The Common Lisp Word Scrambler

In a previous post, I said that I’d like to write a program that gives me all the permutations of the letters in a word that spell valid English words. Well, I’ve gotten a significant portion of the way there. I have a function that will take any word and scramble it into all its possible permutations, valid English or not. Here are a few examples:

CL-USER> (scramble "hi")
("hi" "ih")
CL-USER> (scramble "the")
("the" "teh" "hte" "het" "eth" "eht")
CL-USER> (scramble "horse")
("horse" "hores" "hosre" "hoser" "hoers" "hoesr" "hrose" "hroes" "hrsoe"
 "hrseo" "hreos" "hreso" "hsore" "hsoer" "hsroe" "hsreo" "hseor" "hsero"
 "heors" "heosr" "heros" "herso" "hesor" "hesro" "ohrse" "ohres" "ohsre"
 "ohser" "ohers" "ohesr" "orhse" "orhes" "orshe" "orseh" "orehs" "oresh"
 "oshre" "osher" "osrhe" "osreh" "osehr" "oserh" "oehrs" "oehsr" "oerhs"
 "oersh" "oeshr" "oesrh" "rhose" "rhoes" "rhsoe" "rhseo" "rheos" "rheso"
 "rohse" "rohes" "roshe" "roseh" "roehs" "roesh" "rshoe" "rsheo" "rsohe"
 "rsoeh" "rseho" "rseoh" "rehos" "rehso" "reohs" "reosh" "resho" "resoh"
 "shore" "shoer" "shroe" "shreo" "sheor" "shero" "sohre" "soher" "sorhe"
 "soreh" "soehr" "soerh" "srhoe" "srheo" "srohe" "sroeh" "sreho" "sreoh"
 "sehor" "sehro" "seohr" "seorh" "serho" "seroh" "ehors" "ehosr" "ehros"
 "ehrso" "ehsor" "ehsro" "eohrs" "eohsr" "eorhs" "eorsh" "eoshr" "eosrh"
 "erhos" "erhso" "erohs" "erosh" "ersho" "ersoh" "eshor" "eshro" "esohr"
 "esorh" "esrho" "esroh")

Here’s the code:

(defun all-permutations (list)
  (cond ((null list) nil)
        ((null (cdr list)) (list list))
        (t (loop for element in list
             append (mapcar (lambda (l) (cons element l))
                            (all-permutations (remove element list)))))))

(defun make-distinct (list)
  (let ((count 0)
        (result nil))
    (dolist (element list)
      (push (list element count) result)
      (incf count))
  (reverse result)))

(defun make-simple (list)
  (let ((result nil))
    (dolist (element list)
      (push (first element) result))
    (reverse result)))

(defun clean-list (list)
  (let ((result nil))
    (dolist (element list)
      (push (list-to-string (make-simple element)) result))
    (reverse result)))

(defun scramble-list (list)
  (clean-list (all-permutations (make-distinct list))))

(defun scramble (string)
  (scramble-list (loop for char across string collect char)))

(defun list-to-string (list)
  (concatenate 'string list))

The next step will be to make the program generate only valid English words. My initial thought here was that I would program in all the rules for what makes a valid English word. After I thought about that a little bit, though, I realized that I can’t even think of all the rules or even necessarily express what they are. I just know a valid English word when I see it.

For this reason, I thought it might be better to make the computer learn, with human help, what sequences are valid or invalid. This might be done by generating a list, then asking a person which items are valid. Let’s take the output of “apple” as an example:

CL-USER> (scramble "apple")
("apple" "appel" "aplpe" "aplep" "apepl" "apelp" "apple" "appel" "aplpe"
 "aplep" "apepl" "apelp" "alppe" "alpep" "alppe" "alpep" "alepp" "alepp"
 "aeppl" "aeplp" "aeppl" "aeplp" "aelpp" "aelpp" "paple" "papel" "palpe"
 "palep" "paepl" "paelp" "ppale" "ppael" "pplae" "pplea" "ppeal" "ppela"
 "plape" "plaep" "plpae" "plpea" "pleap" "plepa" "peapl" "pealp" "pepal"
 "pepla" "pelap" "pelpa" "paple" "papel" "palpe" "palep" "paepl" "paelp"
 "ppale" "ppael" "pplae" "pplea" "ppeal" "ppela" "plape" "plaep" "plpae"
 "plpea" "pleap" "plepa" "peapl" "pealp" "pepal" "pepla" "pelap" "pelpa"
 "lappe" "lapep" "lappe" "lapep" "laepp" "laepp" "lpape" "lpaep" "lppae"
 "lppea" "lpeap" "lpepa" "lpape" "lpaep" "lppae" "lppea" "lpeap" "lpepa"
 "leapp" "leapp" "lepap" "leppa" "lepap" "leppa" "eappl" "eaplp" "eappl"
 "eaplp" "ealpp" "ealpp" "epapl" "epalp" "eppal" "eppla" "eplap" "eplpa"
 "epapl" "epalp" "eppal" "eppla" "eplap" "eplpa" "elapp" "elapp" "elpap"
 "elppa" "elpap" "elppa")

Since it’s sometimes debatable whether a word could work or not, it would probably be best to ask whether a word is definitely invalid or not, rather than the other way around.

Is “apple” invalid? No. Is “appel” invalid? No. Is “aplpe” invalid? Definitely. But what part of it is invalid? Knowing that it’s invalid isn’t necessarily all that useful to the computer unless we know why it’s invalid.

Maybe the way to go is to keep track of specific instances of invalid words, then see what all those words have in common and form rules based on that. For “aplpe,” the invalid parts are “aplpe,” “aplp,” “plpe,” “apl,” “plp,” “lpe,” “pl,” “lp,”, “p” and “l”. The valid parts are “ap,” “pe” and “a”.

So far, here’s our list of valid words and invalid words:

Valid Invalid
  • apple
  • appel
  • ap
  • pe
  • a
  • aplpe
  • aplp
  • plpe
  • apl
  • lpe
  • pl
  • lp
  • p
  • l

Let’s move on to a slightly more interesting example. One of the permutations of is “peapl,” which is invalid. Add an “e” to the end of that to make “peaple,” though and it’s fine. Let’s see what happens when we apply the foregoing technique to this invalid word:

Invalid parts: eapl, apl, pl, p, l

Valid parts: peap, pea, eap, pe, ea, ap, a

That makes this our new list:

Valid Invalid
  • apple
  • appel
  • peap
  • eap
  • pea
  • ap
  • pe
  • a
  • aplpe
  • aplp
  • eapl
  • plpe
  • apl
  • lpe
  • pl
  • lp
  • p
  • l

I just noticed that I missed something. If we’re dissecting the invalid words and adding the valid and invalid parts to our list, we should be doing the same thing, or at least something similar, with the valid words. But since all parts of a valid word are valid letter sequences, shouldn’t every part of a valid word end up in the “valid” column? What happens if we do this for the word “apple”?

Valid parts: app, ple, ap, le

Invalid parts: appl, pple, ppl, pp, pl

That’s a little disconcerting. What we’ve illustrated here is that valid words can be comprised of invalid letter sequences, as long as some other letters before or after the invalid sequence set things straight again. What’s the difference between the “pl” in “apple” and the “pl” in “peapl“? What’s the difference between the “pp” in “apple” and the “pp” in “ppl”? Well, in those particular cases, “pl” can’t end a word and “pp” can’t be followed by a consonant. The only way to really preserve these distinctions is to keep track of the entire word in the valid/invalid list.

Even though our latest findings make things a little more complicated, they don’t make our valid/invalid list any less true. All the words in the “valid” list are valid and all the ones in the “invalid” list are not. However, just because a word contains something invalid doesn’t mean the entire word is invalid. The only good our list does for us right now is it tells us that if a word exactly matches something on the “invalid” list, it’s not valid. That’s not a terribly comprehensive solution. We’ll have to come up with something better.

More thoughts to come on this in future posts. My brain hurts.

2 Responses to “The Common Lisp Word Scrambler”

  1. http://epc.buffalo.edu/authors/hennessey/data/jabber/index.html

    It works by sticking letters together when they would form one of the 40-something basic sounds of the English language, which if I remember correctly is called a phoneme. This continues following a set of rules about which phonemes are allowed to connect to each other, until it forms entirely believable nonsense words. It’s how Lewis Caroll came up with many of his nonsense words (only in his head, not with a computer, obviously) for the Jabberwocky poem, hence the name.

  2. Tanner Swett says:

    A word is invalid if and only if it contains an invalid sequence. For example, “aplpe” is invalid because it contains “plp”, and “peapl” is invalid because it contains “pl$” (where $ denotes the end of the word). See if you can make your program automatically find out what’s an invalid sequence and what’s not. Alternatively, skip all that difficult stuff and read about the sonority hierarchy.

Leave a Reply