With Christmas around the corner and some spare time on my hand and decided to find a programmatic solution to a game of Bananagrams.
Bananagrams is a word game similar to Scrabble, invented by Abraham Nathanson in 2006. From a bag of 144 lettered tiles, each player randomly picks the predetermined number of tiles (usually 21 letters).The goal of the game foreach player is to use all of the given letters by creating the word grid of intersecting or interlocking letters that form valid English words. Players can rearrange their grid as many times they like.The player that uses up all of their tiles first, wins the game. The major skill that is required for this game is the ability to create anagram words formed by rearranging the letters by using all the original letters exactly once.
In general, the unwritten strategy for the game usually follows the following two simple rules:Build the longest possible initial word Keep the open layout structure, don’t cram the letters into a small grid
That’s the strategy I also use whenever I play the games. However, Saahil Agrawal and David Kwok of Stanford University in a paper named Bananas for Bananagrams offered another interesting ideaon how to approach the problem. Their strategy suggested the use ofa heuristic method of assigning ascore for each of the words based on its component letters. Essentially they propose to givethe less frequent English letters ahigher number of points with a notion that starting the grid with letters that are more difficult to play, allows for the more flexible use of letters that are easier to play. Something similar is a strategy employed by Scrabble players, who create words with the highest possible combined letter value.Create mysql English Word Dictionary
First I needed a good dictionary.To build one, I’veconverted a text file containing over 466k English words (found on https://github.com/dwyl/english-words ) into MySQL database. I’ve also played with a dictionary of English words that I created from Aspell, here is a short article that describes How to dump and convert Aspell dictionary to wordlist or searchable MySQL/MariaDB database .
Once I had the database of English words in MySQL, I realized, that both dictionaries contain way too many words, some of them so unusual that they would likely never be regularly played by any human player. I needed to be able to select and search only among the words that are recognizable by most people. To resolve the problem, I implemented another database column, that added to each of the words their rank value based on the frequency in which the word occursin a typical English text. You can read more about a word frequency on https://www.wordfrequency.info, who also show some examples of frequency lists based on the COCA corpus.
Once done, I had a list of Englishwords and their associated frequency rank, which I could search by using SQL queries.
The following is my ‘dictionaries’ database and ‘en_english479k’ table definition:CREATE TABLE `en_english479k` ( `id` INT(11) NOT NULL AUTO_INCREMENT, `word` VARCHAR(50) NULL DEFAULT NULL, `description` VARCHAR(255) NULL DEFAULT NULL, `rank` INT(11) NULL DEFAULT NULL, PRIMARY KEY (`id`), UNIQUE INDEX `word_2` (`word`), INDEX `rank` (`rank`), FULLTEXT INDEX `word` (`word`) ) COLLATE='utf8_general_ci' ENGINE=MyISAM ;
Here is a SQL Dump in case someone wants to replicate it: 484k English Words with Frequency Rank.zip (3.8 MB Zip)Search Query MySQL Search Search the longest possible initial word
The initial goal was to create a SQL query that would allow me to search the entire database for a word that contains as many of the letter tiles that I’ve randomly picked from the bag of lettered tiles. It’s somewhat of a tricky exercise because the query must also be capable of selecting words in situations where certain letters occur more than once, yet capable of finding words that might or might not contain any of the letters.
Ultimately, I’ve settled on using REGEX and combination of filters that used a MySQL BETWEEN condition, with order based on ranking the words by their total length (longer the better) and English word frequency rank. This is what the final SQL query looks like fora randomly selected letter sequence:‘ANNEAGANRIT‘.
5 STEP MySQL Query First, build a Regex (1) based on the given lettered tiles. Then filter letters by using BETWEEN condition (2) for each character, defining the minimum and the maximumnumber of times a character can occur in the word. This part of the SQL will need to be generated programmatically, by counting the letter occurrence in the word. Select the allowed minimum and maximum gate for word frequency rank. In this case, I decided to only search my words among the top 50 thousand English words. Set the minimum word length. We want to solve the Bananagrams board only by using English words that are at least 3-letter long. I found that those solutions are the most readable. And finally closing the SQL by leveraging ORDER (3), sorting the words based on the descending word length and ascending frequency rank.
Let’s run the query on the randomly selected tiles ‘ANNEAGANRIT’:
As we can see the best result and the longest English word for the randomly selected tiles ‘ANNEAGANRIT’, is the word: ‘ ARGENTINA ‘.
Note: The importance of using the English word frequency rank is quite obvious when doing a search without the frequency rank, where the solution to randomly selected tiles ‘ANNEAGANRIT’ tiles is a quite obscure English word: ‘ ANTENNARIA ‘. While ‘Antennaria’ might be a valid English word, most of us would agree that it’s hardly recognizable and not really fit for purposes of building a human-like version of the Bananagrams solver.BTW.I had to look it up :) Antennaria is a genus of herbaceous perennial plants.
PHP Function to Find the Longest Word +Remainder of Characters
Now that we have a working SQL that can find the longest recognizable English words based on the given set tiles, we should wrap the SQL query into a PHP function that can be reused.
Input: Takes two URL variables, the sequence of characters and search offset. Result: Provide the longest found English word for the combination of submitted characters as well as the remainder of characters that could be used. The API for finding words could look like this