Back to Feed
ProjectsAI

CS50AI - Crossword generator

Check out the code for this project here

Building an AI Crossword Generator

For this project I built an AI to create crossword puzzles. This was made as part of Harvard's CS50’s Introduction to Artificial Intelligence with Python. This was a good introduction to the branh of artificial intelligence called Constraint Satisfaction Problems (CSPs).

2023-11-19-crossword-generator credit

The Architecture

Here we can divide the crossword puzzle into three section:

  • Variables: The blank slots in the grid for example "Across 1" or "Down 3".

  • Domains: This is the list of vocabulary. I did this in my __init__ function where I create a self.domains dictionary that maps every variable to all the possible words the program is allowed to put into that slot.

  • Constraints: These are the rules of the game.

We have two main constraints that the program needs to consider. Firstly we have the node consistency rule. This is saying that a 5-letter slot is only able to accept a 5-letter word. I do this with my enforce_node_consistency function.

The other constraint we have to use is the arch consistency. This says that if an across word and a down word insect at for example the 3rd letter of the across word and the 1st letter of the down work the characters need to be the same. We represent this by saying for the variables XX and YY intersecting at indices ii and jj, the constraint is X[i]=Y[j]X[i] = Y[j].

AC3 and Smart Backtracking

If we were to just guess words randomly it would take way to long to solve. To make it more efficient I have a solve function that utilizes these 2 concepts:

1. Enforcing Arc Consistency (The AC3 Algorithm)

Before we get the AI to start guessing we prune the list of vocabulary. We use the ac3 algorithm paired with a revise helper function. What this does it it looks at every intersection or arc between two grid slots. If one of the across words does not share a an overlapping letter with any down word the word is deleted from the across domain. This saves the program alot of time by removing the impossible words.

2. Backtracking Search with Heuristics

We then take the list of pruned words and use a backtrack function on them. What this does it it places a word on the board and checks if it violates any of the rules then moves onto the next slot. If it hits a dead end and it cant place any more words it undoes the previous word and tres a different one.

To stop the AI from just guessing I have these heuristics that help to guide its decision:

  • Minimum Remaining Values (MRV): I get the program to try and fill in the hardest slots on the obard first. I do this by having a select_unassigned_variable function that searches for the slot with the fewest possible word options remaining.

  • Least Constraining Value (LCV): When the program is choose which word to put in a slot we have a order_domain_values function that sorts the options based on which word eliminates the fewest options for its neighbours. This means the program leaves room for itself in the future

What's Next?

If I were to do this again I would implement these features to improve the program:

  • Dynamic Grid Generation: At the moment I am giving the program a premade grid structure. In the future I would add it so it dynamically creates the crossword grid for the AI to then fill in.

  • Automated Clue Generation: I would get an LLM to automatically generate the clues for each of the words so it could be used a fully functional crossword like one you would find in a newspaper.