Back to Feed
ProjectsOther

CS50AI - 7 degrees of kevin bacon

Check out the code for this project here

Solving the Six Degrees of Kevin Bacon with Graph Search

The "Six degrees of Kevin Bacon" is an idea that any actor in Hollywood can be connected to Kevin Bacon with no more than six steps. A "step" means that if two actors star in the same film you can take a step between them.

I undertook this project as part of Harvard's CS50’s Introduction to Artificial Intelligence with Python course. It was a good introduction to the course and was a fun and interesting way of implementing a classic search problem and is good way to learn more skills in programming. It is also easy to get the data for this as there is a large easily accessible IMDb database.

2023-11-05-seven-degrees credit

How the program is setup

To solve this problem I translated Hollywood into a graph G=(V,E)G=(V, E) where each of the components are:

  • Vertices (V): The actors (represented by their unique IMDb person_id).

  • Edges (E): The movies connecting them (represented by movie_id).

In our program we take an initial state of actor A and our goal is to find our final state of Actor B. The job of our program is to return the shorted sequence of movies that connects Actor A and Actor B.

How the search works

There are many different search algorithms. For example, if I had used a Depth-first Search(DFS) where the algorithm goes as far as it can down one path before backtracking the algorithm would have found a path. However, this would have been very inefficient and likely not found the shortest path and just found a valid path.

As I want to determine the shortest path not just a path I implemented the Breadth-First Search (BFS) algorithm as it would be more likely to determine a shorter path in this case.

The way that BFS works is that it explored the graph in concentric circles. This means if we start at actor A it would first look at all actors that are 1 degree of separation then if it didn't find actor B it would then look at 2 degrees of separation and so on. To achieve this in the code I have a shortest_path function which uses a QueueFrontier which is a first in first out queue which ensures the algorithm checks all closer connections before searching further away.

How the code works

Here is a breakdown of how the code works:

  • Firstly I have a load_data function which puts the CSVs into three dictionaries (names, people, and movies). This makes it much quicker for the program to look up the movies that actors are in.

  • I then have a Node class where every time the program checks a new actor, it makes a Node object. This node contains the state (the current actor's ID), the action (the movie that got the program here), and the parent (the previous Node).

  • To make sure that the program does not run in infinite loops and just keep going back and forward between actors I included an explored set. This keeps track of the actors wer have already explored and before exploring new actors the program checks the explored set to prevent checking the same actor twice.

  • When node.state == target the program ends. When this happens the program takes the winning Node and traces the parent attributes all the way back to the source. It then outputs the final list of the movie actor chain.

Looking Back

This was a fun way to learn a fundamental concept in AI and programming in general and was a really good way to introduce the CS50AI course. It helped sharpen my programming skills and I am definitely glad they included this project in the CS50AI course and would happily do it again even with more advanced search algorithms.