Creating my AI Movie Trends Explorer

Rahul Bhattacharya Dec 17, 2023 05:27 AM

Live Demo GitHub

Synopsis: In this project I explore how movie trends evolve over time by analyzing ratings, genres, and popularity patterns. It provides an interactive platform where users can explore historical data, apply filters, and visualize changes dynamically, offering deeper insight into shifting audience preferences and behaviors.

It started with a simple thought during a weekend binge of old films. I wondered how trends in movies evolve with time. Ratings shift, genres rise and fall, and audience attention drifts in ways that are not obvious. I wanted something that could let me look at this visually instead of trying to guess by memory. That thought slowly shaped into this project. Dataset used here.

The result is a Streamlit app where I can upload processed movie data and interact with it. Instead of static charts, I wanted sliders, filters, and search that react instantly. To make this work on GitHub Pages and Colab, I needed the right files and code placed properly. In this blog I will explain every file, every helper, and every conditional. I will not stop at describing what the code does, I will explain why it was needed in the overall design. This breakdown is long but complete. It is the exact journey I followed while making the project work end-to-end.

requirements.txt

This file lists all the Python dependencies needed to run the project. Without installing these packages, the Streamlit app will fail. Each entry pins a library version high enough to include the features used in the app.

streamlit>=1.36
pandas>=2.0
numpy>=1.24
scikit-learn>=1.4
scipy>=1.11
altair>=5.0
joblib>=1.3

I used Streamlit for the web interface, pandas and numpy for data handling, scikit-learn for clustering, scipy and altair for numerical tasks and plotting, and joblib to load saved models. This ensures anyone cloning the repo can set up the environment quickly.

README.md

The README gives a quick introduction and step-by-step setup instructions. It describes what data should be uploaded, what outputs are expected, and how to launch the app locally. It acts as the entry point for anyone discovering the repo.

# Movie Trends Explorer

Interactive dashboard to explore movie trends (ratings by year, genre breakdowns), plus optional clustering and semantic search on overviews.

## Setup

1. Upload your raw CSV to `data/movies.csv`.
2. Open the Colab notebook steps (see README top). Run cells to generate:
   - `data/movies_clean.csv`
   - `data/agg_ratings_by_year.csv`
   - `data/genre_exploded.csv`
   - `data/tfidf_vectorizer.pkl` (optional)
   - `data/kmeans.pkl` (optional)
   - `data/pca_2d.csv` (optional)
3. Commit and push to GitHub.

## Run Locally

```bash
pip install -r requirements.txt
streamlit run app.py

I kept it simple so that contributors and recruiters could grasp the purpose without reading code. It also mentions optional model files that enhance clustering and search. This separation makes the base app functional even if machine learning extras are missing.

Data Folder

The data/ folder contains both raw and processed data. Each file serves a role:

movies.csv is the raw dataset.
movies_clean.csv is a cleaned version for display.
agg_ratings_by_year.csv contains pre-aggregated ratings by year.
genre_exploded.csv breaks movies by genre for charts.
pca_2d.csv holds dimensionality reduction results for scatter plots.
tfidf_vectorizer.pkl is a saved text vectorizer.
kmeans.pkl is a trained clustering model.

By storing these, I reduced runtime processing. Streamlit apps restart often, so precomputing avoids long waits each time.

app.py

This is the core of the project. It defines helpers for loading data, functions to draw UI, and the main layout logic. I will expand on each block in order.