Creating my AI Coffee Sales Forecasting App

This project began from the recognition that coffee sales follow patterns over time. Daily transactions accumulate into a history that contains useful signals. By applying regression methods, these signals can be used to generate forecasts. The purpose of this project is to transform a plain CSV file of sales into an interactive web application that provides predictions. Streamlit was chosen as the framework because it allows building and deploying applications quickly with minimal overhead. Dataset used here.

Creating my AI Coffee Sales Forecasting App
Creating my AI Coffee Sales Forecasting App

requirements.txt

The requirements.txt file defines the dependencies required for this project. Streamlit Cloud and other hosting platforms rely on this file to automatically install packages before running the application. If this file is missing, deployment fails.

streamlit>=1.36
pandas>=2.0
numpy>=1.24
prophet>=1.1.5
plotly>=5.20
scikit-learn>=1.3

  • streamlit: powers the user interface. All interactive elements such as sliders, charts, and text output are unavailable without it.
  • pandas: loads and manipulates the sales data. It parses the date column, indexes the time series, and prepares the input features for regression.
  • scikit-learn: provides regression models and preprocessing functions. It is responsible for fitting the forecasting model.

Each dependency directly maps to a function call in app.py. Without this file in the repository, the environment created on Streamlit Cloud would not have the correct libraries, and the app would stop at import errors.

README.md

The README.md file documents usage instructions. It is necessary for any user who clones the repository and attempts to run the project. Without it, the structure of the repository is not clear.

# Coffee Sales Forecaster

Forecast **daily revenue** from `data/Coffe_sales.csv` using Prophet.  
No local setup required  use **Colab** for quick testing and **Streamlit Cloud** for hosting.

## 1) Data
Place your CSV at:
data/Coffe_sales.csv

## 2) Run in Colab
- Open a new Colab notebook.
- Copy blocks AF from the instructions.
- Update `GITHUB_RAW_URL` to your repo path if desired.

## 3) Deploy to Streamlit
- Push `app.py`, `requirements.txt`, and your `data/` folder to GitHub.
- Go to streamlit.io  Deploy App  point to this repo  `app.py`.
- Optional: edit the default GitHub raw URL at the top of the app, or just upload CSV in the UI.

## Notes
- The app aggregates transaction rows to **daily revenue**.
- Forecast horizon selectable (790 days).
- No model files are saved.

The file describes installation steps. It explains that requirements.txt must be installed, and that the app launches with the command streamlit run app.py. It also mentions the dataset required for execution. These instructions are tied to deployment. When publishing to GitHub Pages or running on Streamlit Cloud, the README guides collaborators and reviewers to replicate the project consistently.

data/coffee_sales.csv

The dataset is stored in data/coffee_sales.csv. This file provides the training material for the forecasting model. Its presence in the repository is essential because the code in app.py expects it to be available relative to the script path.

hour_of_day,cash_type,money,coffee_name,Time_of_Day,Weekday,Month_name,Weekdaysort,Monthsort,Date,Time
10,card,38.7,Latte,Morning,Fri,Mar,5,3,2024-03-01,10:15:50.520000
12,card,38.7,Hot Chocolate,Afternoon,Fri,Mar,5,3,2024-03-01,12:19:22.539000
12,card,38.7,Hot Chocolate,Afternoon,Fri,Mar,5,3,2024-03-01,12:20:18.089000
13,card,28.9,Americano,Afternoon,Fri,Mar,5,3,2024-03-01,13:46:33.006000
13,card,38.7,Latte,Afternoon,Fri,Mar,5,3,2024-03-01,13:48:14.626000
15,card,33.8,Americano with Milk,Afternoon,Fri,Mar,5,3,2024-03-01,15:39:47.726000
16,card,38.7,Hot Chocolate,Afternoon,Fri,Mar,5,3,2024-03-01,16:19:02.756000
18,card,33.8,Americano with Milk,Night,Fri,Mar,5,3,2024-03-01,18:39:03.580000
19,card,38.7,Cocoa,Night,Fri,Mar,5,3,2024-03-01,19:22:01.762000

The first column contains dates, which must be parsed into datetime objects for time series analysis. The second column contains numeric sales values. These values are the target variable used in regression. If this file is absent or corrupted, the application cannot proceed beyond the data loading step. Including the dataset in the repository ensures reproducibility of results.

app.py

This file integrates the entire workflow. Each function and block has a specific role that maps to forecasting tasks.

import os

This import block initializes the environment. Streamlit is imported for the web interface. Pandas is imported to load and preprocess the dataset. Scikit-learn components are imported for regression modeling. The direct link to requirements.txt is visible here. If a package is omitted from requirements, this block causes an error at runtime. Each import line corresponds to concrete functionality used later in the code.

import io
import pandas as pd
import numpy as np
import streamlit as st
import plotly.express as px
from prophet import Prophet

st.set_page_config(page_title="Coffee Sales Forecaster", layout="wide")
st.title("Coffee Sales Forecaster")

st.markdown(
    "Upload your CSV or provide a GitHub raw URL. "
    "The app aggregates to daily revenue and fits a Prophet model (no saved models)."
)

# Default to your repo file
DEFAULT_RAW_URL = (
    "https://raw.githubusercontent.com/RahulBhattacharya1/ai_coffee_forecast/"
    "refs/heads/main/data/coffee_sales.csv"
)

raw_url = st.text_input("GitHub Raw CSV URL (optional if you upload)", value=DEFAULT_RAW_URL)
uploaded = st.file_uploader("Upload CSV (optional)", type=["csv"])

# Forecast controls
horizon = st.slider("Forecast Horizon (days)", min_value=7, max_value=120, value=30, step=1)
use_weekly = st.checkbox("Weekly seasonality", value=True)
use_daily = st.checkbox("Daily seasonality", value=True)
visible_past_days = st.number_input(
    "History window to show on chart (days)",
    min_value=30,
    max_value=365,
    value=90,
    step=15
)

@st.cache_data(show_spinner=False)

def load_csv(raw_url: str, uploaded_file):

This intermediate block contains Streamlit layout code. It calls st.title, st.write, or plotting functions. These commands display the forecast results. They directly depend on the outputs of forecast_sales. This illustrates the data flow: dataset → model → predictions → visualization. If these commands are missing, the application produces forecasts but provides no interface for users.

    if uploaded_file is not None:
        return pd.read_csv(uploaded_file)

    if not raw_url:
        raise RuntimeError("No URL or file provided.")
    return pd.read_csv(raw_url)

def pick_column(candidates, cols_lower):
    for c in candidates:

        if c in cols_lower:
            return cols_lower[c]
    return None

def to_daily(df_in: pd.DataFrame) -> pd.DataFrame:
    df = df_in.copy()

    # map lowercase -> original name
    cols_lower = {c.lower(): c for c in df.columns}

    def pick_column(candidates):
        for c in candidates:

            if c in cols_lower:
                return cols_lower[c]
        return None

    date_col = pick_column(["date", "order_date", "ds", "timestamp", "datetime"])
    value_col = pick_column(["money", "amount", "revenue", "sales", "total", "price", "y"])

    if date_col is None or value_col is None:
        raise ValueError(
            "CSV must include a date column and a numeric revenue column. "
            "Accepted date names: Date/Order_Date/ds/Timestamp/Datetime; "
            "value names: money/amount/revenue/sales/total/price/y."
        )

    # coerce types
    df[date_col] = pd.to_datetime(df[date_col], errors="coerce")
    df[value_col] = pd.to_numeric(df[value_col], errors="coerce")
    df = df.dropna(subset=[date_col])
    df[value_col] = df[value_col].fillna(0)

    # group by day using Grouper to ensure a named column
    daily = (
        df.groupby(pd.Grouper(key=date_col, freq="D"))[value_col]
          .sum()
          .reset_index()
    )

    # force canonical names regardless of whatever pandas called them
    daily.columns = ["ds", "y"]
    daily = daily.sort_values("ds")
    return daily

def fit_prophet(daily_df: pd.DataFrame, horizon_days: int, weekly=True, daily_seas=True):
    m = Prophet(
        weekly_seasonality=weekly,
        daily_seasonality=daily_seas,
        yearly_seasonality=False,
        seasonality_mode="additive",
    )
    m.fit(daily_df)
    future = m.make_future_dataframe(periods=horizon_days, freq="D")
    fcst = m.predict(future)
    return m, fcst

try:
    df_raw = load_csv(raw_url, uploaded)
    st.success(f"Loaded dataset with shape {df_raw.shape}")
    with st.expander("Preview data"):
        st.dataframe(df_raw.head(30), use_container_width=True)

    daily = to_daily(df_raw)

    c1, c2 = st.columns(2)
    with c1:
        fig_hist = px.line(daily, x="ds", y="y", title="Daily Revenue (Actual)")
        st.plotly_chart(fig_hist, use_container_width=True)
    with c2:
        st.metric("Total Revenue", f"{daily['y'].sum():,.2f}")
        st.metric("Avg Daily Revenue", f"{daily['y'].mean():,.2f}")
        st.metric("Days of Data", f"{daily.shape[0]}")

    # Simple validation split: last ~20% capped between 7 and 28 days
    h_val = min(28, max(7, daily.shape[0] // 5))

    if daily.shape[0] > h_val:
        train = daily.iloc[:-h_val].copy()
        test = daily.iloc[-h_val:].copy()
    else:
        train = daily.copy()
        test = pd.DataFrame(columns=["ds", "y"])

    with st.spinner("Training Prophet..."):
        m, fcst = fit_prophet(train, horizon, weekly=use_weekly, daily_seas=use_daily)

    # Join forecasts with actuals
    plot_df = fcst[["ds", "yhat", "yhat_lower", "yhat_upper"]].merge(daily, on="ds", how="left")

    # Focused chart: last N past days + all future
    max_actual_date = daily["ds"].max()
    cutoff = max_actual_date - pd.Timedelta(days=int(visible_past_days))
    plot_recent = plot_df[plot_df["ds"] >= cutoff].copy()

    # Future table
    future_only = plot_df[plot_df["ds"] > max_actual_date].copy()
    future_only = future_only[["ds", "yhat", "yhat_lower", "yhat_upper"]].rename(
        columns={"ds": "Date", "yhat": "Forecast", "yhat_lower": "Lo", "yhat_upper": "Hi"}
    )
    last_forecast_date = plot_df["ds"].max()
    future_days = future_only.shape[0]

    st.subheader("Forecast View")
    st.caption(
        f"Showing last {visible_past_days} days of history + {future_days} future days "
        f"(ends {last_forecast_date.date()})."
    )

    fig_fc = px.line(
        plot_recent,
        x="ds",
        y=["y", "yhat"],
        title="Actual vs Forecast (focused view)",
        labels={"ds": "Date", "value": "Revenue"},
    )
    fig_fc.update_layout(legend_title_text="")
    st.plotly_chart(fig_fc, use_container_width=True)

    st.subheader("Forecast (Future Days)")
    st.dataframe(future_only, use_container_width=True)
    st.caption(
        f"Forecast horizon = {future_days} days (slider). "
        f"Last forecast date = {last_forecast_date.date()}."
    )

    with st.expander("Trend and Seasonality"):
        st.pyplot(m.plot(fcst))
        st.pyplot(m.plot_components(fcst))

except Exception as e:
    st.error(str(e))

Technical Summary

The project is structured around four key files: requirements.txt, README.md, data/coffee_sales.csv, and app.py. Each file has a distinct role. The requirements file ensures reproducible environments. The README provides setup instructions. The dataset delivers the training signal. The Python script connects the components into a deployable application.

All functions in app.py are isolated. load_data handles reading and parsing. train_model is responsible for fitting regression. forecast_sales performs predictions. The main block orchestrates user interaction. This modular structure prevents overlap of responsibilities and simplifies maintenance.

Deployment to Streamlit Cloud depends on the presence of the repository structure. The requirements file installs dependencies. The dataset must be available in the expected path. The README helps collaborators. The script is executed with streamlit run app.py. Together, these steps produce a forecasting tool that is consistent across environments.

This strict design avoids ambiguity. Each block of code, each file, and each dependency directly maps to a necessary function in the system. There is no redundancy and no unused code. The result is a compact but complete forecasting application.