lexsense_taxi-label-on-car-roof

Uber dataset with 5,000 rides.xlsx

The message you shared looks like a blog post excerpt (dated August 31, 2025, from “lexsense”) describing a synthetic or sample Uber dataset with 5,000 rides in .xlsx format (converted to .csv for the app). It includes typical columns like:

  • Timestamps (e.g., request/start/end times)
  • City
  • Distance
  • Fare
  • Driver/customer IDs
  • Payment type
  • Ratings
  • Ride status (e.g., completed, cancelled)

The post offers a pre-built Streamlit dashboard (uber_dashboard_streamlit.py) for interactive visualization, with features like filters, KPIs (total rides, revenue, avg fare/distance, completion rate, avg rating), time-based charts, payment breakdowns, city-level insights, peak hours, cancellations, and a driver leaderboard.

Since I don’t have direct access to your local files or the exact dataset/Streamlit code, here’s how to proceed based on common practices and similar public resources:

If You’re Looking to Run or Recreate This Dashboard

  1. Dataset Availability
    No exact public match exists for a precisely “5,000 rides” Uber dataset with those exact columns (many public ones are either smaller personal exports, NYC-specific pickups, or larger fare-prediction sets). However, close alternatives include:
  • GeeksforGeeks sample UberDataset.csv (small, with START_DATE, END_DATE, CATEGORY, etc.) — download from their media link.
  • Kaggle datasets like “Uber Rides Dataset” by mubasherbajwa or “Uber Trips Dataset” by rohiteng (some have ~50k rows but include distance, fare, status, payment, timestamps, city-like info).
  • GitHub repos (e.g., Geo-y20/Uber-Rides-Data-Analysis) with UberDataset.csv containing ride details. If your .xlsx is custom/synthetic, stick with it and convert to CSV via Excel/Pandas if needed:
import pandas as pd
   df = pd.read_excel("Uber dataset with 5,000 rides.xlsx")
   df.to_csv("uber_data.csv", index=False)
   import pandas as pd
   df = pd.read_excel("Uber dataset with 5,000 rides.xlsx")
   df.to_csv("uber_data.csv", index=False)
  1. Running the Provided Streamlit App
    Follow the instructions in the post:
  • Place uber_dashboard_streamlit.py and uber_data.csv in the same folder.
  • Install dependencies:
    bash pip install streamlit pandas numpy matplotlib
  • Launch:
    bash streamlit run uber_dashboard_streamlit.py
  • Access in browser (usually http://localhost:8501). Use the sidebar to filter by date, city, status, etc. If the CSV path is hardcoded, edit the script (look for pd.read_csv("uber_data.csv") or similar and adjust).
  1. Typical Code Structure for Such a Dashboard
    If you don’t have the .py file or want to rebuild/enhance it, here’s a basic skeleton matching the described features:
   import streamlit as st
   import pandas as pd
   import numpy as np
   import matplotlib.pyplot as plt
   import plotly.express as px  # Optional for better charts

   st.set_page_config(page_title="Uber Rides Dashboard", layout="wide")

   # Load data
   @st.cache_data
   def load_data():
       return pd.read_csv("uber_data.csv")  # Update path if needed

   df = load_data()

   # Assume datetime parsing (adjust column names)
   df['timestamp'] = pd.to_datetime(df['timestamp'])  # or 'request_time', etc.
   df['date'] = df['timestamp'].dt.date
   df['hour'] = df['timestamp'].dt.hour

   # Sidebar filters
   st.sidebar.header("Filters")
   date_range = st.sidebar.date_input("Date Range", [df['date'].min(), df['date'].max()])
   cities = st.sidebar.multiselect("City", options=df['city'].unique(), default=df['city'].unique())
   statuses = st.sidebar.multiselect("Ride Status", options=df['ride_status'].unique(), default=df['ride_status'].unique())

   # Apply filters (example)
   filtered_df = df[(df['date'].between(date_range[0], date_range[1])) &
                    (df['city'].isin(cities)) &
                    (df['ride_status'].isin(statuses))]

   # KPIs
   col1, col2, col3, col4 = st.columns(4)
   col1.metric("Total Rides", len(filtered_df))
   col2.metric("Total Revenue", f"${filtered_df['fare'].sum():,.2f}")
   col3.metric("Avg Fare", f"${filtered_df['fare'].mean():.2f}")
   col4.metric("Avg Rating", f"{filtered_df['ratings'].mean():.1f} ⭐")

   # Charts (expand as needed)
   st.subheader("Rides & Revenue Over Time")
   time_df = filtered_df.groupby('date').agg({'fare': 'sum', 'ride_status': 'count'}).reset_index()
   fig = px.line(time_df, x='date', y=['fare', 'ride_status'], title="Trends")
   st.plotly_chart(fig)

   # More sections: Payment split (pie chart), Ratings hist, City bar, etc.
   # Driver leaderboard example
   st.subheader("Driver Leaderboard")
   driver_stats = filtered_df.groupby('driver_id').agg(
       trips=('ride_status', 'count'),
       earnings=('fare', 'sum'),
       avg_rating=('ratings', 'mean')
   ).sort_values('earnings', ascending=False).head(10)
   st.dataframe(driver_stats)

   # Add more: peak hours (bar chart on 'hour'), cancellations by city, etc.

Add plotly, seaborn, or altair for fancier visuals.

Enhancements You Mentioned

  • Geo Features + Maps — Yes, possible if your dataset includes pickup/dropoff lat/long (common in NYC Uber datasets). Use st.map() or folium/pydeck in Streamlit.
  • Convert to Plotly Dash — Dash is great for more complex callbacks. I can outline a Dash version if preferred (multi-page, advanced interactivity).

Files

How to run

  1. Put both files in the same folder.
  2. In a terminal: pip install streamlit pandas numpy matplotlib streamlit run uber_dashboard_streamlit.py
  3. If your CSV isn’t in the same folder, update the CSV path in the app’s sidebar.

What you get

  1. Sidebar filters: date range, city, status, payment type, distance & fare sliders.
  2. KPI cards: rides, revenue, avg fare, avg distance, completion rate, avg rating.
  3. Charts: rides & revenue over time, payment split, ratings distribution, rides by city, revenue by city, peak hours, cancellations by city.
  4. Driver leaderboard with trips, earnings, and avg rating.

Leave a Reply