Google Cloud x MLB Hackathon Resources
Welcome to the Google Cloud x MLB™ Hackathon!
Calling all Devs: Step up to the plate and knock it out of the park! Use Google Cloud's heavy-hitting data and AI lineup (Gemini, Imagen, Vertex AI... the whole roster!) and real data from Major League Baseball™ to build the future of fan engagement. Showcase your AI skills, craft impactful applications, and revolutionize how baseball fans experience the game.
Ready to hit a grand slam? Build a project using Google Cloud AI that revolutionizes MLB™ fan experience.
Let's build the future of baseball together!
This repository contains resources and instructions to help you get started with accessing MLB's data services for your hackathon project. Stay tuned to this repo as more datasets become available.
MLB GUMBO Data Access
A primary data source for this hackathon will be MLB's GUMBO (Grand Unified Master Baseball Object) data feeds, which are available without authentication.
About GUMBO
The GUMBO (Grand Unified Master Baseball Object) live data feed provides a standardized JSON response that summarizes the entire state of a selected game upon each update. Unlike previous live event data feeds, GUMBO provides complete game information with every object creation, rather than incremental updates.
Key Benefits
- Complete Game State: No need to maintain game state locally or build upon message sequences - each GUMBO object includes complete and current dataset for the entire game
- Standard JSON Format: GUMBO follows true JSON standards, unlike previous feed formats
- Flexible Access Methods: Available via:
- Websocket listener (push updates every 1-2 seconds)
- Stats API (pull updates every 12 seconds)
- Development-Friendly: Use actual production data for development at any time without requiring scheduled test data delivery
API Endpoints
Access GUMBO data using these base URLs (replace {game_pk} with the specific game ID):
Current Game State:
https://statsapi.mlb.com/api/v1.1/game/{game_pk}/feed/liveSpecific Game Timestamp:
https://statsapi.mlb.com/api/v1.1/game/{game_pk}/feed/live?timecode=yyyymmdd_######List of Game Update Timestamps:
https://statsapi.mlb.com/api/v1.1/game/{game_pk}/feed/live/timestamps
Common API Query Examples
Here are some useful examples of common Stats API queries:
Get 2024 MLB Regular Season Schedule:
https://statsapi.mlb.com/api/v1/schedule?sportId=1&season=2024&gameType=RParameters explained:
sportId=1: MLB (1 represents Major League Baseball)season=2024: The season yeargameType=R: Regular season games (R = Regular Season, P = Postseason, S = Spring Training)
Get Los Angeles Dodgers 2024 Roster:
https://statsapi.mlb.com/api/v1/teams/119/roster?season=2024Parameters explained:
119: Team ID for LA Dodgersseason=2024: The season year
Get Team Information:
https://statsapi.mlb.com/api/v1/teams/119- Returns detailed information about a specific team (119 = Dodgers)
- Add
?season=2024to get team info for a specific season
Get Player Information:
https://statsapi.mlb.com/api/v1/people/660271- Returns detailed information about a specific player (660271 = Shohei Ohtani)
- Add
?season=2024to get player info for a specific season
Get Live Game Data:
https://statsapi.mlb.com/api/v1.1/game/716463/feed/live- Returns live GUMBO feed for a specific game
- Game PKs can be obtained from the schedule endpoint
Common Query Parameters:
hydrate: Add additional data to the response (e.g.,?hydrate=stats,team)fields: Limit the response to specific fieldsseason: Specify a season yeardate: Specify a specific date (format: MM/DD/YYYY)
Note: All endpoints return JSON data. You can use tools like curl or Python's requests library to fetch the data:
import requests
# Example: Get Dodgers roster
url = "https://statsapi.mlb.com/api/v1/teams/119/roster?season=2024"
response = requests.get(url)
data = response.json()
Getting Started
- Review the provided documentation in this repository for detailed information about:
- Available endpoints
- Data structure
- Query parameters
- Response formats
Historical Data Availability
The MLB data feeds provide different levels of historical data granularity depending on the time period:
- 1901-1968: Boxscore level only
- 1969-1988: Play-by-play level
- 1989-2007: Pitch-by-pitch level
- 2008-2014: Pitch-by-pitch with pitch speed/break information (Pitch F/x)
- 2015-Present: Pitch-by-pitch with enhanced metrics:
- Pitch speed
- Exit velocity
- Home Run distance
Minor League Coverage
- 2021: Florida State League (A) added
- 2022: Pacific Coast League (AAA) added
- 2023: International League (AAA) added
Accessing Dataset Files
All datasets for the hackathon are available in our public Google Cloud Storage bucket. The datasets are organized into the following categories:
- MLB Caption Data
- MLB Fan Content Interaction Data
- MLB StatsAPI Documentation
- Game Data (including 2024 home runs dataset)
Access Methods
Direct Browser Access: Access the datasets through the Google Cloud Console: GCP MLB Hackathon 2025 Bucket
Command Line Access: Using
gsutil(part of Google Cloud SDK):# Download all files gsutil -m cp -r gs://gcp-mlb-hackathon-2025/* .
# Download specific dataset gsutil cp gs://gcp-mlb-hackathon-2025/datasets/2024-mlb-homeruns.csv .
## Additional Resources
### Getting Started Guide
Check out our [Google Colab Notebook](https://colab.research.google.com/drive/1QcZD-_VK-Fa9ZC_iNy6Cth0n67KF2dSC?usp=sharing) for interactive examples and tutorials to help you get started with the MLB data and Google Cloud AI tools.
### MLB Glossary
For help understanding baseball terminology and statistics, refer to the [MLB Glossary](https://www.mlb.com/glossary). This comprehensive resource explains baseball terms, statistics, and metrics used throughout MLB's data services.
More information about available datasets and hackathon themes will be provided in this repository. Stay tuned for updates!
Built With
- docker-google-cloud-platform
- docker-version-control-git-github/gitlab
- express.js
- firebase-hosting
- firebase-hosting-backend-python
- flutter
- gmail-api-multilingual-support-python-google-cloud-translation-api
- google-cloud-libraries-google-cloud-sdk
- html/css-react.js
- jupyter-notebook-content-delivery-javascript
- languages-frameworks/libraries-tools-frontend-javascript
- material-ui
- natural-language
- node.js-flask/django
- nodemailer
- nosql-firestore
- open
- pg
- pgadmin-ai/ml-python-google-cloud-video-intelligence
- postgresql
- postman-database-sql
- psycopg2-firebase-console
- python-firebase-admin
- sendgrid
- smtplib-firebase-cloud-messaging
- source
- tailwindcss-webpack
- text-to-speech-api-google-cloud-console-hosting/deployment-google-app-engine
- text-to-speech-apis-google-cloud-console
- translation
- vite
Log in or sign up for Devpost to join the conversation.