SC Data is a Python package and related tools making use of
sparecores-crawler to pull and
standardize data on cloud compute resources. This repository actually
runs the crawler every 5 minutes to update spot prices, and every hour
to update all cloud resources in an internal SCD table and public
SQLite snapshot as well.
Stable version from PyPI:
pip install sparecores-data
Most recent version from GitHub:
pip install "sparecores-data @ git+https://git@github.com/SpareCores/sc-data.git"
For easy access to the most recent version of the SQLite database file, import
the db object of the sc_data Python package, which runs an updater thread
in the background to keep the SQLite file up-to-date:
from sc_data import db
print(db.path)The database is cached locally in a persistent directory and automatically updated when needed. On import, the package:
- Checks the local cache for a valid (non-stale) database
- If cached and fresh, uses it immediately
- Otherwise, downloads the latest version from our public S3 bucket
- Falls back to a limited version bundled with the package (without pricing information) if download fails
The cache is stored in a platform-specific location:
- Linux:
$XDG_CACHE_HOME/sparecores-data/or~/.cache/sparecores-data/ - macOS:
~/Library/Caches/sparecores-data/ - Windows:
%LOCALAPPDATA%/sparecores-data/
To enforce waiting for the update to complete, you can use the updated event:
db.updated.wait()The package comes with the following set of default parameters, which can be overridden by builtins or environment variables:
| Configuration | Description | Default Value | Builtin Name | Environment Variable |
|---|---|---|---|---|
| Custom Database Path | Custom file path for the database (bypasses cache) | - | sc_data_db_path |
SC_DATA_DB_PATH |
| Disable Updates | Whether to disable automatic updates | False |
sc_data_no_update |
SC_DATA_NO_UPDATE |
| Database URL | The URL of the most recent version of the database file | https://...sc-data-all.db.bz2 |
sc_data_db_url |
SC_DATA_DB_URL |
| HTTP Timeout | The timeout in seconds for downloading the database file | 30 |
sc_data_http_timeout |
SC_DATA_HTTP_TIMEOUT |
| Refresh Interval | The interval in seconds to check for database updates | 600 |
sc_data_db_refresh_seconds |
SC_DATA_DB_REFRESH_SECONDS |
| Cache TTL | Time in seconds before the cached database is considered stale | 86400 (1 day) |
sc_data_db_cache_ttl |
SC_DATA_DB_CACHE_TTL |
Note: Setting SC_DATA_DB_PATH disables caching and uses the specified file directly.