Skip to content

SpareCores/sc-data

Repository files navigation

Spare Cores Data

Build Last Run Project Status: Beta Maintenance Status: Active CC-BY-SA 4.0 License PyPI - Python Version NGI Search Open Call 3 beneficiary

SC Data is a Python package and related tools making use of sparecores-crawler to pull and standardize data on cloud compute resources. This repository actually runs the crawler every 5 minutes to update spot prices, and every hour to update all cloud resources in an internal SCD table and public SQLite snapshot as well.

Installation

Stable version from PyPI:

pip install sparecores-data

Most recent version from GitHub:

pip install "sparecores-data @ git+https://git@github.com/SpareCores/sc-data.git"

Usage

For easy access to the most recent version of the SQLite database file, import the db object of the sc_data Python package, which runs an updater thread in the background to keep the SQLite file up-to-date:

from sc_data import db
print(db.path)

The database is cached locally in a persistent directory and automatically updated when needed. On import, the package:

  1. Checks the local cache for a valid (non-stale) database
  2. If cached and fresh, uses it immediately
  3. Otherwise, downloads the latest version from our public S3 bucket
  4. Falls back to a limited version bundled with the package (without pricing information) if download fails

The cache is stored in a platform-specific location:

  • Linux: $XDG_CACHE_HOME/sparecores-data/ or ~/.cache/sparecores-data/
  • macOS: ~/Library/Caches/sparecores-data/
  • Windows: %LOCALAPPDATA%/sparecores-data/

To enforce waiting for the update to complete, you can use the updated event:

db.updated.wait()

Configuration

The package comes with the following set of default parameters, which can be overridden by builtins or environment variables:

Configuration Description Default Value Builtin Name Environment Variable
Custom Database Path Custom file path for the database (bypasses cache) - sc_data_db_path SC_DATA_DB_PATH
Disable Updates Whether to disable automatic updates False sc_data_no_update SC_DATA_NO_UPDATE
Database URL The URL of the most recent version of the database file https://...sc-data-all.db.bz2 sc_data_db_url SC_DATA_DB_URL
HTTP Timeout The timeout in seconds for downloading the database file 30 sc_data_http_timeout SC_DATA_HTTP_TIMEOUT
Refresh Interval The interval in seconds to check for database updates 600 sc_data_db_refresh_seconds SC_DATA_DB_REFRESH_SECONDS
Cache TTL Time in seconds before the cached database is considered stale 86400 (1 day) sc_data_db_cache_ttl SC_DATA_DB_CACHE_TTL

Note: Setting SC_DATA_DB_PATH disables caching and uses the specified file directly.

References

About

Structured data collected by sc-crawler

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages