Organization

This is an evolving repo optimized for machine-learning projects aimed at designing a new algorithm. They require sweeping over different hyperparameters, comparing to baselines, and iteratively refining an algorithm. Based of cookiecutter-data-science.

Organization

src: contains main code for modeling (e.g. model architecture)
experiments: code for runnning experiments (e.g. loading data, training models, evaluating models)
scripts: scripts for hyperparameter sweeps (python scripts that launch jobs in experiments folder with different hyperparams)
notebooks: jupyter notebooks for analyzing results and making figures
tests: unit tests

Setup

setup using uv (requires installing uv then run a script using uv run <script>).
this installs a package named src for importing
- see pyproject.toml for dependencies, not all are required
example run: run uv run scripts/01_train_basic_models.py (which calls experiments/01_train_model.py) then view the results in notebooks/01_model_results.ipynb
keep tests upated and run using uv run pytest

Features

scripts sweep over hyperparameters using easy-to-specify python code
experiments automatically cache runs that have already completed
- caching uses the (non-default) arguments in the argparse namespace
notebooks can easily evaluate results aggregated over multiple experiments using pandas

Guidelines

See some useful packages here
Avoid notebooks whenever possible (ideally, only for analyzing results, making figures)
Paths should be specified relative to a file's location (e.g. os.path.join(os.path.dirname(__file__), 'data'))
Naming variables: use the main thing first followed by the modifiers (e.g. X_train, acc_test)
- binary arguments should start with the word "use" (e.g. --use_caching) and take values 0 or 1
Use logging instead of print
Use argparse and sweep over hyperparams using python scripts (or custom things, like amulet)
- Note, arguments get passed as strings so shouldn't pass args that aren't primitives or a list of primitives (more complex structures should be handled in the experiments code)
Each run should save a single pickle file of its results
All experiments that depend on each other should run end-to-end with one script (caching things along the way)
Keep updated requirements in setup.py
Follow sklearn apis whenever possible
Use Huggingface whenever possible, then pytorch

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
amlt_example		amlt_example
experiments		experiments
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
pyproject.toml		pyproject.toml
readme.md		readme.md
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Organization

Setup

Features

Guidelines

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Organization

Setup

Features

Guidelines

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages