feat: add .env configuration support and validation#14
Conversation
Summary of ChangesHello @pi-dal, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the project's configurability by introducing a robust system for managing settings via Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a robust and well-designed configuration system using .env files, python-dotenv, and custom helper functions. The integration into existing scripts is consistent, allowing for flexible configuration precedence (CLI args > .env > code defaults). The addition of a validate_config.py script and comprehensive unit tests significantly improves the maintainability and reliability of the configuration setup. The handling of empty strings in .env as "unset" and the bidirectional fallback for N_SPLITS/NUM_SPLITS are thoughtful design choices.
| FACTOR=1.0 | ||
| ORIGINAL_MAX_POSITION_EMBEDDINGS=131072 | ||
| EXPECTED_RUNS=1 | ||
| REASONING_EFFORT= |
| # ------------------------------ | ||
| EVAL_TYPE=code | ||
| NUM_WORKERS=4 | ||
| MAX_ITEMS= |
| return bool(load_dotenv(dotenv_path=dotenv_path, override=override)) | ||
|
|
||
|
|
||
| def _raw_env(name: str, environ: Optional[Mapping[str, str]] = None) -> Optional[str]: |
There was a problem hiding this comment.
The _raw_env function is a core helper for all env_* functions, handling the fetching, stripping, and None conversion of environment variables. Adding a docstring to explain its purpose and behavior would improve code clarity and maintainability.
| def _raw_env(name: str, environ: Optional[Mapping[str, str]] = None) -> Optional[str]: | |
| def _raw_env(name: str, environ: Optional[Mapping[str, str]] = None) -> Optional[str]: | |
| """Fetches an environment variable, strips whitespace, and treats empty strings as None.""" |
| from env_config import load_env | ||
|
|
||
|
|
||
| def _is_probably_path(value: str) -> bool: |
There was a problem hiding this comment.
The _is_probably_path function uses a heuristic to determine if a string is likely a filesystem path. While practical, it's a heuristic. Adding a docstring to briefly explain the heuristic's logic and its potential limitations (e.g., "This is a heuristic and might not cover all cases of remote model IDs that look like paths, or vice-versa") would be beneficial for clarity and to understand potential edge cases in validation.
| def _is_probably_path(value: str) -> bool: | |
| def _is_probably_path(value: str) -> bool: | |
| """Heuristic to determine if a string is likely a filesystem path rather than a remote model ID.""" |
| if db_uri.startswith("sqlite:////"): | ||
| # urlparse keeps the extra leading slash in path (e.g. '//var/db.sqlite3'); | ||
| # if we feed that into sqlite3's file: URI it becomes 'file://var/..' (authority='var'). | ||
| while sqlite_path.startswith("//"): | ||
| sqlite_path = sqlite_path[1:] | ||
| else: | ||
| # sqlite:///relative.db -> parsed.path='/relative.db' (strip to make it relative) | ||
| sqlite_path = sqlite_path.lstrip("/") |
There was a problem hiding this comment.
The parsing logic for SQLite paths from DB_URI is quite intricate due to how urlparse handles different numbers of slashes for absolute vs. relative paths in file:-like URIs. Adding a detailed comment explaining why sqlite_path needs to be manipulated in this specific way (e.g., how urlparse treats sqlite:////abs/path vs sqlite:///rel/path) would greatly improve the clarity and maintainability of this section.
| if db_uri.startswith("sqlite:////"): | |
| # urlparse keeps the extra leading slash in path (e.g. '//var/db.sqlite3'); | |
| # if we feed that into sqlite3's file: URI it becomes 'file://var/..' (authority='var'). | |
| while sqlite_path.startswith("//"): | |
| sqlite_path = sqlite_path[1:] | |
| else: | |
| # sqlite:///relative.db -> parsed.path='/relative.db' (strip to make it relative) | |
| sqlite_path = sqlite_path.lstrip("/") | |
| if db_uri.startswith("sqlite:////"): | |
| # urlparse keeps the extra leading slash in path (e.g. '//var/db.sqlite3'); | |
| # if we feed that into sqlite3's file: URI it becomes 'file://var/..' (authority='var'). | |
| # We need to strip leading slashes to get a correct absolute path for os.path.exists. | |
| while sqlite_path.startswith("//"): | |
| sqlite_path = sqlite_path[1:] | |
| else: | |
| # sqlite:///relative.db -> parsed.path='/relative.db' (strip to make it relative) | |
| # We need to strip the single leading slash to get a correct relative path. | |
| sqlite_path = sqlite_path.lstrip("/") |
…I#10) - Add .env.example template with all configurable environment variables - Add env_config.py module with typed env var helpers (load_env, env_str, env_int, env_float, env_bool, env_present) - Add validate_config.py script for configuration validation - Integrate env config into 9 Python scripts (infer_split_merge.py, infer_self_play.py, etc.) - Update README.md with Configuration section - Add .env to .gitignore - Add comprehensive unit tests Precedence: CLI args > .env > code defaults
9e8e5b7 to
9f44c67
Compare
|
Rebased onto latest main and resolved conflicts. Added namespaced env var support to avoid collisions, while keeping existing unprefixed vars as fallback for compatibility. PR is now mergeable. |
|
Looks good, thank you! |
Summary
.envfile configuration support with environment variable loading helpersvalidate_config.pyscript for configuration validationChanges
New Files
.env.exampleenv_config.pyload_env(),env_str(),env_int(),env_float(),env_bool(),env_present()helpersvalidate_config.pytests/test_env_config.pytests/test_validate_config.pyModified Files
.gitignore.envto prevent committing local configurationREADME.mdinfer_split_merge.pyinfer_self_play.pyprepare_self_play_data.pyprepare_sft_data_code.pyself_play_eval.pytest_cases_generation.pytest_cases_postprocess.pydeduplicate_problems.pyUsage
Design Decisions
CLI args > .env > code defaultspython-dotenvis not installedN_SPLITS/NUM_SPLITSare interchangeable for convenienceCloses #10