Skip to content

feat: add .env configuration support and validation#14

Merged
cafe3310 merged 1 commit into
inclusionAI:mainfrom
pi-dal:issue/10-env-config
Jan 31, 2026
Merged

feat: add .env configuration support and validation#14
cafe3310 merged 1 commit into
inclusionAI:mainfrom
pi-dal:issue/10-env-config

Conversation

@pi-dal

@pi-dal pi-dal commented Jan 26, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add .env file configuration support with environment variable loading helpers
  • Add validate_config.py script for configuration validation
  • Integrate env config into 9 entry-point Python scripts
  • Add comprehensive unit tests

Changes

New Files

File Description
.env.example Template configuration file with all environment variables documented
env_config.py Utility module with load_env(), env_str(), env_int(), env_float(), env_bool(), env_present() helpers
validate_config.py Configuration validation script (checks paths, permissions, types)
tests/test_env_config.py Unit tests for env_config module
tests/test_validate_config.py Unit tests for validation logic

Modified Files

File Changes
.gitignore Add .env to prevent committing local configuration
README.md Add "Configuration" section with usage instructions
infer_split_merge.py Integrate env config for all CLI arguments
infer_self_play.py Integrate env config for all CLI arguments
prepare_self_play_data.py Integrate env config for all CLI arguments
prepare_sft_data_code.py Integrate env config for all CLI arguments
self_play_eval.py Integrate env config for all CLI arguments
test_cases_generation.py Integrate env config for all CLI arguments
test_cases_postprocess.py Integrate env config for all CLI arguments
deduplicate_problems.py Integrate env config for all CLI arguments

Usage

# 1. Copy the example env file
cp .env.example .env

# 2. Edit with your configuration
vim .env

# 3. Validate your setup
python validate_config.py

# 4. Run scripts (will use .env values as defaults)
python infer_split_merge.py  # no need to pass --model_path if set in .env

Design Decisions

  • Precedence: CLI args > .env > code defaults
  • Optional dependency: Falls back gracefully if python-dotenv is not installed
  • Bidirectional fallback: N_SPLITS / NUM_SPLITS are interchangeable for convenience
  • Error handling: Invalid env values exit with clear messages (no stack traces)

Closes #10

@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello @pi-dal, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's configurability by introducing a robust system for managing settings via .env files. It streamlines the setup process for users by centralizing configuration and providing a dedicated validation tool to catch common errors early. The integration across core scripts means that most operational parameters can now be managed externally, improving flexibility and maintainability, while comprehensive testing ensures the reliability of these new features.

Highlights

  • Environment Variable Support: Introduced comprehensive support for loading configuration from .env files, allowing users to define parameters outside of the codebase.
  • Configuration Validation: Added a new script, validate_config.py, to perform checks on environment variables, ensuring paths exist, permissions are correct, and types are valid, preventing runtime errors.
  • Integration Across Scripts: Integrated the new environment configuration system into 9 key Python entry-point scripts, enabling them to automatically pick up settings from .env files.
  • Unit Testing: Provided extensive unit tests for both the environment configuration loader (env_config.py) and the validation logic (validate_config.py) to ensure robustness.
  • Configuration Precedence: Established a clear precedence order for configuration values: Command Line Interface (CLI) arguments override .env settings, which in turn override hardcoded defaults.
  • Optional Dependency: The .env loading mechanism gracefully falls back if the python-dotenv library is not installed, maintaining functionality without strict dependency.
  • Bidirectional Fallback: Implemented a convenient bidirectional fallback for common environment variable names, such as N_SPLITS and NUM_SPLITS, to enhance user experience.
  • Improved Error Handling: Ensured that invalid environment variable values result in clear, user-friendly error messages and a clean exit, avoiding noisy stack traces.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust and well-designed configuration system using .env files, python-dotenv, and custom helper functions. The integration into existing scripts is consistent, allowing for flexible configuration precedence (CLI args > .env > code defaults). The addition of a validate_config.py script and comprehensive unit tests significantly improves the maintainability and reliability of the configuration setup. The handling of empty strings in .env as "unset" and the bidirectional fallback for N_SPLITS/NUM_SPLITS are thoughtful design choices.

Comment thread .env.example
FACTOR=1.0
ORIGINAL_MAX_POSITION_EMBEDDINGS=131072
EXPECTED_RUNS=1
REASONING_EFFORT=

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The REASONING_EFFORT variable is left empty. While env_str correctly interprets this as None, it would be helpful for users if the comment provided more context on its expected values or format, or explicitly stated that it's optional and can be left empty.

Comment thread .env.example
# ------------------------------
EVAL_TYPE=code
NUM_WORKERS=4
MAX_ITEMS=

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The MAX_ITEMS variable is left empty. Similar to REASONING_EFFORT, providing an example value or a note that it's optional would improve clarity for users.

Comment thread env_config.py
return bool(load_dotenv(dotenv_path=dotenv_path, override=override))


def _raw_env(name: str, environ: Optional[Mapping[str, str]] = None) -> Optional[str]:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _raw_env function is a core helper for all env_* functions, handling the fetching, stripping, and None conversion of environment variables. Adding a docstring to explain its purpose and behavior would improve code clarity and maintainability.

Suggested change
def _raw_env(name: str, environ: Optional[Mapping[str, str]] = None) -> Optional[str]:
def _raw_env(name: str, environ: Optional[Mapping[str, str]] = None) -> Optional[str]:
"""Fetches an environment variable, strips whitespace, and treats empty strings as None."""

Comment thread validate_config.py
from env_config import load_env


def _is_probably_path(value: str) -> bool:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _is_probably_path function uses a heuristic to determine if a string is likely a filesystem path. While practical, it's a heuristic. Adding a docstring to briefly explain the heuristic's logic and its potential limitations (e.g., "This is a heuristic and might not cover all cases of remote model IDs that look like paths, or vice-versa") would be beneficial for clarity and to understand potential edge cases in validation.

Suggested change
def _is_probably_path(value: str) -> bool:
def _is_probably_path(value: str) -> bool:
"""Heuristic to determine if a string is likely a filesystem path rather than a remote model ID."""

Comment thread validate_config.py
Comment on lines +126 to +133
if db_uri.startswith("sqlite:////"):
# urlparse keeps the extra leading slash in path (e.g. '//var/db.sqlite3');
# if we feed that into sqlite3's file: URI it becomes 'file://var/..' (authority='var').
while sqlite_path.startswith("//"):
sqlite_path = sqlite_path[1:]
else:
# sqlite:///relative.db -> parsed.path='/relative.db' (strip to make it relative)
sqlite_path = sqlite_path.lstrip("/")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parsing logic for SQLite paths from DB_URI is quite intricate due to how urlparse handles different numbers of slashes for absolute vs. relative paths in file:-like URIs. Adding a detailed comment explaining why sqlite_path needs to be manipulated in this specific way (e.g., how urlparse treats sqlite:////abs/path vs sqlite:///rel/path) would greatly improve the clarity and maintainability of this section.

Suggested change
if db_uri.startswith("sqlite:////"):
# urlparse keeps the extra leading slash in path (e.g. '//var/db.sqlite3');
# if we feed that into sqlite3's file: URI it becomes 'file://var/..' (authority='var').
while sqlite_path.startswith("//"):
sqlite_path = sqlite_path[1:]
else:
# sqlite:///relative.db -> parsed.path='/relative.db' (strip to make it relative)
sqlite_path = sqlite_path.lstrip("/")
if db_uri.startswith("sqlite:////"):
# urlparse keeps the extra leading slash in path (e.g. '//var/db.sqlite3');
# if we feed that into sqlite3's file: URI it becomes 'file://var/..' (authority='var').
# We need to strip leading slashes to get a correct absolute path for os.path.exists.
while sqlite_path.startswith("//"):
sqlite_path = sqlite_path[1:]
else:
# sqlite:///relative.db -> parsed.path='/relative.db' (strip to make it relative)
# We need to strip the single leading slash to get a correct relative path.
sqlite_path = sqlite_path.lstrip("/")

…I#10)

- Add .env.example template with all configurable environment variables
- Add env_config.py module with typed env var helpers (load_env, env_str, env_int, env_float, env_bool, env_present)
- Add validate_config.py script for configuration validation
- Integrate env config into 9 Python scripts (infer_split_merge.py, infer_self_play.py, etc.)
- Update README.md with Configuration section
- Add .env to .gitignore
- Add comprehensive unit tests

Precedence: CLI args > .env > code defaults
@pi-dal pi-dal force-pushed the issue/10-env-config branch 2 times, most recently from 9e8e5b7 to 9f44c67 Compare January 31, 2026 06:32
@pi-dal

pi-dal commented Jan 31, 2026

Copy link
Copy Markdown
Contributor Author

Rebased onto latest main and resolved conflicts. Added namespaced env var support to avoid collisions, while keeping existing unprefixed vars as fallback for compatibility. PR is now mergeable.

@cafe3310

Copy link
Copy Markdown
Contributor

Looks good, thank you!

@cafe3310 cafe3310 merged commit 22cf6c0 into inclusionAI:main Jan 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add . env Configuration Support and Validation

2 participants