feat: add .env configuration support and validation by pi-dal · Pull Request #14 · inclusionAI/PromptCoT

pi-dal · 2026-01-26T05:25:16Z

Summary

Add .env file configuration support with environment variable loading helpers
Add validate_config.py script for configuration validation
Integrate env config into 9 entry-point Python scripts
Add comprehensive unit tests

Changes

New Files

File	Description
`.env.example`	Template configuration file with all environment variables documented
`env_config.py`	Utility module with `load_env()`, `env_str()`, `env_int()`, `env_float()`, `env_bool()`, `env_present()` helpers
`validate_config.py`	Configuration validation script (checks paths, permissions, types)
`tests/test_env_config.py`	Unit tests for env_config module
`tests/test_validate_config.py`	Unit tests for validation logic

Modified Files

File	Changes
`.gitignore`	Add `.env` to prevent committing local configuration
`README.md`	Add "Configuration" section with usage instructions
`infer_split_merge.py`	Integrate env config for all CLI arguments
`infer_self_play.py`	Integrate env config for all CLI arguments
`prepare_self_play_data.py`	Integrate env config for all CLI arguments
`prepare_sft_data_code.py`	Integrate env config for all CLI arguments
`self_play_eval.py`	Integrate env config for all CLI arguments
`test_cases_generation.py`	Integrate env config for all CLI arguments
`test_cases_postprocess.py`	Integrate env config for all CLI arguments
`deduplicate_problems.py`	Integrate env config for all CLI arguments

Usage

# 1. Copy the example env file
cp .env.example .env

# 2. Edit with your configuration
vim .env

# 3. Validate your setup
python validate_config.py

# 4. Run scripts (will use .env values as defaults)
python infer_split_merge.py  # no need to pass --model_path if set in .env

Design Decisions

Precedence: CLI args > .env > code defaults
Optional dependency: Falls back gracefully if python-dotenv is not installed
Bidirectional fallback: N_SPLITS / NUM_SPLITS are interchangeable for convenience
Error handling: Invalid env values exit with clear messages (no stack traces)

Closes #10

gemini-code-assist · 2026-01-26T05:25:38Z

Summary of Changes

Hello @pi-dal, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's configurability by introducing a robust system for managing settings via .env files. It streamlines the setup process for users by centralizing configuration and providing a dedicated validation tool to catch common errors early. The integration across core scripts means that most operational parameters can now be managed externally, improving flexibility and maintainability, while comprehensive testing ensures the reliability of these new features.

Highlights

Environment Variable Support: Introduced comprehensive support for loading configuration from .env files, allowing users to define parameters outside of the codebase.
Configuration Validation: Added a new script, validate_config.py, to perform checks on environment variables, ensuring paths exist, permissions are correct, and types are valid, preventing runtime errors.
Integration Across Scripts: Integrated the new environment configuration system into 9 key Python entry-point scripts, enabling them to automatically pick up settings from .env files.
Unit Testing: Provided extensive unit tests for both the environment configuration loader (env_config.py) and the validation logic (validate_config.py) to ensure robustness.
Configuration Precedence: Established a clear precedence order for configuration values: Command Line Interface (CLI) arguments override .env settings, which in turn override hardcoded defaults.
Optional Dependency: The .env loading mechanism gracefully falls back if the python-dotenv library is not installed, maintaining functionality without strict dependency.
Bidirectional Fallback: Implemented a convenient bidirectional fallback for common environment variable names, such as N_SPLITS and NUM_SPLITS, to enhance user experience.
Improved Error Handling: Ensured that invalid environment variable values result in clear, user-friendly error messages and a clean exit, avoiding noisy stack traces.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a robust and well-designed configuration system using .env files, python-dotenv, and custom helper functions. The integration into existing scripts is consistent, allowing for flexible configuration precedence (CLI args > .env > code defaults). The addition of a validate_config.py script and comprehensive unit tests significantly improves the maintainability and reliability of the configuration setup. The handling of empty strings in .env as "unset" and the bidirectional fallback for N_SPLITS/NUM_SPLITS are thoughtful design choices.

gemini-code-assist · 2026-01-26T05:27:57Z

+FACTOR=1.0
+ORIGINAL_MAX_POSITION_EMBEDDINGS=131072
+EXPECTED_RUNS=1
+REASONING_EFFORT=


The REASONING_EFFORT variable is left empty. While env_str correctly interprets this as None, it would be helpful for users if the comment provided more context on its expected values or format, or explicitly stated that it's optional and can be left empty.

gemini-code-assist · 2026-01-26T05:27:57Z

+# ------------------------------
+EVAL_TYPE=code
+NUM_WORKERS=4
+MAX_ITEMS=


The MAX_ITEMS variable is left empty. Similar to REASONING_EFFORT, providing an example value or a note that it's optional would improve clarity for users.

gemini-code-assist · 2026-01-26T05:27:57Z

+    return bool(load_dotenv(dotenv_path=dotenv_path, override=override))
+
+
+def _raw_env(name: str, environ: Optional[Mapping[str, str]] = None) -> Optional[str]:


The _raw_env function is a core helper for all env_* functions, handling the fetching, stripping, and None conversion of environment variables. Adding a docstring to explain its purpose and behavior would improve code clarity and maintainability.

Suggested change

def _raw_env(name: str, environ: Optional[Mapping[str, str]] = None) -> Optional[str]:

def _raw_env(name: str, environ: Optional[Mapping[str, str]] = None) -> Optional[str]:

"""Fetches an environment variable, strips whitespace, and treats empty strings as None."""

gemini-code-assist · 2026-01-26T05:27:58Z

+from env_config import load_env
+
+
+def _is_probably_path(value: str) -> bool:


The _is_probably_path function uses a heuristic to determine if a string is likely a filesystem path. While practical, it's a heuristic. Adding a docstring to briefly explain the heuristic's logic and its potential limitations (e.g., "This is a heuristic and might not cover all cases of remote model IDs that look like paths, or vice-versa") would be beneficial for clarity and to understand potential edge cases in validation.

Suggested change

def _is_probably_path(value: str) -> bool:

def _is_probably_path(value: str) -> bool:

"""Heuristic to determine if a string is likely a filesystem path rather than a remote model ID."""

gemini-code-assist · 2026-01-26T05:27:58Z

+            if db_uri.startswith("sqlite:////"):
+                # urlparse keeps the extra leading slash in path (e.g. '//var/db.sqlite3');
+                # if we feed that into sqlite3's file: URI it becomes 'file://var/..' (authority='var').
+                while sqlite_path.startswith("//"):
+                    sqlite_path = sqlite_path[1:]
+            else:
+                # sqlite:///relative.db -> parsed.path='/relative.db' (strip to make it relative)
+                sqlite_path = sqlite_path.lstrip("/")


The parsing logic for SQLite paths from DB_URI is quite intricate due to how urlparse handles different numbers of slashes for absolute vs. relative paths in file:-like URIs. Adding a detailed comment explaining why sqlite_path needs to be manipulated in this specific way (e.g., how urlparse treats sqlite:////abs/path vs sqlite:///rel/path) would greatly improve the clarity and maintainability of this section.

Suggested change

if db_uri.startswith("sqlite:////"):

# urlparse keeps the extra leading slash in path (e.g. '//var/db.sqlite3');

# if we feed that into sqlite3's file: URI it becomes 'file://var/..' (authority='var').

while sqlite_path.startswith("//"):

sqlite_path = sqlite_path[1:]

else:

# sqlite:///relative.db -> parsed.path='/relative.db' (strip to make it relative)

sqlite_path = sqlite_path.lstrip("/")

if db_uri.startswith("sqlite:////"):

# urlparse keeps the extra leading slash in path (e.g. '//var/db.sqlite3');

# if we feed that into sqlite3's file: URI it becomes 'file://var/..' (authority='var').

# We need to strip leading slashes to get a correct absolute path for os.path.exists.

while sqlite_path.startswith("//"):

sqlite_path = sqlite_path[1:]

else:

# sqlite:///relative.db -> parsed.path='/relative.db' (strip to make it relative)

# We need to strip the single leading slash to get a correct relative path.

sqlite_path = sqlite_path.lstrip("/")

…I#10) - Add .env.example template with all configurable environment variables - Add env_config.py module with typed env var helpers (load_env, env_str, env_int, env_float, env_bool, env_present) - Add validate_config.py script for configuration validation - Integrate env config into 9 Python scripts (infer_split_merge.py, infer_self_play.py, etc.) - Update README.md with Configuration section - Add .env to .gitignore - Add comprehensive unit tests Precedence: CLI args > .env > code defaults

pi-dal · 2026-01-31T06:36:04Z

Rebased onto latest main and resolved conflicts. Added namespaced env var support to avoid collisions, while keeping existing unprefixed vars as fallback for compatibility. PR is now mergeable.

cafe3310 · 2026-01-31T06:55:22Z

Looks good, thank you!

pi-dal mentioned this pull request Jan 26, 2026

Add . env Configuration Support and Validation #10

Closed

5 tasks

gemini-code-assist Bot reviewed Jan 26, 2026

View reviewed changes

zhanghuidinah mentioned this pull request Jan 30, 2026

Developer Activities: Call for Participation! oceanbase/seekdb#123

Closed

3 tasks

pi-dal force-pushed the issue/10-env-config branch 2 times, most recently from 9e8e5b7 to 9f44c67 Compare January 31, 2026 06:32

cafe3310 merged commit 22cf6c0 into inclusionAI:main Jan 31, 2026

hnwyllmm mentioned this pull request Mar 12, 2026

Developer Activities: Call for Participation! oceanbase/seekdb#252

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add .env configuration support and validation#14

feat: add .env configuration support and validation#14
cafe3310 merged 1 commit into
inclusionAI:mainfrom
pi-dal:issue/10-env-config

pi-dal commented Jan 26, 2026

Uh oh!

gemini-code-assist Bot commented Jan 26, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jan 26, 2026

Uh oh!

gemini-code-assist Bot Jan 26, 2026

Uh oh!

gemini-code-assist Bot Jan 26, 2026

Uh oh!

gemini-code-assist Bot Jan 26, 2026

Uh oh!

gemini-code-assist Bot Jan 26, 2026

Uh oh!

pi-dal commented Jan 31, 2026 •

edited

Loading

Uh oh!

cafe3310 commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return bool(load_dotenv(dotenv_path=dotenv_path, override=override))


		def _raw_env(name: str, environ: Optional[Mapping[str, str]] = None) -> Optional[str]:

	def _raw_env(name: str, environ: Optional[Mapping[str, str]] = None) -> Optional[str]:
	def _raw_env(name: str, environ: Optional[Mapping[str, str]] = None) -> Optional[str]:
	"""Fetches an environment variable, strips whitespace, and treats empty strings as None."""

		from env_config import load_env


		def _is_probably_path(value: str) -> bool:

	def _is_probably_path(value: str) -> bool:
	def _is_probably_path(value: str) -> bool:
	"""Heuristic to determine if a string is likely a filesystem path rather than a remote model ID."""

Conversation

pi-dal commented Jan 26, 2026

Summary

Changes

New Files

Modified Files

Usage

Design Decisions

Uh oh!

gemini-code-assist Bot commented Jan 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

pi-dal commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cafe3310 commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pi-dal commented Jan 31, 2026 •

edited

Loading