-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Bug Description
Summary
OpenViking's configuration file parser accepts unknown fields without warning or validation error. This leads to subtle misconfigurations where users accidentally misspell field names or use deprecated options, resulting in silent failures that are difficult to debug.
Problem
When a user makes a typo in their config file (e.g., "log_level" instead of "level" under log section), the configuration silently falls back to defaults without any warning. This creates confusing situations where users believe they have configured something that is actually being ignored.
Root Cause Analysis
Code Location: openviking/config/ module
The config loading code likely uses a permissive JSON parsing approach without schema validation:
# Likely current implementation (simplified)
with open(config_path) as f:
config = json.load(f)
# Silently accepts any fields, no validation
return configA proper implementation should:
- Define a JSON schema for the configuration
- Validate the loaded config against the schema
- Warn about unknown fields
- Error on required fields that are missing
Steps to Reproduce
- Create a config file with a typo:
{
"storage": {
"workspace": "/tmp/viking"
},
"log": {
"level": "DEBUG",
"output": "stdout"
},
"embedding": {
"dense": {
"api_base": "https://api.openai.com/v1",
"api_key": "sk-xxx",
"provider": "openai",
"dimention": 1024,
"model": "text-embedding-3-large"
}
}
}-
Notice
dimentionis misspelled (should bedimension) -
Start openviking-server — it starts successfully
-
Check logs — config is loaded but
dimentionis ignored, dimension falls back to some default -
All embedding requests fail or return unexpected results because the embedding dimension is wrong
Expected Behavior
OpenViking should either:
- (A) Reject the config file with an error: "Unknown config field 'dimention' in embedding.dense"
- (B) At minimum, log a warning: "Ignoring unknown config field 'dimention' — did you mean 'dimension'?"
Actual Behavior
Config silently ignores the misspelled field, leading to silent misconfiguration.
Environment
- OS: Ubuntu 22.04
- Python: 3.12
- OpenViking: 0.2.9
Impact
- Medium severity — causes confusing debugging sessions
- Affects all config sections (storage, log, embedding, vlm)
- Particularly dangerous for:
dimension/dimentiontypo in embedding (causes embedding/retrieval failures)max_concurrent/max_concurenttypo (causes resource limit issues)api_base/apiBaseinconsistency (causes connection failures)
Proposed Fix
Implement config validation using JSON Schema or similar:
import jsonschema
CONFIG_SCHEMA = {
"type": "object",
"properties": {
"storage": {
"type": "object",
"properties": {
"workspace": {"type": "string"},
},
"required": ["workspace"],
"additionalProperties": False,
},
"log": {
"type": "object",
"properties": {
"level": {"enum": ["DEBUG", "INFO", "WARNING", "ERROR"]},
"output": {"enum": ["stdout", "file"]},
},
"additionalProperties": False,
},
# ... etc
},
"additionalProperties": False, # Reject unknown top-level fields
}
def load_config(path: str) -> dict:
with open(path) as f:
config = json.load(f)
try:
jsonschema.validate(config, CONFIG_SCHEMA)
except jsonschema.ValidationError as e:
# Provide helpful error message
raise ConfigValidationError(f"Invalid config: {e.message}")
return configAlternative Approach
If full schema validation is too restrictive for forward compatibility, at minimum log warnings for unknown fields:
KNOWN_FIELDS = {"storage", "log", "embedding", "vlm", ...}
def warn_unknown_fields(config: dict, path: str = ""):
for key in config:
if key not in KNOWN_FIELDS:
logger.warning(f"Unknown config field '{key}' at '{path}' — ignoring. Did you mean one of: {suggest_similar(key, KNOWN_FIELDS)}?")
if isinstance(config[key], dict):
warn_unknown_fields(config[key], f"{path}.{key}")Labels
bug, config, validation
Metadata
Metadata
Assignees
Labels
Type
Projects
Status