Skip to content

[Bug]: Config file validation silently ignores unknown fields, causing subtle misconfigurations #855

@Financier-Nuri

Description

@Financier-Nuri

Bug Description

Summary

OpenViking's configuration file parser accepts unknown fields without warning or validation error. This leads to subtle misconfigurations where users accidentally misspell field names or use deprecated options, resulting in silent failures that are difficult to debug.

Problem

When a user makes a typo in their config file (e.g., "log_level" instead of "level" under log section), the configuration silently falls back to defaults without any warning. This creates confusing situations where users believe they have configured something that is actually being ignored.

Root Cause Analysis

Code Location: openviking/config/ module

The config loading code likely uses a permissive JSON parsing approach without schema validation:

# Likely current implementation (simplified)
with open(config_path) as f:
    config = json.load(f)

# Silently accepts any fields, no validation
return config

A proper implementation should:

  1. Define a JSON schema for the configuration
  2. Validate the loaded config against the schema
  3. Warn about unknown fields
  4. Error on required fields that are missing

Steps to Reproduce

  1. Create a config file with a typo:
{
  "storage": {
    "workspace": "/tmp/viking"
  },
  "log": {
    "level": "DEBUG",
    "output": "stdout"
  },
  "embedding": {
    "dense": {
      "api_base": "https://api.openai.com/v1",
      "api_key": "sk-xxx",
      "provider": "openai",
      "dimention": 1024,
      "model": "text-embedding-3-large"
    }
  }
}
  1. Notice dimention is misspelled (should be dimension)

  2. Start openviking-server — it starts successfully

  3. Check logs — config is loaded but dimention is ignored, dimension falls back to some default

  4. All embedding requests fail or return unexpected results because the embedding dimension is wrong

Expected Behavior

OpenViking should either:

  • (A) Reject the config file with an error: "Unknown config field 'dimention' in embedding.dense"
  • (B) At minimum, log a warning: "Ignoring unknown config field 'dimention' — did you mean 'dimension'?"

Actual Behavior

Config silently ignores the misspelled field, leading to silent misconfiguration.

Environment

  • OS: Ubuntu 22.04
  • Python: 3.12
  • OpenViking: 0.2.9

Impact

  • Medium severity — causes confusing debugging sessions
  • Affects all config sections (storage, log, embedding, vlm)
  • Particularly dangerous for:
    • dimension / dimention typo in embedding (causes embedding/retrieval failures)
    • max_concurrent / max_concurent typo (causes resource limit issues)
    • api_base / apiBase inconsistency (causes connection failures)

Proposed Fix

Implement config validation using JSON Schema or similar:

import jsonschema

CONFIG_SCHEMA = {
    "type": "object",
    "properties": {
        "storage": {
            "type": "object",
            "properties": {
                "workspace": {"type": "string"},
            },
            "required": ["workspace"],
            "additionalProperties": False,
        },
        "log": {
            "type": "object",
            "properties": {
                "level": {"enum": ["DEBUG", "INFO", "WARNING", "ERROR"]},
                "output": {"enum": ["stdout", "file"]},
            },
            "additionalProperties": False,
        },
        # ... etc
    },
    "additionalProperties": False,  # Reject unknown top-level fields
}

def load_config(path: str) -> dict:
    with open(path) as f:
        config = json.load(f)
    
    try:
        jsonschema.validate(config, CONFIG_SCHEMA)
    except jsonschema.ValidationError as e:
        # Provide helpful error message
        raise ConfigValidationError(f"Invalid config: {e.message}")
    
    return config

Alternative Approach

If full schema validation is too restrictive for forward compatibility, at minimum log warnings for unknown fields:

KNOWN_FIELDS = {"storage", "log", "embedding", "vlm", ...}

def warn_unknown_fields(config: dict, path: str = ""):
    for key in config:
        if key not in KNOWN_FIELDS:
            logger.warning(f"Unknown config field '{key}' at '{path}' — ignoring. Did you mean one of: {suggest_similar(key, KNOWN_FIELDS)}?")
        if isinstance(config[key], dict):
            warn_unknown_fields(config[key], f"{path}.{key}")

Labels

bug, config, validation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions