Skip to content

Standardize credentials and configuration passing for Python components #1149

@junhaoliao

Description

@junhaoliao

Request

Currently, the start_clp.py script uses inconsistent patterns for passing configuration and
credentials to containers, leading to complex merging logic and security concerns. The main issues
are:

  1. Ad-hoc configuration merging: Many components receive a temporary YAML file that is a merged
    result of parts of clp-config.yml and credentials.yml. This requires logic within the
    start_clp.py script to perform the merge and generates numerous temporary files.
  2. Non-uniform configuration subsets & credential handling: The approach differs across components.
    • clp-db-creator receives only a specific section (e.g., a db-config.yml with all fields at
      the top-level from just the database section of clp-config.yml) rather than a standard
      full configuration object.
    • Credential passing is inconsistent: some components as mentioned above get merged credentials,
      while clp-webui already use environment variables.

We shall standardize configuration and credential delivery to improve consistency and security by:

  • Separating credentials from general configuration.
    • Instead, passing all credentials via environment variables to avoid leaking them in temporary files.
  • Using a single generated configuration file for all non-credential configuration data. (instead of generating config files per component)

Possible implementation

  1. Modify CLPConfig class to include a generated_config_file_path field to store the path to the generated config file.

  2. Update start_clp.py script to:

    • Generate a single clp-config-generated.yml file containing all non-credential configuration
    • Pass database credentials via environment variables (CLP_DB_USER, CLP_DB_PASS) instead of embedding them in temporary files
  3. Standardize configuration passing for Python components:

    1. Modify the clp-db-table-creator components to:
      • Read their configuration from the generated clp-config-generated.yml file instead of separate db-config files
      • Load database credentials from environment variables
    2. Modify the following Python components in start_clp to:
      • Load their non-credential configuration from clp-config-generated.yml
      • Read credentials from environment variables:
    • clp-db-table-creator (composed of components/clp-py-utils/clp_py_utils/create-db-tables.py, initialize-clp-metadata-db.py, and initialize-orchestration-db.py)
    • clp-compression-scheduler (components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py)
    • clp-query-scheduler (components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py)
    • clp-compression-worker (components/job-orchestration/job_orchestration/executor/compress/compression_task.py)
    • clp-query-worker (components/job-orchestration/job_orchestration/executor/query/fs_search_task.py and extract_stream_task.py)
    • clp-reducer (components/job-orchestration/job_orchestration/reducer/reducer.py)
  4. Update all other affected Python components and their wrappers to read credentials from environment variables rather than from configuration files, which includes:

    • compress.sh
    • decompress.sh
    • search.sh
    • admin-tools/archive-manager.sh

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions