Request
Currently, the start_clp.py script uses inconsistent patterns for passing configuration and
credentials to containers, leading to complex merging logic and security concerns. The main issues
are:
- Ad-hoc configuration merging: Many components receive a temporary YAML file that is a merged
result of parts of clp-config.yml and credentials.yml. This requires logic within the
start_clp.py script to perform the merge and generates numerous temporary files.
- Non-uniform configuration subsets & credential handling: The approach differs across components.
clp-db-creator receives only a specific section (e.g., a db-config.yml with all fields at
the top-level from just the database section of clp-config.yml) rather than a standard
full configuration object.
- Credential passing is inconsistent: some components as mentioned above get merged credentials,
while clp-webui already use environment variables.
We shall standardize configuration and credential delivery to improve consistency and security by:
- Separating credentials from general configuration.
- Instead, passing all credentials via environment variables to avoid leaking them in temporary files.
- Using a single generated configuration file for all non-credential configuration data. (instead of generating config files per component)
Possible implementation
-
Modify CLPConfig class to include a generated_config_file_path field to store the path to the generated config file.
-
Update start_clp.py script to:
- Generate a single
clp-config-generated.yml file containing all non-credential configuration
- Pass database credentials via environment variables (
CLP_DB_USER, CLP_DB_PASS) instead of embedding them in temporary files
-
Standardize configuration passing for Python components:
- Modify the
clp-db-table-creator components to:
- Read their configuration from the generated
clp-config-generated.yml file instead of separate db-config files
- Load database credentials from environment variables
- Modify the following Python components in
start_clp to:
- Load their non-credential configuration from
clp-config-generated.yml
- Read credentials from environment variables:
clp-db-table-creator (composed of components/clp-py-utils/clp_py_utils/create-db-tables.py, initialize-clp-metadata-db.py, and initialize-orchestration-db.py)
clp-compression-scheduler (components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py)
clp-query-scheduler (components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py)
clp-compression-worker (components/job-orchestration/job_orchestration/executor/compress/compression_task.py)
clp-query-worker (components/job-orchestration/job_orchestration/executor/query/fs_search_task.py and extract_stream_task.py)
clp-reducer (components/job-orchestration/job_orchestration/reducer/reducer.py)
-
Update all other affected Python components and their wrappers to read credentials from environment variables rather than from configuration files, which includes:
compress.sh
decompress.sh
search.sh
admin-tools/archive-manager.sh
Request
Currently, the
start_clp.pyscript uses inconsistent patterns for passing configuration andcredentials to containers, leading to complex merging logic and security concerns. The main issues
are:
result of parts of
clp-config.ymlandcredentials.yml. This requires logic within thestart_clp.pyscript to perform the merge and generates numerous temporary files.clp-db-creatorreceives only a specific section (e.g., adb-config.ymlwith all fields atthe top-level from just the
databasesection ofclp-config.yml) rather than a standardfull configuration object.
while
clp-webuialready use environment variables.We shall standardize configuration and credential delivery to improve consistency and security by:
Possible implementation
Modify
CLPConfigclass to include agenerated_config_file_pathfield to store the path to the generated config file.Update
start_clp.pyscript to:clp-config-generated.ymlfile containing all non-credential configurationCLP_DB_USER,CLP_DB_PASS) instead of embedding them in temporary filesStandardize configuration passing for Python components:
clp-db-table-creatorcomponents to:clp-config-generated.ymlfile instead of separate db-config filesstart_clpto:clp-config-generated.ymlclp-db-table-creator(composed ofcomponents/clp-py-utils/clp_py_utils/create-db-tables.py,initialize-clp-metadata-db.py, andinitialize-orchestration-db.py)clp-compression-scheduler(components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py)clp-query-scheduler(components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py)clp-compression-worker(components/job-orchestration/job_orchestration/executor/compress/compression_task.py)clp-query-worker(components/job-orchestration/job_orchestration/executor/query/fs_search_task.pyandextract_stream_task.py)clp-reducer(components/job-orchestration/job_orchestration/reducer/reducer.py)Update all other affected Python components and their wrappers to read credentials from environment variables rather than from configuration files, which includes:
compress.shdecompress.shsearch.shadmin-tools/archive-manager.sh