Request
The current CLP package flow doesn't handle the default dataset properly.
- In the compression end, we set the dataset to
"default" if not set:
|
dataset = CLP_DEFAULT_DATASET_NAME if dataset is None else dataset |
|
dataset = CLP_DEFAULT_DATASET_NAME if dataset is None else dataset |
- In the native compression script, it can take an optional dataset without further checking:
|
args_parser.add_argument( |
- In the compression job config, the dataset is also nullable:
|
dataset: str | None = None |
|
dataset: str | None = None |
- In the compression job executor, the dataset is nullable but it is not checked:
|
archive_output_dir = archive_output_dir / dataset |
The flow works well if everything's submitted from the compression script end, as it ensures the dataset wil always be default. However, this doesn't work if we submit compression jobs directly to CLP DB.
We should make the dataset handling consistent with the config definition.
Possible implementation
- Allow the dataset to be nullable.
- Don't set it to
default in the compression script.
- Handle the dataset in the compression job.
Request
The current CLP package flow doesn't handle the default dataset properly.
"default"if not set:clp/components/clp-package-utils/clp_package_utils/scripts/compress.py
Line 207 in bfc474f
clp/components/clp-package-utils/clp_package_utils/scripts/compress_from_s3.py
Line 262 in bfc474f
clp/components/clp-package-utils/clp_package_utils/scripts/native/compress.py
Line 314 in bfc474f
clp/components/job-orchestration/job_orchestration/scheduler/job_config.py
Line 25 in bfc474f
clp/components/job-orchestration/job_orchestration/scheduler/job_config.py
Line 35 in bfc474f
clp/components/job-orchestration/job_orchestration/executor/compress/compression_task.py
Line 399 in bfc474f
The flow works well if everything's submitted from the compression script end, as it ensures the dataset wil always be
default. However, this doesn't work if we submit compression jobs directly to CLP DB.We should make the dataset handling consistent with the config definition.
Possible implementation
defaultin the compression script.