refactor(clp-package): Upgrade Pydantic to V2; Update existing code to match V2 syntax; Fix various typing errors.#1338
Conversation
WalkthroughMigrates codebase Pydantic v1→v2: replaces parse_obj/parse_raw/copy/dict/json → model_validate/model_validate_json/model_copy/model_dump/model_dump_json; swaps validator/root_validator → field_validator/model_validator; updates pydantic dependency to ^2.11.9; plus small typing, enum wrapping, and minor cleanup changes across components. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller as Component
participant Reader as read_yaml_config_file()
participant PydModel as PydanticModel
Note over Caller,Reader: Load raw config (YAML/dict)
Caller->>Reader: read config
Reader->>PydModel: model_validate(raw_config)
alt validation succeeds
PydModel-->>Caller: validated model instance
else validation fails
PydModel-->>Caller: raises ValidationError
end
Note over Caller,PydModel: Serialization / copy API change
Caller->>PydModel: previously .dict() / .copy(deep=True)
Caller->>PydModel: now .model_dump() / .model_copy(deep=True)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related issues
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Ruff (0.13.1)components/clp-py-utils/clp_py_utils/clp_config.py�[1;31mruff failed�[0m Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🧪 Early access (Sonnet 4.5): enabledWe are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience. Note:
Comment |
… to `@model_validator(mode="after")` and replace `root_validator` with `Field` for sweep interval validation.
…antic v2) to QueryScheduler fields.
3620bd1 to
838d79c
Compare
…compatibility with Pydantic v2.
…tibility with Pydantic v2.
…tibility with Pydantic v2.
…compatibility with Pydantic v2.
…ine casting for config values.
…ist[str] instead of [str].
… to string before appending to command list
…zations in s3_utils.py
…ement of s3_utils.py
…o `model_validate_json()` for compatibility with Pydantic v2.
… validators for clarity and consistency with Pydantic v2 conventions.
…`value` for consistency with Pydantic v2 conventions.
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (13)
components/job-orchestration/job_orchestration/garbage_collector/garbage_collector.py (1)
45-52: Add the config path to the validation failure logRight now the log only prints the exception message, so operators can’t easily tell which config file blew up when several are in play. Including the file path will make diagnosis much quicker.
- logger.error(err) + logger.error("Failed to load config from %s: %s", config_path, err)components/clp-py-utils/clp_py_utils/s3_utils.py (1)
77-98: Fail fast when credentials are missingNow that
aws_credentialsis typed as optional, we should check forNoneimmediately. Otherwise a misconfiguredAwsAuthenticationinstance (for example, one constructed programmatically and bypassing the Pydantic validator) will raise a confusingAttributeErrorwhen we accessaws_credentials.access_key_id. Raising a clearValueErrorkeeps the error handling consistent with the other branches.elif AwsAuthType.credentials == auth.type: aws_credentials = auth.credentials + if aws_credentials is None: + raise ValueError("credentials must be set when type is AwsAuthType.credentials")components/job-orchestration/job_orchestration/scheduler/scheduler_data.py (1)
34-40: Fix enum join bug in validator and tighten the signature.The error path will raise a TypeError due to joining Enum objects. Also prefer value as the parameter name.
Apply:
- def valid_status(cls, field): - supported_status = [CompressionTaskStatus.SUCCEEDED, CompressionTaskStatus.FAILED] - if field not in supported_status: - raise ValueError(f'must be one of the following {"|".join(supported_status)}') - return field + def valid_status(cls, value): + supported_status = [CompressionTaskStatus.SUCCEEDED, CompressionTaskStatus.FAILED] + if value not in supported_status: + allowed = "|".join(s.name for s in supported_status) + raise ValueError(f"must be one of: {allowed}") + return valueOptional follow-up (cleaner type-level restriction; outside selected lines):
# Replace the field type to remove the need for manual validation: from typing import Literal status: Literal[CompressionTaskStatus.SUCCEEDED, CompressionTaskStatus.FAILED]components/job-orchestration/job_orchestration/executor/compress/compression_task.py (5)
49-60: Parameterize UPDATE to prevent SQL injection and quoting bugsDirectly interpolating values into SQL is unsafe and can break on quotes; use placeholders so the driver escapes/serializes values (including datetimes) correctly.
Apply this diff:
def update_compression_task_metadata(db_cursor, task_id, kv): if not len(kv): raise ValueError("Must specify at least one field to update") - field_set_expressions = [f'{k}="{v}"' for k, v in kv.items()] - query = f""" - UPDATE {COMPRESSION_TASKS_TABLE_NAME} - SET {", ".join(field_set_expressions)} - WHERE id={task_id} - """ - db_cursor.execute(query) + field_set_expressions = [f"{k}=%s" for k in kv.keys()] + query = f""" + UPDATE {COMPRESSION_TASKS_TABLE_NAME} + SET {", ".join(field_set_expressions)} + WHERE id=%s + """ + params = [*kv.values(), task_id] + db_cursor.execute(query, params)
62-73: Parameterize incremental UPDATE to prevent SQL injectionSame issue here; build k=k+%s expressions and pass values separately.
Apply this diff:
def increment_compression_job_metadata(db_cursor, job_id, kv): if not len(kv): raise ValueError("Must specify at least one field to update") - field_set_expressions = [f"{k}={k}+{v}" for k, v in kv.items()] - query = f""" - UPDATE {COMPRESSION_JOBS_TABLE_NAME} - SET {", ".join(field_set_expressions)} - WHERE id={job_id} - """ - db_cursor.execute(query) + field_set_expressions = [f"{k}={k}+%s" for k in kv.keys()] + query = f""" + UPDATE {COMPRESSION_JOBS_TABLE_NAME} + SET {", ".join(field_set_expressions)} + WHERE id=%s + """ + params = [*kv.values(), job_id] + db_cursor.execute(query, params)
550-561: Good Pydantic v2 migration; include exception detail in error outputmodel_validate is correct. Consider returning the underlying exception message to aid debugging.
[Suggest diff below keeps the same logging while surfacing the cause in the task result.]
- except Exception as ex: - error_msg = "Failed to load worker config" + except Exception as ex: + error_msg = f"Failed to load worker config: {ex}" logger.exception(error_msg) return CompressionTaskResult( task_id=task_id, status=CompressionTaskStatus.FAILED, duration=0, error_message=error_msg, )
396-405: Ensure stderr log file is closed on exceptionsOpen file handle isn’t protected by a context manager; a mid-function exception can leak the descriptor.
- stderr_log_file = open(stderr_log_path, "w") + with open(stderr_log_path, "w") as stderr_log_file: + # Start compression + logger.debug("Compressing...") + compression_successful = False + proc = subprocess.Popen( + compression_cmd, stdout=subprocess.PIPE, stderr=stderr_log_file, env=compression_env + ) + # ... move the rest of the function contents that use stderr_log_file inside this with-block ...
502-504: Guard unlink to avoid spurious FileNotFoundErrorOnly unlink the temp file if it exists.
- if logs_list_path: - logs_list_path.unlink() + if logs_list_path.exists(): + logs_list_path.unlink()components/clp-package-utils/clp_package_utils/scripts/native/compress.py (1)
254-259: Guard against unexpected extras when validating OutputConfig.If
OutputConfigdoesn’t ignore extras, this may raise. Safer to exclude fields not inOutputConfig.- clp_output_config = OutputConfig.model_validate(clp_config.archive_output.model_dump()) + clp_output_config = OutputConfig.model_validate( + clp_config.archive_output.model_dump(exclude={"storage", "retention_period"}) + )If
OutputConfigis configured withextra="ignore", this is optional. Please confirm.components/clp-package-utils/clp_package_utils/general.py (1)
315-321: Deep‑copystream_outputfor consistency and safety.Avoids accidental mutation of the root config by workers.
- worker_config.stream_output = clp_config.stream_output + worker_config.stream_output = clp_config.stream_output.model_copy(deep=True)components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (3)
514-526: Useexclude_none=Trueto keep task payloads minimal.Shrinks serialized job config sent to workers.
- job_config = job.get_config().model_dump() + job_config = job.get_config().model_dump(exclude_none=True)
999-1012: Bug: Comparing task status to job status enum.This causes successful extraction tasks to be treated as failures.
- if not QueryJobStatus.SUCCEEDED == task_result.status: + if task_result.status != QueryTaskStatus.SUCCEEDED:
1162-1169: Handle empty config files before validation.
read_yaml_config_filecan return None; guard to avoid confusing errors.- clp_config = CLPConfig.model_validate(read_yaml_config_file(config_path)) + raw = read_yaml_config_file(config_path) + clp_config = CLPConfig() if raw is None else CLPConfig.model_validate(raw)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (3)
components/clp-package-utils/poetry.lockis excluded by!**/*.lockcomponents/clp-py-utils/poetry.lockis excluded by!**/*.lockcomponents/job-orchestration/poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (25)
components/clp-package-utils/clp_package_utils/general.py(3 hunks)components/clp-package-utils/clp_package_utils/scripts/archive_manager.py(1 hunks)components/clp-package-utils/clp_package_utils/scripts/decompress.py(1 hunks)components/clp-package-utils/clp_package_utils/scripts/native/compress.py(3 hunks)components/clp-package-utils/clp_package_utils/scripts/native/utils.py(1 hunks)components/clp-package-utils/clp_package_utils/scripts/start_clp.py(2 hunks)components/clp-package-utils/pyproject.toml(1 hunks)components/clp-py-utils/clp_py_utils/clp_config.py(25 hunks)components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py(1 hunks)components/clp-py-utils/clp_py_utils/initialize-orchestration-db.py(1 hunks)components/clp-py-utils/clp_py_utils/s3_utils.py(4 hunks)components/clp-py-utils/pyproject.toml(1 hunks)components/job-orchestration/job_orchestration/executor/compress/compression_task.py(3 hunks)components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py(4 hunks)components/job-orchestration/job_orchestration/executor/query/fs_search_task.py(3 hunks)components/job-orchestration/job_orchestration/executor/query/utils.py(1 hunks)components/job-orchestration/job_orchestration/executor/utils.py(1 hunks)components/job-orchestration/job_orchestration/garbage_collector/garbage_collector.py(2 hunks)components/job-orchestration/job_orchestration/reducer/reducer.py(1 hunks)components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py(3 hunks)components/job-orchestration/job_orchestration/scheduler/compress/partition.py(2 hunks)components/job-orchestration/job_orchestration/scheduler/job_config.py(2 hunks)components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py(8 hunks)components/job-orchestration/job_orchestration/scheduler/scheduler_data.py(3 hunks)components/job-orchestration/pyproject.toml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-01-16T16:58:43.190Z
Learnt from: haiqi96
PR: y-scope/clp#651
File: components/clp-package-utils/clp_package_utils/scripts/compress.py:0-0
Timestamp: 2025-01-16T16:58:43.190Z
Learning: In the clp-package compression flow, path validation and error handling is performed at the scheduler level rather than in the compress.py script to maintain simplicity and avoid code duplication.
Applied to files:
components/clp-package-utils/clp_package_utils/scripts/decompress.pycomponents/job-orchestration/job_orchestration/executor/compress/compression_task.pycomponents/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py
🧬 Code graph analysis (16)
components/job-orchestration/job_orchestration/reducer/reducer.py (2)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
CLPConfig(848-1038)components/clp-py-utils/clp_py_utils/core.py (1)
read_yaml_config_file(56-62)
components/job-orchestration/job_orchestration/executor/utils.py (2)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
WorkerConfig(1041-1058)components/clp-py-utils/clp_py_utils/core.py (1)
read_yaml_config_file(56-62)
components/clp-package-utils/clp_package_utils/scripts/native/compress.py (1)
components/job-orchestration/job_orchestration/scheduler/job_config.py (1)
OutputConfig(37-43)
components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py (2)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
CLPConfig(848-1038)components/clp-py-utils/clp_py_utils/core.py (1)
read_yaml_config_file(56-62)
components/clp-py-utils/clp_py_utils/s3_utils.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (4)
S3Credentials(506-523)S3Storage(608-639)FsStorage(588-605)AwsAuthentication(526-556)
components/job-orchestration/job_orchestration/executor/compress/compression_task.py (3)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
WorkerConfig(1041-1058)Database(177-281)components/job-orchestration/job_orchestration/scheduler/job_config.py (2)
ClpIoConfig(46-48)PathsToCompress(16-20)components/clp-py-utils/clp_py_utils/sql_adapter.py (1)
SQL_Adapter(60-131)
components/job-orchestration/job_orchestration/scheduler/scheduler_data.py (3)
components/job-orchestration/job_orchestration/scheduler/job_config.py (2)
SearchJobConfig(75-93)QueryJobConfig(59-60)components/job-orchestration/job_orchestration/scheduler/query/reducer_handler.py (1)
ReducerHandlerMessageQueues(26-45)components/job-orchestration/job_orchestration/scheduler/constants.py (2)
QueryJobType(77-80)QueryTaskStatus(64-74)
components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
StorageType(114-116)StorageEngine(103-105)
components/clp-package-utils/clp_package_utils/general.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
CLPConfig(848-1038)
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (4)
components/job-orchestration/job_orchestration/scheduler/job_config.py (4)
ExtractIrJobConfig(63-67)ExtractJsonJobConfig(70-72)QueryJobConfig(59-60)SearchJobConfig(75-93)components/job-orchestration/job_orchestration/scheduler/scheduler_data.py (5)
get_config(59-59)get_config(68-69)get_config(78-79)get_config(96-97)QueryTaskResult(100-104)components/clp-py-utils/clp_py_utils/clp_config.py (1)
CLPConfig(848-1038)components/clp-py-utils/clp_py_utils/core.py (1)
read_yaml_config_file(56-62)
components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (4)
components/job-orchestration/job_orchestration/scheduler/job_config.py (1)
ClpIoConfig(46-48)components/job-orchestration/job_orchestration/scheduler/scheduler_data.py (1)
CompressionTaskResult(28-40)components/clp-py-utils/clp_py_utils/clp_config.py (1)
CLPConfig(848-1038)components/clp-py-utils/clp_py_utils/core.py (1)
read_yaml_config_file(56-62)
components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (3)
components/clp-py-utils/clp_py_utils/sql_adapter.py (1)
SQL_Adapter(60-131)components/clp-py-utils/clp_py_utils/clp_config.py (1)
Database(177-281)components/job-orchestration/job_orchestration/scheduler/job_config.py (1)
SearchJobConfig(75-93)
components/clp-py-utils/clp_py_utils/initialize-orchestration-db.py (2)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
CLPConfig(848-1038)components/clp-py-utils/clp_py_utils/core.py (1)
read_yaml_config_file(56-62)
components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py (3)
components/job-orchestration/job_orchestration/scheduler/job_config.py (2)
ExtractIrJobConfig(63-67)ExtractJsonJobConfig(70-72)components/clp-py-utils/clp_py_utils/sql_adapter.py (1)
SQL_Adapter(60-131)components/clp-py-utils/clp_py_utils/clp_config.py (1)
Database(177-281)
components/job-orchestration/job_orchestration/garbage_collector/garbage_collector.py (2)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
CLPConfig(848-1038)components/clp-py-utils/clp_py_utils/core.py (1)
read_yaml_config_file(56-62)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
components/clp-py-utils/clp_py_utils/clp_logging.py (2)
is_valid_logging_level(32-33)get_valid_logging_level(28-29)components/clp-py-utils/clp_py_utils/core.py (1)
make_config_path_absolute(42-53)
🔇 Additional comments (43)
components/clp-package-utils/clp_package_utils/scripts/decompress.py (1)
152-154: Conversion avoidsshlex.jointype errorsCasting the mounted path to
strkeeps the command list homogeneous and preventsshlex.join(cmd)from blowing up on aPath. Nice catch.components/clp-package-utils/clp_package_utils/scripts/native/utils.py (1)
66-69: Pydantic v2 serialisation update looks goodSwitching to
model_dump()aligns with Pydantic v2 expectations and keeps the msgpack payload unchanged, so no further action needed.components/job-orchestration/job_orchestration/executor/utils.py (1)
20-20: Upgrade tomodel_validatelooks correctMigration from
parse_objtomodel_validatekeeps the loader behaviour intact while aligning with Pydantic v2.components/clp-package-utils/clp_package_utils/scripts/start_clp.py (2)
117-124: Type hint now matches howcmdis usedThe
List[str]annotation reflects the in-place list mutation pattern we rely on. Looks good.
141-158: Updated signature stays consistent with call sitesChanging
cmd_to_runtoList[str]keeps the contract aligned with how we build the docker exec commands. No issues spotted.components/job-orchestration/job_orchestration/executor/query/utils.py (1)
34-38: Migration tomodel_dump()looks correctReplacing
dict()withmodel_dump()keeps the return value as a plaindictwhile aligning with the Pydantic v2 API used elsewhere in the PR. Callers expecting the serialised payload will continue to work.components/job-orchestration/job_orchestration/executor/query/fs_search_task.py (3)
187-188: Accurate switch tomodel_validateforDatabase.The migration away from
parse_objkeeps the validation semantics intact while aligning with the Pydantic v2 API surface—no issues here.
201-202:SearchJobConfigvalidation updated correctly.Adopting
model_validateensures the config payload still round-trips through Pydantic with the expected constraints; looking good.
231-231:model_dumpis the right replacement.Serializing the task results through
model_dump()mirrors the old behaviour and is V2-compliant—no further changes needed.components/job-orchestration/job_orchestration/scheduler/compress/partition.py (2)
37-44: Pydantic v2 JSON serialisation swap looks goodUsing
model_dump_json(exclude_none=True)keeps the payload identical to the previousjson()call while aligning with the v2 API, so the task argument remains a JSON string as expected.
83-94:model_dumpusage maintains the brotli/msgpack payloadSwitching to
model_dump/model_dump_jsonpreserves the dict and JSON structures thatmsgpackand downstream consumers expect, so the compressed payload and the task hand-off remain compatible.components/job-orchestration/job_orchestration/garbage_collector/garbage_collector.py (1)
87-88: LGTM on the task failure loggingDropping the unused exception binding still preserves the traceback via
logger.exception, so this cleanup looks good to me.components/job-orchestration/job_orchestration/executor/query/extract_stream_task.py (4)
57-75: Migration tomodel_validatereads clean.
Switching IR job validation tomodel_validate(...)keeps the semantics intact while aligning this path with the Pydantic v2 API.
91-145: JSON job parsing looks spot on.
AdoptingExtractJsonJobConfig.model_validate(...)ensures we stay consistent with the new validator surface without altering downstream behaviour.
202-203: Database config re-validation checks out.
Rehydrating the DB config withDatabase.model_validate(...)maintains the same guarantees under Pydantic v2 before handing it toSQL_Adapter.
295-295: Result serialisation change LGTM.
model_dump()is the right v2 replacement for the previous.dict()call and preserves the response contract.components/job-orchestration/job_orchestration/scheduler/scheduler_data.py (6)
19-19: Pydantic v2 imports look correct.Using BaseModel, ConfigDict, and field_validator aligns with v2.
32-32: LGTM: Optional default for error_message.This matches Pydantic v2 semantics for truly optional fields.
52-53: LGTM: Optional defaults added.Relaxing these to None avoids required-field friction in v2.
83-85: LGTM: model_config with arbitrary_types_allowed.Necessary for asyncio.Task and custom queues on this model only.
90-91: LGTM: Optional async/task fields defaulting to None.Matches the model_config change and prevents validation errors.
104-104: LGTM: Optional error_log_path.Consistent with v2 optional field handling.
components/job-orchestration/job_orchestration/executor/compress/compression_task.py (2)
566-567: LGTM on Database.model_validate and adapter constructionCorrect v2 migration; types align with SQL_Adapter expectations.
612-613: LGTM on returning model_dump()Correct v2 replacement for dict(); ensures a plain dict for Celery serialization.
components/clp-package-utils/clp_package_utils/general.py (2)
239-241: Good migration to Pydantic v2 copy API.
422-423: LGTM: config loading viamodel_validate.components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (5)
130-138: LGTM:ExtractIrJobConfigmigration tomodel_validate.
176-184: LGTM:ExtractJsonJobConfigmigration tomodel_validate.
666-670: LGTM: Validation to access dataset viaQueryJobConfig.
688-698: LGTM:SearchJobConfigmigration tomodel_validate.
905-919: LGTM:QueryTaskResultmigration tomodel_validate.components/job-orchestration/job_orchestration/reducer/reducer.py (1)
46-49: LGTM:model_validatefor config load with error handling.components/clp-py-utils/clp_py_utils/clp_config.py (1)
613-619: Same issue forstaging_directory.Apply the same fix as for
directory.- def validate_staging_directory(cls, value): - if "" == value: + def validate_staging_directory(cls, value): + if isinstance(value, pathlib.Path) and value.as_posix() == "": raise ValueError("staging_directory cannot be empty") return valuecomponents/clp-package-utils/pyproject.toml (1)
16-16: Pydantic v2 dependency bump looks solid.
The new caret constraint lines up with the rest of the migration and keeps Poetry’s view of the dependency tree consistent.components/job-orchestration/pyproject.toml (1)
21-21: Version bump aligns with the orchestration stack.
Great to see the scheduler package tracking the same 2.11.9 floor so runtime workers and schedulers stay in lock-step.components/clp-py-utils/pyproject.toml (1)
18-18: Dependency update keeps shared utils consistent.
Matching the shared utils’ requirement with the rest of the codebase avoids subtle resolver mismatches.components/clp-package-utils/clp_package_utils/scripts/archive_manager.py (1)
184-190: Explicit enum casting is the right call.
Normalizing the config-derived strings back intoStorageType/StorageEnginekeeps the downstream comparisons resilient to serialisation changes in v2.components/clp-py-utils/clp_py_utils/initialize-orchestration-db.py (1)
47-48: Themodel_validateswitch is on point.
Swapping outparse_objformodel_validateis the compatible v2 path and keeps the error handling logic unchanged.components/clp-py-utils/clp_py_utils/initialize-clp-metadata-db.py (1)
48-49: Good migration tomodel_validate.
This keeps the metadata initialiser in sync with the rest of the config loaders while preserving the existing failure modes.components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (3)
243-245: Nice use ofmodel_validateon packed configs.
Validating the msgpack payloads through the v2 API maintains the prior guarding without introducing extra conversions.
417-418: Task result validation remains robust.
Re-validating each Celery result withCompressionTaskResult.model_validatekeeps the scheduler protected against malformed payloads under the new API.
487-488: Config loading migration looks good.
UsingCLPConfig.model_validatehere mirrors the other entry points and maintains consistent credential loading.components/job-orchestration/job_orchestration/scheduler/job_config.py (1)
87-93: Field validator update is clean.
Recasting the network-address check withfield_validatorkeeps the port guard intact while complying with the v2 decorator pattern.
| msgpack.packb(clp_io_config.model_dump(exclude_none=True, exclude_unset=True)), quality=4 | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Use msgpack with explicit use_bin_type=True for safety.
Prevents potential string/bytes ambiguity across environments.
- msgpack.packb(clp_io_config.model_dump(exclude_none=True, exclude_unset=True)), quality=4
+ msgpack.packb(
+ clp_io_config.model_dump(exclude_none=True, exclude_unset=True),
+ use_bin_type=True
+ ),
+ quality=4📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| msgpack.packb(clp_io_config.model_dump(exclude_none=True, exclude_unset=True)), quality=4 | |
| ) | |
| msgpack.packb( | |
| clp_io_config.model_dump(exclude_none=True, exclude_unset=True), | |
| use_bin_type=True | |
| ), | |
| quality=4 |
🤖 Prompt for AI Agents
In components/clp-package-utils/clp_package_utils/scripts/native/compress.py
around lines 120-121, the msgpack.packb call is missing use_bin_type=True which
can cause string/bytes ambiguity across environments; update the call to include
use_bin_type=True (i.e., msgpack.packb(clp_io_config.model_dump(...),
use_bin_type=True, quality=4) or add use_bin_type=True alongside the existing
args) so msgpack serializes binary data reliably.
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/clp-py-utils/clp_py_utils/clp_config.py(26 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-26T21:12:21.615Z
Learnt from: junhaoliao
PR: y-scope/clp#1338
File: components/clp-py-utils/clp_py_utils/clp_config.py:753-753
Timestamp: 2025-09-26T21:12:21.615Z
Learning: Python linting rules that catch YODA conditions exist but are not enabled for Python modules in the CLP package due to time constraints. The team prefers to avoid YODA conditions in Python for consistency.
Applied to files:
components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2024-10-10T05:49:36.543Z
Learnt from: LinZhihao-723
PR: y-scope/clp#554
File: components/core/src/clp/ffi/SchemaTreeNode.hpp:0-0
Timestamp: 2024-10-10T05:49:36.543Z
Learning: Yoda conditions are part of the project's coding standards for C++ code and should be used in conditional statements.
Applied to files:
components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-07-28T23:14:57.751Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/utils/config.py:42-45
Timestamp: 2025-07-28T23:14:57.751Z
Learning: In the CLP project, the team prefers Yoda-style conditions (e.g., `0 == len(dataset_name)`) over standard Python idioms to help catch accidental assignment operators (writing `=` instead of `==`). Additionally, the team finds using `not` for string emptiness checks ambiguous and prefers explicit length comparisons.
Applied to files:
components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-08-17T15:26:22.790Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/utils/config.py:0-0
Timestamp: 2025-08-17T15:26:22.790Z
Learning: In the CLP project, Yoda-style conditions (literal on the left) are only used for `== 0` checks, not for other comparison operators like `>` or `<`. This is specifically to catch accidental assignment operators (writing `=` instead of `==`) in equality comparisons against zero.
Applied to files:
components/clp-py-utils/clp_py_utils/clp_config.py
🧬 Code graph analysis (1)
components/clp-py-utils/clp_py_utils/clp_config.py (2)
components/clp-py-utils/clp_py_utils/clp_logging.py (2)
is_valid_logging_level(32-33)get_valid_logging_level(28-29)components/clp-py-utils/clp_py_utils/core.py (1)
make_config_path_absolute(42-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: package-image
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (41)
components/clp-py-utils/clp_py_utils/clp_config.py (41)
4-14: LGTM: Pydantic v2 imports correctly added.The new imports align with Pydantic v2 API requirements and support the migration from v1 decorators and methods.
134-142: LGTM: Pydantic v2 field validator correctly applied.The migration from
@validatorto@field_validatorwith@classmethodis properly implemented according to Pydantic v2 patterns.
144-152: LGTM: Pydantic v2 field validator correctly applied.The migration from
@validatorto@field_validatorwith@classmethodis properly implemented according to Pydantic v2 patterns.
154-174: LGTM: Pydantic v2 model validator correctly applied.The migration from
@root_validatorto@model_validator(mode="after")is correctly implemented. The validator receives the model instance (self) and returns it, which aligns with Pydantic v2 expectations.
189-197: LGTM: Pydantic v2 field validator correctly applied.The migration from
@validatorto@field_validatorwith@classmethodis properly implemented according to Pydantic v2 patterns.
199-204: LGTM: Pydantic v2 field validator correctly applied.The migration from
@validatorto@field_validatorwith@classmethodis properly implemented according to Pydantic v2 patterns.
206-217: LGTM: Validator correctly uses shared helper.The migration to
@field_validatorand delegation to_validate_portfollows the established pattern for consistent validation across the codebase.
268-268: LGTM: Pydantic v2 model method correctly updated.The migration from
dict()tomodel_dump()aligns with Pydantic v2 API.
290-295: LGTM: Helper function correctly updated with class parameter.The helper function signature change to accept
clsparameter supports the shared validation pattern used by field validators.
298-300: LGTM: Helper function correctly updated with class parameter.The helper function signature change to accept
clsparameter supports the shared validation pattern used by field validators.
316-320: LGTM: Pydantic v2 field validator correctly applied.The migration from
@validatorto@field_validatorwith@classmethodand delegation to shared helper is properly implemented.
324-347: LGTM: Pydantic v2 field validators correctly applied.All field validators have been properly migrated from
@validatorto@field_validatorwith@classmethod, and the port validator correctly delegates to the shared helper.
353-357: LGTM: Pydantic v2 field validator correctly applied.The migration from
@validatorto@field_validatorwith@classmethodand delegation to shared helper is properly implemented.
363-367: LGTM: Pydantic v2 field validator correctly applied.The migration from
@validatorto@field_validatorwith@classmethodand delegation to shared helper is properly implemented.
378-389: LGTM: Pydantic v2 field validators correctly applied.Both field validators have been properly migrated from
@validatorto@field_validatorwith@classmethod, and the port validator correctly delegates to the shared helper.
392-392: LGTM: Pydantic v2 model method correctly updated.The migration from
dict()tomodel_dump()aligns with Pydantic v2 API.
418-442: LGTM: Pydantic v2 field validators correctly applied.All three field validators have been properly migrated from
@validatorto@field_validatorwith@classmethod. The host and base_port validators correctly delegate to shared helpers.
452-486: LGTM: Pydantic v2 field validators correctly applied.All field validators have been properly migrated from
@validatorto@field_validatorwith@classmethod, with appropriate validation logic for each field.
499-510: LGTM: Pydantic v2 field validators correctly applied.Both field validators have been properly migrated from
@validatorto@field_validatorwith@classmethod, with the port validator correctly delegating to the shared helper.
513-513: LGTM: Pydantic v2 model method correctly updated.The migration from
dict()tomodel_dump()aligns with Pydantic v2 API.
538-552: LGTM: Pydantic v2 field validators correctly applied.Both field validators have been properly migrated from
@validatorto@field_validatorwith@classmethodwith appropriate validation logic.
565-585: LGTM: Pydantic v2 model validator correctly applied.The migration from
@root_validator(pre=True)to@model_validator(mode="before")is correctly implemented. The validator receives raw data and returns it, which aligns with Pydantic v2 "before" mode expectations.
594-606: LGTM: Pydantic v2 field validators correctly applied.Both field validators have been properly migrated from
@validatorto@field_validatorwith@classmethodwith appropriate validation logic.
614-614: LGTM: Pydantic v2 model method correctly updated.The migration from
dict()tomodel_dump()aligns with Pydantic v2 API.
621-625: LGTM: Directory validator updated for Pydantic v2.The migration to
@field_validatorwithmode="before"ensures the validator runs before Pydantic's type conversion topathlib.Path, allowing string validation to work properly.
631-633: LGTM: Pydantic v2 model method correctly updated.The migration from
dict()tomodel_dump()aligns with Pydantic v2 API.
641-645: LGTM: Directory validator updated for Pydantic v2.The migration to
@field_validatorwithmode="before"ensures the validator runs before Pydantic's type conversion topathlib.Path, allowing string validation to work properly.
647-658: Simplify S3 key_prefix validation; remove unnecessary hasattr check.Since
self.s3_configis validated by Pydantic, thehasattrcheck is unnecessary and can be removed.@model_validator(mode="after") def validate_key_prefix(self): - s3_config = self.s3_config - if not hasattr(s3_config, "key_prefix"): - raise ValueError("s3_config must have field key_prefix") - key_prefix = s3_config.key_prefix + key_prefix = self.s3_config.key_prefix if "" == key_prefix: raise ValueError("s3_config.key_prefix cannot be empty") if not key_prefix.endswith("/"): raise ValueError('s3_config.key_prefix must end with "/"') return self
664-666: LGTM: Pydantic v2 model method correctly updated.The migration from
dict()tomodel_dump()aligns with Pydantic v2 API.
722-762: LGTM: Pydantic v2 field validators correctly applied.All field validators have been properly migrated from
@validatorto@field_validatorwith@classmethodwith appropriate validation logic for each field.
771-773: LGTM: Pydantic v2 model method correctly updated.The migration from
dict()tomodel_dump()aligns with Pydantic v2 API.
780-785: LGTM: Pydantic v2 field validator correctly applied.The migration from
@validatorto@field_validatorwith@classmethodis properly implemented according to Pydantic v2 patterns.
794-796: LGTM: Pydantic v2 model method correctly updated.The migration from
dict()tomodel_dump()aligns with Pydantic v2 API.
805-831: LGTM: Pydantic v2 field validators correctly applied.All field validators have been properly migrated from
@validatorto@field_validatorwith@classmethod, with host and port validators correctly delegating to shared helpers.
835-838: LGTM: Pydantic v2 configuration and Field usage correctly applied.The migration from
class Configtomodel_config = ConfigDict()and the use ofFieldwith constraints (gt=0) follows Pydantic v2 best practices.
845-849: LGTM: Pydantic v2 field validator correctly applied.The migration from
@validatorto@field_validatorwith@classmethodand delegation to shared helper is properly implemented.
856-866: LGTM: Pydantic v2 field validators correctly applied.Both field validators have been properly migrated from
@validatorto@field_validatorwith@classmethod, correctly delegating to shared helpers.
885-885: LGTM: Default instance correctly specified.The explicit default assignment maintains the existing behaviour after Pydantic v2 migration.
1034-1034: LGTM: Pydantic v2 model method correctly updated.The migration from
dict()tomodel_dump()aligns with Pydantic v2 API.
1079-1079: LGTM: Pydantic v2 model method correctly updated.The migration from
dict()tomodel_dump()aligns with Pydantic v2 API.
1098-1110: Fix directory validation to handle both string and Path inputs.The current implementation only accepts strings, but the directory fields are typed as
pathlib.Path. This will cause validation failures when Path objects (including default values) are passed.def _validate_directory(value: Any): """ Validates that the given value represents a directory path. :param value: - :raise ValueError: if the value is not of type str. - :raise ValueError: if the value is an empty string. + :raise ValueError: if the value is not a string or pathlib.Path, or if it is empty. """ - if not isinstance(value, str): - raise ValueError("must be a string.") - - if "" == value.strip(): - raise ValueError("cannot be empty") + if isinstance(value, pathlib.Path): + s = str(value) + elif isinstance(value, str): + s = value + else: + raise ValueError("must be a string or pathlib.Path") + + if "" == s.strip(): + raise ValueError("cannot be empty")
| def _validate_port(cls, value): | ||
| min_valid_port = 0 | ||
| max_valid_port = 2**16 - 1 | ||
| if min_valid_port > field or max_valid_port < field: | ||
| if min_valid_port > value or max_valid_port < value: | ||
| raise ValueError( | ||
| f"{cls.__name__}.port is not within valid range " f"{min_valid_port}-{max_valid_port}." | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Simplify comparison; avoid negated inequality.
Use direct comparison for clarity instead of negated inequality.
- if min_valid_port > value or max_valid_port < value:
+ if value < min_valid_port or value > max_valid_port:
raise ValueError(
f"{cls.__name__}.port is not within valid range " f"{min_valid_port}-{max_valid_port}."
)Based on learnings
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _validate_port(cls, value): | |
| min_valid_port = 0 | |
| max_valid_port = 2**16 - 1 | |
| if min_valid_port > field or max_valid_port < field: | |
| if min_valid_port > value or max_valid_port < value: | |
| raise ValueError( | |
| f"{cls.__name__}.port is not within valid range " f"{min_valid_port}-{max_valid_port}." | |
| ) | |
| def _validate_port(cls, value): | |
| min_valid_port = 0 | |
| max_valid_port = 2**16 - 1 | |
| if value < min_valid_port or value > max_valid_port: | |
| raise ValueError( | |
| f"{cls.__name__}.port is not within valid range " f"{min_valid_port}-{max_valid_port}." | |
| ) |
🤖 Prompt for AI Agents
In components/clp-py-utils/clp_py_utils/clp_config.py around lines 303 to 309,
replace the negated/indirect range check ("if min_valid_port > value or
max_valid_port < value") with a direct comparison such as "if value <
min_valid_port or value > max_valid_port" to improve clarity; keep the existing
ValueError message and bounds variables unchanged.
|
|
||
| query_engine = package.get("query_engine") | ||
| presto = values.get("presto") | ||
| @model_validator(mode="after") |
There was a problem hiding this comment.
cc @davemarco / @hoophalab for awareness of the change
There was a problem hiding this comment.
this probably will break the original desired behaviour. read the comment above
There was a problem hiding this comment.
i tried for hours to switch branches and come up with different inputs, but i found no behavioural difference with or without pre=True. i can't be sure if i understand the We set pre=True ... presto config itself comment correctly. (it seems the comment was proposed by @kirkrodrigues ?)
could you share some config example to help understand the difference?
i did dig up the conversations in #1198 and found this from Haiqi:
#1198 (review)
One note to add:
The reason we put pre=true in root_validator is to prevent validating Presto config when it is not allowed.For example, if user specifies a Presto config with invalid port when query_engine is NOT presto. Without pre=true, the script will first complain about invalid port, and only if user fixes the port, it will then let user know that presto is not allowed when query_engine is not presto.
so i tested with this clp-config.yaml
package:
storage_engine: "clp"
query_engine: "presto"
presto:
host: 1
port: -1- with the existing code on
main@a040dfb4f4a14ef22f135dac13d90ff2b6984e07:junhao@ASUS-X870E:~/workspace/2-clp/build/clp-package$ ./sbin/start-clp.sh 2025-09-28T21:48:13.377 ERROR [start_clp] Failed to load config. Traceback (most recent call last): File "/home/junhao/workspace/2-clp/build/clp-package/lib/python3/site-packages/clp_package_utils/scripts/start_clp.py", line 1185, in main clp_config = load_config_file(config_file_path, default_config_file_path, clp_home) File "/home/junhao/workspace/2-clp/build/clp-package/lib/python3/site-packages/clp_package_utils/general.py", line 422, in load_config_file clp_config = CLPConfig.parse_obj(raw_clp_config) File "pydantic/main.py", line 532, in pydantic.main.BaseModel.parse_obj File "pydantic/main.py", line 347, in pydantic.main.BaseModel.__init__ pydantic.error_wrappers.ValidationError: 2 validation errors for CLPConfig package -> __root__ query_engine 'presto' is only compatible with storage_engine 'clp-s'. (type=value_error) presto -> port Presto.port is not within valid range 0-65535. (type=value_error) - I modified the code to use
@root_validatorwithoutpre=True:->diff --git a/components/clp-py-utils/clp_py_utils/clp_config.py b/components/clp-py-utils/clp_py_utils/clp_config.py --- a/components/clp-py-utils/clp_py_utils/clp_config.py (revision a040dfb4f4a14ef22f135dac13d90ff2b6984e07) +++ b/components/clp-py-utils/clp_py_utils/clp_config.py (date 1759110179774) @@ -989,7 +989,7 @@ # We set `pre=True` so that we print errors about a mismatch between the query engine and presto # config before errors about the presto config itself. - @root_validator(pre=True) + @root_validator def validate_presto_config(cls, values): package = values.get("package") if not isinstance(package, Package): @@ -997,7 +997,7 @@ # later and throw an error). return values - query_engine = package.get("query_engine") + query_engine = package.query_engine presto = values.get("presto") if query_engine == QueryEngine.PRESTO and presto is None: raise ValueError(
junhao@ASUS-X870E:~/workspace/2-clp/build/clp-package$ ./sbin/start-clp.sh 2025-09-28T21:50:54.323 ERROR [start_clp] Failed to load config. Traceback (most recent call last): File "/home/junhao/workspace/2-clp/build/clp-package/lib/python3/site-packages/clp_package_utils/scripts/start_clp.py", line 1185, in main clp_config = load_config_file(config_file_path, default_config_file_path, clp_home) File "/home/junhao/workspace/2-clp/build/clp-package/lib/python3/site-packages/clp_package_utils/general.py", line 422, in load_config_file clp_config = CLPConfig.parse_obj(raw_clp_config) File "pydantic/main.py", line 532, in pydantic.main.BaseModel.parse_obj File "pydantic/main.py", line 347, in pydantic.main.BaseModel.__init__ pydantic.error_wrappers.ValidationError: 2 validation errors for CLPConfig package -> __root__ query_engine 'presto' is only compatible with storage_engine 'clp-s'. (type=value_error) presto -> port Presto.port is not within valid range 0-65535. (type=value_error) - with the code on this PR branch where i changed it to
@model_validate(mode="after)with Pydantic V2, which is similar to@root_validatein Pydantic V1: https://github.com/y-scope/clp/blob/2ca7cb95ecc25ca9ecd844177d267df54153cc74/components/clp-py-utils/clp_py_utils/clp_config.pyjunhao@ASUS-X870E:~/workspace/2-clp/build/clp-package$ ./sbin/start-clp.sh 2025-09-28T22:00:01.268 ERROR [start_clp] Failed to load config. Traceback (most recent call last): File "/home/junhao/workspace/2-clp/build/clp-package/lib/python3/site-packages/clp_package_utils/scripts/start_clp.py", line 1183, in main clp_config = load_config_file(config_file_path, default_config_file_path, clp_home) File "/home/junhao/workspace/2-clp/build/clp-package/lib/python3/site-packages/clp_package_utils/general.py", line 422, in load_config_file clp_config = CLPConfig.model_validate(raw_clp_config) File "/home/junhao/workspace/2-clp/build/clp-package/lib/python3/site-packages/pydantic/main.py", line 705, in model_validate return cls.__pydantic_validator__.validate_python( pydantic_core._pydantic_core.ValidationError: 2 validation errors for CLPConfig package Value error, query_engine 'presto' is only compatible with storage_engine 'clp-s'. [type=value_error, input_value={'storage_engine': 'clp',...query_engine': 'presto'}, input_type=dict] For further information visit https://errors.pydantic.dev/2.11/v/value_error presto.port Value error, Presto.port is not within valid range 0-65535. [type=value_error, input_value=-1, input_type=int] For further information visit https://errors.pydantic.dev/2.11/v/value_error
The query_engine mismatch error was reported in all above cases.
There was a problem hiding this comment.
in case we no longer need to replicate the previous handling, i have removed the outdated comment in 167f2a6. If mode="before" is indeed needed, i can revert and update the comment.
… for s3_config key_prefix.
gibber9809
left a comment
There was a problem hiding this comment.
Nice work! Just a few minor comments/clarifications, then this change should be good to go.
| env_vars: Optional[Dict[str, str]] | ||
| aws_credentials: Optional[S3Credentials] |
There was a problem hiding this comment.
Just for my own understanding, why do we remove the explicit = None for these optional arguments here, but add it at clp_config.py:538? Also seems like there are several other places with explicit = None that we've left untouched.
There was a problem hiding this comment.
why do we remove the explicit = None for these optional arguments here
the None value is never used because new values will be assigned by one of the if-branches. the "unused variable error" is actually marked by PyCharm
add it at clp_config.py:538
Pydantic V2 requires that all fields with optional types be assigned None values at their declarations inside the models.
Also seems like there are several other places with explicit
= Nonethat we've left untouched.
Likely in some later branches the variable is untouched, so the None value can be used.
Eden-D-Zhang
left a comment
There was a problem hiding this comment.
No comments from me, just existing suggestion from Devin.
…)` - Apply suggestion from @gibber9809 Co-authored-by: Devin Gibson <gibber9809@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/clp-py-utils/clp_py_utils/clp_config.py(27 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-09-26T21:12:21.655Z
Learnt from: junhaoliao
PR: y-scope/clp#1338
File: components/clp-py-utils/clp_py_utils/clp_config.py:753-753
Timestamp: 2025-09-26T21:12:21.655Z
Learning: Python linting rules that catch YODA conditions exist but are not enabled for Python modules in the CLP package due to time constraints. The team prefers to avoid YODA conditions in Python for consistency.
Applied to files:
components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2024-10-10T05:49:36.543Z
Learnt from: LinZhihao-723
PR: y-scope/clp#554
File: components/core/src/clp/ffi/SchemaTreeNode.hpp:0-0
Timestamp: 2024-10-10T05:49:36.543Z
Learning: Yoda conditions are part of the project's coding standards for C++ code and should be used in conditional statements.
Applied to files:
components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-07-28T23:14:57.751Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/utils/config.py:42-45
Timestamp: 2025-07-28T23:14:57.751Z
Learning: In the CLP project, the team prefers Yoda-style conditions (e.g., `0 == len(dataset_name)`) over standard Python idioms to help catch accidental assignment operators (writing `=` instead of `==`). Additionally, the team finds using `not` for string emptiness checks ambiguous and prefers explicit length comparisons.
Applied to files:
components/clp-py-utils/clp_py_utils/clp_config.py
📚 Learning: 2025-08-17T15:26:22.790Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1100
File: integration-tests/tests/utils/config.py:0-0
Timestamp: 2025-08-17T15:26:22.790Z
Learning: In the CLP project, Yoda-style conditions (literal on the left) are only used for `== 0` checks, not for other comparison operators like `>` or `<`. This is specifically to catch accidental assignment operators (writing `=` instead of `==`) in equality comparisons against zero.
Applied to files:
components/clp-py-utils/clp_py_utils/clp_config.py
🧬 Code graph analysis (1)
components/clp-py-utils/clp_py_utils/clp_config.py (3)
components/clp-package-utils/clp_package_utils/general.py (1)
validate_port(195-207)components/clp-py-utils/clp_py_utils/clp_logging.py (2)
is_valid_logging_level(32-33)get_valid_logging_level(28-29)components/clp-py-utils/clp_py_utils/core.py (1)
make_config_path_absolute(42-53)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: package-image
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
303-309: Avoid Yoda-style comparison for consistency.Line 306 uses
min_valid_port > value or max_valid_port < value. Per project guidelines, Yoda conditions should be avoided in Python for consistency.Based on learnings
Apply this diff:
def _validate_port(cls, value): min_valid_port = 0 max_valid_port = 2**16 - 1 - if min_valid_port > value or max_valid_port < value: + if value < min_valid_port or value > max_valid_port: raise ValueError( f"{cls.__name__}.port is not within valid range " f"{min_valid_port}-{max_valid_port}." )
| def _validate_directory(value: Any): | ||
| """ | ||
| Validates that the given value represents a directory path. | ||
|
|
||
| :param value: | ||
| :raise ValueError: if the value is not of type str. | ||
| :raise ValueError: if the value is an empty string. | ||
| """ | ||
| if not isinstance(value, str): | ||
| raise ValueError("must be a string.") | ||
|
|
||
| if "" == value.strip(): | ||
| raise ValueError("cannot be empty") |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Clarify type handling in _validate_directory.
The helper only accepts str but validators calling it (lines 626-630, 646-650) use mode="before" with fields typed as pathlib.Path. While this works because YAML parsing produces strings before Pydantic converts them to Path objects, the implementation and docstring could be clearer:
- The docstring states "if value is not of type str" but doesn't explain why Path objects would be invalid
- The error message "must be a string" is confusing for a field declared as
pathlib.Path
Consider either:
- Updating the docstring to clarify this runs in "before" mode on raw string input
- Or handling both
strandPathtypes if the helper might be reused in "after" validators
Apply this diff to improve clarity:
def _validate_directory(value: Any):
"""
Validates that the given value represents a directory path.
+ Expected to run in 'before' mode validators on raw string input.
:param value:
- :raise ValueError: if the value is not of type str.
+ :raise ValueError: if the value is not a string (raw config input).
:raise ValueError: if the value is an empty string.
"""
if not isinstance(value, str):
- raise ValueError("must be a string.")
+ raise ValueError("directory must be a string in raw config.")
if "" == value.strip():
raise ValueError("cannot be empty")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _validate_directory(value: Any): | |
| """ | |
| Validates that the given value represents a directory path. | |
| :param value: | |
| :raise ValueError: if the value is not of type str. | |
| :raise ValueError: if the value is an empty string. | |
| """ | |
| if not isinstance(value, str): | |
| raise ValueError("must be a string.") | |
| if "" == value.strip(): | |
| raise ValueError("cannot be empty") | |
| def _validate_directory(value: Any): | |
| """ | |
| Validates that the given value represents a directory path. | |
| Expected to run in 'before' mode validators on raw string input. | |
| :param value: | |
| :raise ValueError: if the value is not a string (raw config input). | |
| :raise ValueError: if the value is an empty string. | |
| """ | |
| if not isinstance(value, str): | |
| raise ValueError("directory must be a string in raw config.") | |
| if "" == value.strip(): | |
| raise ValueError("cannot be empty") |
🤖 Prompt for AI Agents
components/clp-py-utils/clp_py_utils/clp_config.py around lines 1100 to 1112:
the _validate_directory helper currently only accepts str and its
docstring/error message are misleading when used as a Pydantic "before"
validator for fields typed as pathlib.Path; update the function to either (a)
clarify the docstring to state it runs in "before" mode on raw YAML/string input
and that Path objects are not expected here, and make the error messages clearer
(e.g., "directory path must be a non-empty string"), or (b) broaden the
implementation to accept both str and pathlib.Path (convert Path to str via
str(value) before validation) and update the docstring to reflect both accepted
types and behavior. Ensure consistent, user-friendly error messages and keep
behavior unchanged for current callers.
gibber9809
left a comment
There was a problem hiding this comment.
LGTM! I'm fine with the PR title as well.
| @validator("network_address") | ||
| @field_validator("network_address") | ||
| @classmethod | ||
| def validate_network_address(cls, field): |
There was a problem hiding this comment.
this is purely stylistic - we should change the second argument's name (field) to value to match the convention in the latest Pydantic docs
similarly, we should do the same for valid_status in components/job-orchestration/job_orchestration/scheduler/scheduler_data.py
@coderabbitai create an issue to track this
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue to track the stylistic parameter rename
gh issue create \
--title "Rename field validator parameter from 'field' to 'value' for Pydantic v2 convention" \
--body "## Description
Update field validator parameter names to match Pydantic v2 documentation conventions by renaming \`field\` to \`value\`.
## Locations to update
1. \`validate_network_address\` in \`components/job-orchestration/job_orchestration/scheduler/job_config.py\`
2. \`valid_status\` in \`components/job-orchestration/job_orchestration/scheduler/scheduler_data.py\`
## Context
This is a stylistic change to align with the latest Pydantic v2 documentation conventions where field validators use \`value\` as the parameter name for the field being validated.
## References
- PR #1338: https://github.com/y-scope/clp/pull/1338
- Comment: https://github.com/y-scope/clp/pull/1338#discussion_r2395502615
- Requested by: @junhaoliao" \
--assignee junhaoliao
Length of output: 42
@junhaoliao Issue created! I've opened #1362 to track renaming the field parameter to value in both validate_network_address and valid_status field validators to match Pydantic v2 conventions. The issue has been assigned to you with references back to this PR and comment.
|
|
||
| class QueryTaskResult(BaseModel): | ||
| status: QueryTaskStatus | ||
| task_id: str |
There was a problem hiding this comment.
Pydantic V2 seems to enforce stricter type checks, and creating a QueryTaskResult with the task_id being an int actually causes failure in IR extraction jobs (i.e., the "Link to log-viewer" feature in the webui).
@coderabbitai file an issue to track the bug
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Create a GitHub issue to track the Pydantic V2 type enforcement bug
gh issue create \
--title "Pydantic V2 strict type checking causes IR extraction job failure with integer task_id" \
--body "## Description
Pydantic V2 enforces stricter type checks than V1. The \`QueryTaskResult\` class defines \`task_id: str\`, but somewhere in the codebase an integer is being passed for \`task_id\` when creating a \`QueryTaskResult\` instance. This causes IR extraction jobs to fail, breaking the \"Link to log-viewer\" feature in the web UI.
## Context
- Reported in PR #1338: https://github.com/y-scope/clp/pull/1338
- Comment: https://github.com/y-scope/clp/pull/1338#discussion_r2395517772
- Affected component: \`components/job-orchestration/job_orchestration/scheduler/scheduler_data.py\`
- Affected class: \`QueryTaskResult\`
- Field: \`task_id: str\`
## Impact
IR extraction jobs fail when using Pydantic V2, preventing users from accessing the log-viewer link feature in the web UI.
## Reported by
@junhaoliao" \
--assignee junhaoliao \
--repo y-scope/clp
Length of output: 42
@junhaoliao I've created issue #1363 to track this bug: #1363
The issue documents the Pydantic V2 strict type checking problem where QueryTaskResult is being instantiated with an integer task_id instead of a string, causing IR extraction jobs to fail and breaking the log-viewer link feature in the web UI.
…o match V2 syntax; Fix various typing errors. (y-scope#1338) Co-authored-by: Eden Zhang <49173122+Eden-D-Zhang@users.noreply.github.com> Co-authored-by: Devin Gibson <gibber9809@users.noreply.github.com>
Description
Upgrade
bump-pydanticto auto-migrate existing code, changes include:@validator→@field_validator@root_validator(pre=True)→@model_validator(mode="before")class Config: ...syntax tomodel_config = ConfigDict(...)Pydantic v2 Migration
@root_validatorto@model_validator(mode="after")Fieldparse_obj→model_validatedict()→model_dump()copy()→model_copy()json()→model_dump_json()parse_raw()→model_validate_json()values→data,field→value) for Pydantic v2 conventionsFix other typing errors
Optional,ArchiveOutput)clp_site_packages_dir, s3_utils variables)container_paths_to_extract_file_pathto string before appendingStorageTypeandStorageEnginecasting for config valuescmdparameters ([str]→List[str])exceptblocksChecklist
breaking change.
Validation performed
1.
taskto build the package.2. Tested below
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Chores