refactor(clp-package): Replace hard-coded CLP metadata table names with table_prefix from connection params and table suffix constants.#864
Conversation
## Walkthrough
This set of changes refactors how database table names are constructed throughout several components. Hardcoded table name suffixes are replaced with imported constants, and table prefixes are dynamically retrieved from configuration or connection parameters at runtime. Multiple functions and class constructors are updated to accept additional parameters for database connection configuration, enabling dynamic assembly of table names. The modifications affect SQL queries in scripts and core orchestration modules, but do not alter the overall logic, error handling, or control flow.
## Changes
| File(s) | Change Summary |
|-------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py` | Replaces hardcoded table suffixes in SQL queries with imported constants (`ARCHIVE_TAGS_TABLE_SUFFIX`, `ARCHIVES_TABLE_SUFFIX`, `FILES_TABLE_SUFFIX`). Table names are now dynamically constructed using these suffixes and the table prefix from configuration. |
| `components/clp-package-utils/clp_package_utils/scripts/native/decompress.py` | Replaces use of the `CLP_METADATA_TABLE_PREFIX` constant with dynamic retrieval of the table prefix from connection parameters and uses the `FILES_TABLE_SUFFIX` constant to construct the files table name in SQL queries. |
| `components/clp-package-utils/clp_package_utils/scripts/start_clp.py` | Updates imports to use table suffix constants and the `Database` class. Refactors the `start_webui` function to dynamically construct archive and files table names using the table prefix from the database configuration and the respective suffix constants. Database connection parameters are now accessed via the `Database` object. |
| `components/job-orchestration/job_orchestration/executor/compress/compression_task.py` | Imports `ARCHIVE_TAGS_TABLE_SUFFIX` and updates the `update_tags` function to use this constant for constructing the archive tags table name in SQL insert statements, replacing the hardcoded string. |
| `components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py` | Removes `CLP_METADATA_TABLE_PREFIX` and uses the `TAGS_TABLE_SUFFIX` constant. Table names for tag operations are now dynamically constructed using the table prefix from the connection configuration and the suffix constant in SQL queries. |
| `components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py` | Adds a `table_prefix` parameter to several functions and class constructors. Table names in SQL queries are now built dynamically using the table prefix from connection parameters and imported table suffix constants. Function and constructor signatures are updated accordingly. |
## Sequence Diagram(s)
```mermaid
sequenceDiagram
participant Caller
participant Config
participant DB
Caller->>Config: Get DB connection params (includes table_prefix)
Caller->>Caller: Construct table name (table_prefix + TABLE_SUFFIX)
Caller->>DB: Execute SQL query using dynamic table name
DB-->>Caller: Return resultsPossibly related PRs
|
There was a problem hiding this comment.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (1)
389-403: 🛠️ Refactor suggestionRefactor tag-filter query construction to avoid brittle
%string formattingMixing f-strings with legacy
%interpolation is hard to read and easy to break (e.g., iftagsis empty, the generated SQL isIN (), causing a syntax error).A clearer, fully-parameterised approach:
- filter_clauses.append( - f"id IN (SELECT archive_id FROM {table_prefix}{ARCHIVE_TAGS_TABLE_SUFFIX} WHERE " - f"tag_id IN (SELECT tag_id FROM {table_prefix}{TAGS_TABLE_SUFFIX} WHERE tag_name IN " - f"(%s)))" % ", ".join(["%s" for _ in search_config.tags]) - ) + placeholders = ", ".join(["%s"] * len(search_config.tags)) + filter_clauses.append( + f"id IN (SELECT archive_id FROM {table_prefix}{ARCHIVE_TAGS_TABLE_SUFFIX} " + f"WHERE tag_id IN (SELECT tag_id FROM {table_prefix}{TAGS_TABLE_SUFFIX} " + f"WHERE tag_name IN ({placeholders})))" + )No mixed formatting, and the placeholder count always matches the parameter list.
🧹 Nitpick comments (2)
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (1)
865-868: Re-use the existingDatabaseinstance instead of re-parsing it
clp_config.databaseis already a validatedDatabasemodel instance.
Re-parsing it withDatabase.parse_objintroduces unnecessary overhead and double-validation. You can safely drop the extra call:- database_config = Database.parse_obj(clp_config.database) + database_config = clp_config.databaseThis keeps the intent clear and shaves a few milliseconds off start-up.
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (1)
386-388: Usetyping.Anyinstead of the built-inanyin type hints
any(lower-case) refers to the built-in function and is not a valid typing annotation.
Using it here disables static type-checking and may confuse IDEs.- clp_metadata_db_conn_params: Dict[str, any], + clp_metadata_db_conn_params: Dict[str, Any],Apply the same change to the other new signatures for consistency.
Also applies to: 419-420, 535-536
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py(4 hunks)components/clp-package-utils/clp_package_utils/scripts/native/decompress.py(2 hunks)components/clp-package-utils/clp_package_utils/scripts/start_clp.py(2 hunks)components/job-orchestration/job_orchestration/executor/compress/compression_task.py(2 hunks)components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py(2 hunks)components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py(11 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (1)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
Database(88-167)
components/job-orchestration/job_orchestration/scheduler/query/query_scheduler.py (2)
components/clp-py-utils/clp_py_utils/clp_config.py (1)
CLPConfig(603-776)components/job-orchestration/job_orchestration/scheduler/job_config.py (3)
ExtractIrJobConfig(62-66)ExtractJsonJobConfig(69-71)SearchJobConfig(74-92)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: lint-check (ubuntu-latest)
- GitHub Check: lint-check (macos-latest)
🔇 Additional comments (14)
components/job-orchestration/job_orchestration/executor/compress/compression_task.py (2)
13-13: Good addition of the table suffix import.Importing this constant aligns with the PR's objective to make table names configurable rather than hardcoded.
72-76: Well implemented table name construction.The function now properly uses the
table_prefixparameter combined with theARCHIVE_TAGS_TABLE_SUFFIXconstant instead of a hardcoded table name. This approach provides more flexibility and consistency in handling database table names.components/job-orchestration/job_orchestration/scheduler/compress/compression_scheduler.py (3)
18-18: Appropriate replacement of hardcoded prefix with table suffix constant.The import has been correctly updated to use the
TAGS_TABLE_SUFFIXconstant instead of the previously hardcodedCLP_METADATA_TABLE_PREFIX.
236-241: Well-implemented dynamic table name construction.The code now properly extracts the table prefix from database connection parameters and uses it with the
TAGS_TABLE_SUFFIXconstant to dynamically construct the table name. This replaces the previously hardcoded prefix approach.
244-246: Consistent implementation of dynamic table naming.The SQL query correctly uses the dynamically constructed table name with the table prefix and suffix constant, maintaining consistency with the overall refactoring approach.
components/clp-package-utils/clp_package_utils/scripts/native/decompress.py (3)
12-16: Appropriate import structure for table suffix constants.The import statement has been properly updated to include the necessary
FILES_TABLE_SUFFIXconstant, along with other required imports, aligning with the refactoring goal.
51-52: Good addition of table prefix extraction.These new lines correctly retrieve the connection parameters and extract the table prefix, which is necessary for constructing the dynamic table name.
57-57: Proper implementation of dynamic table name in SQL query.The SQL query now correctly uses the dynamically constructed table name by combining the table prefix with the
FILES_TABLE_SUFFIXconstant, replacing the hardcoded approach.components/clp-package-utils/clp_package_utils/scripts/native/archive_manager.py (5)
10-15: Good import structure for table suffix constants.All necessary table suffix constants are properly imported for use throughout the file.
253-255: Well-implemented dynamic table name in query.The SQL query now correctly uses the
table_prefixcombined with theARCHIVES_TABLE_SUFFIXconstant to construct the table name dynamically.
316-323: Consistent implementation in DELETE query.The DELETE query properly uses the dynamic table name construction approach, maintaining consistency with the refactoring strategy.
335-340: Good refactoring of the FILES table reference.The query now properly uses
table_prefixwithFILES_TABLE_SUFFIXto construct the table name dynamically.
342-347: Consistent implementation for archive tags table.The DELETE query for the archive tags table correctly follows the same dynamic table name construction pattern as used elsewhere.
components/clp-package-utils/clp_package_utils/scripts/start_clp.py (1)
871-876: Double-check that all metadata tables needed by the WebUI are passedOnly the archives and files table names are updated here.
If the WebUI still queriestagsorarchive_tagstables (older versions did), missing entries inmeteor_settingswill result in runtime SQL errors.Please verify WebUI code paths and, if necessary, add:
"SqlDbClpTagsTableName": f"{table_prefix}{TAGS_TABLE_SUFFIX}", "SqlDbClpArchiveTagsTableName": f"{table_prefix}{ARCHIVE_TAGS_TABLE_SUFFIX}",before shipping.
| table_prefix = clp_metadata_db_conn_params["table_prefix"] | ||
| query = f"""SELECT archive_id, id as file_split_id | ||
| FROM {table_prefix}{FILES_TABLE_SUFFIX} WHERE | ||
| orig_file_id = '{orig_file_id}' AND | ||
| begin_message_ix <= {msg_ix} AND | ||
| (begin_message_ix + num_messages) > {msg_ix} |
There was a problem hiding this comment.
Prevent SQL-injection by parameterising orig_file_id
orig_file_id is interpolated directly into the SQL string, which can allow injection if the ID ever comes from untrusted input. Parameterise it instead:
- query = f"""SELECT archive_id, id as file_split_id
- FROM {table_prefix}{FILES_TABLE_SUFFIX} WHERE
- orig_file_id = '{orig_file_id}' AND
+ query = f"""SELECT archive_id, id as file_split_id
+ FROM {table_prefix}{FILES_TABLE_SUFFIX} WHERE
+ orig_file_id = %s AND
begin_message_ix <= {msg_ix} AND
(begin_message_ix + num_messages) > {msg_ix}
"""
...
- cursor.execute(query)
+ cursor.execute(query, (orig_file_id,))This eliminates the risk and keeps the style consistent with archive_exists().
Also applies to: 467-468
kirkrodrigues
left a comment
There was a problem hiding this comment.
For the PR title, how about:
refactor(clp-package): Replace hard-coded CLP metadata table names with `table_prefix` from connection params and table suffix constants.
table_prefix from Database connection parameters and table suffix constants.table_prefix from connection params and table suffix constants.
Co-authored-by: kirkrodrigues <2454684+kirkrodrigues@users.noreply.github.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
…th `table_prefix` from connection params and table suffix constants. (y-scope#864)
Description
CLP_METADATA_TABLE_PREFIXwith thetable_prefixfield indatabase_config.get_clp_connection_params_and_type(disable_localhost_socket_connection=true)archives,tags,archive_tagsandfileswith their corresponding suffix constants.These will help in future PRs due to:
table_prefixinto{table_prefix}{dataset}_when usingCLP_Stable_prefixupdated by grep-ping*_TABLE_SUFFIXvariables.This PR succeeds:
clp-sto compression task workers. #819datasetfields to input configs of compression and search jobs. #839CLP_Sstorage engine. #831table-nametodataset-namefor consistency; Update the name of the column metadata table name to put the dataset name first. #846and prepares for step no. 3 of the dataset feature implementation plan.
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
Summary by CodeRabbit