Skip to content

clp-package: Record search job and search task statistics in the database.#416

Merged
wraymo merged 21 commits into
y-scope:mainfrom
wraymo:search_scheduler_profiling
Jun 8, 2024
Merged

clp-package: Record search job and search task statistics in the database.#416
wraymo merged 21 commits into
y-scope:mainfrom
wraymo:search_scheduler_profiling

Conversation

@wraymo

@wraymo wraymo commented May 27, 2024

Copy link
Copy Markdown
Contributor

References

Description

Currently, we store only search jobs in the database, and we don't know how long each task takes to search an archive. This PR introduces search task profiling to provide better insight into the durations.

  • Adds a search task table and SearchTaskStatus.
  • Inserts corresponding search tasks into the database when dispatching a search job.
  • Updates search task metadata before and after binary execution in the search executor.
  • Logs completion details and updates the search job status after all tasks are complete.
  • Manages search task metadata appropriately upon cancellation, based on the search task status.

Validation performed

  • Started the package and compressed the hive-24hrs dataset.
  • In the command line, executed sbin/search.sh "org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties" and checked search_jobs and search_tasks in the database.
  • In the webui, ran query org*, and waited until it finished. Checked search_jobs and search_tasks in the database.
  • Ran the same query, and cancelled it before it finished. Checked search_jobs and search_tasks in the database.

@wraymo wraymo changed the title clp-package: Add search task profiling and fix several issues with search job cancellation clp-package: Add search task profiling Jun 6, 2024
@wraymo wraymo requested a review from kirkrodrigues June 6, 2024 18:08

async def handle_jobs(
db_conn_pool,
database_connection_params: Dict[str, any],

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about clp_metadata_db_conn_params? (Same suggestion in other methods.)

return task_ids


def set_task_status(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we deduplicate this with set_job_status?

@wraymo wraymo requested a review from kirkrodrigues June 7, 2024 21:04
the update is conditional on the job's current status matching `prev_status`. If `kwargs` are
specified, the fields identified by the args are also updated.
Sets the status of the job or the tasks identified by `job_id` to `status`. If `prev_status` is
specified,the update is conditional on the job's current status matching `prev_status`. If

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
specified,the update is conditional on the job's current status matching `prev_status`. If
specified, the update is conditional on the job/task's current status matching `prev_status`. If

@wraymo wraymo requested a review from kirkrodrigues June 8, 2024 01:52

@kirkrodrigues kirkrodrigues left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the PR title, how about:

clp-package: Record search job and search task statistics in the database.

@wraymo wraymo merged commit 64e5941 into y-scope:main Jun 8, 2024
@wraymo wraymo changed the title clp-package: Add search task profiling clp-package: Record search job and search task statistics in the database. Jun 8, 2024
junhaoliao pushed a commit to junhaoliao/clp that referenced this pull request May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants