Skip to content

Optimize test_search_with_max_results using bulk insert (10x faster)#19272

Merged
harupy merged 3 commits intomasterfrom
copilot/optimize-bulk-insert-performance
Dec 8, 2025
Merged

Optimize test_search_with_max_results using bulk insert (10x faster)#19272
harupy merged 3 commits intomasterfrom
copilot/optimize-bulk-insert-performance

Conversation

Copy link
Contributor

Copilot AI commented Dec 8, 2025

Related Issues/PRs

What changes are proposed in this pull request?

Replaces 1200 individual _run_factory() calls with SQLAlchemy bulk insert in test_search_with_max_results, reducing test execution time from 7.44s to 0.76s (~10x speedup).

Changes:

  • Generate all 1200 run UUIDs upfront
  • Use session.add_all() with generator expression to bulk-create SqlRun objects
  • Use enumerate(run_uuids) for cleaner iteration
  • Use constants (RunStatus.to_string(RunStatus.RUNNING), entities.LifecycleStage.ACTIVE) instead of hardcoded strings

Test behavior and assertions remain unchanged.

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/tracking: Tracking Service, tracking client APIs, autologging
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
  • area/gateway: MLflow AI Gateway client APIs, server, and third-party integrations
  • area/prompts: MLflow prompt engineering features, prompt templates, and prompt management
  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality
  • area/projects: MLproject format, project running backends
  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)
Original prompt

tests/store/tracking/test_sqlalchemy_store.py::test_search_with_max_results is slow since it inserts runs one by one. Use bulk-insert for faster execution

diff --git a/tests/store/tracking/test_sqlalchemy_store.py b/tests/store/tracking/test_sqlalchemy_store.py
index d93564eeb7..ed11e48528 100644
--- a/tests/store/tracking/test_sqlalchemy_store.py
+++ b/tests/store/tracking/test_sqlalchemy_store.py
@@ -2398,11 +2398,23 @@ def test_search_full(store: SqlAlchemyStore):
 
 def test_search_with_max_results(store: SqlAlchemyStore):
     exp = _create_experiments(store, "search_with_max_results")
-    runs = [
-        _run_factory(store, _get_run_configs(exp, start_time=r)).info.run_id for r in range(1200)
-    ]
+    # Bulk insert runs using SQLAlchemy for performance
+    run_uuids = [uuid.uuid4().hex for _ in range(1200)]
+    with store.ManagedSessionMaker() as session:
+        session.add_all(
+            SqlRun(
+                run_uuid=run_uuids[i],
+                name="name",
+                experiment_id=int(exp),
+                user_id="Anderson",
+                status="RUNNING",
+                start_time=i,
+                lifecycle_stage="active",
+            )
+            for i in range(1200)
+        )
     # reverse the ordering, since we created in increasing order of start_time
-    runs.reverse()
+    runs = list(reversed(run_uuids))
 
     assert runs[:1000] == _search_runs(store, exp)

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

@harupy harupy marked this pull request as ready for review December 8, 2025 10:27
Copilot AI review requested due to automatic review settings December 8, 2025 10:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI and others added 2 commits December 8, 2025 10:31
Co-authored-by: harupy <17039389+harupy@users.noreply.github.com>
…results

Co-authored-by: harupy <17039389+harupy@users.noreply.github.com>
Copilot AI changed the title [WIP] Optimize test_search_with_max_results for faster execution Optimize test_search_with_max_results using bulk insert (10x faster) Dec 8, 2025
Copilot AI requested a review from harupy December 8, 2025 10:40
@github-actions github-actions bot added area/build Build and test infrastructure for MLflow rn/none List under Small Changes in Changelogs. labels Dec 8, 2025
@harupy harupy enabled auto-merge December 8, 2025 11:15
@harupy harupy added this pull request to the merge queue Dec 8, 2025
Merged via the queue into master with commit 280474e Dec 8, 2025
52 of 54 checks passed
@harupy harupy deleted the copilot/optimize-bulk-insert-performance branch December 8, 2025 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/build Build and test infrastructure for MLflow rn/none List under Small Changes in Changelogs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants