⚡️ Speed up function visualize_tenengrad_measure by 56% in PR #1857 (camera-focus-v2)
#1859
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1857
If you approve this dependent PR, these changes will be merged into the original PR branch
camera-focus-v2.📄 56% (0.56x) speedup for
visualize_tenengrad_measureininference/core/workflows/core_steps/classical_cv/camera_focus/v2.py⏱️ Runtime :
52.1 milliseconds→33.3 milliseconds(best of128runs)📝 Explanation and details
The optimized code achieves a 56% speedup through two main optimizations:
1. Zebra Mask Caching (Primary Speedup)
The original code called
_create_zebra_mask(gray.shape)on every invocation of_apply_zebra_warnings, which involves expensive NumPy operations (np.ogridand modulo arithmetic over the entire image). The optimized version introduces a module-level cache_ZEBRA_MASK_CACHEthat stores zebra masks by shape, eliminating this redundant computation for repeated calls with the same image dimensions.Impact from line profiler:
_create_zebra_masktook 15.2ms (59.1%) of_apply_zebra_warningstime2. In-Place Arithmetic in Tenengrad Computation
The original computed
focus_measure = gx**2 + gy**2, creating three temporary arrays. The optimized version uses:Impact from line profiler:
gx**2 + gy**2took 8.74ms (38.7%) of_compute_tenengradtimePerformance Across Test Cases
The optimization shows consistent gains:
Workload Context
Based on
function_references,visualize_tenengrad_measureis called from a workflowrun()method that processes video frames or batched images. The zebra mask cache will be particularly beneficial when:The cache stores masks by shape tuple as keys, so memory usage grows only with the number of unique image dimensions encountered - typically 1-2 entries in production workflows.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr1857-2025-12-29T20.24.13and push.