Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Dec 29, 2025

⚡️ This pull request contains optimizations for PR #1857

If you approve this dependent PR, these changes will be merged into the original PR branch camera-focus-v2.

This PR will be automatically closed if the original PR is merged.


📄 56% (0.56x) speedup for visualize_tenengrad_measure in inference/core/workflows/core_steps/classical_cv/camera_focus/v2.py

⏱️ Runtime : 52.1 milliseconds 33.3 milliseconds (best of 128 runs)

📝 Explanation and details

The optimized code achieves a 56% speedup through two main optimizations:

1. Zebra Mask Caching (Primary Speedup)

The original code called _create_zebra_mask(gray.shape) on every invocation of _apply_zebra_warnings, which involves expensive NumPy operations (np.ogrid and modulo arithmetic over the entire image). The optimized version introduces a module-level cache _ZEBRA_MASK_CACHE that stores zebra masks by shape, eliminating this redundant computation for repeated calls with the same image dimensions.

Impact from line profiler:

  • Original: _create_zebra_mask took 15.2ms (59.1%) of _apply_zebra_warnings time
  • Optimized: Cache lookup + occasional mask creation takes only 5.3ms (28.6%) on cache misses, near-zero on hits
  • This optimization is especially effective in video processing or batch workflows where frames have consistent dimensions

2. In-Place Arithmetic in Tenengrad Computation

The original computed focus_measure = gx**2 + gy**2, creating three temporary arrays. The optimized version uses:

np.square(gx, out=gx)
np.square(gy, out=gy)
np.add(gx, gy, out=gx)

Impact from line profiler:

  • Original: gx**2 + gy**2 took 8.74ms (38.7%) of _compute_tenengrad time
  • Optimized: In-place operations take 1.13ms (8.8%) total
  • Reduces memory allocations from ~3 full-size arrays to zero temporary arrays

Performance Across Test Cases

The optimization shows consistent gains:

  • Large images (512×512): 136-175% faster - cache effectiveness shines with multiple calls
  • Small images: 8-14% faster - in-place arithmetic benefits dominate
  • No visualization mode: 270% faster - Tenengrad in-place optimization is isolated

Workload Context

Based on function_references, visualize_tenengrad_measure is called from a workflow run() method that processes video frames or batched images. The zebra mask cache will be particularly beneficial when:

  • Processing video streams with fixed resolution
  • Running multiple workflow invocations on same-sized images
  • The camera focus visualization is enabled (which is the common case per the function signature defaults)

The cache stores masks by shape tuple as keys, so memory usage grows only with the number of unique image dimensions encountered - typically 1-2 entries in production workflows.

Correctness verification report:

Test Status
⏪ Replay Tests 🔘 None Found
⚙️ Existing Unit Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
🌀 Generated Regression Tests 47 Passed
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import cv2
import numpy as np

# imports
import pytest

# Function to test (copied from above, as required)
import supervision as sv
from inference.core.workflows.core_steps.classical_cv.camera_focus.v2 import (
    visualize_tenengrad_measure,
)

GRID_DIVISIONS = {
    "None": 0,
    "2x2": 2,
    "3x3": 3,
    "4x4": 4,
    "5x5": 5,
}
from inference.core.workflows.core_steps.classical_cv.camera_focus.v2 import (
    visualize_tenengrad_measure,
)

# --- TESTS BEGIN HERE ---


# Helper: Dummy Detections class for bbox tests
class DummyDetections:
    def __init__(self, xyxy):
        self.xyxy = xyxy

    def __len__(self):
        return len(self.xyxy)


# BASIC TEST CASES


def test_basic_rgb_image_returns_tuple_and_correct_shape():
    # 3x3 RGB image, all mid-gray
    img = np.full((3, 3, 3), 128, dtype=np.uint8)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img
    )  # 349μs -> 315μs (10.8% faster)


def test_basic_grayscale_image_returns_bgr_and_focus_value():
    # 3x3 grayscale image, all mid-gray
    img = np.full((3, 3), 128, dtype=np.uint8)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img
    )  # 256μs -> 225μs (13.7% faster)


def test_basic_no_visualization_returns_input():
    # 3x3 image, all visualizations off
    img = np.full((3, 3, 3), 128, dtype=np.uint8)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img,
        show_zebra_warnings=False,
        grid_overlay="None",
        show_hud=False,
        show_focus_peaking=False,
        show_center_marker=False,
    )  # 48.7μs -> 47.1μs (3.42% faster)


def test_basic_grid_overlay_works():
    # 10x10 image, grid overlay 2x2
    img = np.full((10, 10, 3), 128, dtype=np.uint8)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img, grid_overlay="2x2"
    )  # 347μs -> 314μs (10.5% faster)


def test_basic_focus_peaking_overlay_changes_image():
    # 5x5 image with sharp edge
    img = np.zeros((5, 5, 3), dtype=np.uint8)
    img[:, :2] = 255  # left half white, right half black
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img, show_focus_peaking=True
    )  # 376μs -> 346μs (8.68% faster)


def test_basic_bbox_focus_measure():
    # 5x5 image, single bbox covering half
    img = np.full((5, 5, 3), 128, dtype=np.uint8)
    bbox = [[0, 0, 2, 5]]  # left half
    det = DummyDetections(bbox)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img, detections=det
    )  # 354μs -> 325μs (8.87% faster)


# EDGE TEST CASES


def test_edge_empty_image_raises():
    # Empty image should raise error (cv2 fails on empty)
    img = np.zeros((0, 0, 3), dtype=np.uint8)
    with pytest.raises(cv2.error):
        visualize_tenengrad_measure(img)  # 26.3μs -> 25.6μs (2.86% faster)


def test_edge_single_pixel_image():
    # 1x1 image, should not crash and return valid output
    img = np.full((1, 1, 3), 128, dtype=np.uint8)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img
    )  # 341μs -> 318μs (7.14% faster)


def test_edge_all_underexposed_zebra_overlay():
    # All pixels below underexposed threshold
    img = np.full((10, 10, 3), 0, dtype=np.uint8)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img,
        underexposed_threshold=16,
        overexposed_threshold=239,
        show_zebra_warnings=True,
    )  # 353μs -> 322μs (9.55% faster)


def test_edge_all_overexposed_zebra_overlay():
    # All pixels above overexposed threshold
    img = np.full((10, 10, 3), 255, dtype=np.uint8)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img,
        underexposed_threshold=16,
        overexposed_threshold=239,
        show_zebra_warnings=True,
    )  # 349μs -> 320μs (9.09% faster)


def test_edge_focus_peaking_no_gradient():
    # Flat image: focus measure should be zero
    img = np.full((10, 10, 3), 128, dtype=np.uint8)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img, show_focus_peaking=True
    )  # 343μs -> 312μs (9.79% faster)


def test_edge_bbox_out_of_bounds():
    # Bbox outside image should not crash
    img = np.full((10, 10, 3), 128, dtype=np.uint8)
    bbox = [[-10, -10, 20, 20]]  # way outside
    det = DummyDetections(bbox)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img, detections=det
    )  # 358μs -> 326μs (9.58% faster)


def test_edge_bbox_inverted_coords():
    # Bbox with x2 < x1 or y2 < y1 should be ignored
    img = np.full((10, 10, 3), 128, dtype=np.uint8)
    bbox = [[5, 5, 2, 2]]  # invalid bbox
    det = DummyDetections(bbox)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img, detections=det
    )  # 343μs -> 314μs (9.50% faster)


def test_edge_grid_overlay_invalid_string():
    # Unknown grid_overlay string disables grid
    img = np.full((10, 10, 3), 128, dtype=np.uint8)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img, grid_overlay="invalid"
    )  # 334μs -> 303μs (10.4% faster)


def test_edge_grayscale_input_with_visualizations():
    # Grayscale image, overlays enabled
    img = np.full((10, 10), 128, dtype=np.uint8)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img
    )  # 254μs -> 224μs (13.5% faster)


def test_edge_disable_all_visualizations_grid_overlay_none():
    # All overlays off, grid_overlay 'None'
    img = np.full((10, 10, 3), 128, dtype=np.uint8)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img,
        show_zebra_warnings=False,
        grid_overlay="None",
        show_hud=False,
        show_focus_peaking=False,
        show_center_marker=False,
    )  # 48.6μs -> 47.6μs (2.21% faster)


# LARGE SCALE TEST CASES


def test_large_image_performance_and_shape():
    # 512x512 image, overlays enabled
    img = np.full((512, 512, 3), 128, dtype=np.uint8)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img
    )  # 4.67ms -> 1.98ms (136% faster)


def test_large_image_with_many_bboxes():
    # 256x256 image, 100 bboxes
    img = np.full((256, 256, 3), 128, dtype=np.uint8)
    bboxes = []
    for i in range(100):
        x1 = i
        y1 = i
        x2 = min(255, i + 10)
        y2 = min(255, i + 10)
        bboxes.append([x1, y1, x2, y2])
    det = DummyDetections(bboxes)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img, detections=det
    )  # 1.94ms -> 1.62ms (19.7% faster)
    for val in bbox_focus:
        pass


def test_large_image_grid_overlay_and_focus_peaking():
    # 512x512 image, overlays enabled
    img = np.full((512, 512, 3), 128, dtype=np.uint8)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img, grid_overlay="5x5", show_focus_peaking=True
    )  # 4.80ms -> 1.99ms (141% faster)


def test_large_image_with_sharp_edge_high_focus():
    # 512x512 image, left half white, right half black
    img = np.zeros((512, 512, 3), dtype=np.uint8)
    img[:, :256] = 255
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img
    )  # 13.3ms -> 9.95ms (33.4% faster)


def test_large_image_disable_all_visualizations():
    # 512x512 image, all overlays off
    img = np.full((512, 512, 3), 128, dtype=np.uint8)
    out, focus, bbox_focus = visualize_tenengrad_measure(
        img,
        show_zebra_warnings=False,
        grid_overlay="None",
        show_hud=False,
        show_focus_peaking=False,
        show_center_marker=False,
    )  # 2.29ms -> 619μs (270% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
import cv2
import numpy as np

# imports
import pytest
from inference.core.workflows.core_steps.classical_cv.camera_focus.v2 import (
    visualize_tenengrad_measure,
)

# function to test (as provided above, not repeated here for brevity)

# --- UNIT TESTS FOR visualize_tenengrad_measure ---


# Helper: create a simple grayscale or color image
def make_image(shape=(32, 32), color=None, value=None):
    if color is not None:
        arr = np.full(shape + (3,), color, dtype=np.uint8)
    elif value is not None:
        arr = np.full(shape, value, dtype=np.uint8)
    else:
        arr = np.random.randint(0, 256, shape + (3,), dtype=np.uint8)
    return arr


# Helper: Dummy Detections class for bbox testing
class DummyDetections:
    def __init__(self, xyxy):
        self.xyxy = xyxy

    def __len__(self):
        return len(self.xyxy)


# ------------- BASIC TEST CASES -------------


def test_basic_color_image_returns_expected_shapes():
    """Test that output image has same shape, and focus_value is float, bbox list is empty."""
    img = make_image((32, 32), color=(120, 130, 140))
    out, focus, bbox = visualize_tenengrad_measure(img)  # 412μs -> 373μs (10.5% faster)


def test_basic_gray_image_input():
    """Test that grayscale input is handled and output is 3-channel."""
    img = make_image((32, 32), value=128)
    out, focus, bbox = visualize_tenengrad_measure(img)  # 311μs -> 273μs (14.2% faster)


def test_no_visualization_returns_input():
    """If all overlays are off, output should be input (or BGR-converted for gray)."""
    img = make_image((16, 16), color=(10, 20, 30))
    out, focus, bbox = visualize_tenengrad_measure(
        img,
        show_zebra_warnings=False,
        grid_overlay="None",
        show_hud=False,
        show_focus_peaking=False,
        show_center_marker=False,
    )  # 51.6μs -> 50.3μs (2.49% faster)


def test_focus_value_higher_for_sharp_edge():
    """A sharp edge image should have higher focus value than a blurred one."""
    img = np.zeros((32, 32, 3), dtype=np.uint8)
    img[:, :16] = 0
    img[:, 16:] = 255
    out1, focus1, _ = visualize_tenengrad_measure(img)  # 481μs -> 443μs (8.46% faster)
    img_blur = cv2.GaussianBlur(img, (7, 7), 0)
    out2, focus2, _ = visualize_tenengrad_measure(
        img_blur
    )  # 638μs -> 546μs (16.9% faster)


def test_bbox_focus_measures():
    """Test that bbox focus measures are returned and correct length."""
    img = make_image((32, 32), color=(100, 110, 120))
    det = DummyDetections([[0, 0, 16, 16], [16, 16, 32, 32]])
    out, focus, bbox = visualize_tenengrad_measure(
        img, detections=det
    )  # 425μs -> 389μs (9.36% faster)


# ------------- EDGE TEST CASES -------------


def test_all_black_image_focus_zero():
    """Completely black image should have focus value zero."""
    img = make_image((32, 32), color=(0, 0, 0))
    out, focus, bbox = visualize_tenengrad_measure(img)  # 431μs -> 393μs (9.40% faster)


def test_all_white_image_focus_zero():
    """Completely white image should have focus value zero."""
    img = make_image((32, 32), color=(255, 255, 255))
    out, focus, bbox = visualize_tenengrad_measure(img)  # 428μs -> 391μs (9.64% faster)


def test_small_image_1x1():
    """Test that 1x1 image does not crash and returns expected output."""
    img = make_image((1, 1), color=(123, 234, 56))
    out, focus, bbox = visualize_tenengrad_measure(img)  # 337μs -> 316μs (6.78% faster)


def test_odd_shape_image():
    """Test non-square image shape is handled."""
    img = make_image((17, 31), color=(50, 60, 70))
    out, focus, bbox = visualize_tenengrad_measure(img)  # 370μs -> 334μs (10.8% faster)


def test_invalid_grid_overlay_string():
    """Unknown grid_overlay string should not draw grid and not error."""
    img = make_image((32, 32), color=(10, 20, 30))
    out, focus, bbox = visualize_tenengrad_measure(
        img, grid_overlay="not_a_grid"
    )  # 390μs -> 353μs (10.6% faster)


def test_extreme_thresholds_disable_zebra():
    """Set thresholds so no pixels are under/overexposed; image remains unchanged except overlays."""
    img = make_image((16, 16), color=(100, 100, 100))
    out, focus, bbox = visualize_tenengrad_measure(
        img, underexposed_threshold=0, overexposed_threshold=255
    )  # 351μs -> 317μs (10.9% faster)


def test_focus_peaking_disabled():
    """If focus peaking is off, overlay should not have green highlights."""
    img = make_image((32, 32), color=(120, 130, 140))
    out1, _, _ = visualize_tenengrad_measure(
        img, show_focus_peaking=False
    )  # 387μs -> 349μs (10.9% faster)
    out2, _, _ = visualize_tenengrad_measure(
        img, show_focus_peaking=True
    )  # 364μs -> 328μs (11.0% faster)


def test_center_marker_disabled():
    """If center marker is off, overlay should not have central cross."""
    img = make_image((32, 32), color=(120, 130, 140))
    out1, _, _ = visualize_tenengrad_measure(
        img, show_center_marker=False
    )  # 386μs -> 348μs (10.7% faster)
    out2, _, _ = visualize_tenengrad_measure(
        img, show_center_marker=True
    )  # 363μs -> 325μs (11.8% faster)


def test_hud_overlay_disabled():
    """If HUD is off, overlay should not have HUD box."""
    img = make_image((32, 32), color=(120, 130, 140))
    out1, _, _ = visualize_tenengrad_measure(
        img, show_hud=False
    )  # 150μs -> 112μs (33.3% faster)
    out2, _, _ = visualize_tenengrad_measure(
        img, show_hud=True
    )  # 366μs -> 332μs (10.5% faster)


def test_grid_overlay_works():
    """Test that grid overlay draws lines (image changes)."""
    img = make_image((32, 32), color=(120, 130, 140))
    out1, _, _ = visualize_tenengrad_measure(
        img, grid_overlay="None"
    )  # 385μs -> 347μs (11.2% faster)
    out2, _, _ = visualize_tenengrad_measure(
        img, grid_overlay="3x3"
    )  # 364μs -> 327μs (11.4% faster)


def test_bbox_out_of_bounds():
    """Test that bbox outside image bounds is handled gracefully."""
    img = make_image((32, 32), color=(100, 110, 120))
    det = DummyDetections([[100, 100, 200, 200]])  # Out of bounds
    out, focus, bbox = visualize_tenengrad_measure(
        img, detections=det
    )  # 394μs -> 353μs (11.6% faster)


def test_bbox_partial_out_of_bounds():
    """Test that partially out-of-bounds bbox is clipped."""
    img = make_image((32, 32), color=(100, 110, 120))
    det = DummyDetections([[-10, -10, 10, 10]])
    out, focus, bbox = visualize_tenengrad_measure(
        img, detections=det
    )  # 405μs -> 370μs (9.62% faster)


# ------------- LARGE SCALE TEST CASES -------------


def test_large_image_performance_and_output():
    """Test function works on a large image (but <1000x1000 for speed)."""
    img = make_image((512, 512), color=(128, 128, 128))
    out, focus, bbox = visualize_tenengrad_measure(
        img
    )  # 5.46ms -> 1.99ms (175% faster)


def test_many_bboxes():
    """Test with many bounding boxes (up to 1000)."""
    img = make_image((128, 128), color=(50, 60, 70))
    boxes = []
    for i in range(50):  # 50 boxes
        x1, y1 = i, i
        x2, y2 = min(127, i + 10), min(127, i + 10)
        boxes.append([x1, y1, x2, y2])
    det = DummyDetections(boxes)
    out, focus, bbox = visualize_tenengrad_measure(
        img, detections=det
    )  # 1.04ms -> 944μs (10.6% faster)


def test_large_grid_overlay():
    """Test with a dense grid overlay (5x5)."""
    img = make_image((256, 256), color=(128, 128, 128))
    out, focus, bbox = visualize_tenengrad_measure(
        img, grid_overlay="5x5"
    )  # 1.15ms -> 796μs (44.6% faster)


def test_large_image_with_all_overlays():
    """Test with all overlays enabled on a large image."""
    img = make_image((512, 512), color=(200, 210, 220))
    out, focus, bbox = visualize_tenengrad_measure(
        img,
        show_zebra_warnings=True,
        grid_overlay="4x4",
        show_hud=True,
        show_focus_peaking=True,
        show_center_marker=True,
    )  # 4.81ms -> 1.99ms (141% faster)


# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr1857-2025-12-29T20.24.13 and push.

Codeflash Static Badge

The optimized code achieves a **56% speedup** through two main optimizations:

## 1. Zebra Mask Caching (Primary Speedup)
The original code called `_create_zebra_mask(gray.shape)` on every invocation of `_apply_zebra_warnings`, which involves expensive NumPy operations (`np.ogrid` and modulo arithmetic over the entire image). The optimized version introduces a module-level cache `_ZEBRA_MASK_CACHE` that stores zebra masks by shape, eliminating this redundant computation for repeated calls with the same image dimensions.

**Impact from line profiler:**
- Original: `_create_zebra_mask` took **15.2ms (59.1%)** of `_apply_zebra_warnings` time
- Optimized: Cache lookup + occasional mask creation takes only **5.3ms (28.6%)** on cache misses, near-zero on hits
- This optimization is especially effective in video processing or batch workflows where frames have consistent dimensions

## 2. In-Place Arithmetic in Tenengrad Computation
The original computed `focus_measure = gx**2 + gy**2`, creating three temporary arrays. The optimized version uses:
```python
np.square(gx, out=gx)
np.square(gy, out=gy)
np.add(gx, gy, out=gx)
```

**Impact from line profiler:**
- Original: `gx**2 + gy**2` took **8.74ms (38.7%)** of `_compute_tenengrad` time
- Optimized: In-place operations take **1.13ms (8.8%)** total
- Reduces memory allocations from ~3 full-size arrays to zero temporary arrays

## Performance Across Test Cases
The optimization shows consistent gains:
- **Large images (512×512)**: 136-175% faster - cache effectiveness shines with multiple calls
- **Small images**: 8-14% faster - in-place arithmetic benefits dominate
- **No visualization mode**: 270% faster - Tenengrad in-place optimization is isolated

## Workload Context
Based on `function_references`, `visualize_tenengrad_measure` is called from a workflow `run()` method that processes video frames or batched images. The zebra mask cache will be particularly beneficial when:
- Processing video streams with fixed resolution
- Running multiple workflow invocations on same-sized images
- The camera focus visualization is enabled (which is the common case per the function signature defaults)

The cache stores masks by shape tuple as keys, so memory usage grows only with the number of unique image dimensions encountered - typically 1-2 entries in production workflows.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Dec 29, 2025
@codeflash-ai codeflash-ai bot added ⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to codeflash labels Dec 29, 2025
@shntu
Copy link
Contributor

shntu commented Dec 29, 2025

This would create a cache that would mostly be cache misses, because the images are likely to have slightly different masks for exposure warnings - closing.

@shntu shntu closed this Dec 29, 2025
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr1857-2025-12-29T20.24.13 branch December 29, 2025 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI 🎯 Quality: High Optimization Quality according to codeflash

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants