-
Notifications
You must be signed in to change notification settings - Fork 243
feature: Add Text Display Visualization Block #1895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Introduced a new visualization block for displaying customizable text on images. - Added utility functions for text layout and drawing.
|
bugbot run |
| box_x = 0 if box_w > img_w else max(0, min(box_x, img_w - box_w)) | ||
| box_y = 0 if box_h > img_h else max(0, min(box_y, img_h - box_h)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡️Codeflash found 180% (1.80x) speedup for clamp_box in inference/core/workflows/core_steps/visualizations/text_display/utils.py
⏱️ Runtime : 1.03 milliseconds → 369 microseconds (best of 195 runs)
📝 Explanation and details
The optimization replaces nested max(0, min(...)) function calls with explicit if-elif chains, yielding a 180% speedup (1.03ms → 369μs).
Key Performance Gains:
-
Eliminates redundant comparisons: The original code always evaluates
max(0, min(box_x, img_w - box_w))even whenbox_w > img_wmakes the result deterministic (0). The optimized version short-circuits with early returns. -
Reduces function call overhead: Python function calls (
max,min) carry overhead. The optimized version uses direct comparisons and assignments, which are faster primitive operations. -
Better branch prediction: The if-elif chain provides clearer branching patterns that modern CPUs can predict more effectively than nested function calls.
Test Case Performance:
- Best speedups (150-200%): Cases where boxes fit within bounds or need simple clamping (most common scenarios)
- Slight regressions (1-10% slower): Cases where
box_w > img_w AND box_h > img_h(rare edge case requiring both early exits) - Stress tests: Show consistent 160-220% improvements, indicating the optimization scales well
Impact on Workloads:
The function is called from compute_layout() during text overlay rendering—a common operation in computer vision pipelines. Since text bounding boxes typically fit within image bounds (the common case), this optimization directly benefits the hot path. The 180% speedup means text visualization workflows can process ~2.8x more frames or annotations per second, significantly improving throughput in real-time video processing or batch annotation tasks.
✅ Correctness verification report:
| Test | Status |
|---|---|
| ⏪ Replay Tests | 🔘 None Found |
| ⚙️ Existing Unit Tests | 🔘 None Found |
| 🔎 Concolic Coverage Tests | 🔘 None Found |
| 🌀 Generated Regression Tests | ✅ 1210 Passed |
| 📊 Tests Coverage | 100.0% |
🌀 Click to see Generated Regression Tests
import pytest # used for our unit tests
from inference.core.workflows.core_steps.visualizations.text_display.utils import (
clamp_box,
)
# unit tests
# -------------------------
# Basic Test Cases
# -------------------------
def test_box_fits_within_image_top_left_corner():
# Box fits entirely within image, positioned at top-left
codeflash_output = clamp_box(
0, 0, 10, 10, 100, 100
) # 1.81μs -> 743ns (143% faster)
def test_box_fits_within_image_center():
# Box fits entirely within image, positioned at center
codeflash_output = clamp_box(
45, 45, 10, 10, 100, 100
) # 1.76μs -> 678ns (159% faster)
def test_box_near_right_edge():
# Box near right edge, should clamp to img_w - box_w
codeflash_output = clamp_box(
95, 10, 10, 10, 100, 100
) # 1.77μs -> 853ns (107% faster)
def test_box_near_bottom_edge():
# Box near bottom edge, should clamp to img_h - box_h
codeflash_output = clamp_box(
10, 95, 10, 10, 100, 100
) # 1.76μs -> 803ns (119% faster)
def test_box_exactly_fills_image():
# Box exactly fills image, should be placed at (0, 0)
codeflash_output = clamp_box(
0, 0, 100, 100, 100, 100
) # 1.62μs -> 649ns (150% faster)
# -------------------------
# Edge Test Cases
# -------------------------
def test_box_width_larger_than_image():
# Box width larger than image width, should clamp to x=0
codeflash_output = clamp_box(
50, 10, 120, 10, 100, 100
) # 1.42μs -> 669ns (113% faster)
def test_box_height_larger_than_image():
# Box height larger than image height, should clamp to y=0
codeflash_output = clamp_box(
10, 50, 10, 120, 100, 100
) # 1.40μs -> 689ns (103% faster)
def test_box_width_and_height_larger_than_image():
# Both box width and height larger than image, should clamp to (0, 0)
codeflash_output = clamp_box(
50, 50, 120, 120, 100, 100
) # 608ns -> 635ns (4.25% slower)
def test_box_negative_position_clamped_to_zero():
# Negative box position, should clamp to (0, 0)
codeflash_output = clamp_box(
-10, -10, 10, 10, 100, 100
) # 2.04μs -> 738ns (176% faster)
def test_box_position_exceeds_image_bounds():
# Box position outside image, should clamp to max allowed position
codeflash_output = clamp_box(
200, 200, 10, 10, 100, 100
) # 1.77μs -> 844ns (109% faster)
def test_box_zero_width_and_height():
# Box with zero width/height, should clamp to (0, 0)
codeflash_output = clamp_box(
10, 10, 0, 0, 100, 100
) # 1.73μs -> 674ns (157% faster)
def test_box_width_equal_to_image_width():
# Box width equals image width, should clamp to x=0
codeflash_output = clamp_box(
50, 10, 100, 10, 100, 100
) # 1.75μs -> 779ns (124% faster)
def test_box_height_equal_to_image_height():
# Box height equals image height, should clamp to y=0
codeflash_output = clamp_box(
10, 50, 10, 100, 100, 100
) # 1.74μs -> 744ns (134% faster)
def test_box_at_maximum_possible_position():
# Box at maximum possible position
codeflash_output = clamp_box(
90, 90, 10, 10, 100, 100
) # 1.65μs -> 621ns (165% faster)
def test_box_minimum_size_at_maximum_position():
# Box of size 1 at maximum position
codeflash_output = clamp_box(
99, 99, 1, 1, 100, 100
) # 1.62μs -> 632ns (156% faster)
def test_box_position_and_size_zero():
# All values zero, should clamp to (0, 0)
codeflash_output = clamp_box(0, 0, 0, 0, 0, 0) # 1.60μs -> 628ns (155% faster)
def test_box_size_zero_with_nonzero_image():
# Box size zero, image size nonzero, position arbitrary
codeflash_output = clamp_box(
50, 50, 0, 0, 100, 100
) # 1.66μs -> 622ns (167% faster)
def test_box_size_equals_image_size_and_position_nonzero():
# Box size equals image size, position nonzero, should clamp to (0, 0)
codeflash_output = clamp_box(
10, 10, 100, 100, 100, 100
) # 1.63μs -> 781ns (109% faster)
def test_box_size_just_one_less_than_image():
# Box size just one less than image, position at edge
codeflash_output = clamp_box(
99, 99, 99, 99, 100, 100
) # 1.64μs -> 730ns (125% faster)
def test_box_size_one_with_large_image():
# Box size one, image size large, position at edge
codeflash_output = clamp_box(
999, 999, 1, 1, 1000, 1000
) # 2.08μs -> 913ns (128% faster)
def test_box_negative_size():
# Negative box size, should clamp position to (0, 0)
codeflash_output = clamp_box(
10, 10, -10, -10, 100, 100
) # 1.87μs -> 788ns (138% faster)
def test_image_size_zero():
# Image size zero, box size nonzero, should clamp to (0, 0)
codeflash_output = clamp_box(10, 10, 10, 10, 0, 0) # 636ns -> 696ns (8.62% slower)
def test_box_size_larger_than_zero_image():
# Box size larger than zero image, should clamp to (0, 0)
codeflash_output = clamp_box(10, 10, 10, 10, 0, 0) # 628ns -> 649ns (3.24% slower)
# -------------------------
# Large Scale Test Cases
# -------------------------
def test_large_box_and_image():
# Large box and image, box fits inside
codeflash_output = clamp_box(
500, 500, 100, 100, 1000, 1000
) # 2.12μs -> 872ns (143% faster)
def test_large_box_exceeds_image_bounds():
# Large box, position exceeds image bounds, should clamp to max allowed
codeflash_output = clamp_box(
950, 950, 100, 100, 1000, 1000
) # 1.95μs -> 951ns (105% faster)
def test_large_box_larger_than_image():
# Box larger than image, should clamp to (0, 0)
codeflash_output = clamp_box(
500, 500, 2000, 2000, 1000, 1000
) # 691ns -> 702ns (1.57% slower)
def test_large_box_zero_size():
# Large image, box size zero, position arbitrary
codeflash_output = clamp_box(
999, 999, 0, 0, 1000, 1000
) # 1.96μs -> 797ns (146% faster)
def test_large_box_negative_position():
# Large image, negative box position, should clamp to (0, 0)
codeflash_output = clamp_box(
-100, -100, 100, 100, 1000, 1000
) # 1.94μs -> 740ns (162% faster)
def test_large_box_near_right_bottom_edge():
# Large image, box near right and bottom edge
codeflash_output = clamp_box(
995, 995, 10, 10, 1000, 1000
) # 1.86μs -> 915ns (103% faster)
def test_large_box_width_equal_to_image():
# Large image, box width equal to image width, should clamp to x=0
codeflash_output = clamp_box(
500, 500, 1000, 10, 1000, 1000
) # 2.14μs -> 911ns (135% faster)
def test_large_box_height_equal_to_image():
# Large image, box height equal to image height, should clamp to y=0
codeflash_output = clamp_box(
500, 500, 10, 1000, 1000, 1000
) # 2.13μs -> 879ns (143% faster)
def test_many_boxes_in_bounds():
# Test many boxes within bounds to check performance and correctness
for i in range(0, 1000, 100):
codeflash_output = clamp_box(
i, i, 10, 10, 1000, 1000
) # 10.3μs -> 3.63μs (184% faster)
def test_many_boxes_out_of_bounds():
# Test many boxes out of bounds to check clamping
for i in range(900, 1100, 10):
codeflash_output = clamp_box(
i, i, 100, 100, 1000, 1000
) # 18.2μs -> 6.99μs (160% faster)
def test_many_boxes_larger_than_image():
# Test many boxes larger than image
for w in range(1001, 1100, 10):
for h in range(1001, 1100, 10):
codeflash_output = clamp_box(50, 50, w, h, 1000, 1000)
def test_many_boxes_zero_size():
# Test many boxes with zero size
for i in range(0, 1000, 100):
codeflash_output = clamp_box(
i, i, 0, 0, 1000, 1000
) # 10.2μs -> 3.52μs (188% faster)
# -------------------------
# Mutation-sensitive cases
# -------------------------
def test_mutation_sensitive_x_clamping():
# If the x clamping logic is changed, this should fail
codeflash_output = clamp_box(
999, 10, 10, 10, 1000, 1000
) # 1.90μs -> 887ns (114% faster)
def test_mutation_sensitive_y_clamping():
# If the y clamping logic is changed, this should fail
codeflash_output = clamp_box(
10, 999, 10, 10, 1000, 1000
) # 1.96μs -> 884ns (121% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.import pytest # used for our unit tests
from inference.core.workflows.core_steps.visualizations.text_display.utils import (
clamp_box,
)
# function to test
# (already imported above)
# unit tests
class TestClampBoxBasicCases:
"""Test basic functionality under normal conditions."""
def test_box_at_origin(self):
"""Test box positioned at (0, 0) - should remain unchanged."""
# Box at origin within a 100x100 image
box_x, box_y = clamp_box(
0, 0, 50, 50, 100, 100
) # 1.93μs -> 738ns (162% faster)
def test_box_in_middle(self):
"""Test box positioned in the middle of the image - should remain unchanged."""
# 20x20 box at position (40, 40) in a 100x100 image
box_x, box_y = clamp_box(
40, 40, 20, 20, 100, 100
) # 1.86μs -> 683ns (172% faster)
def test_box_at_maximum_valid_position(self):
"""Test box at the rightmost and bottommost valid position."""
# 30x30 box at position (70, 70) in a 100x100 image (box ends at 100, 100)
box_x, box_y = clamp_box(
70, 70, 30, 30, 100, 100
) # 1.78μs -> 652ns (173% faster)
def test_box_at_right_edge(self):
"""Test box positioned exactly at the right edge."""
# 50x20 box at position (50, 10) in a 100x100 image
box_x, box_y = clamp_box(
50, 10, 50, 20, 100, 100
) # 1.81μs -> 675ns (169% faster)
def test_box_at_bottom_edge(self):
"""Test box positioned exactly at the bottom edge."""
# 20x50 box at position (10, 50) in a 100x100 image
box_x, box_y = clamp_box(
10, 50, 20, 50, 100, 100
) # 1.75μs -> 640ns (174% faster)
class TestClampBoxOutOfBoundsCases:
"""Test behavior when box position is out of bounds."""
def test_negative_x_position(self):
"""Test box with negative x coordinate - should clamp to 0."""
# 30x30 box at position (-10, 20) in a 100x100 image
box_x, box_y = clamp_box(
-10, 20, 30, 30, 100, 100
) # 2.10μs -> 828ns (154% faster)
def test_negative_y_position(self):
"""Test box with negative y coordinate - should clamp to 0."""
# 30x30 box at position (20, -10) in a 100x100 image
box_x, box_y = clamp_box(
20, -10, 30, 30, 100, 100
) # 1.98μs -> 759ns (161% faster)
def test_both_negative_positions(self):
"""Test box with both negative coordinates - should clamp both to 0."""
# 30x30 box at position (-5, -15) in a 100x100 image
box_x, box_y = clamp_box(
-5, -15, 30, 30, 100, 100
) # 1.93μs -> 801ns (141% faster)
def test_x_beyond_right_edge(self):
"""Test box positioned beyond the right edge - should clamp x."""
# 30x30 box at position (80, 20) in a 100x100 image (would end at 110)
box_x, box_y = clamp_box(
80, 20, 30, 30, 100, 100
) # 1.74μs -> 807ns (116% faster)
def test_y_beyond_bottom_edge(self):
"""Test box positioned beyond the bottom edge - should clamp y."""
# 30x30 box at position (20, 80) in a 100x100 image (would end at 110)
box_x, box_y = clamp_box(
20, 80, 30, 30, 100, 100
) # 1.70μs -> 764ns (122% faster)
def test_both_beyond_edges(self):
"""Test box positioned beyond both right and bottom edges."""
# 30x30 box at position (90, 85) in a 100x100 image
box_x, box_y = clamp_box(
90, 85, 30, 30, 100, 100
) # 1.70μs -> 800ns (112% faster)
def test_far_beyond_edges(self):
"""Test box positioned very far beyond image bounds."""
# 20x20 box at position (1000, 2000) in a 100x100 image
box_x, box_y = clamp_box(
1000, 2000, 20, 20, 100, 100
) # 1.84μs -> 895ns (105% faster)
class TestClampBoxOversizedBoxCases:
"""Test behavior when box is larger than the image."""
def test_box_wider_than_image(self):
"""Test box wider than image - x should be clamped to 0."""
# 150x30 box at position (10, 20) in a 100x100 image
box_x, box_y = clamp_box(
10, 20, 150, 30, 100, 100
) # 1.42μs -> 703ns (101% faster)
def test_box_taller_than_image(self):
"""Test box taller than image - y should be clamped to 0."""
# 30x150 box at position (20, 10) in a 100x100 image
box_x, box_y = clamp_box(
20, 10, 30, 150, 100, 100
) # 1.45μs -> 673ns (115% faster)
def test_box_larger_in_both_dimensions(self):
"""Test box larger than image in both dimensions - both should be 0."""
# 200x200 box at position (50, 50) in a 100x100 image
box_x, box_y = clamp_box(
50, 50, 200, 200, 100, 100
) # 587ns -> 650ns (9.69% slower)
def test_box_exactly_wider_than_image(self):
"""Test box exactly one pixel wider than image."""
# 101x50 box at position (5, 10) in a 100x100 image
box_x, box_y = clamp_box(
5, 10, 101, 50, 100, 100
) # 1.42μs -> 676ns (109% faster)
def test_box_exactly_taller_than_image(self):
"""Test box exactly one pixel taller than image."""
# 50x101 box at position (10, 5) in a 100x100 image
box_x, box_y = clamp_box(
10, 5, 50, 101, 100, 100
) # 1.40μs -> 687ns (104% faster)
def test_oversized_box_with_negative_position(self):
"""Test oversized box with negative position - should still clamp to 0."""
# 150x150 box at position (-10, -20) in a 100x100 image
box_x, box_y = clamp_box(
-10, -20, 150, 150, 100, 100
) # 607ns -> 616ns (1.46% slower)
class TestClampBoxEdgeCases:
"""Test edge cases and boundary conditions."""
def test_box_same_size_as_image(self):
"""Test box exactly the same size as image - only valid position is (0, 0)."""
# 100x100 box in a 100x100 image
box_x, box_y = clamp_box(
0, 0, 100, 100, 100, 100
) # 1.76μs -> 672ns (162% faster)
def test_box_same_size_as_image_with_offset(self):
"""Test box same size as image but with non-zero position - should clamp to 0."""
# 100x100 box at position (10, 20) in a 100x100 image
box_x, box_y = clamp_box(
10, 20, 100, 100, 100, 100
) # 1.77μs -> 789ns (124% faster)
def test_zero_width_box(self):
"""Test box with zero width."""
# 0x50 box at position (50, 25) in a 100x100 image
box_x, box_y = clamp_box(
50, 25, 0, 50, 100, 100
) # 1.71μs -> 641ns (166% faster)
def test_zero_height_box(self):
"""Test box with zero height."""
# 50x0 box at position (25, 50) in a 100x100 image
box_x, box_y = clamp_box(
25, 50, 50, 0, 100, 100
) # 1.64μs -> 629ns (160% faster)
def test_zero_size_box(self):
"""Test box with zero width and height."""
# 0x0 box at position (30, 40) in a 100x100 image
box_x, box_y = clamp_box(
30, 40, 0, 0, 100, 100
) # 1.67μs -> 617ns (171% faster)
def test_zero_width_image(self):
"""Test with zero width image."""
# 50x50 box at position (10, 10) in a 0x100 image
box_x, box_y = clamp_box(
10, 10, 50, 50, 0, 100
) # 1.46μs -> 694ns (110% faster)
def test_zero_height_image(self):
"""Test with zero height image."""
# 50x50 box at position (10, 10) in a 100x0 image
box_x, box_y = clamp_box(
10, 10, 50, 50, 100, 0
) # 1.44μs -> 693ns (108% faster)
def test_zero_size_image(self):
"""Test with zero size image."""
# 50x50 box at position (10, 10) in a 0x0 image
box_x, box_y = clamp_box(10, 10, 50, 50, 0, 0) # 627ns -> 607ns (3.29% faster)
def test_one_pixel_box(self):
"""Test 1x1 pixel box."""
# 1x1 box at position (50, 60) in a 100x100 image
box_x, box_y = clamp_box(
50, 60, 1, 1, 100, 100
) # 1.76μs -> 660ns (166% faster)
def test_one_pixel_box_at_edge(self):
"""Test 1x1 pixel box at maximum position."""
# 1x1 box at position (99, 99) in a 100x100 image
box_x, box_y = clamp_box(
99, 99, 1, 1, 100, 100
) # 1.77μs -> 619ns (186% faster)
def test_one_pixel_box_beyond_edge(self):
"""Test 1x1 pixel box beyond edge."""
# 1x1 box at position (100, 100) in a 100x100 image
box_x, box_y = clamp_box(
100, 100, 1, 1, 100, 100
) # 1.68μs -> 779ns (115% faster)
def test_one_pixel_image(self):
"""Test with 1x1 pixel image."""
# 1x1 box at position (0, 0) in a 1x1 image
box_x, box_y = clamp_box(0, 0, 1, 1, 1, 1) # 1.67μs -> 637ns (162% faster)
class TestClampBoxLargeScaleCases:
"""Test performance and scalability with large data samples."""
def test_very_large_image_dimensions(self):
"""Test with very large image dimensions (4K resolution)."""
# 100x100 box at position (2000, 1500) in a 3840x2160 image
box_x, box_y = clamp_box(
2000, 1500, 100, 100, 3840, 2160
) # 2.12μs -> 900ns (135% faster)
def test_very_large_image_with_clamping(self):
"""Test clamping with very large image dimensions."""
# 200x200 box at position (10000, 5000) in a 3840x2160 image
box_x, box_y = clamp_box(
10000, 5000, 200, 200, 3840, 2160
) # 1.97μs -> 972ns (102% faster)
def test_8k_resolution_image(self):
"""Test with 8K resolution image."""
# 500x500 box at position (4000, 2000) in a 7680x4320 image
box_x, box_y = clamp_box(
4000, 2000, 500, 500, 7680, 4320
) # 1.92μs -> 829ns (132% faster)
def test_extremely_large_box_on_large_image(self):
"""Test very large box on large image."""
# 5000x3000 box at position (1000, 500) in a 7680x4320 image
box_x, box_y = clamp_box(
1000, 500, 5000, 3000, 7680, 4320
) # 1.82μs -> 726ns (151% faster)
def test_oversized_box_on_large_image(self):
"""Test oversized box on large image."""
# 10000x5000 box at position (100, 200) in a 7680x4320 image
box_x, box_y = clamp_box(
100, 200, 10000, 5000, 7680, 4320
) # 684ns -> 694ns (1.44% slower)
def test_multiple_clamp_operations(self):
"""Test multiple clamping operations to ensure consistency."""
# Perform 500 clamping operations with various parameters
for i in range(500):
box_x, box_y = clamp_box(
i * 2, i * 3, 50, 50, 1000, 1000
) # 427μs -> 140μs (205% faster)
def test_rapid_clamping_with_varying_positions(self):
"""Test rapid clamping with varying positions."""
# Test 300 different positions
for x in range(0, 3000, 10):
box_x, box_y = clamp_box(
x, x // 2, 100, 100, 1920, 1080
) # 258μs -> 86.8μs (198% faster)
def test_large_negative_positions(self):
"""Test with very large negative positions."""
# Box at position (-10000, -5000) in a 1920x1080 image
box_x, box_y = clamp_box(
-10000, -5000, 200, 150, 1920, 1080
) # 1.86μs -> 726ns (157% faster)
def test_large_positive_positions(self):
"""Test with very large positive positions."""
# Box at position (1000000, 500000) in a 1920x1080 image
box_x, box_y = clamp_box(
1000000, 500000, 200, 150, 1920, 1080
) # 1.73μs -> 841ns (106% faster)
def test_stress_test_various_box_sizes(self):
"""Stress test with various box sizes."""
# Test 200 different box sizes
img_w, img_h = 2000, 2000
for size in range(10, 2010, 10):
box_x, box_y = clamp_box(
100, 100, size, size, img_w, img_h
) # 174μs -> 54.7μs (219% faster)
if size > img_w:
pass
else:
pass
if size > img_h:
pass
else:
pass
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.To test or edit this optimization locally git merge codeflash/optimize-pr1895-2026-01-08T17.52.47
| box_x = 0 if box_w > img_w else max(0, min(box_x, img_w - box_w)) | |
| box_y = 0 if box_h > img_h else max(0, min(box_y, img_h - box_h)) | |
| if box_w > img_w: | |
| box_x = 0 | |
| elif box_x < 0: | |
| box_x = 0 | |
| elif box_x > img_w - box_w: | |
| box_x = img_w - box_w | |
| if box_h > img_h: | |
| box_y = 0 | |
| elif box_y < 0: | |
| box_y = 0 | |
| elif box_y > img_h - box_h: | |
| box_y = img_h - box_h | |
| blended = cv2.addWeighted(overlay, alpha, roi, 1 - alpha, 0) | ||
|
|
||
| # Write blended result back to image | ||
| img[y1_clamped:y2_clamped, x1_clamped:x2_clamped] = blended |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡️Codeflash found 18% (0.18x) speedup for draw_background_with_alpha in inference/core/workflows/core_steps/visualizations/text_display/utils.py
⏱️ Runtime : 3.03 milliseconds → 2.57 milliseconds (best of 250 runs)
📝 Explanation and details
The optimized code achieves a 17% speedup through a single, impactful change in the draw_background_with_alpha function:
Key Optimization: In-Place Alpha Blending
The critical change is replacing:
blended = cv2.addWeighted(overlay, alpha, roi, 1 - alpha, 0)
img[y1_clamped:y2_clamped, x1_clamped:x2_clamped] = blendedwith:
cv2.addWeighted(overlay, alpha, roi, 1 - alpha, 0, dst=roi)
img[y1_clamped:y2_clamped, x1_clamped:x2_clamped] = roiWhy this is faster:
-
Eliminates temporary array allocation: The original code creates a new
blendedarray to store the result. By using thedst=roiparameter,cv2.addWeightedwrites directly into the existingroiarray, eliminating one memory allocation. -
Reduces memory operations: Line profiler shows the
cv2.addWeightedcall drops from ~1.04ms to ~0.69ms (33% faster), and the subsequent assignment operation drops from ~0.42ms to ~0.29ms (31% faster). -
Better cache locality: Since
roiis a view into the original image array, writing directly to it keeps the data in cache rather than creating a separate result buffer.
Performance Impact Analysis
Based on the function_references, draw_background_with_alpha is called from draw_background, which is likely in the rendering path for text display visualizations. The optimization particularly benefits:
- Large rectangles (e.g.,
test_large_full_image_rectangle: 133% faster on 500×500 images) - The memory savings compound with larger regions - Repeated operations (e.g.,
test_large_performance_multiple_calls: 4.5% faster over 50 calls) - Reduced GC pressure accumulates over many draws - Alpha blending scenarios where
background_opacity < 1.0- All alpha-blended backgrounds benefit from this optimization
The optimization has minimal impact on small rectangles (often <2% change) but provides substantial gains when drawing larger backgrounds, making it valuable for typical visualization workloads where text overlays with semi-transparent backgrounds are common.
✅ Correctness verification report:
| Test | Status |
|---|---|
| ⏪ Replay Tests | 🔘 None Found |
| ⚙️ Existing Unit Tests | 🔘 None Found |
| 🔎 Concolic Coverage Tests | 🔘 None Found |
| 🌀 Generated Regression Tests | ✅ 176 Passed |
| 📊 Tests Coverage | 93.9% |
🌀 Click to see Generated Regression Tests
import cv2
import numpy as np
# imports
import pytest
from inference.core.workflows.core_steps.visualizations.text_display.utils import (
draw_background_with_alpha,
)
# Helper function to create a blank image
def blank_img(width, height, color=(0, 0, 0)):
"""Create a blank image of the given color (BGR)."""
arr = np.zeros((height, width, 3), dtype=np.uint8)
arr[:, :] = color
return arr
# Helper function to check if two images are equal
def images_equal(img1, img2):
return np.array_equal(img1, img2)
# =======================
# BASIC TEST CASES
# =======================
def test_basic_full_alpha_rectangle():
"""Test drawing a solid rectangle with alpha=1 (should fully overwrite region)."""
img = blank_img(10, 10, color=(0, 0, 0))
draw_background_with_alpha(
img, (2, 2), (7, 7), (10, 20, 30), alpha=1.0, border_radius=0
) # 15.4μs -> 14.8μs (4.10% faster)
roi = img[2:7, 2:7]
def test_basic_zero_alpha_rectangle():
"""Test drawing with alpha=0 (should not change the image)."""
img = blank_img(10, 10, color=(50, 60, 70))
img_before = img.copy()
draw_background_with_alpha(
img, (2, 2), (7, 7), (100, 110, 120), alpha=0.0, border_radius=0
) # 14.3μs -> 14.2μs (0.253% faster)
def test_basic_half_alpha_rectangle():
"""Test drawing with alpha=0.5 (should blend colors equally)."""
img = blank_img(10, 10, color=(100, 100, 100))
draw_background_with_alpha(
img, (0, 0), (10, 10), (200, 0, 0), alpha=0.5, border_radius=0
) # 13.9μs -> 14.1μs (1.38% slower)
# Center pixel should be average of (100,100,100) and (200,0,0)
expected = np.array([150, 50, 50], dtype=np.uint8)
def test_basic_rounded_rectangle():
"""Test that drawing with border_radius>0 does not raise and modifies image."""
img = blank_img(20, 20, color=(0, 0, 0))
draw_background_with_alpha(
img, (5, 5), (15, 15), (0, 255, 0), alpha=1.0, border_radius=4
) # 27.1μs -> 27.1μs (0.285% faster)
def test_basic_non_square_rectangle():
"""Test drawing a non-square rectangle."""
img = blank_img(20, 10, color=(0, 0, 0))
draw_background_with_alpha(
img, (2, 1), (18, 8), (123, 222, 111), alpha=1.0, border_radius=0
) # 13.8μs -> 13.6μs (1.53% faster)
roi = img[1:8, 2:18]
# =======================
# EDGE TEST CASES
# =======================
def test_edge_rectangle_outside_image():
"""Rectangle completely outside image should not change the image."""
img = blank_img(10, 10, color=(10, 20, 30))
img_before = img.copy()
draw_background_with_alpha(
img, (20, 20), (30, 30), (255, 0, 0), alpha=1.0, border_radius=0
) # 2.59μs -> 2.65μs (2.38% slower)
def test_edge_rectangle_partially_outside_image():
"""Rectangle partially outside image should be clamped to image bounds."""
img = blank_img(10, 10, color=(0, 0, 0))
draw_background_with_alpha(
img, (-5, -5), (5, 5), (255, 255, 255), alpha=1.0, border_radius=0
) # 14.8μs -> 14.6μs (1.23% faster)
# Only top-left 5x5 should be white
roi = img[0:5, 0:5]
def test_edge_zero_area_rectangle():
"""Rectangle with zero area should not modify the image."""
img = blank_img(10, 10, color=(1, 2, 3))
img_before = img.copy()
draw_background_with_alpha(
img, (5, 5), (5, 10), (10, 20, 30), alpha=1.0, border_radius=0
) # 2.48μs -> 2.71μs (8.21% slower)
draw_background_with_alpha(
img, (5, 5), (10, 5), (10, 20, 30), alpha=1.0, border_radius=0
) # 1.59μs -> 1.52μs (4.07% faster)
def test_edge_negative_border_radius():
"""Negative border_radius should be treated as 0 (rectangle)."""
img = blank_img(10, 10, color=(0, 0, 0))
draw_background_with_alpha(
img, (2, 2), (8, 8), (50, 100, 150), alpha=1.0, border_radius=-5
) # 14.8μs -> 14.7μs (0.993% faster)
roi = img[2:8, 2:8]
def test_edge_large_border_radius():
"""border_radius larger than half the rect min side should be clamped."""
img = blank_img(10, 10, color=(0, 0, 0))
# border_radius=100, but max possible is 3 for a 7x7 rect
draw_background_with_alpha(
img, (2, 2), (9, 9), (100, 200, 50), alpha=1.0, border_radius=100
) # 26.2μs -> 26.5μs (0.974% slower)
def test_edge_alpha_out_of_bounds():
"""Alpha < 0 should act as 0, alpha > 1 as 1 (cv2.addWeighted clamps)."""
img = blank_img(10, 10, color=(10, 10, 10))
img_copy = img.copy()
draw_background_with_alpha(
img, (0, 0), (10, 10), (200, 0, 0), alpha=-0.5, border_radius=0
) # 13.8μs -> 13.4μs (2.99% faster)
draw_background_with_alpha(
img, (0, 0), (10, 10), (200, 0, 0), alpha=2.0, border_radius=0
) # 6.73μs -> 6.54μs (2.89% faster)
def test_edge_single_pixel_rectangle():
"""Test drawing a 1x1 rectangle."""
img = blank_img(5, 5, color=(0, 0, 0))
draw_background_with_alpha(
img, (2, 2), (3, 3), (255, 100, 50), alpha=1.0, border_radius=0
) # 13.1μs -> 13.0μs (0.600% faster)
def test_edge_rectangle_touching_image_border():
"""Test rectangle exactly on the image border."""
img = blank_img(5, 5, color=(0, 0, 0))
draw_background_with_alpha(
img, (0, 0), (5, 5), (11, 22, 33), alpha=1.0, border_radius=0
) # 13.3μs -> 13.0μs (2.04% faster)
# =======================
# LARGE SCALE TEST CASES
# =======================
def test_large_full_image_rectangle():
"""Draw a rectangle covering the whole image (500x500)."""
img = blank_img(500, 500, color=(10, 20, 30))
draw_background_with_alpha(
img, (0, 0), (500, 500), (100, 150, 200), alpha=0.7, border_radius=0
) # 696μs -> 298μs (133% faster)
# Test a few random pixels for correct blending
expected = (0.7 * np.array([100, 150, 200]) + 0.3 * np.array([10, 20, 30])).astype(
np.uint8
)
for y in [0, 250, 499]:
for x in [0, 250, 499]:
pass
def test_large_many_small_rectangles():
"""Draw many small rectangles over a large image."""
img = blank_img(100, 100, color=(0, 0, 0))
for i in range(0, 100, 10):
for j in range(0, 100, 10):
draw_background_with_alpha(
img, (i, j), (i + 10, j + 10), (i, j, 255), alpha=1.0, border_radius=3
)
# Test that the center of each square is colored and not black
for i in range(5, 100, 10):
for j in range(5, 100, 10):
pass
def test_large_performance_multiple_calls():
"""Test many sequential calls do not crash or slow down."""
img = blank_img(50, 50, color=(0, 0, 0))
for i in range(50):
draw_background_with_alpha(
img,
(i, 0),
(min(i + 10, 50), 50),
(i * 5 % 256, i * 7 % 256, i * 11 % 256),
alpha=0.3 + 0.01 * i,
border_radius=i % 6,
) # 734μs -> 702μs (4.54% faster)
def test_large_alpha_gradient():
"""Draw rectangles with increasing alpha to form a gradient."""
img = blank_img(100, 10, color=(0, 0, 0))
for i in range(10):
draw_background_with_alpha(
img,
(i * 10, 0),
((i + 1) * 10, 10),
(255, 0, 0),
alpha=i / 9,
border_radius=0,
) # 66.7μs -> 63.7μs (4.59% faster)
# Middle should be blended
mid = img[5, 45]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.To test or edit this optimization locally git merge codeflash/optimize-pr1895-2026-01-08T17.56.22
| blended = cv2.addWeighted(overlay, alpha, roi, 1 - alpha, 0) | |
| # Write blended result back to image | |
| img[y1_clamped:y2_clamped, x1_clamped:x2_clamped] = blended | |
| cv2.addWeighted(overlay, alpha, roi, 1 - alpha, 0, dst=roi) | |
| # Write blended result back to image | |
| img[y1_clamped:y2_clamped, x1_clamped:x2_clamped] = roi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Bugbot reviewed your changes and found no bugs!
…nference into feature-text-display-workflow
inference/core/workflows/core_steps/visualizations/text_display/utils.py
Show resolved
Hide resolved
inference/core/workflows/core_steps/visualizations/text_display/utils.py
Show resolved
Hide resolved
inference/core/workflows/core_steps/visualizations/text_display/utils.py
Show resolved
Hide resolved
…y/v1.py Co-authored-by: Grzegorz Klimaszewski <166530809+grzegorz-roboflow@users.noreply.github.com>
Resolves DG-1
Description
Please include a summary of the change and which issue is fixed or implemented. Please also include relevant motivation and context (e.g. links, docs, tickets etc.).
List any dependencies that are required for this change.
Type of change
Please delete options that are not relevant.
How has this change been tested, please provide a testcase or example of how you tested the change?
YOUR_ANSWER
Any specific deployment considerations
For example, documentation changes, usability, usage/costs, secrets, etc.
Docs
Note
Introduces a new
roboflow_core/text_display@v1visualization block for rendering text onto images with parameter interpolation, styling, and flexible positioning.text_display/v1.pyimplementingTextDisplayVisualizationBlockV1with templatedtext(using{{ $parameters.* }}), optional parameter operations, and options fortext_color,background_color(including transparency),background_opacity,font_scale,font_thickness,padding,text_align,border_radius, and positioning viaposition_mode(absolute/relative withanchor,offset_x,offset_y)text_display/utils.pywith layout and drawing utilities:compute_layout,draw_text_lines,draw_background(alpha/rounded corners), and anchor-based positioningloader.py(imports and inclusion inload_blocks) so it’s available to workflows; outputs updatedimageviaOUTPUT_IMAGE_KEYWritten by Cursor Bugbot for commit 520dcd9. This will update automatically on new commits. Configure here.