Synthetic Data Generation Pipeline for Retail Object Detection

Inspiration

Training a robust object detector requires thousands of labeled images, but collecting and hand-labeling real photos of retail products is slow, expensive, and often impractical. A single SKU might appear in hundreds of lighting conditions, shelf arrangements, and occlusion states that would take weeks to photographing.

The inspiration was a simple question: what if we could render the training set instead of photographing it?

Photorealistic 3D renderers like Blender's Cycles path tracer can produce images virtually indistinguishable from photographs. If we automate the scene setup, randomizing camera angle, lighting color temperature, object placement, and background, each render is effectively a new, uniquely-labeled training example. The annotation comes for free from the renderer itself, with no human labeling required.


What It Does

SDG Pipeline is a headless Blender automation framework that:

  1. Loads .glb product models (6 Coca-Cola SKUs: Coke, Diet Coke, Coke Zero, Cherry Coke, Fanta, Sprite) into a 3D scene
  2. Randomizes every controllable variable, like camera position, focal length, light color temperature, object transforms, PBR material roughness/metallic, background texture
  3. Renders a photorealistic RGB image via Cycles path tracing
  4. Extracts a per-instance binary segmentation mask for each object using Blender's compositor Object Index pass
  5. Writes COCO-format JSON annotations (bounding boxes + polygon segmentation) ready to drop into any standard detection or segmentation training pipeline

Target output: 50 demo images, architected to scale to 2,000+ with multi-GPU parallelism.


How We Built It

The pipeline is organized into 7 phases, each a separate Python module:

Phase Module Responsibility
1 asset_registry.py Load and catalog .glb models, distractors, textures, backgrounds
2 scene_builder.py Import models into Blender collections; assign pass_index per instance
3 randomizer.py Randomize camera, lights, object placement, materials
4 renderer.py Configure Cycles; rebuild compositor with one ID_MASK node per object
4 mask_extractor.py Load per-instance binary mask PNGs from disk
5 annotation_writer.py Compute bounding boxes, polygon segmentation, write COCO JSON
5 run.py Main loop with argparse (--start, --end, --config)

Randomization Design

Every render is seeded deterministically as seed + img_idx, making the dataset fully reproducible:

rng = PCG64(seed + i),  i ∈ [0, N)

Camera position is sampled in spherical coordinates:

r  ~ U(r_min,  r_max)
θ  ~ U(θ_min, θ_max)
φ  ~ U(0,     2π)

Light color is sampled as a blackbody color temperature in Kelvin, then converted to RGB:

T ~ U(4000K, 15000K)  →  (R, G, B)

Grouped Arrangements

A major feature is grouped object bunching: placing multiple cans in tight N×M grid formations that match real retail shelf imagery. Each object in a group still gets its own pass_index, so per-instance masks and COCO annotations work unchanged.

The placement algorithm:

  1. Samples a grid size [N, M] from a weighted distribution (e.g. 2×2 most likely)
  2. Creates an invisible Blender Empty as a group pivot
  3. Lays members out in a centered grid at spacing derived from obj.dimensions (the mesh's real bounding box)
  4. Places the pivot as a unit with AABB collision avoidance against already-placed objects
  5. Tears down the pivot after render, allowing objects to return to pool for the next frame

Masking: ID_MASK via Object Index Pass

Each target object is assigned obj.pass_index = inst_id. The compositor is rebuilt before each render with one CompositorNodeIDMask node per object, wired from the RenderLayers.IndexOB socket to a PNG File Output slot. One Cycles render writes all masks simultaneously.

Multi-GPU Parallelism

The image range [start, end) is partitioned across G GPUs, each running its own Blender process:

job_g = [ floor(g * N/G),  floor((g+1) * N/G) )

Partial COCO JSONs are merged post-run with proper ID remapping via merge_coco.py.


Challenges We Faced

1. Cryptomatte MurmurHash Bug → Zero Annotations

The original implementation used Cryptomatte (the more accurate path as it handles semi-transparent pixels and depth-of-field). Blender's Cryptomatte EXR stores object identity as a MurmurHash3 x86 32-bit (seed=0) hash of the object name. Our initial Python reimplementation used MurmurHash2, producing mismatched hashes and zero matched annotations on every render. Hours were lost before the algorithm mismatch was discovered.

Resolution: Switched to the Object Index / ID_MASK approach entirely. For opaque rigid objects with no depth of field (aluminium cans, Cycles, no DoF), ID_MASK produces identical results with far less complexity and zero hashing.

2. Windows DLL Conflict: OPEN_EXR_MULTILAYER Crash

The Blender compositor's OPEN_EXR_MULTILAYER File Output node crashes on Windows with a MSVCP140.dll runtime conflict when Blender's bundled OpenEXR and the pip openexr package both try to load. This silently produced zero mask files.

Resolution: Replaced the EXR output path with direct PNG output from CompositorNodeIDMask. No external dependencies needed.

3. bpy.context at Module Import Time

Accessing bpy.context or bpy.data at module import time causes silent failures or crashes because Blender's context isn't ready until the script actually runs. Every module had to be audited to ensure all bpy access happens inside function bodies.

4. Collision-Aware Placement for Groups

Placing individual objects with AABB collision avoidance is straightforward, but placing a group as a rigid unit is not. This requires computing the group's composite AABB from all member dimensions, retrying placement of the pivot while checking against all previously placed AABBs. Getting flush contact (spacing = 0) right without objects overlapping or drifting required deriving spacing from obj.dimensions (post-scale mesh bounds) rather than a fixed constant.


What We Learned

  • The rendering pipeline is the easy part. Blender's Cycles is powerful and the Python API is surprisingly complete. The hard work is in the annotation pipeline, correctly extracting, matching, and formatting mask data.
  • Synthetic data generation is a systems engineering problem, not just a graphics problem. Deterministic seeding, collision-aware placement, compositor output naming quirks, and COCO format compliance are all load-bearing.
  • Cryptomatte is complex for good reason. Its coverage-weighted approach handles transparency and DoF correctly. For opaque hard-body objects, Object Index is simpler and sufficient.
  • Test with a single image first. Most bugs (zero annotations, missing masks, wrong file paths) manifest on image 0. The --debug mode (1 image, no randomization) saved hours of debugging time.

Built With

Share this project:

Updates