Synthetic Data Generation Pipeline for Retail Object Detection
Inspiration
Training a robust object detector requires thousands of labeled images, but collecting and hand-labeling real photos of retail products is slow, expensive, and often impractical. A single SKU might appear in hundreds of lighting conditions, shelf arrangements, and occlusion states that would take weeks to photographing.
The inspiration was a simple question: what if we could render the training set instead of photographing it?
Photorealistic 3D renderers like Blender's Cycles path tracer can produce images virtually indistinguishable from photographs. If we automate the scene setup, randomizing camera angle, lighting color temperature, object placement, and background, each render is effectively a new, uniquely-labeled training example. The annotation comes for free from the renderer itself, with no human labeling required.
What It Does
SDG Pipeline is a headless Blender automation framework that:
- Loads
.glbproduct models (6 Coca-Cola SKUs: Coke, Diet Coke, Coke Zero, Cherry Coke, Fanta, Sprite) into a 3D scene - Randomizes every controllable variable, like camera position, focal length, light color temperature, object transforms, PBR material roughness/metallic, background texture
- Renders a photorealistic RGB image via Cycles path tracing
- Extracts a per-instance binary segmentation mask for each object using Blender's compositor Object Index pass
- Writes COCO-format JSON annotations (bounding boxes + polygon segmentation) ready to drop into any standard detection or segmentation training pipeline
Target output: 50 demo images, architected to scale to 2,000+ with multi-GPU parallelism.
How We Built It
The pipeline is organized into 7 phases, each a separate Python module:
| Phase | Module | Responsibility |
|---|---|---|
| 1 | asset_registry.py |
Load and catalog .glb models, distractors, textures, backgrounds |
| 2 | scene_builder.py |
Import models into Blender collections; assign pass_index per instance |
| 3 | randomizer.py |
Randomize camera, lights, object placement, materials |
| 4 | renderer.py |
Configure Cycles; rebuild compositor with one ID_MASK node per object |
| 4 | mask_extractor.py |
Load per-instance binary mask PNGs from disk |
| 5 | annotation_writer.py |
Compute bounding boxes, polygon segmentation, write COCO JSON |
| 5 | run.py |
Main loop with argparse (--start, --end, --config) |
Randomization Design
Every render is seeded deterministically as seed + img_idx, making the dataset fully reproducible:
rng = PCG64(seed + i), i ∈ [0, N)
Camera position is sampled in spherical coordinates:
r ~ U(r_min, r_max)
θ ~ U(θ_min, θ_max)
φ ~ U(0, 2π)
Light color is sampled as a blackbody color temperature in Kelvin, then converted to RGB:
T ~ U(4000K, 15000K) → (R, G, B)
Grouped Arrangements
A major feature is grouped object bunching: placing multiple cans in tight N×M grid formations that match real retail shelf imagery. Each object in a group still gets its own pass_index, so per-instance masks and COCO annotations work unchanged.
The placement algorithm:
- Samples a grid size
[N, M]from a weighted distribution (e.g. 2×2 most likely) - Creates an invisible Blender Empty as a group pivot
- Lays members out in a centered grid at spacing derived from
obj.dimensions(the mesh's real bounding box) - Places the pivot as a unit with AABB collision avoidance against already-placed objects
- Tears down the pivot after render, allowing objects to return to pool for the next frame
Masking: ID_MASK via Object Index Pass
Each target object is assigned obj.pass_index = inst_id. The compositor is rebuilt before each render with one CompositorNodeIDMask node per object, wired from the RenderLayers.IndexOB socket to a PNG File Output slot. One Cycles render writes all masks simultaneously.
Multi-GPU Parallelism
The image range [start, end) is partitioned across G GPUs, each running its own Blender process:
job_g = [ floor(g * N/G), floor((g+1) * N/G) )
Partial COCO JSONs are merged post-run with proper ID remapping via merge_coco.py.
Challenges We Faced
1. Cryptomatte MurmurHash Bug → Zero Annotations
The original implementation used Cryptomatte (the more accurate path as it handles semi-transparent pixels and depth-of-field). Blender's Cryptomatte EXR stores object identity as a MurmurHash3 x86 32-bit (seed=0) hash of the object name. Our initial Python reimplementation used MurmurHash2, producing mismatched hashes and zero matched annotations on every render. Hours were lost before the algorithm mismatch was discovered.
Resolution: Switched to the Object Index / ID_MASK approach entirely. For opaque rigid objects with no depth of field (aluminium cans, Cycles, no DoF), ID_MASK produces identical results with far less complexity and zero hashing.
2. Windows DLL Conflict: OPEN_EXR_MULTILAYER Crash
The Blender compositor's OPEN_EXR_MULTILAYER File Output node crashes on Windows with a MSVCP140.dll runtime conflict when Blender's bundled OpenEXR and the pip openexr package both try to load. This silently produced zero mask files.
Resolution: Replaced the EXR output path with direct PNG output from CompositorNodeIDMask. No external dependencies needed.
3. bpy.context at Module Import Time
Accessing bpy.context or bpy.data at module import time causes silent failures or crashes because Blender's context isn't ready until the script actually runs. Every module had to be audited to ensure all bpy access happens inside function bodies.
4. Collision-Aware Placement for Groups
Placing individual objects with AABB collision avoidance is straightforward, but placing a group as a rigid unit is not. This requires computing the group's composite AABB from all member dimensions, retrying placement of the pivot while checking against all previously placed AABBs. Getting flush contact (spacing = 0) right without objects overlapping or drifting required deriving spacing from obj.dimensions (post-scale mesh bounds) rather than a fixed constant.
What We Learned
- The rendering pipeline is the easy part. Blender's Cycles is powerful and the Python API is surprisingly complete. The hard work is in the annotation pipeline, correctly extracting, matching, and formatting mask data.
- Synthetic data generation is a systems engineering problem, not just a graphics problem. Deterministic seeding, collision-aware placement, compositor output naming quirks, and COCO format compliance are all load-bearing.
- Cryptomatte is complex for good reason. Its coverage-weighted approach handles transparency and DoF correctly. For opaque hard-body objects, Object Index is simpler and sufficient.
- Test with a single image first. Most bugs (zero annotations, missing masks, wrong file paths) manifest on image 0. The
--debugmode (1 image, no randomization) saved hours of debugging time.
Log in or sign up for Devpost to join the conversation.