Goodarz Mehr, Azim Eskandarian
Virginia Commonwealth University
SimBEV.mp4
[2026/1/15] SimBEV2X coming soon...
[2026/1/15] SimBEV 3.1 is released, adding support for 3D semantic occupancy ground truth.
[2025/12/12] SimBEV 3.0 is released, with support for new 3D and BEV classes, randomly-generated hazard areas, an interactive visualizer, and more. SimBEV Dataset v2 coming soon...
[2025/8/15] SimBEV 2.0 is released, with support for new 3D and BEV classes, continuous weather shifts, and more.
[2025/4/15] Our implementation of UniTR trained on the SimBEV dataset is released.
[2025/2/9] Our implementation of BEVFusion trained on the SimBEV dataset is released.
[2025/2/6] Initial release of dataset, code, and paper.
SimBEV is a configurable and scalable synthetic driving data generation tool based on the CARLA Simulator. It supports a comprehensive array of sensors and incorporates information from various sources to capture accurate bird's-eye view (BEV) and 3D semantic occupancy ground truth alongside 3D object bounding boxes to enable a variety of perception tasks, including BEV segmentation, 3D semantic occupancy prediction, and 3D object detection. SimBEV is used to create the SimBEV dataset, a large collection of annotated perception data from diverse driving scenarios.
A data sample generated by SimBEV. The left half depicts a 360-degree view of the ego (data collection) vehicle's surroundings produced by different camera types (from top to bottom RGB, semantic segmentation, instance segmentation, depth, and optical flow cameras, respectively). On the right half, views of lidar, semantic lidar, radar, and BEV ground truth data are shown from top to bottom, respectively. Some images also contain 3D object bounding boxes.
SimBEV randomizes a variety of simulation parameters to create a diverse set of scenarios. To create a new dataset, SimBEV generates and collects data from consecutive episodes, or scenes. The user configures the desired number of scenes for each map (can be an existing CARLA map or a custom one) for the training, validation, and test sets, a variety of simulation parameters, and the sensors that should be used. The user can add more scenes to an existing SimBEV dataset, replace individual scenes, or replay individual scenes to collect additional data. SimBEV works with any CARLA map, even custom maps created by the user.
SimBEV currently supports five camera types (RGB, semantic segmentation, instance segmentation, depth, and optical flow), lidar, semantic lidar, radar, GNSS, IMU, and a custom voxel detection sensor inspired by Co3SOP. The user has full control over each sensor's characteristics (e.g. camera resolution or number of lidar channels), but the placement of the sensors is fixed for now. In addition to sensor data that can be used as ground truth (e.g. semantic segmentation and depth images, semantic lidar point cloud, etc.), SimBEV currently offers three annotation types: 3D object bounding boxes, BEV ground truth, and HD map information.
SimBEV currently produces 3D object bounding boxes for the following 10 classes: car, truck, bus, motorcycle, bicycle, pedestrian, traffic light, traffic sign, traffic cone, and barrier. For each class, the bounding boxes are categorized as easy, medium, or hard based on detection difficulty. Moreover, SimBEV currently supports the following 14 BEV ground truth classes: road, hazard area, road line, sidewalk, crosswalk, traffic cone, barrier, car, truck, bus, motorcycle, bicycle, rider, pedestrian.
The SimBEV dataset (collected using SimBEV 1.0) is a collection of 320 scenes spread across 11 CARLA maps and contains data from all supported sensors. With each scene lasting 16 seconds at a frame rate of 20 Hz, the SimBEV dataset contains 102,400 annotated frames, 8,315,935 3D object bounding boxes (3,792,499 of which are valid, i.e., not fully occluded and visible to the sensors), and 2,793,491,357 BEV ground truth labels.
We developed and tested SimBEV on a system with the following specifications:
- AMD Ryzen 9 9950X (Any Intel 9th Gen or newer or Ryzen 7/9 3rd Gen or newer will probably work)
- 96 GB RAM (32 GB is probably enough)
- Nvidia GeForce RTX 4090
- Ubuntu 22.04
To run SimBEV, your system must satisfy CARLA 0.9.16's minimum system requirements.
To run SimBEV, you must use our custom version of CARLA (built from source from this fork of the ue4-dev branch). Please download it from here.
We have not tested SimBEV with the standard version of CARLA 0.9.16 or CARLA 0.10.0 and advise against using them with SimBEV. CARLA 0.9.16 is incompatible with SimBEV, and, while CARLA 0.10.0 offers superior graphics, it lacks some features from the UE4-based CARLA that SimBEV relies on (e.g. customizable weather, large maps, etc.) We will make SimBEV available for CARLA 0.10.* when it reaches feature parity with the UE4-based CARLA.
Some of the enhancements in our version are:
- Addition of three new sports cars to CARLA's vehicle library using existing 3D models: sixth generation Ford Mustang, Toyota GR Supra, and Bugatti Chiron. The Ford Mustang is SimBEV's default data collection vehicle.
NewCars.mp4
- Addition of lights (headlights, taillights, blinkers, etc.) to those older vehicle models in CARLA's library that lacked them, and redesigning of existing vehicle lights in Blender using a new multi-layer approach that better visualizes modern multi-purpose lights.
- Addition of a set of 160 standard paint colors for most vehicle models (apart from a few like the firetruck) to choose from, and fixing paint color randomization issues for a few vehicles (e.g. the bus).
- Update to the vehicle dynamics parameters of vehicle models to better match their vehicle's behavior and performance in the real world.
- Addition of or updating pedestrian navigation information for CARLA's Town12, Town13, and Town15 maps.
- Update to motorcycle and bicycle models to select their driver model randomly, instead of always using the same model.
- Addition of lights to buildings in Town12 and fixing issues that prevented full control over building/street lights in Town12 and Town15.
- Update to the crosswalk information in the OpenDRIVE map files of Town12, Town13, and Town15.
- Improvements to CARLA's Traffic Manager, including enhancements to the lane changing behavior of vehicles on autopilot and their reaction to static props (street barriers, traffic cones, etc.).
- Enhancements to the collision mesh of vehicle and pedestrian models that should result in a more realistic depiction of them in point cloud data (see a sample comparison between the old (left) and new (right) models below).
- Addition of a custom voxel detection sensor that assigns a semantic class to every occupied voxel within a specified grid around the ego vehicle.
VoxelDetector.webm
- Several bug fixes and improvements, some of which have been contributed to the main CARLA repository as well (see e.g. PR #9381, #9421, #9422, #9423, #9427, and #9471).
We recommend using SimBEV with Docker. The base Docker image is Ubuntu 22.04 with CUDA 13.0.2 and Vulkan SDK 1.3.204. If you want to use a different base image, you may have to modify ubuntu2204/x86_64 when fetching keys on line 61 of the Dockerfile, based on your Ubuntu release and system architecture. Ensure that libnvidia-gl and libnvidia-common version numbers on line 65 of the Dockerfile match your Nvidia driver version number.
- Install Docker on your system.
- Install the Nvidia Container Toolkit. It exposes your Nvidia graphics card to Docker containers.
- Clone this repository:
git clone https://github.com/GoodarzMehr/SimBEV.git && cd SimBEV
- Build the SimBEV Docker image (this will take several minutes):
The following optional build arguments (
docker build --no-cache --rm --build-arg ARG -t simbev:develop .ARG) are available:USER: username inside each container, set to sb by default.CARLA_VERSION: installed CARLA version, set to 0.9.16 by default.
- Launch a container:
Use
docker run --runtime=nvidia --privileged --gpus all --network=host -e DISPLAY=$DISPLAY \ -v [path/to/CARLA]:/home/carla \ -v [path/to/SimBEV]:/home/simbev \ -v [path/to/dataset]:/dataset \ --shm-size 32g -it simbev:develop /bin/bashnvidia-smito ensure your graphics card(s) is (are) visible inside the container. Usevulkaninfo --summaryto ensure Vulkan has access to your graphics card(s). - Install CARLA inside the container by running:
pip carla/PythonAPI/carla/dist/carla-0.9.16-cp310-cp310-linux_x86_64.whl
- In a separate terminal window, enter the container as the root user by running
docker exec -it -u 0 [container name] /bin/bash. Then, run:Exit the container as the root user but stay inside it as the sb (non-root) user.cd simbev && python setup.py develop
If you would like to use SimBEV without Docker, you can install the dependencies using the requirements file and then follow steps 6 and 7 above.
In the simbev directory, use the config.yaml file to configure SimBEV's behavior (for a detailed explanation of available parameters see the sample_config.yaml file). Set mode in the config.yaml file to create to create a new SimBEV dataset. If a SimBEV dataset already exists (in the path provided by path), SimBEV compares the number of existing and desired scenes for each map and creates additional ones if necessary. This feature can be used to continue creating a dataset in the event of a crash or expand an already existing one. Now, run
simbev configs/config.yaml [options]options can be any of the following:
--path: path for saving the dataset (/datasetby default).--render: visualize captured sensor data.--save: save captured sensor data (used by default).--no-save: do not save captured sensor data.
For instance,
simbev configs/config.yaml --render --no-savevisualizes sensor data as it is being captured without saving it.
You can pause/resume the simulation at any time by pressing F9.
If you would like to replace a number of existing scenes, set mode in the config.yaml file to replace and specify the list of scenes that should be replaced using the replacement_scene_config field.
If you would like to replay/augment a number of existing scenes, set mode in the config.yaml file to replay and specify the list of scenes that should be replayed using the replay_scene_config field. SimBEV will use the saved CARLA log file of the specified scenes to replay them. This can be useful if you want to collect additional data from a scene. For example, if you have already collected RGB camera data and would like to collect semantic lidar and radar data when replaying the scene, set use_rgb_camera field in the config.yaml file to False and set use_semantic_lidar and use_radar to True. Just note that because the rider of motorcycles and bicycles is selected at random by UE4 each time, it will be different when replaying the scene, but this is usually a very small discrepancy and everything else in the replayed scene should exactly match the original scene.
An optional post-processing step will calculate the number of lidar and radar points inside each 3D object bounding box (0 for all objects if that data is not collected) alongside a valid flag indicating whether the object is fully occluded (False) or visible to the data collection vehicle (True). By default, an object is valid if the number of points inside its bounding box is non-zero and invalid otherwise. However, if you have collected instance segmentation images you can use the --use-seg argument to use those images to assist in determining the validity of objects (if the number of points inside the object's bounding box is zero but the object is visible in the image, then it is valid). The post-processing step also determines the detection difficulty of an object (either easy, medium, or hard) based on the object's class, distance to the data collection vehicle, and the number of points inside its bounding box. This information will be appended to bounding box data. Finally, if you have collected 3D semantic occupancy data, since in many cases those voxels represent only the surface shell of objects, the post-processing step will fill in the semantic label of voxels inside those objects.
To post-process the data, in the simbev directory run
simbev-postprocess [options]options can be any of the following:
--path: path for saving the dataset (/datasetby default).--process-bbox: post-process 3D object bounding boxes (used by default).--no-process-bbox: do not post-process 3D object bounding boxes.--use-seg: use instance segmentation images to help with post-processing 3D object bounding boxes.--fill-voxels: post-process 3D semantic occupancy data.--morph-kernel-size: kernel size used for morphological closing (3 by default).--num-gpus: number of GPUs used for post-processing 3D semantic occupancy data (-1, i.e. all available GPUs, by default).
The post-processing step will create a new det folder under ground-truth (see Data Format for more information) and move the files of the original det folder to a new old_det folder.
To visualize certain types of collected data (those that are not readily visualized, e.g. semantic segmentation images are already in .png format), run
simbev-visualize [mode] [options]Setting mode to interactive launches SimBEV's interactive visualizer for point cloud (lidar, semantic lidar, radar) and voxel data, allowing the user to evaluate and inspect each scene and frame, as shown below:
SimBEVInteractiveViz.mp4
For all other modes, a new viz folder in the dataset's path is created where the visualizations are stored. Visualizations involving 3D object bounding boxes require data to be post-processed first.
mode can be all, or any combination of the following:
rgb: RGB images with 3D object bounding boxes overlaid.
depth: depth images.
flow: optical flow images.
lidar,lidar-with-bbox: top-down view of lidar point clouds, without and with 3D object bounding boxes overlaid, respectively.
lidar3d,lidar3d-with-bbox: 3D view of lidar point clouds, without and with 3D object bounding boxes overlaid, respectively.
semantic-lidar,semantic-lidar3D: top-down and 3D view of semantic lidar point clouds, respectively.
radar,radar-with-bbox: top-down view of radar point clouds, without and with 3D object bounding boxes overlaid, respectively.
radar3d,radar3d-with-bbox: 3D view of radar point clouds, without and with 3D object bounding boxes overlaid, respectively.
Visualization modes involving point clouds have two default views, NEAR and FAR, as defined in the visualization_handlers file, where you can also define your custom view if needed.
options can be any of the following:
--path: path to the dataset (/datasetby default).-s,--scene: list of scene numbers to visualize, can be individual numbers or a range (-1, i.e. all scenes, by default).-f,--frame: list of frame numbers to visualize, can be individual numbers or a range (-1, i.e. all frames, by default).--ignore-valid-flag: display all 3D bounding boxes regardless of the value of their valid flag.
For instance, using
simbev-visualize rgb depth lidar3d semantic-lidar radar-with-bbox --scene 0 12 27-32 --frame 3 30-49 300visualizes RGB images with 3D bounding boxes overlaid, depth images, lidar point clouds from a 3D perspective view, semantic lidar point clouds from a top-down view, and radar point clouds from a top-down view with 3D bounding boxes overlaid for frames 3, 30 to 49, and 300 of scenes 0, 12, and 27 to 32.
Consult our implementations of BEVFusion and UniTR for how to use the SimBEV dataset.
The placement and coordinate system of the sensors are shown on the left and tabulated on the right. Coordinate values are relative to a FLU (Front-Left-Up) coordinate system positioned at the center of the ground plane of the vehicle's 3D bounding box.
Sensors in SimBEV are referenced using the {subtype}-{position} format (which turns into {position} when subtype is not available). For cameras, subtype can be one of RGB (RGB camera), SEG (semantic segmentation camera), IST (instance segmentation camera), DPT (depth camera), or FLW (optical flow camera), while position can be one of CAM_FRONT_LEFT, CAM_FRONT, CAM_FRONT_RIGHT, CAM_BACK_RIGHT, CAM_BACK, CAM_BACK_LEFT. For instance, DPT-CAM_BACK_LEFT denotes the back left depth camera. For lidar, since there is only one position, regular lidar is denoted by LIDAR while semantic lidar is denoted by SEG-LIDAR. For radar, subtype is not available and position can be one of RAD_LEFT, RAD_FRONT, RAD_RIGHT, RAD_BACK. GNSS and IMU are simply denoted as GNSS and IMU, respectively. The voxel detector is denoted as VOXEL-GRID, and the post-processed 3D semantic occupancy data is denoted as VOXEL-GRID-FILLED.
A generic SimBEV dataset uses the following folder structure.
simbev/
|
├── configs/
|
├── console_logs/
|
├── ground-truth/
| ├── det/
| ├── old_det/ (if 3D object bounding boxes are post-processed)
| ├── seg/
| ├── seg_viz/
| ├── hd_map/
|
├── infos
| ├── simbev_infos_train.json
| ├── simbev_infos_val.json
| ├── simbev_infos_test.json
|
├── logs/
|
├── sweeps/
| ├── RGB-CAM_FRONT_LEFT/
| ├── RGB-CAM_FRONT/
| ├── RGB-CAM_FRONT_RIGHT/
| ├── RGB-CAM_BACK_LEFT/
| ├── RGB-CAM_BACK/
| ├── RGB-CAM_BACK_RIGHT/
| ├── SEG-CAM_FRONT_LEFT/
| ├── SEG-CAM_FRONT/
| ├── SEG-CAM_FRONT_RIGHT/
| ├── SEG-CAM_BACK_LEFT/
| ├── SEG-CAM_BACK/
| ├── SEG-CAM_BACK_RIGHT/
| ├── IST-CAM_FRONT_LEFT/
| ├── IST-CAM_FRONT/
| ├── IST-CAM_FRONT_RIGHT/
| ├── IST-CAM_BACK_LEFT/
| ├── IST-CAM_BACK/
| ├── IST-CAM_BACK_RIGHT/
| ├── DPT-CAM_FRONT_LEFT/
| ├── DPT-CAM_FRONT/
| ├── DPT-CAM_FRONT_RIGHT/
| ├── DPT-CAM_BACK_LEFT/
| ├── DPT-CAM_BACK/
| ├── DPT-CAM_BACK_RIGHT/
| ├── FLW-CAM_FRONT_LEFT/
| ├── FLW-CAM_FRONT/
| ├── FLW-CAM_FRONT_RIGHT/
| ├── FLW-CAM_BACK_LEFT/
| ├── FLW-CAM_BACK/
| ├── FLW-CAM_BACK_RIGHT/
| ├── LIDAR/
| ├── SEG-LIDAR/
| ├── RAD_LEFT/
| ├── RAD_FRONT/
| ├── RAD_RIGHT/
| ├── RAD_BACK/
| ├── GNSS/
| ├── IMU/
| ├── VOXEL-GRID/
| ├── VOXEL-GRID-FILLED/ (if semantic occupancy data is post-processed)
|
├── viz/ (if data is visualized)
Contains the config file for each scene, with the files using the SimBEV-scene-{scene number}.yaml naming scheme. The files are usually identical, unless the dataset was expanded or some scenes were replaced or augmented using a different configuration. If an existing scene is augmented, the new config file will use the SimBEV-scene-{scene number}-augment-{i}.yaml naming scheme, where i is the index of the attempt at augmentation (i.e. i is 0 for the first augmentation attempt, 1 for the second attempt, etc.).
Contains the logging output to the console/terminal.
Contains the ground truth files for each frame, with the files using the SimBEV-scene-{scene number}-frame-{frame number}-{type}.{data type} naming scheme. For the det, seg, seg_viz, and hd_map folders, type and data type are GT_DET and bin; GT_SEG and npz; GT_SEG_VIZ and jpg; and HD_MAP and json, respectively.
The det folder contains the 3D object ground truth files for each frame. In each file, the following information is provided for each object:
-
id: object ID supplied by CARLA -
type: object type, e.g.vehicle.ford.mustang_2016orwalker.pedestrian.0051 -
is_alive: True if the object is alive, False if destroyed -
is_active: True if the object is active, False otherwise -
is_dormant: True if the object is dormant, False otherwise -
parent: ID of the parent object if one exists,Noneotherwise -
attributes: object attributes, e.g.has_lights,color,role_name, etc. for a car -
semantic_tags: object semantic tags -
bounding_box: global coordinates of the corners of the object's 3D bounding box -
location: location ($x$ ,$y$ ,$z$ ) of the object (in a right-handed coordinate frame) -
rotation: rotation (roll, pitch, yaw) of the object (in a right-handed coordinate frame) -
linear_velocity: linear velocity of the object (m/s) -
angular_velocity: angular velocity of the object (deg/s) -
distance_to_ego: distance of the object from the data collection vehicle (m) -
angle_to_ego: angle of the object to the data collection vehicle (deg, vehicle's front vector is 0, positive CCW) -
[requires post processing]
num_lidar_pts: number of lidar points inside the object's 3D bounding box -
[requires post processing]
num_radar_pts: number of radar points inside the object's 3D bounding box -
[requires post processing]
valid_flag: True if the object is visible to the data collection vehicle, False otherwise -
[requires post processing]
class: class of the object -
[requires post processing]
difficulty: detection difficulty of the object, can be easy, medium, or hard -
[traffic light only]
green_time: duration the traffic light stays green (s) -
[traffic light only]
yellow_time: duration the traffic light stays yellow (s) -
[traffic light only]
red_time: duration the traffic light stays red (s) -
[traffic light only]
state: current state of the traffic light (i.e. green, yellow, or red) -
[traffic light only]
opendrive_id: OpenDRIVE ID of the traffic light -
[traffic light only]
pole_index: index of the traffic light's pole whitin the traffic light group -
[traffic sign only]
sign_type: traffic sign's type, if it can be extracted from CARLA; generallystop,yield, orspeed_limit; in Town12, Town13, and Town15 the speed limit is provided as well, e.g.speed_limit_30(30 km/h speed limit) orspeed_limit_55_min_40(55 km/h speed limit, 40 km/h minimum speed limit)
The seg folder contains the BEV ground truth files for each frame. BEV ground truth is a binary road, hazard, road_line, sidewalk, crosswalk, traffic_cone, barrier, car, truck, bus, motorcycle, bicycle, rider, pedestrian. The second and third dimensions of the array increase along the
The seg_viz folder contains the visualization of the BEV ground truth for each frame.
The hd_map folder contains information about the waypoint at the ego vehicle's location for each frame, which, when combined wih the CARLA map's OpenDRIVE file data should provide accurate map information about the area around the ego vehicle. The following information is provided for each waypoint:
id: waypoint ID supplied by CARLAs: distance along the road sectionroad_id: OpenDRIVE ID of the road the waypoint belongs tosection_id: OpenDRIVE ID of the road section the waypoint belongs tolane_id: OpenDRIVE ID of the lane the waypoint belongs tolane_type: type of the lane the waypoint belongs to, should beDrivingbut other possible values includeSidewalk,Shoulder,Curb, etc.lane_width: width of the lane the waypoint belongs tolane_change: type of lane change permitted by the laneis_junction: whether the waypoint is in a junctionjunction_id: OpenDRIVE ID of the junction if the waypoint is in a junctionis_intersection: whether the waypoint is in an intersectiontransform: global coordinate transform (location, rotation) of the waypointleft/right_lane_marking: information about the left/right lane markings, includestype(e.g.Solid,Broken,SolidBroken, etc.),width,color, andlane_changeleft/right_lane: information about the corresponding waypoint in the left/right lane, includesid,s,road_id,section_id,lane_id,lane_type,lane_width, andlane_change
Contains the info files for each data split, with the files using the simbev_infos_{split}.json naming scheme where split is either train, val, or test. Each file is comprised of metadata and data. metadata contains coordinate transformation matrices for all sensors (i.e. sensor2lidar_translation, sensor2lidar_rotation, sensor2ego_translation, and sensor2ego_rotation), as well as the camera intrinsics matrix. data contains scene information, divided into scene_info and scene_data for each scene. scene_info includes the overall scene information, while scene_data provides information about individual frames, including file paths for collected sensor data and the corresponding ground truth.
Contains the log file for each scene, with the files using the SimBEV-scene-{scene number}.log naming scheme. Log files can be used by SimBEV to replay scenes and collect additional data.
Contains collected sensor data for each frame, with the files using the {sensor}/SimBEV-scene-{scene number}-frame-{frame number}-{sensor}.{type} naming scheme. For instance, back left RGB camera image for frame 12 of scene 27 is saved as RGB-CAM_BACK_LEFT/SimBEV-scene-0027-frame-0012-RGB-CAM_BACK_LEFT.jpg. We briefly discuss how each sensor's data is saved below. See CARLA's sensors documentation for more details.
- RGB camera: images are saved as
.jpgfiles. - Semantic segmentation camera: images are saved as
.pngfiles. - Instance segmentation camera: images are saved as
.pngfiles. - Depth camera: images are saved as
.pngfiles. - Optical flow camera: images are saved as a
$(h, w, 2)$ NumPy array where$h$ and$w$ are the image height and width, respectively. - Lidar: point clouds are saved as a
$(n, 3)$ NumPy array where the columns represent the$x$ ,$y$ , and$z$ values, respectively. - Semantic lidar: point clouds are saved as a
$(n, 6)$ NumPy array where the columns represent the$x$ ,$y$ , and$z$ values, cosine of the incidence angle, and the index and semantic tag of the hit object, respectively. - Radar: point clouds are saved as a
$(n, 4)$ NumPy array where the columns represent the depth, altitude angle, azimuth angle, and velocity, respectively. - GNSS: data is saved as a [latitude, longitude, altitude] Numpy array.
- IMU: data is saved as a [
$\dot{x}$ ,$\dot{y}$ ,$\dot{z}$ ,$\dot{\phi}$ ,$\dot{\theta}$ ,$\dot{\psi}$ ,$\psi$ ] NumPy array. - Voxel detector: data is saved as a
$(d, w, h)$ NumPy array where the dimensions represent the$x$ ,$y$ , and$z$ directions of the vehicle's FLU coordinate system, respectivelly. Each cell contains the semantic (class) label of the object that overlaps with that cell, unless the cell is unoccupied in which case its value is 0.
Models are trained on the SimBEV dataset's train set and evaluated on its test set with the hyperparameters their authors used for the nuScenes dataset.
| Model | Modality | mAP (%) | mATE (m) | mAOE (rad) | mASE | mAVE (m/s) | SDS (%) | |
|---|---|---|---|---|---|---|---|---|
| BEVFusion-C | C | 22.1 | 0.744 | 1.04 | 0.137 | 4.65 | 25.1 | Checkpoint |
| BEVFusion-L | L | 48.1 | 0.144 | 0.133 | 0.134 | 1.56 | 56.4 | Checkpoint |
| BEVFusion | C+L | 48.1 | 0.146 | 0.122 | 0.127 | 1.54 | 56.6 | Checkpoint |
| UniTR | C+L | 47.7 | 0.113 | 0.224 | 0.090 | 0.55 | 61.7 | Checkpoint |
| UniTR+LSS | C+L | 47.8 | 0.113 | 0.207 | 0.085 | 0.53 | 62.2 | Checkpoint |
| Model | Modality | Road | Car | Truck | Bus | Motorcycle | Bicycle | Rider | Pedestrian | mIoU | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| BEVFusion-C | C | 76.0 | 17.2 | 5.1 | 22.9 | 0.0 | 0.0 | 0.0 | 0.0 | 15.2 | Checkpoint |
| BEVFusion-L | L | 87.7 | 70.6 | 73.5 | 81.5 | 32.5 | 3.6 | 18.4 | 18.9 | 48.3 | Checkpoint |
| BEVFusion | C+L | 88.4 | 72.7 | 74.5 | 80.0 | 36.3 | 3.6 | 23.3 | 20.0 | 50.0 | Checkpoint |
| UniTR | C+L | 92.8 | 73.8 | 67.7 | 51.7 | 36.5 | 11.4 | 36.2 | 27.5 | 49.7 | Checkpoint |
| UniTR+LSS | C+L | 93.3 | 72.8 | 69.4 | 58.5 | 35.9 | 6.3 | 31.6 | 12.9 | 47.6 | Checkpoint |
SimBEV is based on CARLA and we are grateful to the team that maintains it. SimBEV has also taken inspiration from the nuScenes, SHIFT, OPV2V, and V2X-Sim datasets, as well as Co3SOP.
The sixth generation Ford Mustang model is based on this BlenderKit model by Kentik Khudosovtsev.
Hazard area static props are based on this Roadside Construction asset by Quixel Megascans.
If SimBEV is useful or relevant to your research, please kindly recognize our contributions by citing our paper:
@article{mehr2025simbev,
title={SimBEV: A Synthetic Multi-Task Multi-Sensor Driving Data Generation Tool and Dataset},
author={Mehr, Goodarz and Eskandarian, Azim},
journal={arXiv preprint arXiv:2502.01894},
year={2025}
}






























































