A four-wheel ground robot, simulated in ROS / Gazebo, that drives over a city-scale road network and classifies the road surface in front of it as pothole or plain road from a forward-facing camera feed using a ResNet-50 transfer-learning model.
This was my undergrad (BTech ECE) final-year capstone at SVNIT Surat (2020-2021), built with a 4-person team and submitted in May 2021. Project guide: Prof. A. H. Lalluwadia.
Potholes are a real safety and cost problem. In India alone, pothole-related accidents kill roughly 3,000 people a year; in the US, the American Automobile Association estimated USD 3 billion / year in vehicle damage from potholes. Most existing solutions either rely on citizen reporting (slow) or on accelerometer-based detection from a vehicle that has already hit the pothole (too late). We wanted to prototype the upstream version: a small autonomous robot that scans the road ahead with a camera and flags potholes before a vehicle drives over them.
We chose to build the entire system in simulation so we could iterate on the perception, the robot design, and the world all in one place without owning the hardware.
End-to-end the project has three layers:
- Robot model. A four-wheel differential-drive bot, designed in SolidWorks, exported to URDF, and spawned in Gazebo. Onboard sensor: a single forward-facing RGB camera (Gazebo
libgazebo_ros_camera.soplugin). - Simulated world. A custom Gazebo world populated with road segments, buildings, trees, street lamps and intentionally-placed pothole meshes so we could rehearse the full driving + detection loop.
- Perception. A ResNet-50 CNN (pretrained on ImageNet, head fine-tuned on a Kaggle "pothole and plain road" dataset) consumes the camera frames and emits a binary classification per frame.
My personal scope on the team was the simulation and perception integration side: getting the URDF + Gazebo world wired together, exposing the camera topic, training the classifier in Colab, and writing the ROS node that calls the model on each captured frame.
A full driving-and-detection video is at Video Result.mp4 (77 MB) and the project report PDF is at Project Report.pdf.
+-----------------------------+
| Gazebo world |
| (FinalWorldTest.world) |
| roads + pothole meshes |
+--------------+--------------+
|
spawns + simulates
v
+---------------+ +-----------+-----------+ +--------------------+
| cmd_vel.py |--->| bot_urdf1 |--->| /bot_urdf1/camera1 |
| (teleop) | | (4-wheel diff-drive | | /image_raw |
+---------------+ | URDF + camera link) | +----------+---------+
+-----------------------+ |
v
+-----------------------+
| Prediction.py |
| (ROS node) |
| load ResNet-50 .h5 |
| resize 256x256 |
| argmax -> Pothole / |
| Plain Road |
+-----------------------+
A single catkin package, bot_urdf1, holds everything:
Project Files/project_ws/src/pkb/src/bot_urdf1/
urdf/bot_urdf1.urdf # SolidWorks -> URDF; includes camera sensor + diff-drive plugin
launch/gazebo.launch # spawns the world + the bot + tf_footprint_base
launch/display.launch # robot_state_publisher + joint_state_publisher_gui + RViz
launch/FinalWorldTest.world # the city world with planted potholes
config/bot.yaml # joint group effort controllers (l_/r_con_position_controller)
src/cmd_vel.py # keyboard teleop publishing to /cmd_vel
src/Prediction.py # the perception node (TF / Keras)
src/teleop_twist_keyboard.py # standard ROS teleop, included for convenience
Models/, World/, meshes/, ... # Gazebo model assets (textures, .rar / .zip source meshes)
Training script: Project Files/Detection/pothole-detection-v2.ipynb
- Backbone: ResNet-50, ImageNet weights,
include_top=False,pooling='max'. - Head: Dropout(0.20) -> Dense(2048, relu) -> Dense(1024, relu) -> Dense(512, relu) -> Dense(2, softmax).
- Input: 256 x 256 x 3 RGB.
- Optimizer: Adam (lr=1e-5), categorical cross-entropy, ReduceLROnPlateau on val_acc.
- Dataset: Kaggle "Pothole and Plain Road Images" (binary classification), 75/25 train/val split.
- Inference:
Prediction.pypolls/bot_urdf1/camera1/image_raw, resizes to 256 x 256, runsmodel.predict, and printsPothole DetectedorPlain Road.
The report also compares ResNet-34 vs Inception-V3 as alternatives (Chapter 5, Table 5.3) before settling on the ResNet family for accuracy-per-parameter reasons.
This project was developed on ROS Melodic / Ubuntu 18.04 with Gazebo 9. It should run on Noetic / Ubuntu 20.04 with minor changes (Python 3 print syntax, the y_pred = True typo in Prediction.py).
Prereqs:
sudo apt install ros-melodic-desktop-full \
ros-melodic-joint-state-publisher-gui \
ros-melodic-effort-controllers \
ros-melodic-gazebo-ros-pkgs
pip install tensorflow keras opencv-python numpyBuild the workspace:
cd "Project Files/project_ws"
catkin_make
source devel/setup.bashLaunch the simulated world + spawn the robot:
roslaunch bot_urdf1 gazebo.launchIn a second terminal, drive the bot with the keyboard:
rosrun bot_urdf1 cmd_vel.py
# or the standard ROS teleop:
rosrun bot_urdf1 teleop_twist_keyboard.pyIn a third terminal, run the perception node (expects model_1.h5 in cwd):
rosrun bot_urdf1 Prediction.pyTo visualize in RViz instead of Gazebo:
roslaunch bot_urdf1 display.launch gui:=True| Layer | Choice |
|---|---|
| Robotics OS | ROS Melodic |
| Simulator | Gazebo 9 (gazebo_ros, libgazebo_ros_camera.so) |
| Robot model | SolidWorks -> URDF, diff-drive, single RGB camera |
| Control | effort_controllers/JointGroupEffortController |
| Perception | TensorFlow / Keras, ResNet-50 transfer learning |
| Vision utils | OpenCV (cv2.resize, image read) |
| Language | Python 2.7 (ROS nodes), Python 3 (training notebook) |
| Build | catkin |
- The robot drives reliably under teleop through the simulated world.
- The classifier was trained for 50 epochs on the Kaggle dataset; accuracy/loss curves and per-image predictions are in Chapter 6 of the project report. We have not re-verified those numbers since 2021, so I am not quoting a headline accuracy here on purpose.
- End-to-end loop (drive -> capture frame -> classify -> print label) works in simulation. The recorded run is in
Video Result.mp4.
- Classification, not detection. The model gives a whole-frame binary label, not a bounding box or a depth-localized pothole position. The next step would be a Faster R-CNN / YOLO-style detector, which the report scopes out in Chapter 2.6.
- Inference loop in
Prediction.pyis shell-based (os.system('rosrun image_view image_saver ...')) rather than subscribing directly to the camera topic viacv_bridge. That was a pragmatic shortcut to ship; the right fix is a proper subscriber. - Single camera, no depth. A stereo pair or a depth sensor would let us localize potholes in 3D rather than just flag their presence.
- Sim-only. Never deployed to a physical bot.
- There is a Python typo in
Prediction.py(y_pred = Trueshould be==) that I haven't fixed in-place to preserve the original submission state. Worth fixing on a port to Noetic.
.
├── Project Files/
│ ├── Detection/ # training notebook + standalone OpenCV/Keras script
│ ├── World/ # Gazebo screenshots + recorded run video
│ └── project_ws/ # catkin workspace (src + cached build/devel)
│ └── src/pkb/src/bot_urdf1/ # the ROS package
├── Project Report.pdf # 56-page report, ECED SVNIT, May 2021
├── Project Presentation.pptx
├── Video Result.mp4 # end-to-end demo
├── media/ # README screenshots
├── LICENSE
└── README.md
Team (BTech ECE, SVNIT Surat, May 2021):
- Dhruvil Parikh (U17EC153), me; simulation + perception integration
- Pankaj Kumar Vijayvergiya (U17EC122)
- Prakash Saini (U17EC151)
- Sarvesh Dubey (U17EC152)
Guide: Prof. A. H. Lalluwadia, ECED, SVNIT Surat.
Curated and re-published here as part of my GitHub portfolio: github.com/dparikh79 · dparikh79.github.io


