SimCQC

Inspiration

We believed that we could seriously reduce casualties in close quarter combat situations, and develop our ideas to help reduce risk in more general combat situations, by simulating interactions with agents, optimising agent interactions and then using this to help train humans to make correct decisions in various scenarios.

What it does

Users can generate their own custom close quarter combat scenarios. They can set walls, goals and numbers of 'attackers' and 'defenders'. We also used Claude in order to convert user text instructions into a standardised form which can then be used to generate the simulation scenario. The user can then also manually adjust this converted scenario, in order to fine tune to their exact requirements. In addition we provide the option to generate random scenarios, which could then be use to help develop the intuition of humans after they see the optimal strategies develop.

First we created a heuristic minimax algorithm, to develop some baseline strategy for out agents to improve upon.

We then run simulations on these scenarios, leveraging statistics such as distribution of distances between attackers, distance and displacement covered by attackers, average exploration, vision radius, and noise radius to train a reinforcement learning model to produce optimal sets of moves for the attackers to make. The strength of moves are determined by the number of deaths, and score, given by number of kills and proximity to goal points.

How we built it

We used object oriented programming to create our interactions between agents, react to create our UI and a Claude API to help streamline the process from user to backend.

Challenges we ran into

Had to train a model within 24 hours, when trying to implement the twin delayed algorithm (mentioned in future improvements) we found we had a lack of GPU power. It was also difficult to define an objective function for the model, as it is not a standard process such as a time series. Linked to this representing the data was challenging, as it was in a variety of forms, and again contained non-standard objects. Development of additional features, such as the checkpoints, also led to some front end issues.

What's next for SimCQC

We have begun experimenting with a twin delayed deterministic policy gradient algorithm and have an initial version of code that uses a custom Gazebo environment for off-policy reinforcement learning. We use the Bellmann equation to learn the Q-function and the Q-function to subsequently learn the policy. The system trains the drone across three linear and three angular velocities, while prioritizing collision probability constraints. We are still working on implementation with ROS2, which would allow us to communicate in real time with autonomous aerial vehicles.

Furthermore, we aim to be able to implement the ability for this AI to recommend adjustments to coordination strategies that our users input into it.