GEANT4 Particle Identification using ML Algorithms

The results from various ML algorithms for particle identification of a simplified GEANT4 based simulation.

Background

GEANT4

GEANT4 (Geometry and Tracking) is a platform for the simulation of the passage of particles through matter using Monte Carlo methods.

More information: https://en.wikipedia.org/wiki/Geant4

Preparing the Dataset

The following dataset is from a simplified GEANT4 based simulation of electron-proton inelastic scattering measured by a particle detector system.

Exploring the Dataset

There are a total of 7 columns. The id column identifies the particle (e.g., positron (-11), pion (211), kaon (321) and proton (2212)). The p column is momentum in GeV/c. The theta and beta columns are angles in radians. The nphe column is the number of photoelectrons. The ein column is the inner energy (GeV). The eout column is the outer energy (GeV).

Beta vs. Momentum

The Beta vs. Momentum plot shows the relationship between the β measured by the ToF system and the momentum, p obtained from TPC. The visible bands are from e+ (positron), π+ (pion), K- (kaon), and p (proton).

k-Nearest Neighbors (k-NN)

The k-nearest neighbors algorithm (k-NN) is a non-parametric classification method. For a classification problem, the input consists of the k closest training examples in a data set. The output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors. If k=1, then the object is simply assigned to the class of that single nearest neighbor.

More information: https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

Random Forests

Random forests or random decision forests are an ensemble learning method for tasks that operate by constructing a multitude of decision trees at training time. For classification tasks, the output of the random forest is the class selected by most trees. Random forests correct for decision trees' habit of overfitting to their training set.

More information: https://en.wikipedia.org/wiki/Random_forest

Multilayer Perceptron (MLP)

A multilayer perceptron (MLP) is a class of feedforward artifical neural network (ANN). An MLP consists of at least three layers of nodes, an input layer, a hidden layer, and an output layer. Except for the input nodes, each node is a neuron that uses nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training. The ReLU activation function was used in this project.

More information: https://en.wikipedia.org/wiki/Multilayer_perceptron