Inspiration

An image can be described using color and textual features of itself as explained by its constituent pixels. These features can be used to distinguish between images since each image will have its own set of features. Hence, we can perform classification tasks as well as similarity computation using these features. In this phase of the project, we aim to use Personalized PageRank, Support Vector Machines, Decision Trees and Locality Sensitive Hashing to perform classification and similarity computation between features. Further, we experiment with a clustering algorithm and try to perform classification with it. Lastly, we have built a relevance feedback system that reorders results according to the user’s selection of relevant and irrelevant images.

What it does

PageRank (PR): ​PageRank is an algorithm that is used to rank the nodes in a directed graph. It does this to assign an importance to each node in the graph depending on the number of links a node has as well as the quality of the edges to other nodes. If a node has a lot of incoming links and is associated with many other important nodes, it will be assigned a higher rank indicating that it is an important node in the graph.

o Personalized PageRank (PPR): Personalized PageRank is an extension of the PageRank algorithm wherein instead of restarting from a random node in the graph, restarts are only allowed from a given predetermined set of nodes. This makes the random walk biased and localized towards the given set of nodes. o Locality Sensitive Hashing (LSH)​: LSH is a hashing algorithm. It reduces the search space by reducing the dimensionality and helps maintain local relations of the original data, thereby assisting in identifying approximate nearest neighbors. In essence, it reduces the number of comparisons needed to identify data points that are similar to each other in the given feature space. Consider that there are 2 points - x and y. For a locally sensitive hash function H, ▪ Probability of H(x) = H(y) will be high if x and y are close to each other in the feature space. ▪ Probability of H(x) = H(y) will be low if x and y are not near each other in the feature space.

o K-Means Clustering: K-Means clustering is an unsupervised learning techniques. Given the number of clusters, K-means finds the K clusters such that error (sum of distances of points to their respective cluster centers) is minimized. We used 10e4 as the threshold to stop the iteration. Program will terminate and return the cluster centers when the error is less than the threshold. Centers are initialized randomly based on the data distribution. This leads to local minima and optimal centers for each run might be different. For this, we ran the K-Means 100 times and return the centers that give us the least error.

o Support Vector Machine (SVM): SVM is a supervised learning classifier which tries to find an optimal hyperplane dividing two classes. It is a large margin classifier that tries to maximise the margin between the two separate classes. The “support vectors” that it stands on art the closest points of the two classes, which the algorithm uses to calculate the optimal decision boundary. We train a linear SVM, that calibrates its weights, initialized as zero, to find this decision

boundary. Classification of an image is done through dot product of these weights with the feature vector of the image, if the dot product is greater than 1, the image is classified as having a positive label, whatever the positive label was set as.

o Decision Tree: We are implementing the standard decision tree for two class binary classification of dorsal and palmar images. Gini index and Information gain are calculated at every level for each of the remaining features to further fit the data with corresponding labels. We use Gini index to select the best discriminative feature for training. Information gain can be used to prune the tree if we are not satisfied by the value of adding a new level.

How I built it

Description of the Proposed Solution/Implementation

● Task 1:

○ Inputs: ■ Number of latent semantics to be considered, ​k ■ Folder containing labeled images ■ Folder containing unlabelled images

○ Initially, we collect the dorsal images as well as the palmar images from the labeled folder. We applied Color moments feature extractor on it ( also tried Histogram of Oriented Gradients) to extract the features. We then perform SVD on these feature vectors to get ​k ​ latent semantics. ○ In doing so, we get the respective V matrix (​latent semantics in the feature space ​ ) that we save. ○ At the same time, we are taking the average of the U matrix as well. Intuitively, this is considering the dorsal and palmar images to be their respective clusters and taking the mean value of the respective U matrices will give us the respective cluster centers. ○ We now calculate average intra cluster distance for each cluster and store this too. ○ Once we get the input test image, its feature vector is converted to the latent semantic space by doing a dot product with the V matrix of dorsal space and V matrix of palmar space. ○ This gives us the representation of the input test image in the new vector spaces. ○ To perform classification, we find the distance between the input test image in the new latent semantic space and the centroid in that space and divide it by the average intra cluster distance of that set of labels. ○ The lesser the value, the closer the point is to that distribution. This point will be assigned a class label belonging to whichever distribution it is closest to.

○ Outputs: ■ Class labels of the test images as estimated by the model are outputted. ■ Classification accuracy

● Task 2:

○ Inputs: ■ Folder containing labelled images ■ Folder containing unlabelled images

■ Number of clusters, ​c ○ Initially, we collect the dorsal images as well as the palmar images from the labeled folder. We applied Color moments feature extractor on it ( also tried Histogram of Oriented Gradients) to extract the features. ○ The dimensionality reduction technique, SVD, is used to reduce the feature size and maintain only the most relevant features. ○ We use K-means clustering technique to get the clusters. Cluster centers are updated as described in the introduction section. ○ Based on the input, ​c clusters of dorsal images and​c clusters of palmar images are created using the labeled set of images in the folder. The labels are retrieved from the metadata of the images. ○ The new input test image is mapped in the space and a voting method is used to select which cluster is the test image closest to. ○ Here, we compare with the​c/2 ​ closest clusters and vote to find out which label has the majority. The input test image is labeled with that majority label.

○ Output: ■ The labels of the test images as classified by the model. ■ Classification accuracy ■ Visual image of the clusters

● Task 3:

○ Inputs: ■ The number of edges that should be present from each image, ​k ■ The number of similar images to be visualized, ​K ■ Input folder containing images ■ Image ids for restarts

○ The features are extracted using LBP, after which, the similarity matrix is created for the set of images in the folder using cosine similarity metric ○ For each image’s similarity order for the other images, we select the top k closest images and those k will be the outgoing edges for the corresponding image. This will give a matrix of nxn dimensions, which we treat as a graph. ○ The columns of this graph are normalized so that they can be considered as probabilities (initially the k outgoing probabilities are equivalent for each column, other values are all zeros) ○ The restart vector is initialized as all zeros, only the restart nodes are given the value 1 and then it is normalized. ○ There are two methods to compute the saturation of the values: first is an iterative method, which is computed until the convergence criterion or the computing constraint permits. The steady state probability of a node can be

given by: . The second method is the inverse method, which is expensive depending on the size of the matrix, and the expression can be given as: -1​ x I (1 )T]Π = [ − −β s β

○ Output: ■ Top K most dominant images ■ Visualization of the results and their scores

● Task 4:

○ Inputs: ■ Classifier model ■ Folder containing labeled images ■ Folder containing unlabeled images

○ Decision Tree Classifier: ■ The decision tree is trained on the entire dataset by considering the feature with maximum Gini index at each node. ■ The Gini index is given by Gini = A/(A + B), where A is the number of samples that can be correctly classified by a particular feature. So, it is a value between zero and one that is a commonly accepted measure of deciding a split when training the decision tree as it describes the imbalance in the split. Higher imbalance is favorable as we want to separate the classes as easily as possible. ■ Information Gain is another metric that gives us the amount of increase in surety we can get using that particular to the split. It basically gives us the certainty of a potential split if chosen. ○ PPR Classifier: ■ This task is based on task 3, where a Personalized Page Rank algorithm is used to get similar images to given images. We are extending the algorithm to accommodate one new image (this one is unknown), along with many labeled images as provided to find out which of the known nodes are the outgoing nodes for the unlabeled node. ■ For every image in the unlabeled set, it is added to the list of labeled images, and after calculating the vectors and their similarity matrix, we train the personalized page rank with only the unlabeled image as the restart node. This will give us a greater probability of edges originating from our required node. ■ From these nodes, we get a set of nodes that are most relevant to the unlabeled node. We perform a majority voting among the labels of these k images and predict based on that. ○ SVM Classifier:

■ Linear SVM is implemented. The idea is to have a line (or hyperplane in higher dimensions) w​T​x + b = 0 as the decision boundary that separates the two classes such that the closest points to the line for either class lie on the gutters - w​T​x + b = 0. This ensures that the classifier finds a decision boundary such that the margin between the two classes is maximised. ■ Some corners are cut for quick training. Bias is effectively ignored, to save training time. The training time is set by the number of epochs, instead of achieving convergence. ■ As for the training itself, gradient descent is used to update the weights. The updates depends on the prediction. The prediction is calculated as the dot product of the weight vector and the image vector (there is a weight for every feature - so for the 1728 features generated for an image through color moments, there are a total of 1728 weights). If the prediction value is greater than 1, then the classification the update is: w = w - α2λw where α is the learning rate and λ is used to decay the learning rate, holding the value 1/current_epoch. However, if the prediction value is less than 1, that means the point is misclassified, so the update is: w = w + α(xy - 2λw) where x is the feature vector and y is the label for the current image/point. ■ We train the classifier for 200 epochs with a learning rate of 0.01, initializing the weights as zero. ■ After the training is done, for the images on which we are testing the classifier, the prediction value is generated as the dot product of the trained weights and the feature vector of each test image. If the prediction value is greater than 1, the image is classified as having a positive label (Dorsal is chosen as the positive label), otherwise it is classified as having a negative label (Palmar).

○ Output: ■ The class labels of the input test images are produced. ■ Test accuracy of the given model

● Task 5:

○ Inputs: ■ The number of layers, ​L ■ The number of hashes per layer, ​k ■ Number of similar images to be outputted, ​t

○ We implemented a version of LSH inspired by the implementation suggested in the class slides. Hash Functions: ​The hash functions used consists of mapping the input onto randomly created line segments. The line segments were created by selecting two random k-dimension points in the solution space(where k is the dimension of the input data). These points form a line. The line was normalized to have unit length. This will come in use later. The line is divided into 10 uniform segments based on the solution space. The bin is decided based on the dot product to the point which needs to be indexed. Let​p ​ be the point to be indexed. Let​s be the start of the line segment. Let​e ​ be the end of the unit line segment. Then the bin is decided by the variable, Bin: Bin = <(p - e), (e-s)> The combination of the hash function and the bin of the hash creates the hash value. Layers:​Each layer of LSH contains multiple hash functions. The number of layers and the number of hash functions per layer is decided by the user. Similarity: ​The similarity of points in the input with a query is measured by looking at the hash of the query. The bins related with the query hash is observed for each hash function. For each layer we look at the intersection of the bins related to the hash. This intersection is considered as the results for that layer. Between each layer the union of the results for the layers create the overall results. If the number of results is too low, the adjacent bins are included for each hash function. The number of adjacent bins to be included are increased until we get enough results.

○ Output: ■ The top t similar images, where t is a user provided variable.

● Task 6:

○ Input: ■ Relevance feedback system to be used. ○ From task 5 b, we have computed the 1000 similar images to Hand_0000674.jpg. We reorder the results based on the feedback (we don't modify query and compute new set of similar images) ■ At each iteration, we display the top 20 similar images ■ Through the command prompt, user can select the images that are relevant and images that are irrelevant. (User doesn’t have to label all the 20 images displayed). ■ This data is added to previously labelled (in the previous iterations) data and a binary classifier is trained.

■ Classifier classified all the 1000 images, and ranks each image. Images are sorted based on this ranking. ■ Top 20 images are displayed for the next iteration. ○ SVM based feedback system​: ■ Similar to task 4, given label data (2 classes), SVM is trained. ■ On the basis of trained weights, predictions can be made for images. If the dot product of the trained weights and the images feature vector is positive, the image can be classified as having a positive label (relevant). ■ In this case however, instead of calculating class predictions, it is the prediction value or the dot product that is saved. ■ Prediction value is used to rank the images. Images are sorted in descending order according to their prediction value. Highest value indicates most similar image to the query. ○ PPR based feedback system: ■ This task is based on task 3, where a Personalized Page Rank algorithm is used to get similar images to given images. We are extending the algorithm to accommodate one dynamically chosen query such that the images chosen as relevant by the user will further retrain the model and reach a new optimal solution. ■ The top scorers will be chosen for the re-resulting displayed subsequently. This method can be used iteratively with approximate weights being stored for long term efficient recommendation systems. ○ Decision Tree based feedback system: ■ Similar to task 4, given label data (2 classes), Decision tree is trained. ■ Using the trained decision tree, all the 1000 samples are classified and grouped into relevant and irrelevant. ■ Distance between the previously labelled relevant images and newly classified relevant images is measured and sorted in ascending order. ■ Similarly ordering is done for irrelevant images (descending order) ■ These two lists are concatenated and returned as the final sorted images list. ■ Top 20 of these images is displayed to the user for the next iteration. ○ Probabilistic relevance feedback system: ■ This feedback system is based on “Improving retrieval performance by relevance feedback” by Gerard Salton and Chris Buckley. ■ As the proposed method required the data to be in binary format, we converted the data to binary in the following way ● Normalize each feature value so that each feature ranges from 0 to 1 ● For each feature, compute the variance with feature 1. ○ If Variance is positive ■ 1 is assigned to the feature value if normalized value is greater than 0.5 ■ 0 is assigned otherwise ○ If variance is negative

■ 1 is assigned to the feature value if normalized value is less than 0.5 ■ 0 is assigned otherwise ■ Similarity between the current image and the query is computed by the following equation. If r​i​ is the number of relevant images in which feature i’s value is 1, R is the total number of relevant images, ni - ri is the number of irrelevant images in which feature i’s value is 1 and N is the total number of relevant and irrelevant documents.

sim(D,Q) = ∑​ t ​ i=1 ​ d ​ i ​ log [p ​ i ​ (1-u ​ i ​ ) / u ​ i ​ (1-p ​ i ​ )] Where,

p ​ i ​ = (r ​ i ​ +0.5) / (R+1) and, u ​ i ​ = (n ​ i ​ -r ​ i ​ +0.5)/(N-R+1) ■ At each iteration, images are sorted based on this similarity score (descending order) and top 20 are displayed to the user.

○ Outputs: ■ A set of images reordered according to user feedback (relevance as selected by user) is shown

Built With

Share this project:

Updates