Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Finding Euclidean distance using Scikit-Learn in Python
In this article, we will learn to find the Euclidean distance using the Scikit-Learn library in Python. Euclidean distance measures the straight-line distance between two points in space and is widely used in machine learning algorithms, particularly clustering.
What is Euclidean Distance?
The Euclidean distance formula calculates the straight-line distance between two points in n-dimensional space ?
Scikit-Learn provides the euclidean_distances() function to calculate these distances efficiently for arrays of points.
Method 1: Distance from Points to Origin
Calculate the Euclidean distance from multiple points to the origin (0,0,0) ?
# importing euclidean_distances function from scikit-learn module
from sklearn.metrics.pairwise import euclidean_distances
# importing NumPy module with an alias name
import numpy as np
# input NumPy array with 3D points
input_array = np.array([[3.5, 1.5, 5],
[1, 4, 2],
[6, 3, 10]])
# calculating the euclidean distance between points and origin (0,0,0)
result_distance = euclidean_distances(input_array, [[0, 0, 0]])
# printing the resultant euclidean distance
print("Euclidean distances from origin:")
print(result_distance)
Euclidean distances from origin: [[ 6.28490254] [ 4.58257569] [12.04159458]]
Each row shows the distance from the corresponding point to the origin.
Method 2: Distance Between Two Arrays
Calculate pairwise distances between points in two different arrays ?
# importing euclidean_distances function from scikit-learn module
from sklearn.metrics.pairwise import euclidean_distances
# importing numpy library with an alias name
import numpy as np
# input numpy array 1
input_array_1 = np.array([[3.5, 1.5, 5],
[1, 4, 2],
[6, 3, 10]])
# input numpy array 2
input_array_2 = np.array([[5, 4, 2],
[4, 3, 1],
[8.5, 2, 6]])
# calculating the euclidean distance between input_array_1 and input_array_2
result_distance = euclidean_distances(input_array_1, input_array_2)
# printing the resultant euclidean distance
print("Pairwise Euclidean distances:")
print(result_distance)
Pairwise Euclidean distances: [[4.18330013 4.30116263 5.12347538] [4. 3.31662479 8.7321246 ] [8.1240384 9.21954446 4.82182538]]
The output is a 3×3 matrix where element [i,j] represents the distance between point i from the first array and point j from the second array.
Understanding the Output Matrix
| Array 1 Point | Array 2 Point [5,4,2] | Array 2 Point [4,3,1] | Array 2 Point [8.5,2,6] |
|---|---|---|---|
| [3.5,1.5,5] | 4.183 | 4.301 | 5.123 |
| [1,4,2] | 4.000 | 3.317 | 8.732 |
| [6,3,10] | 8.124 | 9.220 | 4.822 |
Applications in Machine Learning
Euclidean distance is fundamental in clustering algorithms like K-means, where it helps determine cluster membership by measuring similarity between data points. Points with smaller distances are considered more similar and are grouped into the same cluster.
Conclusion
Scikit-Learn's euclidean_distances() function efficiently computes distances between points and arrays. Use it for single-point-to-origin calculations or pairwise distance matrices between multiple point sets in machine learning applications.
