- The distance of any point A from itself is zero, i.e. dist(A,A) = 0.
- The distance from A to B is the same as the distance from B to A, i.e. dist(A,B) = dist(B,A) (the symmetry condition).
As usual, it is easiest to visualise this in two dimensions.
Figure Nr.04 The Triangle Inequality
The equality only occurs if Z is the same point as A or B or is on the direct route between them. There are many possible distance measures, but the most popular is almost certainly the Euclidean Distance (Figure Nr.05). This measure is named after the Greek Mathematician Euclid of Alexandria, who lived around 300 bc and is celebrated as the founder of geometry.
We will start by illustrating the formula for Euclidean distance in two dimensions. If we denote an instance in the training set by (a1, a2) and the unseen instance by (b1, b2) the length of the straight line joining the points is (by Pythagoras Theorem)
If there are two points (a1, a2, a3) and (b1, b2, b3) in a three-dimensional space the corresponding formula is
The formula for Euclidean distance between points (a1, a2, . . . , an) and (b1, b2, . . . , bn) in n-dimensional space is a generalisation of these two results. The Euclidean distance is given by the formula
Another measure sometimes used is called Manhattan Distance or City Block Distance. The analogy is with travelling around a city such as Manhattan, where you cannot (usually) go straight from one place to another but only by moving along streets aligned horizontally and vertically.
Figure Nr.05 Example of Euclidean Distance
The City Block distance between the points (4, 2) and (12, 9) in Figure Nr.06 is
(12 − 4) + (9 − 2) = 8 + 7 = 15.
Figure Nr.06 Example of City Block Distance
A third possibility is the maximum dimension distance. This is the largest absolute difference between any pair of corresponding attribute values. (The absolute difference is the difference converted to a positive number if it is negative.) For example the maximum dimension distance between instances
is 12.4 − (−7.1) = 19.5
For many applications, Euclidean distance seems the most natural way of measuring the distance between two instances.
Related Posts :
Nearest Neighbour Classification
Nearest Neighbour: Normalisation
Data Classification
Nearest Neighbour Classification
Nearest Neighbour: Normalisation
Data Classification