Featured

Nearest Neighbour : Distance Measures

There are many possible ways of measuring the distance between two instances with n attribute values, or equivalently between two points in n-dimensional space. We usually impose three requirements on any distance measure we use. We will use the notation dist(X, Y ) to denote the distance between two points X and Y .
  1. The distance of any point A from itself is zero, i.e. dist(A,A) = 0. 
  2. The distance from A to B is the same as the distance from B to A, i.e. dist(A,B) = dist(B,A) (the symmetry condition).
The third condition is called the triangle inequality (Figure Nr.04). It corresponds to the intuitive idea that "the shortest distance between any two points is a straight line". The condition says that for any points A, B and Z: dist(A,B) ≤ dist(A,Z) + dist(Z,B).
As usual, it is easiest to visualise this in two dimensions.
Triangle Inequality
Figure Nr.04 The Triangle Inequality

The equality only occurs if Z is the same point as A or B or is on the direct route between them. There are many possible distance measures, but the most popular is almost certainly the Euclidean Distance (Figure Nr.05). This measure is named after the Greek Mathematician Euclid of Alexandria, who lived around 300 bc and is celebrated as the founder of geometry.
We will start by illustrating the formula for Euclidean distance in two dimensions. If we denote an instance in the training set by (a1, a2) and the unseen instance by (b1, b2) the length of the straight line joining the points is (by Pythagoras Theorem)
nearest-neighbour4

If there are two points (a1, a2, a3) and (b1, b2, b3) in a three-dimensional space the corresponding formula is


The formula for Euclidean distance between points (a1, a2, . . . , an) and (b1, b2, . . . , bn) in n-dimensional space is a generalisation of these two results. The Euclidean distance is given by the formula



Another measure sometimes used is called Manhattan Distance or City Block Distance. The analogy is with travelling around a city such as Manhattan, where you cannot (usually) go straight from one place to another but only by moving along streets aligned horizontally and vertically.

Figure Nr.05 Example of Euclidean Distance

The City Block distance between the points (4, 2) and (12, 9) in Figure Nr.06 is
(12 − 4) + (9 − 2) = 8 + 7 = 15.

Figure Nr.06 Example of City Block Distance

A third possibility is the maximum dimension distance. This is the largest absolute difference between any pair of corresponding attribute values. (The absolute difference is the difference converted to a positive number if it is negative.) For example the maximum dimension distance between instances

is 12.4 − (−7.1) = 19.5
For many applications, Euclidean distance seems the most natural way of measuring the distance between two instances.



www.CodeNirvana.in

Copyright © Computer Science | Blogger Templates | Designed By Code Nirvana