Home Algorithms Similarity Canberra distance 06 | 12 | 2019
Feed Display
 heise online News Nachrichten nicht nur aus der Welt der Computer Latest News
Popular
 Canberra distance   Algorithms - Similarity
Written by Jan Schulz
Thursday, 13 September 2007 08:21 # Canberra distance

## Objective

The Canberra distance is a metric function often used for data scattered around an origin. It was introduced in 1966 (Lance & Williams 1966) and is today mainly used in the form of 1967 (Lance & Williams 1967).

## Equation

The Canberra metric is similar to the Manhattan distance (which itself is a special form of the Minkowski distance). The distinction is that the absolute difference between the variables of the two objects is divided by the sum of the absolute variable values prior to summing. The generalised equation is given in the form: This is a slightly modified form compared to the original form given by Lance & Williams (1966) and was suggested by Adkins (reference in Lance & Williams 1967). In the equation dCAD is the Canberra distance between the two objects i and j, k is the index of a variable and n is the total number of variables y.

## Synonyms

No common synonyms are established for the Canberra metric.

## Usage

In the original form of the Canberra metric data must not be signed. The modified form according to Adkins (in Lance & Williams 1967) has the property that the result becomes unity when the variables are of opposite sign. It is useful in the special case where signs represent differences in kind rather than in degree (Lance & Williams 1967). Anyhow, it is mainly used for values > 0. This metric is easily biased for measures around the origin and very sensitive for values close to 0, where it is more sensitive to proportional than to absolute differences (Lance & Williams 1967). This feature becomes more apparent in higher dimensional space, respectively an increasing number of variables. It is in turn less influenced than the Manhattan distance by variables with high values (Krebs 1989). As a very sensitive measurement it is applicable to identify deviations from normal readings (e.g. Emran & Ye 2001).

## Algorithm

The algorithm controls whether the data input matrix is rectangular or not. If not the function returns FALSE and a defined, but empty output matrix. When the matrix is rectangular the Canberra distance is calculated. Therefore the dimensions of the respective arrays of the output matrix and the titles for the rows and columns set. As the result is a square matrix, which is mirrored along the diagonal only values for one triangular part and the diagonal are computed. When errors occur during computation the function returns FALSE.

## Source

`Function dist_Canberra (InputMatrix : t2dVariantArrayDouble; Var OutputMatrix : t2dVariantArrayDouble) : Boolean;// The function dist_CalcCanberraMatrix calculates the Canberra dissimilarity// matrix between several cases, which are expected in the rows. The variables are// expected in the columns. Function returns FALSE if at least one cell can not be// calculated. The result matrix is returned in OutputMatrix.// (c) Dr. Jan Schulz, 25.April 2007; www.code10.infoVar  InputCols        : Integer;     InputRows        : Integer;     OutputMatrixSize : Integer;     RunnerY          : Integer;     RunnerX          : Integer;     Numerator        : Double;     Denominator      : Double;     i                : Integer;     FirstVal         : Double;     Summed           : Double;     NoOfValues       : Integer;     SecondVal        : Double;     Dissimilarity    : Double;Begin  // if one dimension is zero  If Not mtx_IsRectangular (InputMatrix, InputRows, InputCols) THen  Begin    // create an empty matrix, return FALSE and exit    mtx_Create (OutputMatrix, 1, 1, 0, 'Erroneous Canberra distance matrix');    dist_Canberra := False;    Exit;  end;  // create an output matrix of required size  mtx_Create (OutputMatrix, InputRows, InputRows, NaN, 'Canberra distance matrix');  //copy the respective titles  For RunnerY := Low (InputMatrix.RowTitle) to High (InputMatrix.RowTitle) do  Begin    // names for rows and columns are the same in this triangualary matrix    OutputMatrix.RowTitle [RunnerY] := InputMatrix.RowTitle [RunnerY];    OutputMatrix.ColTitle [RunnerY] := InputMatrix.RowTitle [RunnerY];  end;  // let's expect the best case ...  dist_Canberra := True;  // compare every object  For RunnerY := Low (OutputMatrix.Cells) to High (OutputMatrix.Cells) do  Begin    // with every other    For RunnerX := Low (OutputMatrix.Cells) to RunnerY do    Begin      NoOfValues  := 0;      Summed      := 0;      // include all variables in analysis      For i := 0 to High (InputMatrix.Cells ) do      Begin        FirstVal  := InputMatrix.Cells [RunnerX, i];        SecondVal := InputMatrix.Cells [RunnerY, i];        // no 'Not A Number' at all?        If Not (IsNAN (FirstVal) Or IsNAN (SecondVal)) THen        Begin          Numerator   := Abs (FirstVal - SecondVal);          Denominator := Abs (FirstVal) + Abs (SecondVal);          If Denominator <> 0 THen          Begin            // increase number of variable pairs and add the term            Inc (NoOfValues);            Summed := Summed + (Numerator / Denominator);          end          Else          Begin            // can not calculate as denominator is zero            dist_Canberra := False;          end;        end        Else        Begin          // can not calculate as invalid numbers were found          dist_Canberra := False;        end;      end;      // at least one valid pair of variables found?      If NoOfValues > 0 THen      Begin        Dissimilarity := Summed;      end      Else      Begin        Dissimilarity := NAN;      end;      // set the value on both sides of the diagonal or diagonal itself      OutputMatrix.Cells [RunnerX, RunnerY] := Dissimilarity;      OutputMatrix.Cells [RunnerY, RunnerX] := Dissimilarity;    end;  end;end; `

## Example

For a data matrix aInputMatrix of the type t2dVariantArrayDouble, populated with:

 Data Var1 Var2 Var3 Case1 1 1 1 Case2 1 1 0 Case3 2 2 2 Case4 10 10 10 Case5 11 11 11 Case6 10 5 0

the call of:

aBooleanVar := dist_Canberra (aInputMatrix, aOutputMatrix);

returns the respective Canberra distance matrix in aOutputMatrix:

 Canberra distance Case1 Case2 Case3 Case4 Case5 Case6 Case1 0 1.000 1.000 2.455 2.500 2.485 Case2 1.000 0 1.667 2.636 2.667 1.485 Case3 1.000 1.667 0 2.000 2.077 2.095 Case4 2.455 2.636 2.000 0 0.143 1.333 Case5 2.500 2.667 2.077 0.143 0 1.423 Case6 2.485 1.485 2.095 1.333 1.423 0

Although the Euclidean distance between the objects Case1 and Case3 is the same as between Case4 and Case5, the Canberra distance indicates a higher distance between the objects Case1 and Case3. This is due to the fact that the analysis is more sensitive for values closer to the origin. The distance between Case2 and Case6 is shorter, as the joint zeroes in variable 3 are omitted.

## Literature

Emran S.M., Ye N. (2001): Robustness of Canberra metric in computer intrusion detection. Proceedings of the 2001 IEEE, Workshop on Information Assurance and Security, United States Military Academy, West Point, New York, 5-6 June, 2001.

Krebs C.J. (1989): Ecological Methodology. Harper-Collins, New-York.

Lance G.N., Williams W.T. (1966): Computer programs for hierarchical polythetic classification (“similarity analysis”). Computer Journal 9:60-64.

Lance G.N., Williams W.T. (1967): Mixed-data classificatory programs, I.) Agglomerative Systems. Australian Computer Journal 1:15-20.

Last Updated on Friday, 18 March 2011 18:16 