Hamming distance
Objective
The Hamming distance (Hamming 1950) is a metric expressing the distance between two objects by the number of mismatches among their pairs of variables. It is mainly used for string and bitwise analyses, but can also be useful for numerical variables. Although the basic Hamming distance is a metric, the here presented version allows to define a threshold. Variables having an absolute difference below the threshold are considered as equal. Using values larger than 0 for this threshold the triangle inequality could be violated for some calculated distances. Using 0 as threshold it is the original and metric Hamming distance, thresholds below 0 are not defined.
Equation
In the equation dHAD is the Hamming distance between the objects i and j, k is the index of the respective variable reading y out of the total number of variables n. The Hamming distance itself gives the number of mismatches between the variables paired by k.
Synonyms
There are no common synonyms.
Usage
The Hamming distance is an important measurement for the detection of errors in information transmission (Hamming 1950). Beyond this application its usage is valuable for the investigation of e.g. ranked variables of entities coded by numerical tokens.
Algorithm
The algorithm controls whether the data input matrix is rectangular or not. If not the function returns FALSE and a defined, but empty output matrix. When the matrix is rectangular the Hamming distance is calculated. Therefore the dimensions of the respective arrays of the output matrix and the titles for the rows and columns set. As the result is a square matrix, which is mirrored along the diagonal only values for one triangular half and the diagonal are computed. When errors occur during computation the function returns FALSE.
Source
Function dist_Hamming (InputMatrix : t2dVariantArrayDouble; aMaxDiff: Double; Var OutputMatrix : t2dVariantArrayDouble) : Boolean; // The function CalcHammingDistanceMatrix calculates the Hamming distance between // several cases, which are expected in the rows. The variables are expected in the columns. // Function returns FALSE if at least one cell can not be calculated. The result // matrix is returned in OutputMatrix. // (c) Dr. Jan Schulz, 04.May 2008; www.code10.info Var OutputMatrixSize : Integer; InputCols : Integer; InputRows : Integer; RunnerY : Integer; RunnerX : Integer; i : Integer; UnequalItems : Integer; FirstVal : Double; SecondVal : Double; Begin // if one dimension is zero or matrix not rectangular If Not mtx_IsRectangular (InputMatrix, InputRows, InputCols) THen Begin //create an empty matrix, return FALSE and exit mtx_Create (OutputMatrix, 1, 1, 0, 'Erroneous Hamming distance matrix'); dist_Hamming := False; Exit; end;
// maximum allowed difference between two variables needs to be positive aMaxDiff := Abs (aMaxDiff);
// let's expect the best case ... dist_Hamming := True;
// define and set the row dimension of the result matrix OutputMatrixSize := High (InputMatrix.Cells) + 1; SetLength (OutputMatrix.Cells, OutputMatrixSize);
// create the column dimension of the array For RunnerY := Low (OutputMatrix.Cells) to High (Outputmatrix.Cells) do Begin SetLength (OutputMatrix.Cells [RunnerY], OutputMatrixSize); end;
// define title of matrix OutputMatrix.MatrixName := 'Hamming distance matrix';
// Set Row/Col-Title for the new matrix SetLength (OutputMatrix.RowTitle, OutputMatrixSize); SetLength (OutPutMatrix.ColTitle, OutPutMatrixSize); For RunnerY := Low (InputMatrix.RowTitle) to High (InputMatrix.RowTitle) do Begin // names for rows and columns are the same in this triangualary matrix OutputMatrix.RowTitle [RunnerY] := InputMatrix.RowTitle [RunnerY]; OutputMatrix.ColTitle [RunnerY] := InputMatrix.RowTitle [RunnerY]; end;
// compare every object For RunnerY := Low (OutputMatrix.Cells) to High (OutputMatrix.Cells) do Begin //with every other object For RunnerX := Low (OutputMatrix.Cells) to RunnerY do Begin UnequalItems := 0;
//include all variables in analysis For i := 0 to High (InputMatrix.Cells [0]) do Begin FirstVal := InputMatrix.Cells [RunnerX, i]; SecondVal := InputMatrix.Cells [RunnerY, i];
// are variables equal within a certain range? If Abs (FirstVal - SecondVal) > aMaxDiff THen Begin // calculation impossible as invalid numbers were found UnequalItems := UnequalItems + 1; end; end;
// set the calculated value on both sides of diagonal and diagonal itself OutputMatrix.Cells [RunnerX, RunnerY] := UnequalItems; OutputMatrix.Cells [RunnerY, RunnerX] := UnequalItems; end; end; end;
Example
For a data matrix aInputMatrix of the type t2dVariantArrayDouble, populated with:
Data
|
Var1
|
Var2
|
Var3
|
Case1
|
1
|
1
|
1
|
Case2
|
1
|
1
|
0
|
Case3
|
2
|
2
|
2
|
Case4
|
10
|
10
|
10
|
Case5
|
11
|
11
|
11
|
Case6
|
10
|
5
|
0
|
the call of:
aBooleanVar := dist_Hamming (aInputMatrix, 0, aOutputMatrix);
returns the respective matrix of the original Hamming distance in aOutputMatrix:
Hamming
distance
|
Case1
|
Case2
|
Case3
|
Case4
|
Case5
|
Case6
|
Case1
|
0
|
1
|
3
|
3
|
3
|
3
|
Case2
|
1
|
0
|
3
|
3
|
3
|
2
|
Case3
|
3
|
3
|
0
|
3
|
3
|
3
|
Case4
|
3
|
3
|
3
|
0
|
3
|
2
|
Case5
|
3
|
3
|
3
|
3
|
0
|
3
|
Case6
|
3
|
2
|
3
|
2
|
3
|
0
|
The Hamming distance simply counts the number of differences between the paired variables. Thus the distances are unaffected by the distance of the object from the origin.
Literature
Hamming R.W. (1950): Error detecting and error correcting codes. Bell System Technical Journal 26(2):147-160.
|