Home Algorithms Similarity Bray-Curtis dissimilarity
22 | 08 | 2017
Bray-Curtis dissimilarity PDF Print E-mail
Algorithms - Similarity
Written by Jan Schulz   
Wednesday, 12 September 2007 23:15

Bray-Curtis dissimilarity

 

Objective

The non-metric Bray-Curtis dissimilarity (Bray & Curtis 1957) delivers robust and reliable dissimilarity results for a wide range of applications. It is one of the most commonly applied measurements to express relationships in ecology, environmental sciences and related fields.

 

Equation

Bray-Curtis is a modified Manhattan measurement, where the summed differences between the variables are standardised by the summed variables of the objects. The general equation of the Bray-Curtis dissimilarity is:

Bray-Curtis-dissimilarity equation

In the equation dBCD is the Bray-Curtis dissimilarity between the objects i and j, k is the index of a variable and n is the total number of variables y. The Bray-Curtis similarity dBCS is a slightly modified equation. It can be directly calculated from the dissimilarity value:

dBCS = 1 - dBCD

In opposite to the dissimilarity approach a dBCS value of 0 means a complete absence of relationships.

 

Synonyms

Bray-Curtis similarity and dissimilarity values are often multiplied by 100 and given as percentile proportions. It is very similar to the definition of the Sørensen distance. Sometimes the term Czekanowski’s coefficient is erroneously used for Bray-Curtis indices.

 

Usage

When investigating data covering a wide range it might be useful to use a transformation beforehand. It must be considered that Bray-Curtis is not metric when choosing a statistic for the evaluation of the output matrix. When data are ≥0 the Bray-Curtis similarity is within the range of 0 to 1. A value of 1 indicates a complete matching of the two data records in the n-dimensional space. Both dBCD and dBCS are sometimes multiplied by 100 and given as percentile values.

Higher values impact the result of the Bray-Curtis similarity more dominant and imply that these variables are the likely to discriminate between objects. It is not affected by joint zeros (Field et al. 1982), but the result is undefined, when the variables among two objects are entirely 0. In this case the denominator becomes 0 and Clarke et al. (2006) suggest to use a zero-adjusted Bray-Curtis coefficient that includes a virtual dummy variable being 1 for all objects. In the numerator this variable subtracts to zero and in the denominator it sums to 2:

Zero-adjusted Bray-Curtis-dissimilarity equation

The effect is that objects with variables being entirely zero now have one variable in common and zero is returned.

 

Algorithm

The algorithm controls whether the data input matrix is rectangular or not. If not the function returns FALSE and a defined, but empty output matrix. When the matrix is rectangular the Bray-Curtis dissimilarity will be calculated. Therefore the dimensions of the respective arrays of the output matrix are set, and the titles for the rows and columns set. As the result is a square matrix, which is mirrored along the diagonal only values for one triangular part and the diagonal are computed. When errors occur during computation the function returns FALSE.

To calculate the Bray-Curtis similarity the Bray-Curtis dissimilarity matrix is computed first and thereafter transformed.

 

Source

Function dist_BrayCurtisDissimilarity (InputMatrix : T2dVariantArrayDouble; Var OutputMatrix : T2dVariantArrayDouble) : Boolean;
// The function CalcBrayCurtisDissimilarityMatrix calculates the Bray-Curtis dissimilarity
// matrix between several cases, which are expected in the rows. The variables are
// expected in the columns. Function returns FALSE if at least one cell can not be
// calculated. The result matrix is returned in OutputMatrix.
// (c) Jan Schulz, 24.December 2005; www.code10.info
Var InputCols : Integer;
InputRows : Integer;
OutputMatrixSize : Integer;
RunnerY : Integer;
RunnerX : Integer;
Numerator : Double;
Denominator : Double;
i : Integer;
FirstVal : Double;
SecondVal : Double;
Dissimilarity : Double;
Begin
// if one dimension is zero or matrix is not rectangular quit
If Not mtx_IsRectangular (InputMatrix, InputRows, InputCols) THen
Begin
// create an empty matrix, return FALSE and exit
mtx_Create (OutputMatrix, 1, 1, NaN, 'Erroneous Bray-Curtis dissimilarity matrix');
dist_BrayCurtisDissimilarity := False;
Exit;
end;

// let's expect the best case ...
dist_BrayCurtisDissimilarity := True;

// create an output matrix of required size
mtx_Create (OutputMatrix, InputRows, InputRows, NaN, 'Bray-Curtis dissimilarity matrix');

//copy the respective titles
For RunnerY := Low (InputMatrix.RowTitle) to High (InputMatrix.RowTitle) do
Begin
// names for rows and columns are the same in this triangualary matrix
OutputMatrix.RowTitle [RunnerY] := InputMatrix.RowTitle [RunnerY];
OutputMatrix.ColTitle [RunnerY] := InputMatrix.RowTitle [RunnerY];
end;

// compare every object
For RunnerY := Low (OutputMatrix.Cells) to High (OutputMatrix.Cells) do
Begin
// with every other
For RunnerX := Low (OutputMatrix.Cells) to RunnerY do
Begin
Numerator := 0;
Denominator := 0;
//use all variables of each object under comparison
For i := 0 to High (InputMatrix.Cells [0]) do
Begin
FirstVal := InputMatrix.Cells [RunnerX, i];
SecondVal := InputMatrix.Cells [RunnerY, i];

If Not (IsNAN (FirstVal) Or IsNan (SecondVal)) THen
Begin
Numerator := Numerator + Abs (FirstVal - SecondVal);
Denominator := Denominator + (FirstVal + SecondVal)
end
Else
Begin
dist_BrayCurtisDissimilarity := False;
end;
end;

// can we calculate a Bray-Curtis dissimilarity value for these two objects?
If Denominator <> 0 THen Dissimilarity := Numerator / Denominator
Else
Begin
// can not calculate as denominator is zero
Dissimilarity := NAN;
dist_BrayCurtisDissimilarity := False;
end;

// set the value on both sides of the diagonal or diagonal itself
OutputMatrix.Cells [RunnerX, RunnerY] := Dissimilarity;
OutputMatrix.Cells [RunnerY, RunnerX] := Dissimilarity;
end;
end;
end;



Function dist_BrayCurtisSimilarity (InputMatrix : T2dVariantArrayDouble; Var OutputMatrix : T2dVariantArrayDouble) : Boolean;
// The function dist_BrayCurtisSimilarity calculates the Bray-Curtis similarity
// matrix between several cases, which are expected in the rows. The variables are
// expected in the columns. Function returns FALSE if at least one cell can not be
// calculated. The result matrix is returned in OutputMatrix. This function depends
// on the function CalcBrayCurtisDissimilarityMatrix.
// (c) Dr. Jan Schulz, 24.December 2005; www.code10.info
Var RunnerX : Integer;
RunnerY : Integer;
Begin
// calculate the Bray-Curtis dissimilarity matrix
Result := dist_BrayCurtisDissimilarity (InputMatrix, OutputMatrix);

// convert dissimilarity matrix into a similarity matrix
For RunnerY := Low (OutputMatrix.Cells) to High (OutputMatrix.Cells) do
Begin
For RunnerX := Low (OutputMatrix.Cells [RunnerY]) to High (OutputMatrix.Cells [RunnerY]) do
Begin
OutPutMatrix.Cells [RunnerY, RunnerX] := 1 - OutPutMatrix.Cells [RunnerY, RunnerX];
end;
end;

If Result THen OutputMatrix.MatrixName := 'Bray-Curtis similarity matrix'
Else OutputMatrix.MatrixName := 'Erroneous Bray-Curtis similarity matrix';
end;

Example

For a data matrix aInputMatrix of the type t2dVariantArrayDouble, populated with:

Data

Var1

Var2

Var3

Case1

1

1

1

Case2

1

1

0

Case3

2

2

2

Case4

10

10

10

Case5

11

11

11

Case6

10

5

0


the call of:

aBooleanVar := dist_BrayCurtisDissimilarity (aInputMatrix, aOutputMatrix);

returns the respective Bray-Curtis dissimilaritiy matrix in aOutputMatrix:

Bray-Curtis

dissimilarity

Case1

Case2

Case3

Case4

Case5

Case6

Case1

0

0.200

0.333

0.818

0.833

0.778

Case2

0.200

0

0.500

0.875

0.886

0.765

Case3

0.333

0.500

0

0.667

0.692

0.619

Case4

0.818

0.875

0.667

0

0.048

0.333

Case5

0.833

0.886

0.692

0.048

0

0.375

Case6

0.778

0.765

0.619

0.333

0.375

0

Although the Euclidean distance between the objects Case1 and Case3 is the same as between Case4 and Case5, the Bray-Curtis dissimilarity indicates a higher relationship between the objects Case4 and Case5. This is due to the fact that the analysis gives more weight to variables with higher values. Thus, it is very useful when interested in analyses, where high joint presences are more important than sparse ones. This effect can be weakened by initial transformations.

 

Literature

Bray J.R., Curtis J.T. (1957): An ordination of the upland forest communities of Southern Wisconsin. Ecological Monographies 27:325-349.

Clarke K.R., Somerfield P.J., Chapman M.G. (2006): On resemblance measures for ecological studies, including taxonomic dissimilarities and a zero-adjusted Bray-Curtis coefficient for denuded assemblages. Journal of Experimental Marine Biology and Ecology 330:55-80.

Field J.G., Clarke K.R., Warwick R.M. (1982): A practical strategy for analysing multispecies distribution patterns. Marine Ecology Progress Series 8:37-52.

Last Updated on Monday, 23 November 2015 17:21
 
Sponsored Links
Polls
Where did you find helpful information on this site?