|Algorithms - Similarity|
|Written by Jan Schulz|
|Wednesday, 12 September 2007 23:15|
The non-metric Bray-Curtis dissimilarity (Bray & Curtis 1957) delivers robust and reliable dissimilarity results for a wide range of applications. It is one of the most commonly applied measurements to express relationships in ecology, environmental sciences and related fields.
Bray-Curtis is a modified Manhattan measurement, where the summed differences between the variables are standardised by the summed variables of the objects. The general equation of the Bray-Curtis dissimilarity is:
In the equation dBCD is the Bray-Curtis dissimilarity between the objects i and j, k is the index of a variable and n is the total number of variables y. The Bray-Curtis similarity dBCS is a slightly modified equation. It can be directly calculated from the dissimilarity value:
dBCS = 1 - dBCD
In opposite to the dissimilarity approach a dBCS value of 0 means a complete absence of relationships.
Bray-Curtis similarity and dissimilarity values are often multiplied by 100 and given as percentile proportions. It is very similar to the definition of the Sørensen distance. Sometimes the term Czekanowski’s coefficient is erroneously used for Bray-Curtis indices.
When investigating data covering a wide range it might be useful to use a transformation beforehand. It must be considered that Bray-Curtis is not metric when choosing a statistic for the evaluation of the output matrix. When data are ≥0 the Bray-Curtis similarity is within the range of 0 to 1. A value of 1 indicates a complete matching of the two data records in the n-dimensional space. Both dBCD and dBCS are sometimes multiplied by 100 and given as percentile values.
Higher values impact the result of the Bray-Curtis similarity more dominant and imply that these variables are the likely to discriminate between objects. It is not affected by joint zeros (Field et al. 1982), but the result is undefined, when the variables among two objects are entirely 0. In this case the denominator becomes 0 and Clarke et al. (2006) suggest to use a zero-adjusted Bray-Curtis coefficient that includes a virtual dummy variable being 1 for all objects. In the numerator this variable subtracts to zero and in the denominator it sums to 2:
The effect is that objects with variables being entirely zero now have one variable in common and zero is returned.
The algorithm controls whether the data input matrix is rectangular or not. If not the function returns FALSE and a defined, but empty output matrix. When the matrix is rectangular the Bray-Curtis dissimilarity will be calculated. Therefore the dimensions of the respective arrays of the output matrix are set, and the titles for the rows and columns set. As the result is a square matrix, which is mirrored along the diagonal only values for one triangular part and the diagonal are computed. When errors occur during computation the function returns FALSE.
To calculate the Bray-Curtis similarity the Bray-Curtis dissimilarity matrix is computed first and thereafter transformed.
Function dist_BrayCurtisDissimilarity (InputMatrix : T2dVariantArrayDouble; Var OutputMatrix : T2dVariantArrayDouble) : Boolean;
For a data matrix aInputMatrix of the type t2dVariantArrayDouble, populated with:
aBooleanVar := dist_BrayCurtisDissimilarity (aInputMatrix, aOutputMatrix);
returns the respective Bray-Curtis dissimilaritiy matrix in aOutputMatrix:
Although the Euclidean distance between the objects Case1 and Case3 is the same as between Case4 and Case5, the Bray-Curtis dissimilarity indicates a higher relationship between the objects Case4 and Case5. This is due to the fact that the analysis gives more weight to variables with higher values. Thus, it is very useful when interested in analyses, where high joint presences are more important than sparse ones. This effect can be weakened by initial transformations.
Bray J.R., Curtis J.T. (1957): An ordination of the upland forest communities of Southern Wisconsin. Ecological Monographies 27:325-349.
Clarke K.R., Somerfield P.J., Chapman M.G. (2006): On resemblance measures for ecological studies, including taxonomic dissimilarities and a zero-adjusted Bray-Curtis coefficient for denuded assemblages. Journal of Experimental Marine Biology and Ecology 330:55-80.
Field J.G., Clarke K.R., Warwick R.M. (1982): A practical strategy for analysing multispecies distribution patterns. Marine Ecology Progress Series 8:37-52.
|Last Updated on Monday, 23 November 2015 17:21|