Home Algorithms Dot-plots Identity dot-plots 05 | 12 | 2023
Feed Display Latest News
Popular
 Identity dot-plots   Algorithms - Dot-plots
Written by Jan Schulz
Tuesday, 10 June 2008 11:23 # Identity dot-plots

For the further references, tables and images please refer to the 'Introduction to dot-plots' article on this site.

## Algorithm

A robust approach to enhance the method of the initial example is the use of a frame with a window size of more than one symbol (McLachlan 1971). Inside the sliding window pairwise matching characters are counted. This allows comparing substrings of defined lengths from the sequences (Figure 3), which act as templates. Figure 3: Example of dot plot creation with a window size of w = 4. In the window, two out of four symbols match (space and ‘E’) and result in a grey level of 50% for the respective substring comparison.

Formally expressed it means sequence T1 and T2 are data words from a finite alphabet. Length of T1 is m and length of T2 is n. Symbols within T1 are addressed by index i, in T2 by index j. The term T1[i] refers to the ith symbol in the sequence T1. Identity dot plots compare all possible substrings with the length of the window size w with each other. The number of comparable substrings of length w from T1 is k=m-w+1 and the respective number of substrings from T2 is l=n-w+1. Results of the comparisons are stored in a l×k matrix M. For every element of the matrix M(i, j), with 1 i l and 1 j k, we compare the substrings ST1,j and ST2,i. Substrings ST1,j and ST2,i are defined as the w consecutive characters from T2[i] to T2[i+w-1] and T1[j] to T1[j+w-1]. In the substrings the pairwise matching symbols are counted. This is similar to the calculation of a Hamming distance (Hamming 1950), except that matches instead of mismatches contribute to the result. The resulting M is a characteristic presentation of the recurrent patterns in the included sequences T1 and T2 with respect to window size w. This means it is unique, although larger patterns remain when w is modified. Applying the method again to the DNA sequence EU127468 (Table 1), the difference becomes obvious (Figure 2, lower panel).

## Source

`Type t2dArrayInt = Array of Array of Integer;                   Function CreateDot plotIdentity (aSequence1, aSequence2 : String; MatchWindowSize : Integer; Var ResultMatrix : t2dArrayInt) : Boolean;// The function CreateDot plotIdentity creates a matrix representing// the identity dot plot result for comparison among two sequences// with a given window size. (c) Dr. Jan Schulz, 21-25. May 2007// www.code10.infoConst aScore        = 1;Var AlignsSeq1      : Integer;    AlignsSeq2      : Integer;    aPosInSeq1      : Integer;    aPosInSeq2      : Integer;    aTempResult     : Integer;    aCounter        : Integer;Begin  // hoping for the best we expect no errors  Result := True;  //calculate number of aligns to render  AlignsSeq1 := Length (aSequence1) - MatchWindowSize + 1;  AlignsSeq2 := Length (aSequence2) - MatchWindowSize + 1;  // number of both aligns must be > 0  If (AlignsSeq1 <= 0) Or (AlignsSeq2 <= 0) THen  Begin    // Return empty but defined ResultMatrix, notify error & terminate    SetLength (ResultMatrix, 0, 0);    Result := False;    Exit;  end;  //allocate ResultMatrix  SetLength (Resultmatrix, AlignsSeq2, AlignsSeq2);  // in sequence 1 a Matchwindow will be applied back off to every position ...  aPosInSeq1 := AlignsSeq1;  Repeat    // same with sequence 2    aPosInSeq2 := AlignsSeq2;    Repeat      //prior to every new comparison reset the variables      aCounter    := MatchWindowSize - 1;      aTempResult := 0;      // compare all symbols within the Matchwindow      Repeat        // when two characters match they contribute to the results        If aSequence1 [aPosInSeq1 + aCounter] = aSequence2 [aPosInSeq2 + aCounter] THen        Begin          aTempResult := aTempResult + aScore;        end;        // decrease the number of remaining characters to compare        aCounter := aCounter - 1;      // and find out whether finished with this sliding window      Until aCounter < 0;      // after comparison store result in ResultMatrix      ResultMatrix [aPosInSeq2 - 1, aPosInSeq1-1] := aTempResult;      // decrease remaining comparisons for sequence 2 with the resp. pos in the first      aPosinSeq2 := aPosInSeq2 - 1;    Until aPosInSeq2 <= 0;    //decrease the number of comparisons to be made for sequence 1    aPosinSeq1 := aPosInSeq1 - 1;  //exit when finished  Until aPosInSeq1 <= 0;end; `

Last Updated on Friday, 18 March 2011 17:58 