Dot-plots with reverse analysis
For further references, tables and images please refer to the 'Introduction to dot-plots' article on this page.
Algorithm
To identify palindromic or reverted structures it is necessary to compare the source sequences T1 and T2 and in addition the complement strand of one source sequence with one of their reverted strands R1 and R2. In case of a genetic analysis R1 and R2 need to be the reverse complement of T1 and T2. Although reverse accessing of symbols for literal analysis could be performed directly on source sequence T, we introduce a method using R. The advantage is found in genetic analysis, where complementary bases are initially determined once and not each time the sliding window accesses a symbol in the substring.
As sequence length is unaffected by reversion the length of T1 and R1 is m and length of T2 and R2 is n. The number of pairwise matches at an alignment position (i, j) with a given window size w is evaluated between T1 and T2 as in identity dot plots. The number of possible alignments for T1 is k=m-w+1 and the respective number for T2 is l=n-w+1. Results are stored in a matrix M of size l×k. For every cell (i, j) of the matrix, with 1 ≤i ≤ l and 1 ≤ j ≤ k, substrings ST1,j and ST2,i are compared. Substrings ST1,j and ST2,i range from T1[j] to T1[j+w-1] and T2[i] to T2[i+w-1]. Within the window pairwise compared symbols at position p with 1 ≤ p ≤ w are addressed by T1[j+p-1] and T2[i+p-1]. Additionally, the number of matches between R1 and T2 is counted. For R1 alignments start from the rear end with an offset of w (Figure 5). Thus the symbol at position R1[m-j-w+p+1] is compared with T2[i+p-1] and T1[j+p-1] with R2[n-i-w+p+1] and pairwise matches for respective substrings are counted within the range for p. The opposite tails of R1 and T2 overlap with at least w symbols, comprising the outmost meaningful comparison for the respective window size .
|
Figure 5: Example of dot plot creation with a window size w=4 and identification of palindromic motifs. The two source strands are found in the middle and only strands next to each other are compared. In the window, two of the four symbols match when comparing one source sequence with the reverted sequence of the second. |
In M the higher value of the two evaluated substring comparisons T1/T2 and R1/T2 is stored. Although different symbols are evaluated match counts at (i, j) are equal between T1/R2 and R1/T2. Thus, it is just required to calculate the paired matches among substrings between the two source strands and one arbitrarily chosen source and the reverted of the other.
Source
Type t2dArrayInt = Array of Array of Integer;
Function CreateDot plotWithRev (aSequence1, aSequence2, aRevOfSeq1: String; MatchWindowSize : Integer; Var ResultMatrix : t2dArrayInt): Boolean; // The function CreateDot plotWithRev creates a matrix representing // the dot plot result of the comparison among two sequences with a // given window size including palindromic search. // (c) Dr. Jan Schulz, 10-14. December 2007, www.code10.info Const aScore = 1;
Var AlignsSeq1 : Integer; AlignsSeq2 : Integer; aPosInSeq1 : Integer; LengthSeq1 : Integer; LengthSeq2 : Integer; aPosInSeq2 : Integer; aScoreCounterSeq : Integer; aScoreCounterRev : Integer; aCounter : Integer; aNormPos1 : Integer; aNormPos2 : Integer; aRevPos1 : Integer; Begin // hoping for the best we expect no errors Result := True;
// get sequence lengths LengthSeq1 := Length (aSequence1); LengthSeq2 := Length (aSequence2);
//calculate number of aligns to render AlignsSeq1 := LengthSeq1 - MatchWindowSize + 1; AlignsSeq2 := LengthSeq2 - MatchWindowSize + 1;
// number of both aligns must be > 0 If (AlignsSeq1 <= 0) Or (AlignsSeq2 <= 0) THen Begin // Return empty but defined ResultMatrix, notify error & terminate SetLength (ResultMatrix, 0, 0); Result := False; Exit; end;
//allocate ResultMatrix SetLength (Resultmatrix, AlignsSeq2, AlignsSeq2);
// in sequence 1 a Matchwindow will be applied back off to every position ... aPosInSeq1 := AlignsSeq1; Repeat
// same with sequence 2 aPosInSeq2 := AlignsSeq2; Repeat
//prior to every new comparison reset the variables aCounter := MatchWindowSize-1; aScoreCounterSeq := 0; aScoreCounterRev := 0;
// compare all characters within the Matchwindow Repeat
// when two characters match they contribute to the results aNormPos1 := aPosInSeq1 + aCounter; aNormPos2 := aPosInSeq2 + aCounter; aRevPos1 := LengthSeq1 - (aPosInSeq1 + MatchWindowSize) + aCounter + 2;
If (aSequence1 [aNormPos1] = aSequence2 [aNormPos2]) THen aScoreCounterSeq := aScoreCounterSeq + aScore; If (aRevOfSeq1 [aRevPos1] = aSequence2 [aNormPos2]) THen aScoreCounterRev := aScoreCounterRev + aScore;
// decrease the number of remaining characters to compare aCounter := aCounter - 1;
Until aCounter < 0;
// after comparison we store the highest result in the array If aScoreCounterRev > aScoreCounterSeq THen aScoreCounterSeq := aScoreCounterRev;
// after comparison store result in ResultMatrix ResultMatrix [aPosInSeq2 - 1, aPosInSeq1-1] := aScoreCounterSeq;
// decrease remaining comparisons for sequence 2 with the resp. pos in the first aPosinSeq2 := aPosInSeq2 - 1; Until aPosInSeq2 <= 0;
//decrease the number of comparisons to be made for sequence 1 aPosinSeq1 := aPosInSeq1 - 1; Until aPosInSeq1 <= 0;
end;
|