Well, having discussed sequence alignment for the last couple of weeks I thought this would be a good time to talk about how finding a perfect match works. That is, how to find a sequence, like this one
… inside another (database of) other sequence(s):
TTAGCTACTTACTAGCGTCATATCACTTATCTAGCATTCATCTGTACGTATCTAC
TTAGCTACTTACTAGCGTCATATCCCTTATCTAGCATTCATCTGTACGTATCTAC
TTATCGACTATCATGACTATAGCTATCTGAGGTCAGTCGTTACTATATTATTATCTGCGCGCGATTACGTAGCTACTTACTAGCGTCATATCCCTTATCTAGCATTCATCTGTACGTATCTAC
… and of course you know you just use Command-F. But how do we find a matching string that’s 25 bases long, in a list of 100,000,000,000 bases? And then, how do you find all of them? That’s the question that we have to be able to answer, if we
want to, say, do BLAST.
Given that sequence alignment is essentially a bit slow (it’s O(n^2): takes roughly an amount of time proportional to the square of the sequence lengths), we need to use a fast method to find matching strings, that
doesn’t have to use alignment.
I’ll talk about these things on Thursday of this week:
- finding the perfect match
- hashing
- BLAST
See you then! (I hope…)
Cheers
Mike
Michael Charleston
Associate Professor in Bioinformatics
School of Physical Sciences
University of Tasmania
AUSTRALIA
phone: +61 3 6226 2444