Are you troubled by not finding the perfect match?
Well, having discussed sequence alignment for the last couple of weeks I thought this would be a good time to talk about how finding a perfect match works. That is, how to find a sequence, like this one TATCACT
… inside another (database of) other sequence(s): TTAGCTACTTACTAGCGTCATATCACTTATCTAGCATTCATCTGTACGTATCTAC TTAGCTACTTACTAGCGTCATATCCCTTATCTAGCATTCATCTGTACGTATCTAC TTATCGACTATCATGACTATAGCTATCTGAGGTCAGTCGTTACTATATTATTATCTGCGCGCGATTACGTAGCTACTTACTAGCGTCATATCCCTTATCTAGCATTCATCTGTACGTATCTAC
… and of course you know you just use Command-F. But how do we find a matching string that’s 25 bases long, in a list of 100,000,000,000 bases? And then, how do you find all of them? That’s the question that we have to be able to answer, if we want to, say, do BLAST. Given that sequence alignment is essentially a bit slow (it’s O(n^2): takes roughly an amount of time proportional to the square of the sequence lengths), we need to use a fast method to find matching strings, that doesn’t have to use alignment.
I’ll talk about these things on Thursday of this week:
* finding the perfect match * hashing * BLAST
See you then! (I hope…)
Cheers Mike
Michael Charleston Associate Professor in Bioinformatics School of Physical Sciences University of Tasmania AUSTRALIA phone: +61 3 6226 2444
University of Tasmania Electronic Communications Policy (December, 2014). This email is confidential, and is for the intended recipient only. Access, disclosure, copying, distribution, or reliance on any of it by anyone outside the intended recipient organisation is prohibited and may be a criminal offence. Please delete if obtained in error and email confirmation to the sender. The views expressed in this email are not necessarily the views of the University of Tasmania, unless clearly intended otherwise.