How to find the perfect match

20 Sep 2016


      Are you troubled by not finding the perfect match?
Well, having discussed sequence alignment for the last couple of weeks I thought this would be a good time to talk about how finding a perfect match works. That is, how to find a sequence, like this one
TATCACT
… inside another (database of) other sequence(s):
TTAGCTACTTACTAGCGTCATATCACTTATCTAGCATTCATCTGTACGTATCTAC
TTAGCTACTTACTAGCGTCATATCCCTTATCTAGCATTCATCTGTACGTATCTAC
TTATCGACTATCATGACTATAGCTATCTGAGGTCAGTCGTTACTATATTATTATCTGCGCGCGATTACGTAGCTACTTACTAGCGTCATATCCCTTATCTAGCATTCATCTGTACGTATCTAC
… and of course you know you just use Command-F. But how do we find a matching string that’s 25 bases long, in a list of 100,000,000,000 bases? And then, how do you find all of them? That’s the question that we have to be able to answer, if we want to, say, do BLAST.
Given that sequence alignment is essentially a bit slow (it’s O(n^2): takes roughly an amount of time proportional to the square of the sequence lengths), we need to use a fast method to find matching strings, that doesn’t have to use alignment.
I’ll talk about these things on Thursday of this week:
*   finding the perfect match
  *   hashing
  *   BLAST
See you then! (I hope…)
Cheers
Mike
Michael Charleston
Associate Professor in Bioinformatics
School of Physical Sciences
University of Tasmania
AUSTRALIA
phone: +61 3 6226 2444
University of Tasmania Electronic Communications Policy (December, 2014).
This email is confidential, and is for the intended recipient only. Access, disclosure, copying, distribution, or reliance on any of it by anyone outside the intended recipient organisation is prohibited and may be a criminal offence. Please delete if obtained in error and email confirmation to the sender. The views expressed in this email are not necessarily the views of the University of Tasmania, unless clearly intended otherwise.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

How to find the perfect match