The most important application of sequence matching is in finding closely-related genes or proteins for a nucleotide or amino acid sequence of interest to biochemists. The most complete repository of DNA and protein sequences can be found at NCBI, the National Center for Biotechnology Information. NCBI maintains web site http://www.ncbi.nlm.nih.gov which, among other things, contains a listing of known DNA and protein sequences from a number of international public databases, complete with descriptions of their organisms and information about their discovery.
Along with the DNA and protein database, there are several publically available search mechanisms for finding similar genes of proteins for any sequence of interest. Our elementary sequence matching procedure has been implemented on a large scale in BLAST, the Basic Local Alignment Search Tool, available at the NCBI homepage.
BLAST comes in several forms, and in particular has separate versions for nucleotide and protein matches. Here are two that would be appropriate for you to use for your assignments.
To use BLAST, simply copy and paste your sequence of interest in the Search box, and then press the BLAST! button. This button will change to Format!, and when the matching sequences have been found, pushing the Format! button will give you a set of similar sequences. (Hint: The search may take a minute or so, despite what they tell you. If you click Format! early, you will get a occasionally updated page that will eventually give you the matches. You may delete this page and reclick on Format! if you become impatient with the page updates.)DNA BLAST webpage
protein BLAST webpage
BLAST will give you a list of the sequences or parts of sequences that match some subset of your sequence. The graph above this list shows where each of these sequences is located relative to your sequence. To see exactly how close your match is to any of the listed sequences, scroll down below this list, where the precise alignment is given. Clicking on the identifier for the sequence will give you more information about that sequence and where it comes from.
<>You can narrow your search to a particular organism by choosing from the Limit by entrez query - or select from: list of organisms.Other examples:
ggatcacct gcttcggcct cctcaaagtgc tgggcttaca ggcgcgccac tcgcggccggc
ctacatatc atactgtcta aatagcactta tttttctgg cagctttttg tagcttctgt
taaatctttt gccataaag ataactcttca cttcctcctt ttctctcgg atcccgagtg
tgtgtgtgtg tgtgtgtgtg tgtgtgtgtc tgtgtgtcac ccctattgaac tagtagaat
tcccagtacaa acttgaatag aagtgtgag agcacacatc ttttgtctaat gatgaactc
tgactgcttt ctcccccaaag ataacgtga tgttcgctgt aggacttttcg tagctgccct
ttgtcaggcc aaggcagttg ccacctactc atagcttgct gagagttctt attcgaata
gatvttagat tttttvttttc agatgtttt tttgtgtgctc aacgcagctt acatgttat
tttcattagg gtaacattaca ttgatgttt tctttactcg atcaaatactg acttttatt
agtctccact ccactcactca acacaatac gaccatcgag atcgccatgac cccaggctg
tgtgcacaca c
What is this? Does it correspond to a similar gene in rodents?
protein sequence
skavkyytle eiqkhnnsks twlilhykvy dltkfleehp ggeevlreqa ggdatenfed
vghstdarel sktfiigelh pddrskitkp ses
What is this? Does it correspond to a similar gene in
rodents?
How about bacteria?
Appendix
DNA BLAST penalty scores:
match: -1 (reward)protein BLAST penalty score matrix:
mismatch: 3
frst gap: 5
each additional gap: 2