Topic 3: Sequence similarity searches I

Methods - BLAST, FASTA. Theoretical principles. Scoring matrices (PAM, BLOSUM).


Are there any sequences related to my sequence?


BLAST


FASTA


Tasks

3.1

Using knowledge from the previouis lesson, find a protein sequence of the human neurofibromin 1. Choose one of the found Refseq sequences and examine the HomoloGene link (found among "Related information").


3.2

The sequence below corresponds to an evolutionarily conserved yeast gene whose plant homologues you are seeking:

>Scsec19
GGGATTGTAGATGTAGTTTCAACACGTCGGCTGATTTATCCCGATTTTGTTAGTAGAAAAGGTTCTACTT
CATTCTTGCTTGAGACGTCGTCCCATCAAATTTCTAACATAGTCTTTTTTCAAGGAAGGATATTTTTCAA
AGCAGGACTGCAATTAGTCTTTTCCTTTTCTTTACTCCCCTTCCATCATAACTGTTAGTGAATAACCACT
TATATAGCATAACACAATGGATCAAGAAACAATAGACACTGACTACGACGTGATTGTCTTAGGTACCGGT
ATTACCGAATGTATCTTATCTGGTTTACTCTCTGTAGATGGAAAAAAGGTATTACATATTGACAAGCAAG
ACCATTATGGTGGCGAAGCTGCTTCTGTGACCTTATCTCAATTGTATGAAAAATTTAAACAAAATCCGAT
CAGTAAAGAGGAACGGGAGTCCAAGTTTGGTAAAGATAGAGATTGGAATGTCGACTTAATTCCTAAATTC
CTGATGGCCAATGGTGAGCTGACAAATATTTTAATACATACCGATGTGACCAGATATGTCGATTTCAAGC
AAGTTTCTGGCTCCTACGTTTTTAAGCAAGGCAAAATTTACAAAGTGCCAGCTAATGAAATAGAAGCCAT
TTCATCGCCATTGATGGGTATTTTTGAAAAACGTAGAATGAAGAAATTTTTAGAATGGATTAGCTCTTAC
AAAGAAGATGACTTGTCCACTCATCAAGGATTAGACTTAGACAAGAATACCATGGATGAAGTGTATTATA
AATTTGGGTTAGGCAATTCTACCAAAGAATTCATCGGTCATGCAATGGCTTTATGGACCAATGATGACTA
CTTACAACAACCTGCTAGGCCATCGTTTGAGAGGATTTTGTTATATTGCCAAAGTGTTGCCCGTTACGGT
AAATCACCTTATTTGTATCCTATGTATGGGTTAGGCGAACTTCCACAAGGATTTGCTCGTTTGTCGGCTA
TTTACGGTGGTACTTACATGCTAGACACTCCAATTGATGAAGTATTGTATAAAAAAGACACAGGAAAATT
TGAAGGGGTCAAGACTAAGCTGGGAACTTTCAAGGCCCCATTGGTTATTGCTGATCCAACTTATTTTCCC
GAAAAATGTAAATCTACTGGTCAAAGAGTTATTAGAGCCATCTGTATTCTTAACCATCCAGTTCCGAACA
CCAGTAACGCGGATTCTTTACAAATTATTATCCCACAAAGCCAACTGGGAAGGAAAAGCGATATATACGT
TGCGATTGTTTCAGATGCGCATAACGTTTGCTCCAAGGGTCACTATTTAGCAATTATTTCTACAATCATT
GAAACTGATAAACCACATATAGAATTAGAGCCTGCTTTCAAACTTCTGGGACCAATCGAAGAAAAATTCA
TGGGAATTGCCGAATTATTTGAACCAAGAGAAGACGGCTCTAAGGATAACATTTACTTATCCAGATCATA
CGACGCATCCTCTCATTTCGAATCCATGACTGACGATGTTAAAGATATTTACTTCAGAGTAACAGGCCAC
CCATTAGTTCTAAAACAAAGACAAGAACAAGAAAAGCAGTAAATTCATACCTTTACGACTAAAGCAGCAA
TTGGAGGGTAAACTTATTTTTTCC

Try to locate the closest plant homologue of this sequence by searching for

  1. closest  plant nucleotide relatives by NCBI blastn (database nr, orgn viridiplantae)
  2. closest previously annotated plant homologues by NCBI blastx (database nr, orgn viridiplantae)
Compare the results of both searches and evaluate their significance. Provide the output files and identify a list of significant relatives of the query sequence from both methods. Which search method is better?


3.3

Use the clean sequence from Task 1.1 to find the closest Arabidopsis thaliana ESTs

Keep a multi-FASTA file of the best matches from megablast for future use.

Then try to find best matches from the Arabidopsis thaliana TAIR10 database at the Phytozome server. and inspect the results


3.4

Take the protein sequence obtained in Task 1.2 and use it to search the non-redundant Arabidopsis database
  1. by blastp (NCBI or AtGDB use the non-redundant Arabidopsis database or Atpep-TAIR10) or Phytozome (use target type Proteome)
  2. by tblastn on NCBI.
  3. by fasta on EBI (searching the Uniprot Viridiplantae database, as Arabidopsis option is not available there).
  4. by SSEARCH on EBI (demo only)

Compare and evaluate the results from at least two methods (i.e. provide the output files). What was your clone?