Topic 3: Sequence similarity searches I
Methods - BLAST, FASTA. Theoretical principles. Scoring matrices (PAM, BLOSUM).
Are there any sequences related to my sequence?
- Did anyone find them already? Try Entrez related sequences!
- Sequence to database comparisons: methods - most frequent BLAST, FASTA ... (BUT heuristic searching is a bit risky; if you mind this, try SSEARCH)
BLAST
- theory (incl. the scoring matrices), manuals at the NCBI site
FASTA
- description of FASTA and related programs (incl. SSEARCH)
- help incl. sample data at EBI
Tasks
3.1
Using knowledge from the previouis lesson, find a protein sequence of the human neurofibromin 1. Choose one of the found Refseq sequences and examine the HomoloGene link (found among "Related information").
3.2
The sequence below corresponds to an evolutionarily conserved yeast gene whose plant homologues you are seeking:>Scsec19
GGGATTGTAGATGTAGTTTCAACACGTCGGCTGATTTATCCCGATTTTGTTAGTAGAAAAGGTTCTACTT
CATTCTTGCTTGAGACGTCGTCCCATCAAATTTCTAACATAGTCTTTTTTCAAGGAAGGATATTTTTCAA
AGCAGGACTGCAATTAGTCTTTTCCTTTTCTTTACTCCCCTTCCATCATAACTGTTAGTGAATAACCACT
TATATAGCATAACACAATGGATCAAGAAACAATAGACACTGACTACGACGTGATTGTCTTAGGTACCGGT
ATTACCGAATGTATCTTATCTGGTTTACTCTCTGTAGATGGAAAAAAGGTATTACATATTGACAAGCAAG
ACCATTATGGTGGCGAAGCTGCTTCTGTGACCTTATCTCAATTGTATGAAAAATTTAAACAAAATCCGAT
CAGTAAAGAGGAACGGGAGTCCAAGTTTGGTAAAGATAGAGATTGGAATGTCGACTTAATTCCTAAATTC
CTGATGGCCAATGGTGAGCTGACAAATATTTTAATACATACCGATGTGACCAGATATGTCGATTTCAAGC
AAGTTTCTGGCTCCTACGTTTTTAAGCAAGGCAAAATTTACAAAGTGCCAGCTAATGAAATAGAAGCCAT
TTCATCGCCATTGATGGGTATTTTTGAAAAACGTAGAATGAAGAAATTTTTAGAATGGATTAGCTCTTAC
AAAGAAGATGACTTGTCCACTCATCAAGGATTAGACTTAGACAAGAATACCATGGATGAAGTGTATTATA
AATTTGGGTTAGGCAATTCTACCAAAGAATTCATCGGTCATGCAATGGCTTTATGGACCAATGATGACTA
CTTACAACAACCTGCTAGGCCATCGTTTGAGAGGATTTTGTTATATTGCCAAAGTGTTGCCCGTTACGGT
AAATCACCTTATTTGTATCCTATGTATGGGTTAGGCGAACTTCCACAAGGATTTGCTCGTTTGTCGGCTA
TTTACGGTGGTACTTACATGCTAGACACTCCAATTGATGAAGTATTGTATAAAAAAGACACAGGAAAATT
TGAAGGGGTCAAGACTAAGCTGGGAACTTTCAAGGCCCCATTGGTTATTGCTGATCCAACTTATTTTCCC
GAAAAATGTAAATCTACTGGTCAAAGAGTTATTAGAGCCATCTGTATTCTTAACCATCCAGTTCCGAACA
CCAGTAACGCGGATTCTTTACAAATTATTATCCCACAAAGCCAACTGGGAAGGAAAAGCGATATATACGT
TGCGATTGTTTCAGATGCGCATAACGTTTGCTCCAAGGGTCACTATTTAGCAATTATTTCTACAATCATT
GAAACTGATAAACCACATATAGAATTAGAGCCTGCTTTCAAACTTCTGGGACCAATCGAAGAAAAATTCA
TGGGAATTGCCGAATTATTTGAACCAAGAGAAGACGGCTCTAAGGATAACATTTACTTATCCAGATCATA
CGACGCATCCTCTCATTTCGAATCCATGACTGACGATGTTAAAGATATTTACTTCAGAGTAACAGGCCAC
CCATTAGTTCTAAAACAAAGACAAGAACAAGAAAAGCAGTAAATTCATACCTTTACGACTAAAGCAGCAA
TTGGAGGGTAAACTTATTTTTTCC
Try to locate the closest plant homologue of this sequence by searching for
- closest plant nucleotide relatives by NCBI blastn (database nr, orgn viridiplantae)
- closest previously annotated plant homologues by NCBI blastx (database nr, orgn viridiplantae)
3.3
Use the clean sequence from Task 1.1 to find the closest Arabidopsis thaliana ESTs
- by NCBI blastn
- by NCBI nucleotide megablast.
Keep a multi-FASTA file of the best matches from megablast for future use.
Then try to find best matches from the Arabidopsis thaliana TAIR10 database at the Phytozome server. and inspect the results