Topic 4: Sequence similarity searches II

Special features and implementations of BLAST. A closer look at the pairwise comparison/pairwise alignment problem.


There are quite a few things BLAST can do besides simple searching.

And some things BLAST cannot do ...


Tasks

4.1

Pick one of the longer and visibly repetitive extensin sequences collected in Task 2.2 and use it to perform the following searches on the non-redundant (nr) protein database at the NCBI BLAST site

Compare the time of computation between "normal" and "quick" BLASTP and the results of all three methods..


4.2

Pick one of the Arabidopsis RabGDI homologues found in Task 2.3  and use it to perform a NCBI blastp search on the non-redundant (nr) database, keeping all the parameters default.

4.3

Compare the following sequences using the BLAST 2 sequences option on NCBI. Examine the effects of Note how did the alignment and the E-values change with each modification of the settings.

Sequences:

>NP_564369.1 zinc finger (C2H2 type) family protein [Arabidopsis thaliana]
MGKKKKRATEKVWCYYCDREFDDEKILVQHQKAKHFKCHVCHKKLSTASGMVIHVLQVHKENVTKVPNAK
DGRDSTDIEIYGMQGIPPHVLTAHYGEEEDEPPAKVAKVEIPSAPLGGVVPRPYGMVYPPQQVPGAVPAR
PMYYPGPPMRHPAPVWQMPPPRPQQWYPQNPALSVPPAAHLGYRPQPLFPVQNMGMTPTPTSAPAIQPSP
VTGVTPPGIPTSSPAMPVPQPLFPVVNNSIPSQAPPFSAPLPVGGAQQPSHADALGSADAYPPNNSIPGG
TNAHSYASGPNTSGPSIGPPPVIANKAPSNQPNEVYLVWDDEAMSMEERRMSLPKYKVHDETSQMNSINA
AIDRRISESRLAGRMAF
>NP_001080324.1 BUB3-interacting and GLEBS motif-containing protein ZNF207 [Xenopus laevis]
MGRKKKKQLKPWCWYCNRDFDDEKILIQHQKAKHFKCHICHKKLYTGPGLAIHCMQVHKETIDAVPNAIP
GRTDIELEIYGMEGIPEKDMEERRRILEQKTQVDGQKKKTNQDDSDYDDDDDTAPSTSFQQMQTQQAFMP
TMGQPGIPGLPGAPGMPPGITSLMPAVPPLISGIPHVMAGMHPHGMMSMGGMMHPHRPGIPPMMAGLPPG
VPPPGLRPGIPPVTQAQPALSQAVVSRLPVPSTSAPALQSVPKPLFPSAGQAQAHISGPVGTDFKPLNNI
PATTAEHPKPTFPAYTQSTMSTTSTTNSTASKPSTSITSKPATLTTTSATSKLVHPDEDISLEEKRAQLP
KYQRNLPRPGQAPISNMGSTAVGPLGAMMAPRPGLPPQQHGMRHPLPPHGQYGAPLQGMAGYHPGTMPPF
GQGPPMVPPFQGGPPRPLMGIRPPVMSQGGRY


4.4

Dotplots, such as those seen in the previous lesson, are a good way to get a quick impression of sequence homology, especially in nucleotide sequences.

Here is a DNA sequence that may contain repeats. Download it to your disk and generate a reverse-complemented sequence using SMS Reverse Complement. Save the reverse-complemented sequence in a Fasta file.

Use the EMBOSS Dotmatcher online tool to produce a dotplot (both for the direct and complemetary strand). Examine the effects of altering window size and threshold values (hint: first try default vs. threshold = 50).


4.5

Use PSI_BLAST to identify eukaryotic homologs of Staphylococcus aureus exfoliative toxin A, using the WP_001065781.1 sequence as a query. Run at least one, better two iterations and document the results by the output of the last iteration done. What eukaryotic protein family is related to the toxin?