Topic 5: Sequence motif searches and protein domain structure analysis I

Extraction of sequence characteristics and searching for known domains - SMART, PROSITE and similar resources.


What can we tell from a sequence (almost) on a first glance?

DNA:

Protein:


Tasks

5.1.

Compute the molecular mass and pI of a selected extensin from Task 2.2 using SMS or the Expasy "Compute pI/mw" tool. .


5.2

Analyse the sequence below for secretory and transmembrane localisation signals using
  1. SignalP 6.0 (gives a comparison of two methods with a nice graphical input)
  2. TMHMM
>gi|12643323|sp|Q9SYQ8|CLV1_ARATH Receptor protein kinase CLAVATA1 precursor
MAMRLLKTHLLFLHLYLFFSPCFAYTDMEVLLNLKSSMIGPKGHGLHDWIHSSSPDAHCSFSGVSCDDDA
RVISLNVSFTPLFGTISPEIGMLTHLVNLTLAANNFTGELPLEMKSLTSLKVLNISNNGNLTGTFPGEIL
KAMVDLEVLDTYNNNFNGKLPPEMSELKKLKYLSFGGNFFSGEIPESYGDIQSLEYLGLNGAGLSGKSPA
FLSRLKNLREMYIGYYNSYTGGVPREFGGLTKLEILDMASCTLTGEIPTSLSNLKHLHTLFLHINNLTGH
IPPELSGLVSLKSLDLSINQLTGEIPQSFINLGNITLINLFRNNLYGQIPEAIGELPKLEVFEVWENNFT
LQLPANLGRNGNLIKLDVSDNHLTGLIPKDLCRGEKLEMLILSNNFFFGPIPEELGKCKSLTKIRIVKNL
LNGTVPAGLFNLPLVTIIELTDNFFSGELPVTMSGDVLDQIYLSNNWFSGEIPPAIGNFPNLQTLFLDRN
RFRGNIPREIFELKHLSRINTSANNITGGIPDSISRCSTLISVDLSRNRINGEIPKGINNVKNLGTLNIS
GNQLTGSIPTGIGNMTSLTTLDLSFNDLSGRVPLGGQFLVFNETSFAGNTYLCLPHRVSCPTRPGQTSDH
NHTALFSPSRIVITVIAAITGLILISVAIRQMNKKKNQKSLAWKLTAFQKLDFKSEDVLECLKEENIIGK
GGAGIVYRGSMPNNVDVAIKRLVGRGTGRSDHGFTAEIQTLGRIRHRHIVRLLGYVANKDTNLLLYEYMP
NGSLGELLHGSKGGHLQWETRHRVAVEAAKGLCYLHHDCSPLILHRDVKSNNILLDSDFEAHVADFGLAK
FLVDGAASECMSSIAGSYGYIAPEYAYTLKVDEKSDVYSFGVVLLELIAGKKPVGEFGEGVDIVRWVRNT
EEEITQPSDAAIVVAIVDPRLTGYPLTSVIHVFKIAMMCVEEEAAARPTMREVVHMLTNPPKSVANLIAF


5.3.

Analyse the sequence above using the Expasy Scan PROSITE search and by SMART and examine the results.
Then take the region of the protein sequence containing LRRs (leucine-rich repeats) and extract sequences of at least ten randomly selected LRRs (leucine-rich repeats) identified by SMART, Keep them in a "multi-FASTA" file.. Then perform a search for repeats using RADAR tool at the EBI and keep the results (HTML file) for future use..

Prediction of secondary structure elements from protein sequence. Methods, pros and cons.


Protein secondary structure prediction

Online tools:


Golden rules:        Avoid traditional Chou and Fasman algorithm.

Note the accuracy of the algorithms on standard benchmarks and "real life situations".
Use methods based on multiple alignments. Check carefully the alignment - avoid redundancies.
Use several independent methods, of similar accuracy.
In case of disagreement, trust PHD (PredictProtein), Jnet (Jpred) and PsiPred.


Presentation by M. Potocký (from 2004 TIPNET course, addresses may be outdated)


Tasks:

5.4

>gi|22331122|ref|NP_188302.2| phospholipase D zeta1 / PLDzeta1 (PLDP1) [Arabidopsis thaliana]
MASEQLMSPASGGGRYFQMQPEQFPSMVSSLFSFAPAPTQETNRIFEELPKAVIVSVSRPDAGDISPVLL
SYTIECQYKQFKWQLVKKASQVFYLHFALKKRAFIEEIHEKQEQVKEWLQNLGIGDHPPVVQDEDADEVP
LHQDESAKNRDVPSSAALPVIRPLGRQQSISVRGKHAMQEYLNHFLGNLDIVNSREVCRFLEVSMLSFSP
EYGPKLKEDYIMVKHLPKFSKSDDDSNRCCGCCWFCCCNDNWQKVWGVLKPGFLALLEDPFDAKLLDIIV
FDVLPVSNGNDGVDISLAVELKDHNPLRHAFKVTSGNRSIRIRAKNSAKVKDWVASINDAALRPPEGWCH
PHRFGSYAPPRGLTDDGSQAQWFVDGGAAFAAIAAAIENAKSEIFICGWWVCPELYLRRPFDPHTSSRLD
NLLENKAKQGVQIYILIYKEVALALKINSVYSKRRLLGIHENVRVLRYPDHFSSGVYLWSHHEKLVIVDN
QVCFIGGLDLCFGRYDTFEHKVGDNPSVTWPGKDYYNPRESEPNTWEDALKDELERKKHPRMPWHDVHCA
LWGPPCRDVARHFVQRWNYAKRNKAPYEDSIPLLMPQHHMVIPHYMGRQEESDIESKKEEDSIRGIRRDD
SFSSRSSLQDIPLLLPHEPVDQDGSSGGHKENGTNNRNGPFSFRKSKIEPVDGDTPMRGFVDDRNGLDLP
VAKRGSNAIDSEWWETQDHDYQVGSPDETGQVGPRTSCRCQIIRSVSQWSAGTSQVEESIHSAYRSLIDK
AEHFIYIENQFFISGLSGDDTVKNRVLEALYKRILRAHNEKKIFRVVVVIPLLPGFQGGIDDSGAASVRA
IMHWQYRTIYRGHNSILTNLYNTIGVKAHDYISFYGLRAYGKLSEDGPVATSQVYVHSKIMIVDDRAALI
GSANINDRSLLGSRDSEIGVLIEDTELVDSRMAGKPWKAGKFSSSLRLSLWSEHLGLRTGEIDQIIDPVS
DSTYKEIWMATAKTNTMIYQDVFSCVPNDLIHSRMAFRQSLSYWKEKLGHTTIDLGIAPEKLESYHNGDI
KRSDPMDRLKAIKGHLVSFPLDFMCKEDLRPVFNESEYYASPQVFH


Prediction of RNA secondary structure


Tasks:

5.5

Predict 2D structure of  hop viroid RNA

>gi|13872751|emb|AJ290412.1|HLA290412 Hop latent viroid sequence of 'thermomutant' T229
CTGGGGAATACACTACGTGACTTACCTGTATGATGGCAAGGGTTCGAAGAGGGATCCCCGGGGAAACCTA
CTCGAGCGAGGCGGAGATCGAGCGCCAGTTCGTGCGCGGCGACCTGAAGTTGCTTCGGCTTCTTCTTGTT
CGCGTCCTGCGTGGAACGGCTCCTTCTCCACACCAGCCGGAGTTGGAAACTACCCGGTGGATACAACTCT
TGAGCGCCGAGCTTTACCTGCAGAAGTTCACATAAAAAGTGCCCAT