Topic 6: Sequence motif searches and protein domain structure analysis II
Multiple alignments:
- by hand
- using standalone editor such as Macaw - the Multiple Alignment Construction and Analysis Workbench or BioEdit
- by lots of automated or semiautomated tools - see Lesson 8
Tasks
6.1
Construction of a leucine-rich repeat (LRR) profile. Take the file of LRR repeats produced in Task 5.3 and select all sequences of equal length. Display them using the SMS Color Align Conservation utility (lower the identity/similarity threshold to 60 % to see something interesting). Then go back to your original repeats file (i.e. all the repeats), open it in BioEdit and try to introduce gaps into shorter sequences to adjust their length while keeping the pattern.
Deduce a consensus pattern and write it in the PROSITE format. Then compare this pattern to the "official" LRR profile kept at the SMART site.
6.2
Below is a collection of a few GDIs and related sequences from yeast, plants and metazoa. Produce an alignment using MACAW (keep the file for future use). Try to define sequence motifs characteristic for all the sequences, as well as significant differences between the GDIs and the Rab escort proteins.>ScGDI1 gi_6320983_ref_NP_011062.1 Regulates vesicle traffic in secretory pathway... ; Gdi1p [Saccharomyces cerevisiae] MDQETIDTDYDVIVLGTGITECILSGLLSVDGKKVLHIDKQDHYGGEAASVTLSQLYEKFKQNPISKEER ESKFGKDRDWNVDLIPKFLMANGELTNILIHTDVTRYVDFKQVSGSYVFKQGKIYKVPANEIEAISSPLM GIFEKRRMKKFLEWISSYKEDDLSTHQGLDLDKNTMDEVYYKFGLGNSTKEFIGHAMALWTNDDYLQQPA RPSFERILLYCQSVARYGKSPYLYPMYGLGELPQGFARLSAIYGGTYMLDTPIDEVLYKKDTGKFEGVKT KLGTFKAPLVIADPTYFPEKCKSTGQRVIRAICILNHPVPNTSNADSLQIIIPQSQLGRKSDIYVAIVSD AHNVCSKGHYLAIISTIIETDKPHIELEPAFKLLGPIEEKFMGIAELFEPREDGSKDNIYLSRSYDASSH FESMTDDVKDIYFRVTGHPLVLKQRQEQEKQ >ScMRS6 gi_6324946_ref_NP_015015.1 protein of the TCD/MRS6 family... (Rab escort protein); Mrs6p [Saccharomyces cerevisiae] MLSPERRPSMAERRPSFFSFTQNPSPLVVPHLAGIEDPLPATTPDKVDVLIAGTGMVESVLAAALAWQGS NVLHIDKNDYYGDTSATLTVDQIKRWVNEVNEGSVSCYKNAKLYVSTLIGSGKYSSRDFGIDLSPKILFA KSDLLSILIKSRVHQYLEFQSLSNFHTYENDCFEKLTNTKQEIFTDQNLPLMTKRNLMKFIKFVLNWEAQ TEIWQPYAERTMSDFLGEKFKLEKPQVFELIFSIGLCYDLNVKVPEALQRIRRYLTSFDVYGPFPALCSK YGGPGELSQGFCRSAAVGGATYKLNEKLVSFNPTTKVATFQDGSKVEVSEKVIISPTQAPKDSKHVPQQQ YQVHRLTCIVENPCTEWFNEGESAAMVVFPPGSLKSGNKEVVQAFILGAGSEICPEGTIVWYLSTTEQGP RAEMDIDAALEAMEMALLRESSSGLENDEEIVQLTGNGHTIVNSVKLGQSFKEYVPRERLQFLFKLYYTQ YTSTPPFGVVNSSFFDVNQDLEKKYIPGASDNGVIYTTMPSAEISYDEVVTAAKVLYEKIVGSDDDFFDL DFEDEDEIQASGVANAEQFENAIDDDDDVNMEGSGEFVGEMEI >DmGDI gi_480358_pir_S36746 GDP dissociation inhibitor - fruit fly (Drosophila melanogaster) MDEEYDVDVLGTGLKECILSGIMLSVSGKKVLHIDRNKYYGGESASITPLEELFQRYRTGAARPRFGRGR DWNVDLIPKFLMANGQLVKLLIHTGVTRYLEFKSIEGSYVYKGGKIAKVPVDQKEALASDLMGMFEKRRF RNFLIYVQDFREDDPKTWKDFDPTKANMQGLYDKFGLDKNTQDFTGHALALFRDDEYLNEPAVNTIRRIK LYSDSLARYGKSPYLYPMYGLGELPQGFARLSAIYGGTYMLDKPIDEIVLGEGGKVVGVRSGEEVAKCKQ VYCDPSYVPRRLRKRGKVIRCICIQDHPGASTKDGLSTQIIIPQKQVGRKSDIYVSLVSSTHQVAAKGWF VGMVSTTVETENPEVEIKPGLDLLEPIAQKFVTISDYLEPIDDGSESQIFISESYDATTHFETTCWDVLN IFKRGTGETFDFSKDQGTSWVTRSSKRE >DmRepP1 gi_17137652_ref_NP_477420.1 Rep-P1; rab escort protein [Drosophila melanogaster] MLDDLPEQFDLVVIGTGFTESCIAAAGSRIGKSVLHLDSNEYYGDVWSSFSMDALCARLDQEVEPHSALR NARYTWHSMEKESETDAQSWNRDSVLAKSRRFSLDLCPRILYAAGELVQLLIKSNICRYAEFRAVDHVCM RHNGEIVSVPCSRSDVFNTKTLTIVEKRLLMKFLTACNDYGEDKCNEDSLEFRGRTFLEYLQAQRVTEKI SSCVMQAIAMCGPSTSFEEGMQRTQRFLGSLGRYGNTPFLFPMYGCGELPQCFCRLCAVYGGIYCLKRAV DDIALDSNSNEFLLSSAGKTLRAKNVVSAPGYTPVSKGIELKPHISRGLFISSSPLGNEELNKGGGGVNL LRLLDNEGGREAFLIQLSHYTGACPEGLYIFHLTTPALSEDPASDLAIFTSQLFDQSDAQIIFSSYFTIA AQSSKSPAAEHIYYTDPPTYELDYDAAIANARDIFGKMFPDADFLPRAPDPEEIVVDGEDPSALNEHTLP EDLRAQLHDMQQATQEMDIQE >HsGDI1 gi_4503971_ref_NP_001484.1 GDP dissociation inhibitor 1; mental retardation, X-linked... [Homo sapiens] MDEEYDVIVLGTGLTECILSGIMSVNGKKVLHMDRNPYYGGESSSITPLEELYKRFQLLEGPPESMGRGR DWNVDLIPKFLMANGQLVKMLLYTEVTRYLDFKVVEGSFVYKGGKIYKVPSTETEALASNLMGMFEKRRF RKFLVFVANFDENDPKTFEGVDPQTTSMRDVYRKFDLGQDVIDFTGHALALYRTDDYLDQPCLETVNRIK LYSESLARYGKSPYLYPLYGLGELPQGFARLSAIYGGTYMLNKPVDDIIMENGKVVGVKSEGEVARCKQL ICDPSYIPDRVRKAGQVIRIICILSHPIKNTNDANSCQIIIPQNQVNRKSDIYVCMISYAHNVAAQGKYI AIASTTVETTDPEKEVEPALELLEPIDQKFVAISDLYEPIDDGCESQVFCSCSYDATTHFETTCNDIKDI YKRMAGTAFDFENMKRKQNDVFGEAEQ >HsGDI2 gi_6598323_ref_NP_001485.2 GDP dissociation inhibitor 2; rab GDP-dissociation inhibitor, beta [Homo sapiens] MNEEYDVIVLGTGLTECILSGIMSVNGKKVLHMDRNPYYGGESASITPLEDLYKRFKIPGSPPESMGRGR DWNVDLIPKFLMANGQLVKMLLYTEVTRYLDFKVTEGSFVYKGGKIYKVPSTEAEALASSLMGLFEKRRF RKFLVYVANFDEKDPRTFEGIDPKKTTMRDVYKKFDLGQDVIDFTGHALALYRTDDYLDQPCYETINRIK LYSESLARYGKSPYLYPLYGLGELPQGFARLSAIYGGTYMLNKPIEEIIVQNGKVIGVKSEGEIARCKQL ICDPSYVKDRVEKVGQVIRVICILSHPIKNTNDANSCQIIIPQNQVNRKSDIYVCMISFAHNVAAQGKYI AIVSTTVETKEPEKEIRPALELLEPIEQKFVSISDLLVPKDLGTESQIFISRTYDATTHFETTCDDIKNI YKRMTGSEFDFEEMKRKKNDIYGED >HsREP2 gi_4502811_ref_NP_001812.1 choroideremia-like Rab escort protein 2; REP-2 ... [Homo sapiens] MADNLPTEFDVVIIGTGLPESILAAACSRSGQRVLHIDSRSYYGGNWASFSFSGLLSWLKEYQQNNDIGE ESTVVWQDLIHETEEAITLRKKDETIQHTEAFPYASQDMEDNVEEIGALQKNPSLGVSNTFTEVLDSALP EESQLSYFNSDEMPAKHTQKSDTEISLEVTDVEESVEKEKYCGDKTCMHTVSDKDGDKDESKSTVEDKAD EPIRNRITYSQIVKEGRRFNIDLVSKLLYSQGLLIDLLIKSDVSRYVEFKNVTRILAFREGKVEQVPCSR ADVFNSKELTMVEKRMLMKFLTFCLEYEQHPDEYQAFRQCSFSEYLKTKKLTPNLQHFVLHSIAMTSESS CTTIDGLNATKNFLQCLGRFGNTPFLFPLYGQGEIPQGFCRMCAVFGGIYCLRHKVQCFVVDKESGRCKA IIDHFGQRINAKYFIVEDSYLSEETCSNVQYKQISRAVLITDQSILKTDLDQQTSILIVPPAEPGACAVR VTELCSSTMTCMKDTYLVHLTCSSSKTAREDLESVVKKLFTPYTETEINEEELTKPRLLWALYFNMRDSS GISRSSYNGLPSNVYVCSGPDCGLGNEHAVKQAETLFQEIFPTEEFCPPPPNPEDIIFDGDDKQPEAPGT NNVVMAKLESSEESKNLESPEKHLQN >CeGDI1 gi_17540906_ref_NP_502788.1 GDI-1 GDP dissociation inhibitor [Caenorhabditis elegans] MDEEYDAIVLGTGLKECIISGMLSVSGKKVLHIDRNNYYGGESASLTPLEQLYEKFHGPQAKPQQEMGRG RDWNVDLIPKFLMANGPLVKLLIHTGVTRYLEFKSIEASFVVKGGKIYKVPADEMEALATSLMGMFEKRR FKKFLVWVQQFDENKEDTWQGLDPHNSTMQQVYEKFGLDENTADFTGHALALYRDDEHKNQPYAPAVEKI RLYSDSLARYGKSPYLYPLYGLGELPQGFARLSAIYGGTYMLDKPVDEIVMENGKAIGVKCGDEIVRGKQ IYCDPSYAKDRVKKTGQVVRAICLLNHPIPNTNDAQSCQIIIPQKQVGRHYDIYISCCSNTNMVTPKGWY LAMVSTTVETANPEAEVLPGLQLLGAIAEKFIQISDVYEPSDLGSESQIFISQSYDATTHFETTCKDVLN MFERGTTKEFDFTNITHLSLNDQE >CeY67D2 gi_17556376_ref_NP_497423.1 Y67D2.1.p [Caenorhabditis elegans] MDEKLPESVDVVVLGTGLPEAILASACARAGLSVLHLDRNEYYGGDWSSFTMSMVHEVTENQVKKLDSSE ISKLSELLTENEQLIELGNREIVENIEMTWIPRGKDEEKPMKTQLEEASQMRRFSIDLVPKILLSKGAMV QTLCDSQVSHYAEFKLVNRQLCPTETPEAGITLNPVPCSKGEIFQSNALSILEKRALMKFITFCTQWSTK DTEEGRKLLAEHADRPFSEFLEQMGVGKTLQSFIINTIGILQQRPTAMTGMLASCQFMDSVGHFGPSPFL FPLYGCGELSQCFCRLAAVFGSLYCLGRPVQAIVKKDGKITAVIANGDRVNCRYIVMSPRFVPETVPASS TLKIERIVYATDKSIKEAEKEQLTLLNLASLRPDAAVSRLVEVGFEACTAPKGHFLVHATGTQEGETSVK TIAEKIFEKNEVEPYWKMSFTANSMKFDTAGAENVVVAPPVDANLHYASVVEECRQLFCTTWPELDFLPR AMKKEEEEEEEPETEEIAEN >OsGDI2 AAB69871.1 GDP dissociation inhibitor protein OsGDI2 [Oryza sativa] MDEEYDLIVLGTGLKECILSGLLSVDGLKVLHMDRNDYYGGDSTSLNLNQLWKRFRGEDKPPAHLGSSKD YNVDMVPKFMMANGTLVRTLIHTDVTKYLSFKAVDGSYVFSKGKIHKVPATDMEALKSPLMGLFEKRRAR NFFIYVQDYNEADPKTHQGLDLTTMTTRELIAKYGLSDDTVDFIGHALALHKDDRYLNEPAIDTVKRMKV YAESLAPFQGGSPSIYPLYGLGELPQGHARLSAVYGGTYILNKPDCKVEFDMEGKVCGVTSEGETAKCKK VVCDPSYLPNKVRKDRKVARAIAIMSHPIASTNDSHSVQIILPQKQLGRKSDMYVFCCSYTHNVAPKGKF IAFVSTEAETDNPQSELKPGIDLLGQVDELFFDIYDRYEPVNEPSLDNCFVSTSYDATTHFETTVTDVLN MYTLITGKAVDLSVDLSAASAAEEY >OsGDI1 AAB69870.1 GDP dissociation inhibitor protein OsGDI1 [Oryza sativa] MDEEYDVIVLGTGLKECILSGLLSVDGLKVLHMDRNDYYGGDSTSLNLNQLWKRFRGEDKPPAHLGASRD YNVDMVPKFMMANGTLVRTLIHTDVTKYLSFKAVDGSYVFSKGKIHKVPATDMEALKSPLMGLFEKRRAR NFFIYVQDYDEADPKTHQGLDLTTMTTRELIAKYGLSDDTVDFIGHALALHRDDRYLNEPAIDTVKRMKL YAESLPRFQGGSPSIYPLYGLGELPQGFARLRAVYGGTYMLNKPDCKVEFDMEGKVCGVTSEGESAKCKK VVCDPSYLPNKVRKIGKVARAIAIMSHPIANTNDSHSVQIILPQKQLGRKSDMYVFGCSYTHNVAPKGKF IAFVSTEAETDHPESELKPGIDLLGQVDELFFDIYDRYEPVNEPSLDNCFVSTSYDATTHFETTVTDVLN MYTLITGKTVDLSVDLSAASAAEKY >OsREP NP_001042697.1 Os01g0269100 [Oryza sativa Japonica Group] MADAPATGGGFPAQDYPTIDPTSFDVVLCGTGLPESVLAAACAAAGKTVLHVDPNPFYGSLFSSLPLPSL PSFLSPSPSDDPAPSPSPSSAAAVDLRRRSPYSEVETSGAVPEPSRRFTADLVGPRLLYCADEAVDLLLR SGGSHHVEFKSVEGGTLLYWDGDLYPVPDSRQAIFKDTTLQLREKNLLFRFFKLVQAHIAASAAGAAAAG EGEASGRLPDEDLDLPFVEFLKRQNLSPKMRAVVLYAIAMADYDQDGVESCERLLTTREGVKTIALYSSS IGRFANAEGAFIYPMYGHGELPQAFCRCAAVKGIANASHSTSC
Searching databases of known patterns - a few entry points
Tasks
6.3
Take at least one of the motifs produced in tasks 6.1 or 6.2 and use it to search SwissProt/Uniprot using one of the above provided tools. Restrict taxonomically as close to Arabidopsis thaliana as possible. Watch out for pattern formats!