Topic 7: Playing with DNA: gene identification, gene building and PCR primer design

Algorithmic searching for coding sequences and intron/exon structures in chromosomal DNA. Use of EST alignments for confirmation of the splicing pattern. PCR primer design.


How do I find the coding sequences?


Splicing rules


How to do it


PCR primer design

Recommended sites:


Tasks

7.1.

Below you find a fragment of A. thaliana chromosome sequence.

>7_1_genomic
ATTACCATAATTTAATTTGAACTTAATTTTCTCTAGGAATGGTGATGATCCACTACCACTATCATTGATT
TCATTCCATATTCCTTTGACCGACTGAAATTACGTTGGAAATAGTATATTTTGATGAATAATTTATTTAC
TCGGAAAAAAGAGGTCAAGTTATTAATAGTAAGTACATATACATTATCAATTAAGAATTCAATTGAGTTT
TAAGGAAAATCCTATTAATTTGTTTGGTATTCGGTATTTGTTAGTTCTAAGGAATTGAATTTCCCGATTA
TACATCATTATAACGTTCTCAAGTTCCAAACTTGCAACCCACATTTTGTCGATATTCTCAAATGTGAATT
CATTCAATTTCCCATAGAAAACATAAATTTGCACTTAAAGTTAACAATTGAAATCGTATCTAAATGGGAA
TGTTTTTGGCTTTTAGTGTTAGACTTCCAAAGCGTCAAAAATATTTCTAGAAAGAGCACAAAAAATAAGC
AACGCCACTACTTTTGGACAAAGTCAACGATAACACACATCAACCGCACCAGCTCCATAAAAGTCCATCT
CACGAAAACGATTCTAGTCAAACTACCTAAAACACCCTTATATTTACATACAACCCAATCCCACTAACAA
GGGTATTTTCGTCAATCACAAAATTTATCACCGACCCGGGAAGAAGAAGAAGAACAGATCAACTAATTTC
TGCTTTCAACTCCACATTAAACCAAAACCTCCAAAAAGAATCATTTATTTAAATTATCTTCCCGTTTTAA
GTTCCTGAGATTTTTGGGAATTGTAAATTTGAAGAAAATTAAACAAAGACGTGTTTTCATTTTTTTTTTT
GTTTCCTTTATTGATCTCTCTCTATCTCTCTAAATGAGCTAAATCGTTAATGGCTGCCATGTTTAATCAT
CCATGGCCTAATTTAACCCTAATTTACTTCTTCTTCATCGTCGTTTTACCATTCCAATCACTTTCTCAAT
TTGATTCTCCTCAAAATATCGAAACTTTCTTCCCCATCTCTTCACTCTCCCCTGTTCCACCACCGCTTCT
TCCACCTTCGTCAAACCCATCTCCGCCGTCGAATAATTCATCATCTTCGGATAAAAAAACAATCACCAAA
GCTGTCCTTATAACAGCAGCAAGTACTTTACTTGTAGCTGGAGTTTTCTTCTTCTGCCTCCAAAGATGTA
TCATCGCACGGAGACGGAGAGACAGAGTTGGACCAGTCAGAGTCGAAAACACTTTACCTCCGTATCCTCC
TCCTCCGATGACGTCGGCGGCGGTGACTACGACTACTTTGGCTAGAGAAGGATTCACGAGGTTTGGTGGT
GTGAAAGGTTTGATTCTTGATGAGAATGGTCTTGATGTGTTGTATTGGAGAAAGCTACAGAGTCAGAGAG
AAAGAAGTGGGAGTTTCAGGAAACAGATCGTCACCGGAGAAGAAGAAGACGAGAAAGAAGTTATTTATTA
CAAGAACAAGAAGAAAACAGAGCCCGTTACAGAGATTCCTCTTCTTAGAGGAAGATCATCTACTTCTCAC
AGTGTTATCCATAACGAAGATCATCAGCCGCCACCGCAGGTGAAACAGAGTGAACCAACACCACCACCGC
CACCACCGTCAATTGCGGTGAAACAGAGTGCACCAACGCCATCGCCACCTCCTCCGATTAAGAAGGGTTC
TTCACCATCGCCACCGCCACCTCCACCGGTGAAAAAGGTTGGAGCTTTATCATCATCAGCTTCGAAACCA
CCACCTGCGCCGGTTAGAGGAGCAAGTGGAGGAGAGACTTCGAAACAAGTAAAGTTGAAGCCTTTACATT
GGGATAAAGTAAACCCTGATTCCGATCATTCAATGGTTTGGGACAAAATCGATCGTGGATCATTCAGGTA
TATATTTATTTCGAAAGTTAGGGCTTTTGCTTCAATCAATTGAAAAAACCCTAATTTGTTTTTGTTTCTT
CTCAGTTTCGATGGCGATTTAATGGAAGCTCTGTTTGGATACGTTGCCGTGGGGAAGAAATCACCAGAAC
AAGGCGATGAGAAAAACCCTAAATCAACGCAAATATTCATACTTGATCCGAGAAAGTCTCAAAACACAGC
GATTGTGCTCAAATCATTAGGTATGACACGTGAAGAGCTTGTTGAATCACTCATAGAAGGAAACGATTTC
GTGCCAGACACTCTTGAGAGGTTAGCTAGAATAGCTCCAACGAAAGAAGAACAATCAGCCATTCTTGAAT
TCGACGGTGACACGGCAAAGCTTGCTGATGCGGAGACGTTTCTGTTTCATCTTCTTAAATCCGTGCCAAC
CGCGTTTACGAGACTAAACGCGTTTCTCTTTAGGGCTAATTATTATCCAGAGATGGCTCATCATAGCAAA
TGTTTACAAACGTTGGATTTAGCTTGTAAAGAGCTGAGATCTCGTGGCTTGTTTGTGAAGCTTTTGGAGG
CAATACTTAAAGCTGGAAACAGAATGAACGCGGGTACCGCGAGAGGAAACGCTCAAGCGTTTAATCTAAC
CGCGCTTTTGAAGCTTTCGGATGTTAAAAGCGTTGATGGGAAGACTTCTTTGCTTAACTTTGTAGTGGAG
GAAGTTGTTAGATCGGAAGGAAAACGTTGTGTTATGAATAGAAGAAGCCATAGCTTAACACGAAGCGGTA
GTAGTAACTACAATGGTGGTAATAGTAGTCTTCAGGTTATGTCGAAAGAAGAGCAAGAGAAAGAGTACTT
GAAGCTTGGTTTACCAGTTGTTGGTGGATTGAGCTCTGAGTTTTCAAACGTGAAGAAAGCTGCTTGTGTG
GACTATGAAACGGTTGTTGCAACTTGTTCTGCTCTTGCGGTTAGAGCGAAAGATGCGAAAACGGTGATTG
GAGAATGTGAAGATGGAGAAGGAGGGAGGTTTGTGAAAACGATGATGACGTTTCTTGATTCGGTAGAGGA
AGAGGTGAAAATAGCGAAAGGTGAAGAGAGGAAAGTGATGGAGCTTGTGAAACGTACAACGGATTATTAT
CAAGCAGGAGCTGTTACAAAGGGGAAGAATCCACTTCATTTGTTTGTTATCGTTAGAGATTTTCTTGCCA
TGGTTGATAAAGTTTGCTTAGATATTATGAGAAATATGCAGAGGAGGAAGGTTGGTAGTCCGATATCGCC
TTCTTCGCAGCGGAATGCGGTGAAATTCCCGGTTTTGCCTCCGAATTTCATGTCGGACAGAGCTTGGAGT
GATTCTGGTGGGTCGGATTCTGATATGTGAGAGTCAAGATTTGTTATATGTAAATACTAAATAGTAGAAG
CATTTTGGGTATTGATTAGCATTGAAAGATGTTGAATTGTTTATAGATTTATCAGTCCAAAGCATTGGAC
TTGAGTATAATTTGTTCCTTGTATAAATAAACAATTTTGCTTTAAGACCTTTCCATGTTTATGAACATGT
CTTCTTTAACTTCACATAGACCTTTTGTTTACGTAAGAACTAATAATACTAAATTGTTTGATAATTCTAA
ATGTGAAAGTGAACCACTATATAGTGTGAACTTGGCTTTATTGAATTCTTTTTAAAAAAATTTCTCCAGA
GCTTTAGATGTAGGAGTTAATATTTTCACCTAACATAGCCTCTTTTTTATGTTTCTCTATCAACTAACAC
TAAATTTGTGGATGAAGACTAAATTAACATAAGTTTATCTATTAACTAACAACCTACCAGTTTGATGCTT
GTAAATATGAAACTTCAACGTTATAAAGACTATATGGTGTGAACTTTTTATCCATCTTTATTGACTTTTA
AAATTTTCTTAATTTGAGTAAACAAAAGCAGAAGCTTTTTAAAGGATGCAGGAGTTGATTTTTGTATATG
AACAAAACATATACTTCTCCCTTAGACGAATTTGGAGCTATCATTCTTGGTTTCAAACTTTTTAATAATT
TGAGCTTTAAAGCAAAATGGCAACTTTATATTGATCACTAGTCCACAACACTTTCTCTGCCTTTTCCTCA
ATAGCAACGCGTAGTCAAGAAGAAGAACGTGTTTAACATGGACCAATCTTGATTAAGATAATAGTATGAT
CAAATGCTTATATAAACACACTAAAAAGGAATCAAATTTAA

Use GenScan to identify open reading frames within this sequence and keep the predicted CDS (DNA!)  as a FASTA file.


7.2.

Use at least one (but preferentially more) of the other prediction servers demonstrated to obtain an independent splicing prediction for the same sequence. Keep the prediction again as a (DNA) FASTA file.

HINT: to locate splice sites defined by co-ordinates (such as produced by NetGene2 or GeneBuilder), use SMS GroupDNA to number your sequence.


7.3.

Align the two predictions to the genomic sequence using Macaw to compare the results. Then use the experimentally established mRNA sequence below to validate your prediction. Include the best matching Arabidopsis ESTs from task 3.3., which should come from the same gene,  into your alignment (you may have to reverse-complement some of these sequences first). Produce an experimentally supported ORF prediction and predicted protein sequence.

>mRNA gi_23270371 Arabidopsis thaliana At1g70140 mRNA sequence
AACTCCACATTAAACCAAAACCTCCAAAAAGAATCATTTATTTAAATTATCTTCCCGTTTTAAGTTCCTG
AGATTTTTGGGAATTGTAAATTTGAAGAAAATTAAACAAAGACGTGTTTTCATTTTTTTTTTTGTTTCCT
TTATTGATCTCTCTCTATCTCTCTAAATGAGCTAAATCGTTAATGGCTGCCATGTTTAATCATCCATGGC
CTAATTTAACCCTAATTTACTTCTTCTTCATCGTCGTTTTACCATTCCAATCACTTTCTCAATTTGATTC
TCCTCAAAATATCGAAACTTTCTTCCCCATCTCTTCACTCTCCCCTGTTCCACCACCGCTTCTTCCACCT
TCGTCAAACCCATCTCCGCCGTCGAATAATTCATCATCTTCGGATAAAAAAACAATCACCAAAGCTGTCC
TTATAACAGCAGCAAGTACTTTACTTGTAGCTGGAGTTTTCTTCTTCTGCCTCCAAAGATGTATCATCGC
ACGGAGACGGAGAGACAGAGTTGGACCAGTCAGAGTCGAAAACACTTTACCTCCGTATCCTCCTCCTCCG
ATGACGTCGGCGGCGGTGACTACGACTACTTTGGCTAGAGAAGGATTCACGAGGTTTGGTGGTGTGAAAG
GTTTGATTCTTGATGAGAATGGTCTTGATGTGTTGTATTGGAGAAAGCTACAGAGTCAGAGAGAAAGAAG
TGGGAGTTTCAGGAAACAGATCGTCACCGGAGAAGAAGAAGACGAGAAAGAAGTTATTTATTACAAGAAC
AAGAAGAAAACAGAGCCCGTTACAGAGATTCCTCTTCTTAGAGGAAGATCATCTACTTCTCACAGTGTTA
TCCATAACGAAGATCATCAGCCGCCACCGCAGGTGAAACAGAGTGAACCAACACCACCACCGCCACCACC
GTCAATTGCGGTGAAACAGAGTGCACCAACGCCATCGCCACCTCCTCCGATTAAGAAGGGTTCTTCACCA
TCGCCACCGCCACCTCCACCGGTGAAAAAGGTTGGAGCTTTATCATCATCAGCTTCGAAACCACCACCTG
CGCCGGTTAGAGGAGCAAGTGGAGGAGAGACTTCGAAACAAGTAAAGTTGAAGCCTTTACATTGGGATAA
AGTAAACCCTGATTCCGATCATTCAATGGTTTGGGACAAAATCGATCGTGGATCATTCAGTTTCGATGGC
GATTTAATGGAAGCTCTGTTTGGATACGTTGCCGTGGGGAAGAAATCACCAGAACAAGGCGATGAGAAAA
ACCCTAAATCAACGCAAATATTCATACTTGATCCGAGAAAGTCTCAAAACACAGCGATTGTGCTCAAATC
ATTAGGTATGACACGTGAAGAGCTTGTTGAATCACTCATAGAAGGAAACGATTTCGTGCCAGACACTCTT
GAGAGGTTAGCTAGAATAGCTCCAACGAAAGAAGAACAATCAGCCATTCTTGAATTCGACGGTGACACGG
CAAAGCTTGCTGATGCGGAGACGTTTCTGTTTCATCTTCTTAAATCCGTGCCAACCGCGTTTACGAGACT
AAACGCGTTTCTCTTTAGGGCTAATTATTATCCAGAGATGGCTCATCATAGCAAATGTTTACAAACGTTG
GATTTAGCTTGTAAAGAGCTGAGATCTCGTGGCTTGTTTGTGAAGCTTTTGGAGGCAATACTTAAAGCTG
GAAACAGAATGAACGCGGGTACCGCGAGAGGAAACGCTCAAGCGTTTAATCTAACCGCGCTTTTGAAGCT
TTCGGATGTTAAAAGCGTTGATGGGAAGACTTCTTTGCTTAACTTTGTAGTGGAGGAAGTTGTTAGATCG
GAAGGAAAACGTTGTGTTATGAATAGAAGAAGCCATAGCTTAACACGAAGCGGTAGTAGTAACTACAATG
GTGGTAATAGTAGTCTTCAGGTTATGTCGAAAGAAGAGCAAGAGAAAGAGTACTTGAAGCTTGGTTTACC
AGTTGTTGGTGGATTGAGCTCTGAGTTTTCAAACGTGAAGAAAGCTGCTTGTGTGGACTATGAAACGGTT
GTTGCAACTTGTTCTGCTCTTGCGGTTAGAGCGAAAGATGCGAAAACGGTGATTGGAGAATGTGAAGATG
GAGAAGGAGGGAGGTTTGTGAAAACGATGATGACGTTTCTTGATTCGGTAGAGGAAGAGGTGAAAATAGC
GAAAAAAAAAAAAAAAA 

 


7.4

Design PCR primers for detection of a 400 to 700 bp long diagnostic fragment of the cDNA assembled in Task 7.3 using NCBI Primer-Blast or Primer3. Try to select primers that would distinguish between a product amplified from the cDNA and contaminating genomic DNA in a RT-PCR experiment (hint: the Macaw alignment comes handy for this). Present a graphical map of the locus, showing positions of the primers and the cDNA exons mapped to the genomic locus.