Topic 10: Phylogenetic trees
Construction and critical interpretation of trees based on a pre-existing alignment.
Problem of meaningful data selection. Construction of trees using ClustalX and PHYLIP.Considerations
- A tree is only as good as the alignment you started with
- Gaps are no data - throw them out!
- A tree without statistics is worthless: do bootstrapping (500 cycles is better than 100) and plot the bootstrap values onto branches of a tree based on original data.
- Always present your tree including the method (NJ is OK for first look at sequence data)
- There is a difference between an rooted tree and an unrooted one.
Methods
Read documentation of the programs described below for theory. The Treecon manual (available from the entry page) is a good starter.- Slow, step by step approach: Phylip
- Making and plotting a tree
- Prepare your alignment in the Phylip format and use it as infile.
- Calculate sequence to sequence distances using PROTDIST. You can choose your substitution matrix (e.g. PAM).
- Rename the old infile (e.g. to infile0), rename the outfile to infile and feed it into the program NEIGHBOR, which calculates tree topology. Here you can decide whether your tree should be rooted. There will be TWO output files - outfile and outtree.
- Rename outfile to get it out of the way (I suggest to oitfile1). Use outtree to plot your tree in a graphical format. To do this, rename outtree to intree an then:
- If the tree was unrooted, use DRAWTREE.
- If it was rooted, use DRAWGRAM.
- Save the output (plotfile) in an appropriate graphic format.Rename it to something reasonable. Rename the outtree to get it out of the way (or to keep it)-
- Bootstrapping the tree
- Return to the original infile and prepare a bootstrapped data set: SEQBOOT (100 x bootstrap only in the course). Rename the original infile to get it out of the way. Rename outfile to infile.
- Run the programs PROTDIST (this is the most time-consuming step). and NEIGHBOR as above but with the multiple data sets (100 sets) option in each case.
- Rename outtree to intree and calculate consensus tree using CONSENSE.
- Outfile of this step contains your tree in a text format; including the bootstrap values.
- Add bootstrap values to nodes of the tree produced from original data manually (conventionally only those over 50 %).
- Let the machine do it, almost no user-defined options, very fast but
maybe less accurate: ClustalX
- Import your alignment in *.aln format into ClustalX (unlike in othe cases, you do not have to worry about gaps ... you'll soon see why).
- Excude positions with gaps (Tree menu), if you want to (ADVISABLE).
- Run Bootstrap N-J TREE (and realize how much faster it is - you even can afford 1000 x !). Examine the output file (*.phb).
- Use NJPLOT to visualize the tree.
GOOD LUCK!
Tasks
To begin with, you should have from the previous lessons (8 and 9) at least two alignments obtained by two different methods, preferentially one of them including some manual work (i.e. a MACAW-based alignment or a result of some automated tool finely tuned in BioEdit). Each of these two alignments should be in 3 formats:
1. FASTA (*.fst or *.txt) or CLUSTAL (*.ALN)
2. Phylip interleaved/Phylip 4 (*.phy)
3. Phylip interleaved/Phylip 4 with asll columns containing gaps removed (*.phy)
If you do not have these data from your own work, use the model data set.
NOTE: you do not need to do all the tasks, but you should try at least once each method demonstrated.
10.1
Using the same method, compare the trees produced from a manual alignment and an automated one (use ungapped data or ungapped tree calculation in ClustalX).10.2
Using the same method, compare the trees produced from a gapped and ungapped alignment (use data produced by the same approach).10.3
Compare the trees produced from the same alignment by two methods (use ungapped data or ungapped tree in Clustal X).10.4 Bonus: how to really construct a phylogenetic tree using MEGA
... but MEGA is also a bit dangerous...
A good detailed protocol can be found here.
A quick and dirty minimalistic way to a reasonable maximum likelihood tree from your existing alignment in FASTA format:
- Open MEGA amd open a file session with your (preferentially gapless) file. Choose to "Analyze" rather than "Align" the sequences and specify what your sequences are.
- Go Analysis - Phylogeny - Construct maximum likelihood tree. In the resulting menu, choose method, bootstraps (500 in real life, 100 in the course), treatment of gaps if they are present in your data, and run the analysis (if your data are gapless, use all sites).
- In the resulting tree, explore alternative views, topologies and rooting if appropriate. Save the image you want to keep as PDF (from the Image menu)
MEGA can do much more ... this is just a start! If you wish to continue on the project, save your session (but then you have to open it at the same system).
- Finalization of anything left behind from previous lessons.
- Evaluation of results.
- General discussion.