| 1 | PhyML-20130708 |
|---|
| 2 | |
|---|
| 3 | DESCRIPTION |
|---|
| 4 | ''A simple, fast, and accurate algorithm to estimate |
|---|
| 5 | large phylogenies by maximum likelihood'' |
|---|
| 6 | |
|---|
| 7 | Stephane Guindon and Olivier Gascuel, |
|---|
| 8 | Systematic Biology 52(5):696-704, 2003. |
|---|
| 9 | |
|---|
| 10 | Please cite this paper if you use this software in your publications. |
|---|
| 11 | |
|---|
| 12 | PARAMETERS |
|---|
| 13 | -i (or --input) seq_file_name |
|---|
| 14 | seq_file_name is the name of the nucleotide or amino-acid sequence file in PHYLIP format. |
|---|
| 15 | |
|---|
| 16 | |
|---|
| 17 | -d (or --datatype) data_type |
|---|
| 18 | data_type is 'nt' for nucleotide (default), 'aa' for amino-acid sequences, or 'generic', |
|---|
| 19 | (use NEXUS file format and the 'symbols' parameter here). |
|---|
| 20 | |
|---|
| 21 | |
|---|
| 22 | -q (or --sequential) |
|---|
| 23 | Changes interleaved format (default) to sequential format. |
|---|
| 24 | |
|---|
| 25 | |
|---|
| 26 | -n (or --multiple) nb_data_sets |
|---|
| 27 | nb_data_sets is an integer corresponding to the number of data sets to analyse. |
|---|
| 28 | |
|---|
| 29 | |
|---|
| 30 | -p (or --pars) |
|---|
| 31 | Use a minimum parsimony starting tree. This option is taken into account when the '-u' option |
|---|
| 32 | is absent and when tree topology modifications are to be done. |
|---|
| 33 | |
|---|
| 34 | |
|---|
| 35 | -b (or --bootstrap) int |
|---|
| 36 | int > 0: int is the number of bootstrap replicates. |
|---|
| 37 | int = 0: neither approximate likelihood ratio test nor bootstrap values are computed. |
|---|
| 38 | int = -1: approximate likelihood ratio test returning aLRT statistics. |
|---|
| 39 | int = -2: approximate likelihood ratio test returning Chi2-based parametric branch supports. |
|---|
| 40 | int = -4: SH-like branch supports alone. |
|---|
| 41 | int = -5: (default) approximate Bayes branch supports. |
|---|
| 42 | |
|---|
| 43 | |
|---|
| 44 | -m (or --model) model |
|---|
| 45 | model : substitution model name. |
|---|
| 46 | - Nucleotide-based models : HKY85 (default) | JC69 | K80 | F81 | F84 | TN93 | GTR | custom (*) |
|---|
| 47 | |
|---|
| 48 | (*) : for the custom option, a string of six digits identifies the model. For instance, 000000 |
|---|
| 49 | corresponds to F81 (or JC69 provided the distribution of nucleotide frequencies is uniform). |
|---|
| 50 | 012345 corresponds to GTR. This option can be used for encoding any model that is a nested within GTR. |
|---|
| 51 | |
|---|
| 52 | - Amino-acid based models : LG (default) | WAG | JTT | MtREV | Dayhoff | DCMut | RtREV | CpREV | VT |
|---|
| 53 | Blosum62 | MtMam | MtArt | HIVw | HIVb | custom |
|---|
| 54 | |
|---|
| 55 | |
|---|
| 56 | --aa_rate_file filename |
|---|
| 57 | filename is the name of the file that provides the amino acid substitution rate matrix in PAML format. |
|---|
| 58 | It is compulsory to use this option when analysing amino acid sequences with the `custom' model. |
|---|
| 59 | |
|---|
| 60 | |
|---|
| 61 | -f e, m, or fA,fC,fG,fT |
|---|
| 62 | |
|---|
| 63 | - e : the character frequencies are determined as follows : |
|---|
| 64 | - Nucleotide sequences: (Empirical) the equilibrium base frequencies are estimated by counting |
|---|
| 65 | the occurence of the different bases in the alignment. |
|---|
| 66 | - Amino-acid sequences: (Empirical) the equilibrium amino-acid frequencies are estimated by counting |
|---|
| 67 | the occurence of the different amino-acids in the alignment. |
|---|
| 68 | - m : the character frequencies are determined as follows : |
|---|
| 69 | - Nucleotide sequences: (ML) the equilibrium base frequencies are estimated using maximum likelihood |
|---|
| 70 | - Amino-acid sequences: (Model) the equilibrium amino-acid frequencies are estimated using |
|---|
| 71 | the frequencies defined by the substitution model. |
|---|
| 72 | - "fA,fC,fG,fT" : only valid for nucleotide-based models. |
|---|
| 73 | |
|---|
| 74 | fA, fC, fG and fT are floating numbers that correspond to the frequencies of A, C, G and T |
|---|
| 75 | respectively (WARNING: do not use any blank space between your values of nucleotide frequencies, only commas!) |
|---|
| 76 | |
|---|
| 77 | |
|---|
| 78 | -t (or --ts/tv) ts/tv_ratio |
|---|
| 79 | ts/tv_ratio : transition/transversion ratio. DNA sequences only. |
|---|
| 80 | Can be a fixed positive value (ex:4.0) or e to get the maximum likelihood estimate. |
|---|
| 81 | |
|---|
| 82 | |
|---|
| 83 | -v (or --pinv) prop_invar |
|---|
| 84 | prop_invar : proportion of invariable sites. |
|---|
| 85 | Can be a fixed value in the [0,1] range or e to get the maximum likelihood estimate. |
|---|
| 86 | |
|---|
| 87 | |
|---|
| 88 | -c (or --nclasses) nb_subst_cat |
|---|
| 89 | nb_subst_cat : number of relative substitution rate categories. Default : nb_subst_cat=4. |
|---|
| 90 | Must be a positive integer. |
|---|
| 91 | |
|---|
| 92 | |
|---|
| 93 | -a (or --alpha) gamma |
|---|
| 94 | gamma : distribution of the gamma distribution shape parameter. |
|---|
| 95 | Can be a fixed positive value or e to get the maximum likelihood estimate. |
|---|
| 96 | |
|---|
| 97 | |
|---|
| 98 | -s (or --search) move |
|---|
| 99 | Tree topology search operation option. |
|---|
| 100 | Can be either NNI (default, fast) or SPR (a bit slower than NNI) or BEST (best of NNI and SPR search). |
|---|
| 101 | |
|---|
| 102 | |
|---|
| 103 | -u (or --inputtree) user_tree_file |
|---|
| 104 | user_tree_file : starting tree filename. The tree must be in Newick format. |
|---|
| 105 | |
|---|
| 106 | |
|---|
| 107 | -o params |
|---|
| 108 | This option focuses on specific parameter optimisation. |
|---|
| 109 | params=tlr : tree topology (t), branch length (l) and rate parameters (r) are optimised. |
|---|
| 110 | params=tl : tree topology and branch length are optimised. |
|---|
| 111 | params=lr : branch length and rate parameters are optimised. |
|---|
| 112 | params=l : branch length are optimised. |
|---|
| 113 | params=r : rate parameters are optimised. |
|---|
| 114 | params=n : no parameter is optimised. |
|---|
| 115 | |
|---|
| 116 | |
|---|
| 117 | --rand_start |
|---|
| 118 | This option sets the initial tree to random. |
|---|
| 119 | It is only valid if SPR searches are to be performed. |
|---|
| 120 | |
|---|
| 121 | |
|---|
| 122 | --n_rand_starts num |
|---|
| 123 | num is the number of initial random trees to be used. |
|---|
| 124 | It is only valid if SPR searches are to be performed. |
|---|
| 125 | |
|---|
| 126 | |
|---|
| 127 | --r_seed num |
|---|
| 128 | num is the seed used to initiate the random number generator. |
|---|
| 129 | Must be an integer. |
|---|
| 130 | |
|---|
| 131 | |
|---|
| 132 | --print_site_lnl |
|---|
| 133 | Print the likelihood for each site in file *_phyml_lk.txt. |
|---|
| 134 | |
|---|
| 135 | |
|---|
| 136 | --print_trace |
|---|
| 137 | Print each phylogeny explored during the tree search process |
|---|
| 138 | in file *_phyml_trace.txt. |
|---|
| 139 | |
|---|
| 140 | |
|---|
| 141 | --run_id ID_string |
|---|
| 142 | Append the string ID_string at the end of each PhyML output file. |
|---|
| 143 | This option may be useful when running simulations involving PhyML. |
|---|
| 144 | |
|---|
| 145 | |
|---|
| 146 | --quiet |
|---|
| 147 | No interactive question (for running in batch mode) and quiet output. |
|---|
| 148 | |
|---|
| 149 | |
|---|
| 150 | --no_memory_check |
|---|
| 151 | No interactive question for memory usage (for running in batch mode). Normal ouput otherwise. |
|---|
| 152 | |
|---|
| 153 | |
|---|
| 154 | --alias_subpatt |
|---|
| 155 | Site aliasing is generalized at the subtree level. Sometimes lead to faster calculations. |
|---|
| 156 | See Kosakovsky Pond SL, Muse SV, Sytematic Biology (2004) for an example. |
|---|
| 157 | |
|---|
| 158 | |
|---|
| 159 | --boot_progress_display num (default=20) |
|---|
| 160 | num is the frequency at which the bootstrap progress bar will be updated. |
|---|
| 161 | Must be an integer. |
|---|