source: branches/port5/GDEHELP/GDE2.2_manual_text

Last change on this file was 2, checked in by oldcode, 24 years ago

Initial revision

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 50.3 KB
Line 
1GDE2.2 rev1     1
2
3
4Genetic Data Environment
5        version 2.2
6
7Table of Contents
8Introduction    2
9What's New for this Release     2
10System Requirements     2
11Note to Motif users     2
12Installing the GDE      3
13Using the GDE   3
14Data Types      7
15Menu Functions
16File menu       7
17Edit menu       9
18DNA/RNA menu    9
19External Functions      9
20Bug reports/extensions  12
21Acknowledgments 12
22Appendix A, File Formats        13
23Appendix B, Adding Functions    16
24Appendix C, External functions  19
25
26
27.c.Introduction
28
29The Genetic Data Environment is part of a growing
30set of programs for manipulating and analyzing
31"genetic" data.  It differs in design from other
32analysis programs in that it is intended to be an
33expandable and customizable system, while still
34being easy to use.
35
36There are a tremendous number of publicly available
37programs for sequence analysis.  Many of these
38programs have found their way into commercial
39packages which incorporate them into integrated,
40easy to use systems.  The goal of the GDE is to
41minimize the amount of effort required to integrate
42sequence analysis functions into a common
43environment.  The GDE takes care of the user
44interface issues, and allows the programmer to
45concentrate on the analysis itself.  Existing programs
46can be tied into the GDE in a matter of hours (or
47minutes) as apposed to days or weeks.  Programs
48may be written in any language, and still seamlessly
49be incorporated into the GDE.
50
51These programs are, and will continue to be,
52available at no charge.  It is the hope that this
53system will grow in functionality as more and more
54people see the benefits of a modular analysis
55environment.  Users are encouraged to make
56modifications to the system, and forward all changes
57and additions to Steven Smith at
58smith@bioimage.millipore.com. 
59
60.c.What's New for this Release
61
62GDE 2.2 represents a maintainence release.  Several
63small bugs have been fixed, as well as new editing
64features and user interface elements.  Also, I have
65tried to update all of the contributed external
66programs to their latest release.  Updated programs
67include:
68
69Phylip
70Treetool
71LoopTool
72Readseq
73Blast
74Fasta
75
76Improved versions of printing, and translate are
77included as well.  As for new editing features, a
78useful "yanking" feature has been added by Scott
79Ferguson from Exxon Research, and the capability
80to export the colormap for a seqeunce (see
81appendicies A/C).  Among the bugs fixed in this
82release are:
83
84Selection mask problems when exporting to
85Genbank (fixed in 2.1)
86Memory leaks (fixed in 2.1)
87Correct handling of circular sequences
88More liberal interpretation of Genbank formatted
89files. (not column dependent)
90
91
92.c.System Requirements
93
94GDE 2.2 currently runs on the Sun family of
95workstations.  This includes the Sun3 and Sun4
96(Sparcstation) systems.  It was written in XView,
97and runs on Suns using OpenWindows 3.0 or MIT's
98X Windows.  It runs in both monochrome, and color,
99and can be run remotely on any system capable of
100running X Windows Release 4.  You should have at
101least 15 meg of free disk space available.  The binay
102release for SparcStations was compiled under
103SunOS 4.1.2 and Openwindows 3.0.
104
105We are also supporting a DECStation version of
106GDE.  This is running under XView 3.0/X11R5. We
107encourage interested people to port the programs to
108their favorite Unix platform.  There are informal
109ports to the SGI line of unix machines.
110
111.c.Note to Motif users
112
113GDE2.2 can be run using different window
114managers.  The most common alternative to olwm is
115the Motif window manager (mwm).  The only
116problem in using another window manager is that
117the status line is not displayed.  We have added a
118"Message panel" as an option under "File-
119>Properties" which displays all of the information
120contained on the status line.
121
122People using other window managers may also
123prefer using xterm, and xedit as default terminals and
124file editors.  This can be accomplished by replacing
125all occurrences of 'shelltool' and 'textedit' with
126'xterm -e' and 'xedit' in the
127$GDE_HELP_DIR/.GDEmenus file.
128
129
130.c.Installing the GDE
131
132Instructions for the source code release are included
133in the README.install file.
134
135The binary installations consist of creating a GDE
136directory, such as /usr/local/GDE, and un-taring the
137installation tarfile into the directory.  If you are
138installing the GDE for your own use, then you can
139simply make a GDE subdirectory.  There is no need
140to be superuser (root) to do the installation in your
141own directory.  For example:
142
143demo% mkdir /usr/local/GDE
144demo% cp GDE2.2.tar /usr/local/GDE
145demo% cd /usr/local/GDE
146demo% tar -xf GDE2.2.tar
147
148After this, each new user will need to add two lines
149to their .cshrc file so that they can find the gde
150programs and files.
151
152demo% cat >> ~/.cshrc
153set path = ($path /usr/local/GDE/bin)
154setenv GDE_HELP_DIR /usr/local/GDE/help/
155^D
156
157You may wish to make a copy of the .GDEmenus
158file from the help directory into your home directory. 
159This is only necessary if you wish to modify your
160menus.  Copy the demo files from
161/usr/local/GDE/demo into your local directory, and
162you are now ready to use the GDE.
163
164FastA and Blast need to have the properly formatted
165databases installed in the $GDE_HELP_DIR under
166the directories FASTA/PIR, FASTA/GENBANK,
167BLAST/pir BLAST/genbank.  For FASTA, simply
168copy a version of PIR and Genbank into the proper
169directory.  Alternately,  the PIR and GENBANK
170files can be symbolic links to copies of Genbank
171held elsewhere on your system.  You may need to
172look at the .GDEmenus file in $GDE_HELP_DIR to
173verify that you are using the same divisions for
174these databases.
175
176Blast installation involves converting PIR and
177GENBANK to a temporary FASTA format (using
178pir2fasta and gb2fasta) and then using pressdb for
179nucleic acid, and setdb for amino acid to reformat the
180databases again into blast format.  The .GDEmenus
181file is currently set up to search with blast using the
182following databases: pir, genpept, genupdate, and
183genbank.  If you wish to divide these into
184subdivisions, then the .GDEmenus file will have to
185be edited.
186
187The most up to date release of blast can be obtained
188via anonymous ftp to ncbi.nlm.nih.gov.  The most
189recent release of FASTA can be obtained via
190anonymous ftp to uvaarpa.virginia.edu.  It is
191strongly recommended that you retrieve these copies,
192and become familiar with their setup.
193
194.c.Using the GDE
195
196It is assumed that the user is familiar with the Unix,
197and OpenWindows/Xwindows environments.  It is
198also assumed that people running standard MIT X-
199Windows will be using the OpenLook window
200manager (olwm).  Other window managers work
201with varied success.  If you are not certain as to how
202your system is set up, please contact your systems
203administrator.
204
205Once the window system has started, and a terminal
206window (xterm, shelltool etc.) you can start up the
207GDE by typing:
208
209demo% gde tRNAs
210
211This should load the sample data set tRNAs into
212GDE, and the following window should appear:
213                                                               
214
215
216This is the sequence alignment editor.  It consists of
217a color alignment display, a set of command menus,
218horizontal and vertical scroll bars to navigate the
219alignment, a list of short sequence names (usually
220the LOCUS of a Genbank entry), and a status line. 
221The cursor is located in the upper left corner.
222
223
224Using the Mouse
225
226The mouse follow OpenLook standards for
227operation.  The functions for each button are:
228
229                                 
230
231
232The left mouse button is used for placing the cursor,
233selecting sequences by their short names,
234scrolling/paging, performing split screens, and
235resizing.  The right button is used for pop up menus,
236and scrollbar menus.  The middle button is used for
237extending a text selection.
238
239Cursor Movement
240
241The cursor can be moved using the arrow keys, or by
242clicking the mouse within a sequence.  The cursors
243position is displayed on the status line in both
244sequence position and alignment column number. 
245The right hand side of the status line shows the left
246and right column positions of the currently active
247display.
248
249Scrolling is controlled by the scrollbar elevator.  By
250clicking (left mouse button) on one of the elevator
251arrows, the screen will scroll one character in that
252direction.  By dragging the elevator center, the
253screen can be moved directly to any location.  By
254clicking directly to one side of the elevator, the
255screen will scroll one full screen in that direction. 
256And by clicking on the scrollbar anchor, the elevator
257will move to that anchor.  Scrollbars also have
258menus associated with them giving other scroll
259options.  Use the right mouse button to activate the
260menu.
261
262
263Selecting Sequences
264
265Sequence selection is necessary before most
266functions can be performed.  Selecting sequences is
267accomplished by clicking or dragging (left button)
268over the short name associated with the sequence(s). 
269The name of the sequence should become
270highlighted on the release of the mouse button.  By
271holding down the shift key, you can toggle the
272selection on or off for any set of sequences.  By
273clicking just to the right of any sequence short
274name, you will deselect all of them.
275
276Selecting Text
277
278Selecting text is accomplished in much the same
279way as selecting entire sequences.  In the editing
280window, you can drag the mouse pointer over a
281rectangular region the select a block of text.  By
282using the shift key (or the middle mouse button)
283you can adjust the selection to include other
284sequences, or other columns of text.  If groups are
285enabled, GDE will automatically select all sequences
286in a group if any one sequence in a group is selected
287(See Sequence Editing).
288
289Sequence Protection
290
291All sequences can be individually protected against
292accidental modification.  This is accomplished by
293selecting the set of sequences that you are interested
294in editing, and choosing the "Set protections" menu
295item under the File menu.  Your choices are:
296
297Unambiguous modification               
298        Changing/adding/deleting regular characters
299Ambiguous changes                       
300        Changing ambiguous codes ('N', 'X'...)
301Alignment modifications                 
302        Changing alignment gaps ('-', '~')
303
304Sequence Editing
305
306Sequences can be edited by simply typing to insert,
307and using the delete or backspace key to delete
308characters.  Sequences must have the proper
309protections set to allow the type of modifications
310that you are attempting.  The default protection level
311only allows modification to the alignment, but not
312to the sequences themselves.  The Sun function
313keys, cut, copy and paste are used to edit selected
314text.  Text selections work in rectangular (possibly
315disjointed) regions.  You can cut or copy a block of
316sequence text, and paste it to a new cursor location
317using these three keys.
318
319
320Sequence Yanking:
321
322Yanking referes to the "pulling" of a base to fill a
323gapped position like beads on an abacus.  Place the
324cursor over a gap character, and type <crtl> k to yank
325the character from the left into the current position. 
326Type <ctrl>l to pull the character from the right. 
327Repeat counts are honored ("20 <ctrl> l" will yank
32820 characters from the right).
329
330
331Repeat Counts
332
333By typing a numeric value before an editing
334function you can insert, delete or move a number of
335characters at a time.  The current repeat count is
336displayed on the status line, and can be cleared by
337clicking the left mouse button in the alignment
338window.  In order to insert twenty gaps into a
339sequence, one would type "20-".  In order to move
340down five sequences, one would type "5¯".  This
341works with all sequence types, however the meta
342(diamond) key must be held down when the cursor
343is in a text or mask sequence.  This is because
344numbers are valid characters in these sequences, and
345would otherwise be confused with repeat counts. 
346
347Split Screen
348
349Split screen editing allows the viewing one region
350while editing another.  This is very useful for
351aligning "downstream" regions by editing
352"upstream".
353
354The alignment window can be split horizontally into
355two or more windows into the alignment.  These
356windows scroll independently of each other both
357horizontally and vertically.  The short names
358displayed to the left of the alignment correspond to
359the window that was last scrolled or edited.  Care
360should be taken in any modifications done in this
361mode so that edits are performed on the correct
362sequence.  To avoid confusion during split screen
363operations, the vertical scroll bars may be locked so
364that all windows scroll together.
365       
366       
367                                 
368
369
370In order to split a window into two views, grab (left
371button) the left or right anchor (small rectangle) at
372either end of the horizontal scrollbar and drag to the
373middle of the window.  This should split the
374window into two views.  To join two views, place
375the mouse pointer on the horizontal scroll bar use
376the menu (right button) .
377               
378The views are NOT two copies of the alignment. 
379Changes in one window are reflected in the other. 
380Users should not be confused by this fact. 
381
382Sequence Grouping
383
384Sequences can be grouped for editing functions. 
385This is very helpful when trying to adjust several
386sub alignments.  When grouped, all sequences
387within a group will be affected by editing in any
388member of the group.  All sequences within a group
389must have protections set to allow modification
390before any one will be modified. 
391
392In order to group sequences, select the names of the
393sequences that should fall within a group, and select
394Group under the Edit menu.  A number will be
395placed at the left of the sequence representing its
396assigned group number.  To any sequence or
397sequences, the user selects those sequences and uses
398the Ungroup command under the Edit menu.
399
400Special keys
401
402There are also a few special function keys used in
403the GDE.  Some functions have meta key
404equivalences so that they can be called from the
405keyboard, instead of by the menu system.  The
406"meta" key is a standard property of X windows, and
407may be remapped to a different key symbol for
408different keyboards.  For example, meta on Sun
409workstations is represented with a à, where on a
410Macintosh running MacX it might be the "apple"
411key.  The operation of the key is the same as the
412control or shift key, it is held down while pressing
413the second key in the sequence.
414
415Cut text, copy text and paste text are mapped to the
416Openlook equivalent keys (L10, L6, and L8 on Sun
417keyboards).  Other meta keys are defined in the
418.GDEmenus file, and may be changed to suit your
419preferences.
420
421.c.Data Types
422
423The GDE supports several data types.  The data
424types supported in 2.2 are DNA, RNA, protein
425(single letter codes), mask sequence, and text. 
426
427DNA and RNA
428
429Nucleic acid sequences are tightly type cast, and can
430contain any IUPAC code (ACGTUM
431RSVWYHKDBN) as well as two alignment gap
432characters ('~' and '-').  Some keys are remapped to
433fit IUPAC codes.  For example,  'X' is mapped to
434'N'.  All nonstandard characters get mapped to the
435alignment gap '-'.  Upper and lower case are both
436supported, and the T/U characters are mapped based
437on whether you are working with DNA or RNA. 
438The color coding for DNA and RNA is identical. 
439The color for ambiguous characters, and for
440alignment gaps is grey.
441
442Amino Acid Sequence
443
444Amino acid sequences are loosely type cast, and can
445contain any valid ASCII character.  The results of
446analysis on nonstandard characters is not guaranteed.
447The color for nonstandard amino acid characters, and
448for alignment gaps is grey.
449
450Text Sequence
451
452Any valid ASCII printable character can be entered
453into a text sequence.  Care should be taken with
454using space characters, as these will only be saved
455properly in Genbank format, and not in flat file
456format.  The characters @#% and " should be
457avoided as well, as these can confuse the reading of
458flat files if saved in that format.
459
460Mask Sequence
461
462Mask sequence is identical to text sequence with the
463following exceptions.  Mask sequence can have the
464ability (function dependent) of masking out
465positions in an alignment for analysis.  If a mask
466sequence is selected along with some other
467sequence(s) for an analysis function that permits
468masking, then all columns that contain a '0' in the
469mask sequence will be ignored by the function.  The
470mask itself would not be passed to the analysis
471function either.  Some functions allow masking,
472some do not.  Refer to the instruction page for each
473function to see whether or not it supports sequence
474masking.
475
476Color Masks
477
478Color masks give color to a sequence on a position
479by position basis.  Individual sequences can have
480color masks attached to them, or one color mask can
481be used for an entire alignment.  Color masks are
482generated externally by some analysis functions, and
483are then passed back to the GDE.  The file format for
484a colormask is described in Appendix A.
485
486
487.c.Menu Functions: File menu
488
489The GDE has several built-in menu functions under
490the File and Edit menus.  These functions are unique
491in that they are part of the primary display editor,
492and are not described in the .GDEmenus file.
493
494Open...
495Selecting this will bring up the open file dialog
496box.  Users can scroll through a list of files in the
497current directory, move up and down the directory
498tree, and open any individual data file.  The
499sequence data in that file is loaded into the current
500editing window below any existing sequences.  The
501open command will open any Genbank formatted
502file, or a GDE flat file.
503
504Save as...
505This function will save the entire alignment to a
506specified file in either Genbank or flat file format. 
507The file will be saved in the local directory unless a
508relative or absolute path is specified.
509
510Properties...
511Properties controls the display settings.  Those
512settings include character size, color type,  and insert
513direction.  The screen can also be inverted, vertical
514scroll lock and keyboard clicks (tactile feedback)
515can be turned on or off.    Vertical scrollbar lock will
516cause all split views to scroll together in the vertical
517direction.
518                       
519                         
520
521
522
523Protections...
524This will display, and then set the default
525protections for all selected sequences.  If two or
526more of the sequences differ in their current
527protection settings, a warning message will appear in
528the protection dialog box.  The protections currently
529available are alignment gap protection, ambiguous
530character protection, unambiguous character
531protection, and translation protection.
532                               
533                   
534
535Get info...
536This option allows the viewing and setting of
537attributes associated with each individual sequence. 
538These attributes include short name, full name,
539description, author, comments, and the sequence
540type. The attributes loosely correspond to fields in a
541Genbank entry.  Comments can be included for each
542sequence in the comments field.
543
544       
545       
546                                   
547
548
549
550.c2. Edit menu
551
552Select All
553Selects all sequences.  This is helpful when you
554have several dozen sequences.
555
556Select by name...
557Select all sequences containing a given string in
558their short names field.  No wild cards are allowed,
559and only selecting is allowed, not de-selecting.  The
560search is started when the Return key is pressed, and
561multiple searches can be accumulated.  Press Done
562when finished.
563
564Cut/Copy/Paste sequences
565Cut copy and paste are primarily useful for
566reordering sequences, and for making duplicate
567copies of a given sequence.  They do not pass
568information to other programs.  This capability will
569be implemented in a later release.  Cut and copy will
570place the selected sequences on an internal clipboard. 
571They can then be pasted back into the top of editing
572window (default) or under the last selected
573sequence.
574
575Group/Ungroup
576Assign a group number to the selected sequences. 
577Edit operations in any one sequence within the
578group will be propagated to all within the group. 
579Sequence protections from one group are also
580imposed upon all other sequence in the given group. 
581If a given operation is illegal in one sequence in a
582group (i.e. alignment modification) then it will not
583work in any of the sequences in that group. 
584Ungroup will remove the selected sequences from a
585given group.
586
587Compress
588Compress will remove gap characters from the
589selected sequences.  The user has the option of
590removing all gaps, or simply all columns containing
591nothing but gaps.  This is useful for minimizing the
592length of a subalignment.
593
594Reverse Sequence
595Reverses the selected sequences.  Alignment gaps are
596reversed as well.  The selected sequences will remain
597aligned after reversal.
598
599.c2. DNA/RNA menu
600
601Complement Sequence
602Converts DNA/RNA into its complement strand
603(keeping full IUPAC ambiguity).  This function has
604no effect on text, protein, or mask sequence.  Note
605that this function does not produce the reverse
606strand of DNA but merely converts A<->T and G<-
607>C.  If the reverse strand is needed, remember to
608Complement and Reverse the sequence (Edit menu).
609
610
611.c.External Functions
612
613See appendix C for a full description of functions
614supported in GDE 2.2  All external functions are
615described in the configuration file .GDEmenus.  Here
616is a brief description of some of the basic functions
617included.
618
619File menu
620
621New Sequence <meta n>   Create a new sequence.
622Prompts for sequence type, and short name.
623
624Import foreign format
625Export foreign format   Load and save sequences
626using Readseq by Don Gilbert (see Appendix C).
627
628Save Selection          Save the currently
629selected sequences in a specified file.
630
631Pretty print            Print using the sequence
632formatter supplied by Readseq.
633
634Print Selection         Print the selected
635sequences to the chosen printer.  This function
636supports
637                        the Unix command
638enscript as well as lpr. The .GDEmenus file may
639need to
640                        be modified to add the
641names of local printers to the printer list.
642
643Edit menu
644
645sort...                 Sort the selected
646sequences by a primary and secondary key.  Pass the
647new order
648                        to a new GDE window.
649
650Extract                 Extract the selected
651sequences into a new window.
652
653DNA/RNA Menu
654Translate               Translate the selected
655sequences from DNA/RNA to Amino acid.  The user
656can
657                        specify the desired
658reading frame, and the minimum open reading frame
659(stop
660                        codon to stop codon) to
661translate.  The user can also choose between single
662                        letter code and triple
663letter codes. There is also an option to allow each
664ORF to
665                        to be entered as a seperate
666sequence.
667
668Dot plot                Display a dotplot
669identity matrix for the selected sequence(s).  If only
670one
671                        sequence is selected, then
672the dotplot is a self comparison.  If two or more
673                        sequences are selected,
674then the first two sequences are compared.
675
676Clustal Align           Align the selected
677sequences using the clustalv algorithm by Des
678Higgins.
679                        (See Appendix C)
680
681Find All <meta f>       Search and highlight the
682selected sequences for a given substring.  A
683specified
684                        percent of mismatching
685can also be allowed.
686
687Variable Positions      The selected sequences
688are scored column by column for conservation.  The
689                        result is returned as a
690grey scale alignment color mask.  This can be useful
691                        in selecting PCR primers.
692
693Sequence Consensus      Return the consensus for
694the selected sequences.  This can either be a majority
695                        consensus, or an
696ambiguity consensus using IUPAC coding.
697
698Distance Matrix                 Calculate a distance
699matrix for the selected sequences. (See Appendix C)
700
701MFOLD           Fold the selected
702sequences using MFOLD by Michael Zuker.  The
703resulting
704                        structure is returned as a
705nested bracket ('[]') representation of the secondary
706                        structure.(See appendix
707C.)
708
709Draw Secondary Struct   Draw the selected
710sequence using the proposed secondary structure. 
711Both the
712                        secondary structure
713prediction, and the RNA sequence should be
714selected before
715                        calling this function. 
716The drawing program is LoopTool. (See Appendix
717C)
718
719Highlight Helix         Show all violations to a
720proposed RNA secondary structure.  The secondary
721                        structure represented must
722be selected, as well as the aligned sequences to be
723                        tested.  The selected
724sequences will then be colored according to whether
725or not
726                        they support the
727proposed 2¡ structure.  Standard Watson/Crick
728paring will be
729                        colored dark blue, G-U
730paring will be colored light blue, mismatches will be
731                        colored gold, and pairng
732to gaps will be red.
733
734Blastn/BlastX           Search the selected
735sequence (select only one) against a given database
736with the
737                        BLAST searching tool
738written by Altschul, Gish, Miller, Myers, and
739Lipman.
740                        Blastn searches DNA
741against DNA databases, blastx searches DNA against
742AA
743                        databases by translating
744the sequence in all six reading frames. (See
745Appendix C)
746
747FastA                   Search the selected
748sequence (select only one) against a given database
749using the
750                        FASTA similarity search
751program written by Pearson and Lipman. (See
752                        Appendix C)
753
754Protein Menu
755
756
757Clustal Align           Align the selected amino
758acid sequences using the clustal algorithm. (See
759                        Appendix C)
760
761Blastp, Tblastn, Blast3 Search the selected
762sequence (select only one) against a given database
763with the
764                        BLAST searching tool
765written by Altschul, Gish, Miller, Myers, and
766Lipman.
767                        Blastp searches AA
768against AA databases, tblastn searches AA against
769DNA
770                        databases by translating
771the database in all six reading frames.  Blast3 finds
772                        three way alignments that
773are could not be found with only pairwise
774comparisons.
775                        (See Appendix C)
776
777Sequence Management Menu
778
779Assemble contigs        Assemble the selected
780sequences into contigs using the program CAP
781(Contig
782                        Assemble Program)
783written by Xiaoqiu Huang.  The resulting sequences
784are
785                        returned to the current
786GDE window, and they are grouped into contigs. 
787The
788                        user can then sort the
789sequences by group, and offset to produce an ordered
790list of
791                        the contigs. (See
792Appendix C)
793
794Strategy view           Pass out the selected
795sequences to StratView.  This program will display
796contigs
797                        in a greatly reduced line
798drawing.  This is very useful for large contigs.
799
800Restriction sites               Search the selected
801sequences for the restriction enzymes specified in the
802given
803                        enzyme file.  The
804restriction sites are then colored by enzyme.
805
806Phylogeny menu
807
808DeSoete Tree fit                Calculate a phylogenetic
809tree using a least squares fitting algorithm on a
810distance
811                        matrix calculated from the
812selected sequences.  The results can then be passed
813on
814                        to treetool for display
815and manipulation. (See Appendix C)
816
817Phylip 3.5              Pass the selected data to
818on of the treeing programs in Phylip, written by
819                        Joe Felsenstein.  The
820chosen phylip program is started in it's own
821window,
822                        with the selected
823sequences already loaded. (See Appendix C)
824
825
826
827
828Citation of work
829
830We ask that any published work using any of the
831external functions in GDE cite the appropriate
832authors.  Please see Appendix C for references.
833
834
835
836.c.Bug reports/extensions
837
838Any bug reports, request for enhancement, and useful
839extensions to the GDE should be forwarded by
840electronic mail to:
841
842smith@bioimage.millipore.com
843
844Please include as much detail as possible in bug
845reports so that the bug can be reproduced.
846Correspondence should be addressed to:
847
848Steven Smith
849Millipore Imaging Systems
850777 E. Eisenhower Pkwy
851Ann Arbor, MI   48108
852
853
854
855
856
857
858
859
860.c.Acknowledgments
861
862I would like to thank the following people for their
863input and assistance and code used in the
864development of the GDE:
865
866Carl Woese, Gary Olsen and Mike Maciukenas at
867University of Illinois Dept of Microbiology,  Ross
868Overbeek at Argonne National Laboratories,Walter
869Gilbert, Patrick Gillevet, Chunwei Wang, Susan
870Russo and Erik Bunce at the Harvard Genome
871Laboratory.  I would also like to personally thank
872the following people for their permission to include
873their software with this release of GDE.
874
875Tim Littlejohn
876Scott Ferguson
877Brian Fristensky
878Des Higgins
879David Lipman and the group at NCBI
880William Pearson
881Don Gilbert
882Xiaoqui Huang
883Joe Felsenstein
884Michael Zuker
885Geert DeSoete
886
887
888Many thanks to all the people who have directly and
889indirectly helped with the ongoing support of GDE.
890It is only by the generosity of these people that
891GDE has been successful.
892
893
894
895.c.Appendix A, File Formats
896
897The currently supported file formats include GDE
898data files, Genbank formatted files (with type
899extensions), a generic flat file format, and a color
900mask file.
901
902GDE format
903GDE format is a tagged field format used for storing
904all available information about a sequence.  The
905format matches very closely the GDE internal
906structures for sequence data.  The format consists of
907text records starting and ending with braces ('{}'). 
908Between the open and close braces are several tagged
909field lines specifying different pieces of information
910about a given sequence.  The tag values can be
911wrapped with double quote characters ('""') as
912needed.  If quotes are not used, the first whitespace
913delimited string is taken as the value. The allowable
914fields are:
915
916{
917name            "Short name for sequence"
918longname        "Long (more descriptive) name for
919sequence"
920sequence-ID     "Unique ID number"
921creation-date   "mm/dd/yy hh:mm:ss"
922direction       [-1|1]
923strandedness    [1|2]
924type   
925        [DNA|RNA||PROTEIN|TEXT|MASK]
926offset          (-999999,999999)
927group-ID        (0,999)
928creator         "Author's name"
929descrip         "Verbose description"
930comments        "Lines of comments that can be
931fairly arbitrary
932text about a sequence.  Return characters are allowed,
933but no internal
934double quotes or brace characters.  Remember to
935close with a double
936quote"
937sequence
938        "gctagctagctagctagctcttagctgtagtcgtagctgatgc
939tagct
940gatgctagctagctagctagctgatcgatgctagctgatcgtagctgacg
941gactgatgctagctagctagctagctgtctagtgtcgtagtgcttattgc"
942}
943
944Any fields that are not specified are assumed to be
945the default values.  Offsets can be negative as well
946as positive.  Genbank entries written out in this
947format will have all (") converted to ('), and all ({})
948converted to ([]) to avoid confusion in the parser. 
949Leading and trailing gaps are removed prior to
950writing each sequence.  This format is deliberately
951verbose in order to be simple to duplicate.
952
953
954Genbank format:
955GDE can read a concatenated list of Genbank entries,
956and extract certain fields from such files.  The
957default method for storing nucleic acid, amino acid,
958masking sequences or text is in Genbank format. 
959The following fields are recognized:
960
961LOCUS:          Short name for this sequence
962(Maximum of 32 characters) 
963DEFINITION:     Definition of sequence (Maximum
964of 80 characters)
965ORGANISM:               Full name of organism
966(Maximum of 80 characters)
967AUTHORS:                Authors of this sequence
968(Maximum of 80 characters)
969ACCESSION:      ID Number for this sequence
970(Maximum of 80 characters)
971ORIGIN:         Beginning of sequence data 
972//                      End of sequence data 
973
974All other lines are retained as comments.  The
975LOCUS line also specifies what type of sequence
976follows.  The form of this line is:
977
978LOCUS       name                size bp type    date
979       
980
981where name is the Genbank Locus name, size is total
982base count, type is one of DNA, RNA, PROTEIN,
983MASK, or TEXT and date is of the form dd-MON-
984yyyy.  In this way, the standard Genbank format is
985extended to store all text, mask and protein data. 
986The Genbank character set has also been extended in
987order to support these other data types.  Valid
988characters are:
989
990DNA/RNA:                Full IUPAC coding as
991well as '-' and '~' characters for alignment
992                        gaps
993Protein:                All valid single letter
994codes plus '-' and '~'.  Other ASCII characters
995                        may be inserted, however
996external functions may be confused by
997                        such characters.
998Mask:                   All legal printable ASCII
999characters.  If used as a selection mask, all
1000                        columns containing a '0'
1001will be removed from any analysis.
1002Text:                   All valid ASCII
1003characters.
1004
1005Here is a valid Genbank entry for two E.coli
1006tRNA's:
1007
1008LOCUS         ECOTRNT4      76 bp     RNA               
100928-JAN-1991
1010DEFINITION  E. coli (T4 infected) vulnerable tRNA
1011(A).
1012  ORGANISM  Escherichia coli
1013   AUTHORS  Amitsur,M., Levitz,R. and Kaufmann,G.
1014FEATURES       From  To/Span     Description
1015    tRNA          1       76     vulnerable tRNA(A)
1016BASE COUNT  ?
1017ORIGIN
1018        1 GGGUCGUUAG CUCAGUUGGU AGAGCAGUUG
1019ACUUUUAAUC AAUUGGNCGC AGGUUCGAAU
1020       61 CCUGCACGAC CCACCA
1021//
1022LOCUS          ECOTRQ1      75 bp     RNA               
102328-JAN-1991
1024DEFINITION  E.coli Gln-tRNA-1.
1025  ORGANISM  Escherichia coli
1026   AUTHORS  Yaniv,M. and Folk,W.R.
1027SOURCE      -REFERENCE   [1]  JOURNAL   J. Biol.
1028Chem. 250, 3243-3253 (1975)
1029FEATURES       From  To/Span     Description
1030    tRNA          1       75     Gln-tRNA-1 (NAR:
10310510)
1032    refnumbr      1        1     sequence not
1033numbered in [1]
1034BASE COUNT  ?
1035ORIGIN
1036        1 UGGGGUAUCG CCAAGCGGUA AGGCACCGGU
1037UUUUGAUACC GGCAUUCCCU GGUUCGAAUC
1038       61 CAGGUACCCC AGCCA
1039//
1040
1041
1042Flat file format:
1043This is a simplified format for importing sequence
1044data, and passing it out to analysis functions.  Very
1045little information is actually retained in this format,
1046and should be used carefully so as not to lose
1047attribute information.  It is defined as follow:
1048
1049type_character short_name
1050sequence_data
1051sequence_data
1052sequence_data
1053...
1054
1055The type character is # for DNA/RNA, % for protein
1056sequence, @ for mask sequence, and " for text.  The
1057short name is the same as the LOCUS line in
1058Genbank.  This is followed by lines of sequence,
1059each ending with a return character.These lines are
1060read until the next type character is encountered, or
1061until the end of the file is reached.  Care should be
1062taken in using this format with text as space
1063characters are stripped automatically.  As of release
10642.0, flat file format allows for an optional offset to
1065be specified in parentheses after the sequence name. 
1066An offset represents how many leading gap
1067characters should be placed before the start of a
1068sequence.  If this offset does not exist, then it is
1069defined to be 0. 
1070
1071Here is a sample flat file for two Ecoli tRNA's:
1072
1073#ECOTRNT4
1074GGGUCGUUAGCUCAGUUGGUAGAGCAGUUGACUUUUAAUCAAUUGGNCGCAG
1075GUUCGAAU
1076CCUGCACGACCCACCA
1077#ECOTRQ1
1078UGGGGUAUCGCCAAGCGGUAAGGCACCGGUUUUUGAUACCGGCAUUCCCUGG
1079UUCGAAUC
1080CAGGUACCCCAGCCA
1081
1082
1083Color mask:
1084The format for a color mask has been kept simple to
1085make implementation of color functions easy.  The
1086format optionally defines which sequence to color,
1087whether or not to color alignment gaps in the
1088existing sequence, and how long the following mask
1089will be.  It is then followed by a list of decimal
1090color codes (range 0 to 15) for each position in the
1091sequence.  There are four keywords used in the color
1092mask file.  Those keywords are:
1093
1094name:short name                 If short name
1095matches a currently loaded sequence,
1096                                then impose this
1097color mask on that sequence.  If this
1098                                line is omitted,
1099then color all sequences this color, and the color
1100                                mask is expected
1101to start at the leftmost column on the screen.
1102
1103length:length                   The following
1104list in length long
1105
1106nodash:                         Skip over dash
1107characters when imposing this color mask
1108                                on the named
1109sequence.  This allows an unaligned color
1110                                mask to be
1111placed over aligned sequence.
1112
1113start:                          Begin reading
1114the color mask on the next line.
1115
1116Here is a sample color mask file:
1117
1118name:test_sequence
1119length:10
1120nodash:
1121start:
11223
11233
11243
11256
11265
11273
11283
11293
11302
11317
1132
1133The colors in the default color lookup table are:
11340       White                           8
1135        Black
11361       Yellow                          9
1137        Grey 1
11382       Violet                          10
1139        Grey 2
11403       Red                             11
1141        Grey 3
11424       Aqua                            12
1143        Grey 4
11445       Lime Green                      13
1145        Grey 5
11466       Blue                            14
1147        Grey 6
11487       Purple                          15
1149        White
1150
1151
1152
1153.c.Appendix B, Adding Functions
1154
1155The GDE uses a menu description language to
1156define what external programs it can call, and what
1157parameters and data to pass to each function.  This
1158language allows users to customize their own
1159environment to suite individual needs.
1160
1161The following is how the GDE handles external
1162programs when selected from a menu:
1163
1164       
1165                                 
1166
1167
1168Each step in this process is described in a file
1169.GDEmenus in the user's current or home directory.
1170
1171The language used in this file describes three phases
1172to an external function call.  The first phase
1173describes the menu item as it will appear, and the
1174Unix command line that is actually run when it is
1175selected.  The second phase describes how to prompt
1176for the parameters needed by the function.  The third
1177phase describes what data needs to be passed as
1178input to the external function, and what data (if any)
1179needs to be read back from its output.
1180
1181The form of the language is a simple keyword/value
1182list delimited by the colon (:) character.  The
1183language retains old values until new ones are set. 
1184For example, setting the menu name is done once for
1185all items in that menu, and is only reset when the
1186next menu is reached.
1187
1188The keywords for phase one are:
1189
1190menu:menu name                         
1191        Name of current menu
1192item:item name                         
1193        Name of current menu item
1194itemmeta:meta_key                       
1195        Meta key equivalence (quick keys)
1196itemhelp:help_file                     
1197        Help file (either full path, or in
1198                                       
1199        GDE_HELP_DIR)
1200itemmethod:Unix command
1201
1202The item method command is a bit more involved, it
1203is the Unix command that will actually run the
1204external program intended.  It is one line long, and
1205can be up to 256 characters in length.  It can have
1206embedded variable names (starting with a '$') that
1207will be replaced with appropriate values later on.  It
1208can consist of multiple Unix commands separated by
1209semi-colons (;), and may contain shell scripts and
1210background processes as well as simple command
1211names.  Examples will be given later.
1212
1213The keywords for phase two are:
1214
1215arg:argument_variable_name             
1216                Name of this variable.  It will
1217appear
1218                                       
1219                in the itemmethod: line with a
1220dollar
1221                                       
1222                sign ($) in front of it.
1223argtype:slider,chooser,choice_menu or text     
1224                The type of graphic object
1225                                       
1226                representing this argument.
1227
1228arglabel:descriptive label             
1229                A short description of what this
1230                                       
1231                argument represents
1232
1233argmin:minimum_value (integer)         
1234                Used for sliders.
1235
1236argmax:maximum_value (integer)         
1237                Used for sliders.
1238
1239argvalue:default_value (integer)               
1240                It is the numeric value associated
1241with
1242                                       
1243                sliders or the default choice in
1244                                       
1245                choosers, choice_menus, and
1246choice_lists
1247                                       
1248                (the first choice is 0, the second is
12491 etc.)
1250
1251argtext:default value                   
1252                Used for text fields.
1253       
1254argchoice:displayed value:passed value 
1255                Used for choosers and
1256                                       
1257                choice_menus.  The first value is
1258                                       
1259                displayed on screen, and the
1260second
1261                                       
1262                value is passed to the itemmethod
1263                                       
1264                line.
1265
1266The keywords for phase three are as follows:
1267
1268in:input_file                           
1269                GDE will replace this name with a
1270                                       
1271                randomly generated temporary file
1272                                       
1273                name. It will then write the
1274selected
1275                                       
1276                data out to this file.
1277
1278informat:file_format                   
1279                Write data to this file for input to
1280                                       
1281                this function.  Currently support
1282                                       
1283                values are Genbank, and flat.
1284inmask:                                 
1285                This data can be controlled by a
1286                                       
1287                selection mask.
1288
1289insave:                                 
1290                Do not remove this file after
1291running
1292                                       
1293                the external function.  This is
1294useful
1295                                       
1296                for functions put in the
1297background.
1298
1299out:output_file                         
1300                GDE will replace this name with a
1301                                       
1302                randomly generated temporary file
1303                                       
1304                name.  It is up to the external
1305function
1306                                       
1307                to fill this file with any results
1308that
1309                                       
1310                might be read back into the GDE.
1311
1312outformat:file_format                   
1313                The data in the output file will be
1314in
1315                                       
1316                this format.  Currently support
1317                                       
1318                values are colormask, Genbank,
1319and
1320                                       
1321                flat.
1322
1323outsave:                               
1324                Do not remove this file after
1325reading.
1326                                       
1327                This is useful for background
1328tasks.
1329
1330outoverwrite:                           
1331                Overwrite existing sequences in
1332the current
1333                                       
1334                GDE window.  Currently
1335supported with
1336                                       
1337                "gde" format only.
1338
1339
1340
1341Here is a sample dialog box, and it's entry in the
1342.GDEmenus file:
1343       
1344       
1345                             
1346
1347Using the default parameters given in the dialog
1348box, the executed Unix command line would be:
1349
1350(tr '[a-z]' '[A-Z]' < .gde_001 >.gde_001.tmp ; mv
1351.gde_001.tmp CAPS ; gde CAPS -Wx medium ; rm
1352.gde_001 ) &
1353
1354where .gde_001 is the name of the temporary file
1355generated by the GDE which contains the selected
1356sequences in flat file format.  Since the GDE runs
1357this command in the background ('&' at the end) it
1358is necessary to specify the insave: line, and to
1359remove all temporary files manually.  There is no
1360output file specific because the data is not loaded
1361back into the current GDE window, but rather a new
1362GDE window is opened on the file.  A simpler
1363command that reloads the data after conversion
1364might be:
1365
1366item:All caps
1367itemmethod:tr '[a-z]' '[A-Z]' <INPUT > OUTPUT
1368
1369in:INPUT
1370informat:flat
1371
1372out:OUTPUT
1373outformat:flat
1374
1375In this example, no arguments are specified, and so
1376no dialog box will appear.  The command is not run
1377in the background, so the GDE can clean up after
1378itself automatically.  The converted sequence is
1379automatically loaded back into the current GDE
1380window.
1381
1382In general, the easiest type of program to integrate
1383into the GDE is a program completely driven from a
1384Unix command line.  Interactive programs can be
1385tied in (MFOLD for example), however shell scripts
1386must be used to drive the parameter entry for these
1387programs.  Programs of the form:
1388
1389program_name -a1 argument1 -a2 arguement2 -f
1390inputfile -er errorfile > outputfile
1391
1392can be specified in the .GDEmenus file directly. As
1393this is the general form of most one Unix commands,
1394these tend to be simpler to implement under the
1395GDE.
1396
1397As functions grow in complexity, they may begin to
1398need a user interface of their own.  In these cases, the
1399command line calling arguments are still necessary
1400in order to allow the GDE to hand them the
1401appropriate data, and possible retrieve results after
1402some external manipulation.
1403
1404
1405.c.Appendix C, External functions
1406
1407ClustalV - Cluster multiple sequence alignment
1408
1409Author: Des Higgins.
1410
1411Reference:      Higgins,D.G. Bleasby,A.J. and
1412Fuchs,R. (1991) CLUSTAL V: improved software
1413                for multiple sequence alignment. 
1414ms. submitted to CABIOS
1415
1416Parameters:
1417                k-tuple pairwise search Word
1418size for pairwise comparisons
1419                Window size             Smaller
1420values give faster alignments,
1421                                        larger
1422values are more sensitive.
1423                Transitions weighted    Can
1424weight transitions twice as high as
1425                               
1426        transversions (DNA only).
1427                Fixed gap penalty       Gap
1428insertion penalty, lower value, more gaps
1429                Floating gap penalty    Gap
1430extension penalty, lower value, longer gaps
1431
1432               
1433
1434Comments:
1435                ClustalV is a directed multiple
1436sequence alignment algorithm that
1437                aligns a set of sequences based on
1438their level of similarity.  It first
1439                uses a Lipman Peasron pairwise
1440similarity scoring to find "clusters"
1441                of similar sequences, and pre-
1442aligns those sequences.  It then adds
1443                other sequences to the alignment
1444in the order of their similarity so as
1445                to produce the cleanest alignment.
1446
1447                Warning:  ClustalV only uses
1448unambiguous character codes.  It will also
1449                convert all sequences to upper case
1450in the process of aligning.  Clustal
1451                does not pass back comments,
1452author etc.  Be sure to keep copies of your
1453                sequences if you do not wish to
1454lose this information.
1455
1456
1457MFOLD - RNA secondary prediction
1458
1459Author: Michael Zuker
1460
1461Reference:      M. Zuker
1462                On Finding All Suboptimal
1463Foldings of an RNA Molecule.
1464                Science, 244, 48-52, (1989)
1465
1466                J. A. Jaeger, D. H. Turner and M.
1467Zuker
1468                Improved Predictions of
1469Secondary Structures for RNA.
1470                Proc. Natl. Acad. Sci. USA,
1471BIOCHEMISTRY, 86, 7706-7710, (1989)
1472
1473                J. A. Jaeger, D. H. Turner and M.
1474Zuker
1475                Predicting Optimal and
1476Suboptimal Secondary Structure for RNA.
1477                in "Molecular Evolution:
1478Computer Analysis of Protein and
1479                Nucleic Acid Sequences", R. F.
1480Doolittle ed.
1481                Methods in Enzymology, 183,
1482281-306 (1989)
1483
1484Parameters:
1485                Linear/circular RNA fold
1486                ct File to save results
1487
1488Comments:
1489                MFOLD passes it's output to a
1490program Zuk_to_gen that translates the secondary
1491                structure prediction to a nested
1492bracket ([]) notation.  This notation can then be
1493used
1494                in the Highlight Helix, and Draw
1495Secondary structure (LoopTool) functions.
1496
1497                MFOLD currently does not
1498support much in the way of additional parameters.
1499                We hope to have all additional
1500parameters available soon.
1501
1502
1503Blast - Basic Local Alignment Search Tool
1504
1505Reference:
1506                Karlin, Samuel and Stephen F.
1507Altschul (1990).  Methods  for
1508                assessing the statistical
1509significance of molecular sequence
1510                features by using general scoring
1511schemes, Proc. Natl. Acad.
1512                Sci. USA 87:2264-2268.
1513
1514                Altschul, Stephen F., Warren Gish,
1515Webb  Miller,  Eugene  W.
1516                Myers,  and  David  J. Lipman
1517(1990).  Basic local alignment
1518                search tool, J. Mol. Biol. 
1519215:403-410.
1520
1521                Altschul,  Stephen  F.  (1991).   
1522Amino  acid   substitution
1523                matrices  from an information
1524theoretic perspective. J. Mol.
1525                Biol.  219:555-565.
1526
1527
1528
1529Parameters:
1530                Which Database          Which
1531nucleic or amino acid database
1532                                        to
1533search.
1534
1535                Word Size               Length
1536of initial hit. after locating a match of
1537                                        this
1538length, alignment extension is attempted.
1539        Blastn
1540                Match score             Score
1541for matches in secondary alignment extension
1542                Mismatch score          Score
1543for mismatches in secondary alignment extension
1544
1545        Blastx, tblastn, blastp,  blast3
1546                Substitution Matrix
1547        PAM120 or PAM250
1548               
1549
1550        Comments:       The report is loaded into
1551a text editor.  This should be saved as a new file
1552                        as the default file is
1553removed after execution.  The latest version of blast
1554can
1555                        be obtained via
1556anonymous ftp to ncbi.nlm.nih.gov.
1557
1558
1559
1560
1561FastA - Similarity search
1562
1563        Reference:
1564                W.  R.  Pearson  and D. J. Lipman
1565(1988),
1566                "Improved Tools for Biological
1567Sequence Analysis", PNAS  85:2444-2448
1568
1569                W.  R.   Pearson (1990) "Rapid
1570and Sensitive Sequence
1571                Comparison with FASTP and
1572FASTA" Methods  in  Enzymology  183:63-98
1573
1574        Parameters:
1575                Database               
1576        Which database to search
1577                Number of alignments to report
1578                SMATRIX         
1579        Which similarity matrix to use
1580               
1581
1582        Comments:
1583                The FastA package includes
1584several additional programs for pairwise alignment.
1585                We have only included a bare
1586bones link to FastA.  We hope to include a more
1587                complete setup for the actual 2.2
1588release.
1589
1590
1591
1592
1593Assemble Contigs - CAP Contig Assembly Program
1594
1595        Author - Xiaoqiu Huang
1596                Department of Computer Science
1597                Michigan Technological
1598University
1599                Houghton, MI 49931
1600                E-mail: huang@cs.mtu.edu
1601
1602                Minor modifications for I/O by S.
1603Smith
1604
1605        Reference -
1606                "A Contig Assembly Program
1607Based on Sensitive Detection of
1608                Fragment Overlaps" (submitted to
1609Genomics, 1991)
1610
1611        Parameters:
1612                Minimum overlap
1613        Number of bases required for overlap
1614                Percent match within overlap
1615        Percentage match required in the overlap
1616                                       
1617        region before merge is alowwed.
1618
1619        Comments:
1620
1621                CAP returns the aligned sequences
1622to the current editor window.  The sequences are
1623                placed into contigs by setting the
1624groupid.  Cap does not change the order of the
1625                sequences, and so the results
1626should be sorted by group and offset (see sort under
1627the
1628                Edit menu).
1629
1630
1631Lsadt - Least squares additive tree analysis
1632
1633Author: Geert De Soete, 'C' implementation by Mike
1634Maciukenas University of Illinois
1635
1636Reference:LSADT, 1983 Psychometrika, 1984
1637Quality and Quantity
1638
1639Parameters:
1640                Distance correction to use in
1641distance matrix calculations (see count below).
1642                What should be used for initial
1643parameters estimates
1644                Random number seed
1645                Display method (See TreeTool
1646below)
1647
1648Comments:
1649                The program has been rewritten in
1650'C' and will be included with the rRNA Database
1651                phylogenetic package being
1652written at the University of Illinois Department  of
1653                Microbiology.
1654
1655                Count is a  short program to
1656calculate a distance matrix from a sequence
1657                alignment (see below).
1658
1659
1660
1661Count - Distance matrix calculator
1662
1663Author: Steven Smith
1664
1665Parameters:
1666                Correction method       
1667        Currently Jukes-Cantor or none
1668                Include dashed columns
1669                Match upper case to lower
1670
1671
1672Comments:
1673                Passes back a distance matrix in a
1674format readable by LSADT.
1675
1676
1677
1678
1679Treetool - Tree drawing/manipulation
1680
1681Author: Michael Maciukenas, University of Illinois
1682
1683Comments:
1684                See included documentation for
1685TreeTool usage.
1686
1687
1688
1689Readseq - format conversion program
1690
1691Author:         Don Gilbert
1692
1693Parameters:     Many, but can easily be run in
1694interactive mdoe.
1695
1696Comments:
1697                Readseq is  a very useful program
1698for format conversion. The latest versionsupports
1699over a
1700                dozen different file formats, as
1701well as formating capabilities for publication.  GDE
1702makes
1703                of Readseq for importing and
1704exporting seqeuences as well as a filtering tool to
1705some
1706                external functions.
1707
1708
1709
1710       
1711Lsadt - Least squares additive tree analysis
1712
1713Author: Geert De Soete, 'C' implementation by Mike
1714Maciukenas University of Illinois
1715
1716Reference:LSADT, 1983 Psychometrika, 1984
1717Quality and Quantity
1718
1719Parameters:
1720                Distance correction to use in
1721distance matrix calculations (see count below).
1722                What should be used for initial
1723parameters estimates
1724                Random number seed
1725                Display method (See TreeTool
1726below)
1727
1728Comments:
1729                The program has been rewritten in
1730'C' and will be included with the rRNA Database
1731                phylogenetic package being
1732written at the University of Illinois Department  of
1733                Microbiology.
1734
1735                Count is a  short program to
1736calculate a distance matrix from a sequence
1737                alignment (see below).
1738
1739
1740
1741Count - Distance matrix calculator
1742
1743Author: Steven Smith
1744
1745Parameters:
1746                Correction method       
1747        Currently Jukes-Cantor or none
1748                Include dashed columns
1749                Match upper case to lower
1750
1751
1752Comments:
1753                Passes back a distance matrix in a
1754format readable by LSADT.
1755
1756
1757
1758Copyright Notice
1759
1760The Genetic Data Environment (GDE) software and
1761documentation are not in the public domain. 
1762Portions of this code are owned and copyrighted by
1763the The Board of Trustees of the University of
1764Illinois and by Steven Smith. External functions
1765used by GDE are the proporty of, their respective
1766authors. This release of the GDE program and
1767documentation may not be sold, or incorporated into
1768a commercial product, in whole or in part without
1769the expressed written consent of the University of
1770Illinois and of its author, Steven Smith.
1771
1772All interested parties may redistribute the GDE as
1773long as all copies are accompanied by this
1774documentation,  and all copyright notices remain
1775intact.  Parties interested in redistribution must do
1776so on a non-profit basis, charging only for cost of
1777media.  Modifications to the GDE core editor should
1778be forwarded to the author Steven Smith.  External
1779programs used by the GDE are copyright by, and are
1780the property of their respective authors unless
1781otherwise stated.
1782
1783
1784While all attempts have been made to insure the
1785integrity of these programs:
1786
1787Disclaimer
1788
1789THE UNIVERSITY OF ILLINOIS, HARVARD
1790UNIVERSITY AND THE AUTHOR, STEVEN
1791SMITH GIVE NO WARRANTIES, EXPRESSED
1792OR IMPLIED FOR THE SOFTWARE AND
1793DOCUMENTATION PROVIDED, INCLUDING,
1794BUT NOT LIMITED TO WARRANTY OF
1795MERCHANTABILITY AND WARRANTY OF
1796FITNESS FOR A PARTICULAR PURPOSE. 
1797User understands the software is a research tool for
1798which no warranties as to capabilities or accuracy are
1799made, and user accepts the software "as is."  User
1800assumes the entire risk as to the results and
1801performance of the software and documentation.  The
1802above parties cannot be held liable for any direct,
1803indirect, consequential or incidental damages with
1804respect to any claim by user or any third party on
1805account of, or arising from the use of software and
1806associated materials.  This disclaimer covers both the
1807GDE core editor and all external programs used by
1808the GDE.
1809
1810  Required field
1811
1812
1813
Note: See TracBrowser for help on using the repository browser.