source: branches/lib/HELP_SOURCE/source/aci.hlp

Last change on this file was 19575, checked in by westram, 3 weeks ago
  • reintegrates 'help' into 'trunk'
    • preformatted text gets checked for width now (to enforce it fits into the arb help window).
    • fixed help following these checks, using the following steps:
      • ignore problems in foreign documentation.
      • increase default help window width.
      • introduce control comments to
        • accept oversized preformatted sections.
        • enforce preformatted style for whole sections.
        • simply define single-line preformatted sections
          Used intensive for definition of internal script languages.
    • fixed several non-related problems found in documentation.
    • minor layout changes for HTML version of arb help (more compacted; highlight anchored/all sections).
    • refactor system interface (GUI version) and use it from help module.
  • adds: log:branches/help@19532:19574
  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 30.6 KB
Line 
1#       main topics:
2UP      arb.hlp
3UP      glossary.hlp
4
5#       sub topics:
6SUB     exec_bug.hlp
7
8# format described in ../help.readme
9
10
11
12TITLE           ARB Command Interpreter (ACI)
13
14OCCURRENCE      NDS
15                [ export db ]
16                [ ARB_NT/Species/search/parse_fields ]
17
18DESCRIPTION     ACI is a simple command interpreter, which uses streams of data as central concept.
19
20                Many ACI commands have parameters which are specified behind
21                the command in parenthesis.
22
23                All ACI commands
24                 * take the data from (one or multiple) input streams,
25                 * modify that data and
26                 * write that data to (one or multiple) output streams.
27
28                     e.g. the command 'count("a")' counts every 'a' for each input stream and
29                     generates one output stream (containing the char count) for every input stream.
30
31                The first input stream always is a single stream,
32                often the value of a database field (e.g. when ACI is used in LINK{props_nds.hlp}).
33
34                The number of output streams depends on the used command:
35                 * most commands produce one output stream for each input stream (as the count-example above)
36                 * some commands combine two input streams into one output stream (e.g. see binary operators below)
37                 * some commands ignore all input streams and create one output streams (e.g. 'readdb(fieldname)')
38                 * Note: special stream related commands are documented in section 'STREAM HANDLING'
39
40                Multiple commands can be separated by two operator symbols: ';' and '|'.
41                  * ';' binds stronger than '|'
42                  * commands separated by ';' form a command-list and operate independently from each other:
43                    - all(!) commands use all(!) input streams
44                    - each command generates its own output streams
45                  * the '|' operator acts as processing sequence point, i.e.
46                    - all output streams generated by the command-list on the left side of the '|' will be passed
47                    - as input streams to the command-list on the right side of the '|'.
48
49                Finally (at the end of the overall ACI expression) all generated output streams get concatenated.
50
51                Typical uses are to
52                  * show text at the tips of the tree (LINK{props_nds.hlp})
53                  * write information into database fields (LINK{mod_field_list.hlp})
54
55                Instead of using ACI commands (as described in this document) you may always use any of the other
56                integrated data processing languages. Simply prefix the command
57
58                  * with a ':' to use LINK{srt.hlp}
59                  * with a '/' to use LINK{reg.hlp}
60
61                Both are as well available inside ACI via the commands 'srt' and 'command', see below.
62
63SECTION Examples
64
65# PREFORMATTED 1
66                        count("A");count("AG")
67
68                                creates two streams:
69
70                                        1. how many A's
71                                        2. and how many A's and G's
72
73# PREFORMATTED 1
74                        count("A");count("G")|per_cent
75
76                                per_cent is a command that divides two numbers
77                                (number of 'A's / number of 'G's) and returns the result
78                                as percent.
79
80SECTION Simple example to illustrate the data flow
81
82# PREFORMATTED 1
83        count("A");count("G")|"a/g = "; per_cent
84
85        input                                                         concatenate output
86        "AGG" ----> count("A") -->| -----> "a/g = " --> | --> "a/g = " ---> 'a/g = 50'
87              \                   | \ /                 |               /
88               \                  |  \                  |              /
89                \                 | / \                 |             /
90                 -> count("G") -->| -----> per_cent --> | --> "50" ---
91
92
93SECTION PARAMETERS
94
95        Several commands expect or accept additional parameters in
96        parenthesis (e.g. 'remove(aA)').
97
98        Multiple parameters have to be separated by ',' or ';'.
99
100        There are two distinct ways to specify such a parameter:
101        - unquoted
102
103          Unquoted parameters are taken as specified, despite the following exceptions:
104           - any character in ',;"|\)' needs to be escaped by prefixing one '\'.
105           - spaces will get removed if not prefixed by '\'.
106
107        - quoted
108
109          Quoted parameters begin and end with a '"'. You can use any character,
110          but you need to escape '\' and '"' by preceeding a '\'.
111
112          Examples:
113
114          remove("\"")                will remove all double quotes from input.
115          remove("\\")                will remove all backslashes from input.
116
117        [@@@ behavior currently not strictly implemented]
118
119SECTION COMMANDLIST
120
121        If not explicitely mentioned, every command
122        creates one output stream for each input stream.
123
124        STREAM HANDLING
125
126                echo(x1;x2;x3...)       creates one output stream from each specified parameter
127                                        (parameters are separated by ';').
128
129                "text"                  same as 'echo("text")'
130
131                dd                      copies all input streams to output streams
132
133                cut(N1,N2,N3)           copies the Nth input stream(s)
134
135                drop(N1,N2)             copies all but the Nth input stream(s)
136
137                dropempty               drops all empty input streams
138
139                dropzero                drops all non-numeric or zero input streams
140
141                swap(N1,N2)             swaps two input streams
142                                        (w/o parameters: swaps last two streams)
143
144                toback(X)               moves the Xth input stream
145                                        to the end of output streams
146
147                tofront(X)              moves the Xth input stream
148                                        to the start of output streams
149
150                merge([sep])            merges all input streams into one output stream.
151                                        If 'sep' is specified, it's inserted between them.
152                                        If no input streams are given, it returns 1 empty
153                                        output stream.
154
155                split([sep[,mode]])     splits all input streams at separator string 'sep'
156                                        (default: split at linefeed).
157
158                                        Modes:
159
160                                        0               remove found separators (default)
161                                        1               split before separator
162                                        2               split after separator
163
164                colsplit([width])       splits each input stream into multiple streams of the
165                                        specified width (or shorter for last output stream).
166                                        The default width is 1.
167
168                streams                 returns the number of input streams
169
170        STRING
171
172                head(n)                 the first n characters
173                left(n)                 the first n characters
174
175                tail(n)                 the last n characters
176                right(n)                the last n characters
177
178                                        the above functions return an empty string for n<=0
179
180                len                     the length of the input
181
182                len("chr")              the length of the input excluding the
183                                        characters in 'chr'
184
185                mid(x,y)                the substring string from position x to y
186
187                    Allowed positions are
188                    - [1..N] for mid()
189                    - [0..N-1] for mid0()
190
191                    A position below that range is relative to the end of the string,
192                    i.e. mid(-2,0) and mid0(-3,-1) are equiv to tail(3)
193
194                crop("str")             removes characters of 'str' from
195                                        both ends of the input
196
197                remove("str")           removes all characters of 'str'
198                                        e.g. remove(" ") removes all blanks
199
200                keep("str")             keep is the opposite of remove:
201                                        remove all chars that are not a member of 'str'
202
203                isEmpty                 return '1' for each empty input stream, '0' for others
204
205                srt("orig=dest",...)    replace command, invokes SRT
206                                        (see LINK{srt.hlp})
207
208# PREFORMATTED 1
209                translate("old","new"[,"other"])
210
211                        translates all characters from input that occur in the
212                        first argument ("old") by the corresponding character of the
213                        second argument ("new").
214
215                        An optional third argument (one character only) means:
216                        replace all other characters with the third argument.
217
218                        Example:
219
220                                Input:                        "--AabBCcxXy--"
221                                translate("abc-","xyz-")      "--AxyBCzxXy--"
222                                translate("abc-","xyz-",".")  "--.xy..z...--"
223
224                        This can be used to replace illegal characters from sequence date
225                        (see predefined expressions in 'Modify fields of listed species').
226
227
228                tab(n)                  append n-len(input) spaces
229
230                pretab(n)               prepend n-len(input) spaces
231
232                upper                   converts string to upper case
233                lower                   converts string to lower case
234                caps                    capitalizes string
235
236# PREFORMATTED 1
237                format(options)
238
239                    takes a long string and breaks it into several lines
240
241                        option       (default)     description
242                        ==========================================================
243                        width=#      (50)          line width
244                        firsttab=#   (10)          first line left indent
245                        tab=#        (10)          left indent (not first line)
246                        "nl=chrs"    (" ")         list of characters that specify
247                                                   a possibly point of a line break;
248                                                   the line break characters get removed!
249                        "forcenl=chrs" ("\n")      Force a newline at these characters.
250
251                    (see also format_sequence below)
252
253# PREFORMATTED 1
254                extract_words("chars",val)
255
256                    Search for all words (separated by ',' ';' ':' ' ' or 'tab') that
257                    contain more characters of type chars than val, sort them
258                    alphabetically and write them separated by ' ' to the output
259
260        ESCAPING AND QUOTING
261
262                escape         escapes all occurrences of '\' and '"' by preceeding a '\'
263                quote          quotes the input in '"'
264
265                unescape       inverse of escape
266                unquote        removes quotes (if present). otherwise return input
267
268
269        STRING COMPARISON
270
271                compare(a,b)            return -1 if a<b, 0 if a=b, 1 if a>b
272                equals(a,b)             return 1 if a=b, 0 otherwise
273                contains(a,b)           if a contains b, this returns the position of
274                                        b inside a (1..N) and 0 otherwise.
275                                        Always returns 0 if b is empty.
276                partof(a,b)             if a is part of b, this returns the position of
277                                        a inside b (1..N) and 0 otherwise.
278
279                For each of these functions a case-insensitive alternative
280                exists (icompare, iequals, ...).
281
282                Note: all these functions may be used as binary operators
283                (see section 'BOOLEAN OPERATORS' below for concept).
284
285
286        NUMERIC COMPARISON
287
288                All functions here operate with floating-point numbers.
289
290                isBelow(a,b)            return 1 if a<b, 0 otherwise
291                isAbove(a,b)            return 1 if a>b, 0 otherwise
292                isEqual(a,b)            return 1 if a=b, 0 otherwise
293
294                Note: all functions above may be used as binary operators
295                (see section 'BOOLEAN OPERATORS' below for concept).
296
297# PREFORMATTED 1
298                inRange(low,high)
299
300                    For the values of all input streams, this returns
301                    * 1 if low <= value <= high,
302                    * 0 otherwise.
303
304        CALCULATOR
305
306                plus                    add arguments
307                minus                   subtract arguments
308                mult                    multiply arguments
309                div                     divide arguments
310                per_cent                divide arguments * 100
311                                        (not rounded; use "fper_cent|round(0)")
312                rest                    divide arguments, take rest
313
314                The above functions perform calculation with integer numbers.
315
316                For most of these functions there also exists a floating-point variant:
317                    * fplus
318                    * fminus
319                    * fmult
320                    * fdiv
321                    * fper_cent
322
323                To avoid 'division by zero'-errors, the operators 'div', 'per_cent' and 'rest'
324                (and their f-variants) return 0, if the second argument is zero.
325
326                Note: all functions above may be used as binary operators
327                (see section 'BOOLEAN OPERATORS' below for concept).
328
329# PREFORMATTED 1
330                round(digits)
331
332                        rounds a floating-point input to the specified amount of digits
333                        behind the floating-point.
334                        Specify zero to round to an integer number.
335                        Specify negative digits to round to multiples of 10, 100, 1000, ...
336
337
338        BOOLEAN OPERATORS
339
340                All input streams are converted to boolean
341                values (i.e. 0 or 1) as follows:
342
343                    "0"         -> 0
344                    any number  -> 1
345                    any text    -> 0 (even empty text!)
346
347                Operators:
348
349                    Not     invert values of all input streams (0<->1)
350                    And     return 1 if all input streams are 1, 0 otherwise
351                    Or      return 1 if one input streams is  1, 0 otherwise
352
353                    Use "|or|not" or "|and|not" to execute NOR or NAND.
354
355        BINARY OPERATORS
356
357               Several operators work as so called 'binary operators'.
358               These operators may be used in various ways, which are
359               shown using the operator 'plus':
360
361                     ACI                OUTPUT                  STREAMS
362                     plus(a,b)          a+b                     input:0 output:1
363                     a;b|plus           a+b                     input:2 output:1
364                     a;b;c;d|plus       a+b;c+d                 input:4 output:2
365                     a;b;c|plus(x)      a+x;b+x;c+x             input:3 output:3
366
367               That means, if the binary operator
368
369                    - has no arguments, it expects an even number of input streams. The operator is
370                      applied to the first 2 streams, then to the second 2 stream and so on.
371                      The number of output streams is half the number of input streams.
372                    - has 1 argument, it accepts one to many input streams. The operator
373                      is applied to each input stream together with the argument.
374                      For each input stream one output stream is generated.
375                    - has 2 arguments, it is applied to these. The arguments are interpreted as
376                      ACI commands and are applied for each input stream. The results of
377                      the commands are passed as arguments to the binary operator. For each input
378                      stream one output stream is generated.
379
380        CONDITIONAL
381
382                select(a,b,c,...)       each input stream is converted into a number
383                                        (non-numeric text converts to zero). That number is
384                                        used to select one of the given arguments:
385                                             0 selects 'a',
386                                             1 selects 'b',
387                                             2 selects 'c' and so on.
388                                        The selected argument is interpreted as ACI command
389                                        and is applied to an empty input stream.
390
391        DEBUGGING
392
393                trace(onoff)            toggle tracing of ACI actions to standard output.
394                                        Parameter: 0 or 1 (switch off or on)
395
396                                        All streams are copied (like 'dd').
397
398                                        Example:
399# PREFORMATTED 1
400                                          cmd1 | cmd2 | trace(1) | tracedCmd1 | tracedCmd2 | trace(0) | untracedCmd
401
402                                        To see the output from trace, either
403                                          * start arb from a terminal or
404                                          * use LINK{console.hlp}
405
406
407        DATABASE AND SEQUENCE
408
409                readdb(field_name)      the contents of the field 'field_name'
410
411                sequence                the sequence in the current alignment.
412
413                                        Note: older ARB versions returned 'no sequence'
414                                        if the current alignment contained no sequence.
415                                        Now it returns an empty string.
416
417                                        For genes it returns only the corresponding part
418                                        of the sequence. If the field complement = 1 then the
419                                        result is the reverse-complement.
420
421                sequence_type           the sequence type of the selected alignment:
422                                        'rna', 'dna' or 'ami'
423                ali_name                the name of the selected alignment (e.g. 'ali_16s')
424
425                Note: Because they ignore all input streams, the commands above make more
426                sense at the beginning of an ACI expression (or subexpression).
427
428                checksum(options)       calculates a CRC checksum
429                                        options:
430                                        "exclude=chrs"    remove 'chrs' before calculation
431                                        "toupper"         make everything uppercase first
432
433                gcgchecksum             a gcg compatible checksum
434
435# PREFORMATTED 1
436                format_sequence(options)
437
438                        takes a long string (e.g. sequence) and breaks it into several lines.
439
440                        option       (default)  description
441                        =============================================================
442                        width=#      (50)       sequence line width
443                        firsttab=#   (10)       first line left indent
444                        tab=#        (10)       left indent (not first line)
445                        numleft      (NO)       numbers on the left side
446                        numright=#   (NO)       numbers on the right side (#=width)
447                        gap=#        (10)       insert a gap every # seq. characters.
448
449                        (see also 'format' above)
450
451# PREFORMATTED 1
452                extract_sequence("chars",rel_len)
453
454                        like extract_words, but do not sort words, but rel_len is the minimum
455                        percentage of characters of a word that mach a character in 'chars'
456                        before word is taken. All words will be separated by white space.
457
458# PREFORMATTED 1
459                taxonomy([treename,] depth)
460
461                        Returns the taxonomy of the current species or group as defined by a tree.
462
463                        If 'treename' is specified, its used as tree, otherwise the 'default tree'
464                        is used (which in most cases is the tree displayed in the ARB_NT main window).
465
466                        'depth' specifies how many "levels" of the taxonomy are used.
467
468        FILTERING
469
470                There are several functions to filter sequential data:
471
472                      - filter
473                      - diff
474                      - change
475
476                All these functions use the following COMMON OPTIONS to define
477                what is used as filter sequence:
478
479                    - species=name
480
481                      Use species 'name' as filter.
482
483                    - SAI=name
484
485                      Use SAI 'name' as filter.
486
487                    - first=1
488
489                      Use 1st input stream as filter for all other input streams.
490
491                    - pairwise=1
492
493                      Use 1st input stream as filter for 2nd stream,
494                      3rd stream as filter for 4th stream, and so on.
495
496                    - align=ali_name
497
498                      Use alignment 'ali_name' instead of current default
499                      alignment (only meaningful together with 'species' or 'SAI').
500
501                    Note: Only one of the parameters 'species', 'SAI', 'first' or 'pairwise' may be used.
502
503# PREFORMATTED 1
504                diff(options)
505
506                        Calculates the difference between the filter (see common options above) and the input stream(s) and
507                        write the result to output stream(s).
508
509                        Additional options:
510
511                        - equal=x
512
513                          Character written to output if filter and stream are equal at
514                          a position (defaults to '.'). To copy the stream contents for
515                          equal columns, specify 'equal=' (directly followed by ',' or ')')
516
517                        - differ=y
518
519                          Character written to output if filter and stream don't match at one column position.
520                          Default is to copy the character from the stream.
521
522# PREFORMATTED 1
523                filter(options)
524
525                        Filters only specified columns out of the input stream(s). You need to
526                        specify either
527
528                        - exclude=xyz
529
530                          to use all columns, where the filter (see common options above) has none
531                          of the characters 'xyz'
532
533                        or
534
535                        - include=xyz
536
537                          to use only columns, where the filter has one of the characters 'xyz'
538
539                        All used columns are concatenated and written to the output stream(s).
540
541
542# PREFORMATTED 1
543                change(options)
544
545                        Randomly modifies the content of columns selected
546                        by the filter (see common options above).
547                        Only columns containing letters will be modified.
548
549                        The options 'include=xyz' and 'exclude=xyz' work like
550                        with 'filter()', but here they select the columns to modify - all other
551                        columns get copied unmodified.
552
553                        How the selected columns are modified, is specified by the following
554                        parameters:
555
556                                - change=percent
557
558                                  percentage of changed columns (default: silently change nothing, to make
559                                  it more difficult for you to ignore this helpfile)
560
561                                - to=xy
562
563                                  randomly change to one of the characters 'xy'.
564
565                                  Hints:
566
567                                        - Use 'xyy' to produce 33% 'x' and 66% 'y'
568                                        - Use 'xxxxxxxxxy' to produce 90% 'x' and 10% 'y'
569                                        - Use 'x' to replace all matching columns by 'x'
570
571                        I think the intention for this (long undocumented) command is to easily generate
572                        artificial sequences with different GC-content, in order to test treeing-software.
573
574        SPECIALS
575
576# PREFORMATTED 1
577                exec(command[,param1,param2,...])
578
579                    Execute external (unix) command.
580
581                    Given params will be single-quoted and passed to the command.
582
583                    All input streams will be concatenated and piped into the command.
584
585                    When the command itself is a pipe, put it in parenthesis (e.g. "(sort|uniq)").
586                    Note: This won't work together with params.
587
588                    The result is the output of the command.
589
590                    WARNING!!!
591
592                        You better not use this command for NDS,
593                        because any slow command will disable all editing -> You never
594                        can remove this command from the NDS. Even arb_panic will not
595                        easily help you.
596
597# PREFORMATTED 1
598                command(action)
599
600                        applies 'action' to all input streams using
601
602                                 - ACI,
603                                 - SRT (if starts with ':') (see LINK{srt.hlp})
604                                 - or as REG (if starts with '/') (see LINK{reg.hlp}).
605
606                        If you nest calls (i.e. if 'action' contains further calls to 'command') you have to apply
607                        escaping multiple times (e.g. inside an export filter - which is in fact an
608                        SRT expression - you'll have to use double escapes).
609
610# PREFORMATTED 1
611                eval(exprEvalToAction)
612
613                        the 'exprEvalToAction' is evaluated (using an empty string as input)
614                        and the result is interpreted as action and gets applied to all
615                        input streams (as in 'command' above).
616
617                        Example: Said you have two numeric positions stored in database
618                        fields 'pos1' and 'pos2' for each species. Then the following
619                        command extracts the sequence data from pos1 to pos2:
620
621# PREFORMATTED 1
622                             sequence|eval(" \"mid(\";readdb(pos1);\";\";readdb(pos2);\")\" ")
623
624                        How the example works:
625
626                            The argument is the escaped version of the
627                            command
628# PREFORMATTED 1
629                                "mid(" ; readdb(pos1) ; ";" ; readdb(pos2) ; ")"
630
631                            If pos1 contains '10' and pos2 contains '20' that command will
632                            evaluate to 'mid(10;20)'.
633
634                            For these positions the executed ACI behaves like 'sequence|mid(10;20)'.
635
636# PREFORMATTED 1
637                define(name,escapedCommand)
638
639                        defines a ACI-macro 'name'. 'escapedCommand' contains an escaped
640                        ACI command sequence. This command sequence can be executed with
641                        do(name).
642
643# PREFORMATTED 1
644                do(name)
645
646                        applies a previously defined ACI-macro to all input streams (see 'define').
647
648                        'define(a,action)' followed by 'do(a)' works similar to 'command(action)'.
649
650                        See embl.eft for an example using define and 'do'
651
652# PREFORMATTED 1
653                findspec(action)
654
655                        Each input stream is interpreted as species 'name' (ID) and a species
656                        with that 'name' is searched (aborts with error if species could not be found;
657                        silently ignores empty streams).
658
659                        Otherwise 'action' is applied (to one empty stream).
660                        Instead of the current item, all database commands inside 'action' use the found species.
661
662# PREFORMATTED 1
663                findacc(action)
664
665                        like findspec, but search for 'acc' instead of 'name'.
666
667# PREFORMATTED 1
668                findgene(action)
669
670                        like findspec, but searches for genes (starting at organism or
671                        at other gene of same organism).
672
673# PREFORMATTED WIDTH DEFAULT
674                origin_organism(action)
675                origin_gene(action)
676# PREFORMATTED RESET
677
678                        like command() but readdb() etc. reads all data from the
679                        origin organism/gene of a gene-species (not from the gene-species itself).
680
681                        This function applies only to gene-species!
682
683SECTION         Future features
684
685# PREFORMATTED 1
686                statistic
687
688                        creates a character statistic of the sequence
689                        (not implemented yet)
690
691EXAMPLES
692
693                Some random ACI expression examples:
694
695# PREFORMATTED 1
696                sequence|format_sequence(firsttab=0;tab=10)|"SEQUENCE_";dd
697
698                                fetches the default sequence, formats it,
699                                and prepends 'SEQUENCE_'.
700
701# PREFORMATTED 1
702                sequence|remove(".-")|format_sequence
703
704                                get the default sequence, remove all '.-' and
705                                format it
706
707# PREFORMATTED 1
708                sequence|remove(".-")|len
709
710                                the number of non '.-' symbols (sequence length )
711
712# PREFORMATTED 1
713                "[";taxonomy(tree_other,3);" -> ";taxonomy(3);"]"
714
715                                shows for each species how their taxonomy
716                                changed between "tree_other" and current tree
717
718# PREFORMATTED 1
719                equals(readdb(tmp),readdb(acc))|select(echo("tmp and acc differ"),)
720
721                                returns 'tmp and acc differ' if the content of
722                                the database fields 'tmp' and 'acc' differs. empty result
723                                otherwise.
724
725# PREFORMATTED 1
726                readdb(full_name)|icontains(bacillus)|compare(0)|select(echo(..),readdb(full_name))
727
728                                returns the content of the 'full_name' database entry if it contains
729                                the substring 'bacillus'. Otherwise returns '..'
730
731
732BUGS            The output of taxonomy() is not always instantly refreshed.
Note: See TracBrowser for help on using the repository browser.