source: branches/tree/HELP_SOURCE/source/aci.hlp

Last change on this file was 18769, checked in by westram, 3 years ago
  • move all helpfiles to new source location
  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 29.7 KB
Line 
1#Please insert up references in the next lines (line starts with keyword UP)
2UP      arb.hlp
3UP      glossary.hlp
4
5#Please insert subtopic references  (line starts with keyword SUB)
6SUB     exec_bug.hlp
7
8# Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain}
9
10#************* Title of helpfile !! and start of real helpfile ********
11
12TITLE           ARB Command Interpreter (ACI)
13
14OCCURRENCE      NDS
15                [ export db ]
16                [ ARB_NT/Species/search/parse_fields ]
17
18DESCRIPTION     ACI is a simple command interpreter, which uses streams of data as central concept.
19
20                Many ACI commands have parameters which are specified behind
21                the command in parenthesis.
22
23                All ACI commands
24                 * take the data from (one or multiple) input streams,
25                 * modify that data and
26                 * write that data to (one or multiple) output streams.
27
28                     e.g. the command 'count("a")' counts every 'a' for each input stream and
29                     generates one output stream (containing the char count) for every input stream.
30
31                The first input stream always is a single stream,
32                often the value of a database field (e.g. when ACI is used in LINK{props_nds.hlp}).
33
34                The number of output streams depends on the used command:
35                 * most commands produce one output stream for each input stream (as the count-example above)
36                 * some commands combine two input streams into one output stream (e.g. see binary operators below)
37                 * some commands ignore all input streams and create one output streams (e.g. 'readdb(fieldname)')
38                 * Note: special stream related commands are documented in section 'STREAM HANDLING'
39
40                Multiple commands can be separated by two operator symbols: ';' and '|'.
41                  * ';' binds stronger than '|'
42                  * commands separated by ';' form a command-list and operate independently from each other:
43                    - all(!) commands use all(!) input streams
44                    - each command generates its own output streams
45                  * the '|' operator acts as processing sequence point, i.e.
46                    - all output streams generated by the command-list on the left side of the '|' will be passed
47                    - as input streams to the command-list on the right side of the '|'.
48
49                Finally (at the end of the overall ACI expression) all generated output streams get concatenated.
50
51                Typical uses are to
52                  * show text at the tips of the tree (LINK{props_nds.hlp})
53                  * write information into database fields (LINK{mod_field_list.hlp})
54
55                Instead of using ACI commands (as described in this document) you may always use any of the other
56                integrated data processing languages. Simply prefix the command
57
58                  * with a ':' to use LINK{srt.hlp}
59                  * with a '/' to use LINK{reg.hlp}
60
61                Both are as well available inside ACI via the commands 'srt' and 'command', see below.
62
63SECTION Examples
64
65                        count("A");count("AG")
66
67                                creates two streams:
68
69                                        1. how many A's
70                                        2. and how many A's and G's
71
72                        count("A");count("G")|per_cent
73
74                                per_cent is a command that divides two numbers
75                                (number of 'A's / number of 'G's) and returns the result
76                                as percent.
77
78SECTION Example data flow
79
80        eg: count("A");count("G")|"a/g = "; per_cent
81
82        input                                                         concatenate output
83        "AGG" ----> count("A") -->| -----> "a/g = " --> | --> "a/g = " ---> 'a/g = 50'
84              \                   | \ /                 |               /
85               \                  |  \                  |              /
86                \                 | / \                 |             /
87                 -> count("G") -->| -----> per_cent --> | --> "50" ---
88
89
90SECTION PARAMETERS
91
92        Several commands expect or accept additional parameters in
93        parenthesis (e.g. 'remove(aA)').
94
95        Multiple parameters have to be separated by ',' or ';'.
96
97        There are two distinct ways to specify such a parameter:
98        - unquoted
99
100          Unquoted parameters are taken as specified, despite the following exceptions:
101           - ',;"|\)' need to be escaped by prefixing one '\'
102           - spaces will get removed if unprefixed by '\'
103
104        - quoted
105
106          Quoted parameters begin and end with a '"'. You can use any character,
107          but you need to escape '\' and '"' by preceeding a '\'.
108
109          Example: 'remove("\"")' will remove all double quotes from input.
110                   'remove("\\")' will remove all backslashes from input.
111
112        [@@@ behavior currently not strictly implemented]
113
114SECTION COMMANDLIST
115
116        If not explicitely mentioned, every command
117        creates one output stream for each input stream.
118
119        STREAM HANDLING
120
121                echo(x1;x2;x3...)       creates one output stream from each specified parameter
122                                        (parameters are separated by ';').
123
124                "text"                  same as 'echo("text")'
125
126                dd                      copies all input streams to output streams
127
128                cut(N1,N2,N3)           copies the Nth input stream(s)
129
130                drop(N1,N2)             copies all but the Nth input stream(s)
131
132                dropempty               drops all empty input streams
133
134                dropzero                drops all non-numeric or zero input streams
135
136                swap(N1,N2)             swaps two input streams
137                                        (w/o parameters: swaps last two streams)
138
139                toback(X)               moves the Xth input stream
140                                        to the end of output streams
141
142                tofront(X)              moves the Xth input stream
143                                        to the start of output streams
144
145                merge([sep])            merges all input streams into one output stream.
146                                        If 'sep' is specified, it's inserted between them.
147                                        If no input streams are given, it returns 1 empty
148                                        output stream.
149
150                split([sep[,mode]])     splits all input streams at separator string 'sep'
151                                        (default: split at linefeed).
152
153                                        Modes:
154
155                                        0               remove found separators (default)
156                                        1               split before separator
157                                        2               split after separator
158
159                colsplit([width])       splits each input stream into multiple streams of the specified
160                                        width (or shorter for last output stream).
161                                        The default width is 1.
162
163                streams                 returns the number of input streams
164
165        STRING
166
167                head(n)                 the first n characters
168                left(n)                 the first n characters
169
170                tail(n)                 the last n characters
171                right(n)                the last n characters
172
173                                        the above functions return an empty string for n<=0
174
175                len                     the length of the input
176
177                len("chr")              the length of the input excluding the
178                                        characters in 'chr'
179
180                mid(x,y)                the substring string from position x to y
181
182                                        Allowed positions are
183                                        - [1..N] for mid()
184                                        - [0..N-1] for mid0()
185
186                                        A position below that range is relative to the end of the string,
187                                        i.e. mid(-2,0) and mid0(-3,-1) are equiv to tail(3)
188
189                crop("str")             removes characters of 'str' from
190                                        both ends of the input
191
192                remove("str")           removes all characters of 'str'
193                                        e.g. remove(" ") removes all blanks
194
195                keep("str")             the opposite of remove:
196                                        remove all chars that are not a member
197                                        of 'str'
198
199                isEmpty                 return '1' for each empty input stream, '0' for others
200
201                srt("orig=dest",...)    replace command, invokes SRT
202                                        (see LINK{srt.hlp})
203
204                translate("old","new"[,"other"])
205
206                        translates all characters from input that occur in the
207                        first argument ("old") by the corresponding character of the
208                        second argument ("new").
209
210                        An optional third argument (one character only) means:
211                        replace all other characters with the third argument.
212
213                        Example:
214
215                                Input:                        "--AabBCcxXy--"
216                                translate("abc-","xyz-")      "--AxyBCzxXy--"
217                                translate("abc-","xyz-",".")  "--.xy..z...--"
218
219                        This can be used to replace illegal characters from sequence date
220                        (see predefined expressions in 'Modify fields of listed species').
221
222
223                tab(n)                  append n-len(input) spaces
224
225                pretab(n)               prepend n-len(input) spaces
226
227                upper                   converts string to upper case
228                lower                   converts string to lower case
229                caps                    capitalizes string
230
231                format(options)
232
233                    takes a long string and breaks it into several lines
234
235                        option       (default)     description
236                        ==========================================================
237                        width=#      (50)          line width
238                        firsttab=#   (10)          first line left indent
239                        tab=#        (10)          left indent (not first line)
240                        "nl=chrs"    (" ")         list of characters that specify
241                                                   a possibly point of a line break;
242                                                   the line break characters get removed!
243                        "forcenl=chrs" ("\n")      Force a newline at these characters.
244
245                    (see also format_sequence below)
246
247                extract_words("chars",val)
248
249                    Search for all words (separated by ',' ';' ':' ' ' or 'tab') that
250                    contain more characters of type chars than val, sort them
251                    alphabetically and write them separated by ' ' to the output
252
253        ESCAPING AND QUOTING
254
255                 escape         escapes all occurrences of '\' and '"' by preceeding a '\'
256                 quote          quotes the input in '"'
257
258                 unescape       inverse of escape
259                 unquote        removes quotes (if present). otherwise return input
260
261
262        STRING COMPARISON
263
264               compare(a,b)             return -1 if a<b, 0 if a=b, 1 if a>b
265               equals(a,b)              return 1 if a=b, 0 otherwise
266               contains(a,b)            if a contains b, this returns the position of
267                                        b inside a (1..N) and 0 otherwise.
268                                        Always returns 0 if b is empty.
269               partof(a,b)              if a is part of b, this returns the position of
270                                        a inside b (1..N) and 0 otherwise.
271
272               The above functions are binary operators (see below).
273               For each of them a case-insensitive alternative exists (icompare, iequals, ...).
274
275        NUMERIC COMPARISON
276
277                All functions here operate with floating-point numbers.
278
279                isBelow(a,b)            return 1 if a<b, 0 otherwise
280                isAbove(a,b)            return 1 if a>b, 0 otherwise
281                isEqual(a,b)            return 1 if a=b, 0 otherwise
282
283                The above functions are binary operators (see below).
284
285                inRange(low,high)
286
287                        for values of each input stream:
288                        return 1 if low <= value <= high, 0 otherwise
289
290        CALCULATOR
291
292                plus                    add arguments
293                minus                   subtract arguments
294                mult                    multiply arguments
295                div                     divide arguments
296                per_cent                divide arguments * 100 (not rounded; use "fper_cent|round(0)")
297                rest                    divide arguments, take rest
298
299                Calculation is performed with integer numbers.
300                For most of these functions a floating-point variant exists:
301                    * fplus
302                    * fminus
303                    * fmult
304                    * fdiv
305                    * fper_cent
306
307                round(digits)
308
309                        rounds a floating-point input to the given numbers of digits
310                        behind the floating-point.
311                        Specify zero to round to an integer number.
312                        Specify negative digits to round to multiples of 10, 100, 1000, ...
313
314
315                To avoid 'division by zero'-errors, the operators 'div', 'per_cent' and 'rest'
316                return 0 if the second argument is zero.
317
318                The above functions work as binary operators (see below).
319
320        BOOLEAN OPERATORS
321
322                All input streams are converted to boolean
323                values (i.e. 0 or 1) as follows:
324
325                    "0"         -> 0
326                    any number  -> 1
327                    any text    -> 0 (even empty text!)
328
329                Operators:
330
331                    Not     invert values of all input streams (0<->1)
332                    And     return 1 if all input streams are 1, 0 otherwise
333                    Or      return 1 if one input streams is  1, 0 otherwise
334
335                    Use "|or|not" or "|and|not" to execute NOR or NAND.
336
337        BINARY OPERATORS
338
339               Several operators work as so called 'binary operators'.
340               These operators may be used in various ways, which are
341               shown using the operator 'plus':
342
343                     ACI                OUTPUT                  STREAMS
344                     plus(a,b)          a+b                     input:0 output:1
345                     a;b|plus           a+b                     input:2 output:1
346                     a;b;c;d|plus       a+b;c+d                 input:4 output:2
347                     a;b;c|operator(x)  a+x;b+x;c+x             input:3 output:3
348
349               That means, if the binary operator
350
351                    - has no arguments, it expects an even number of input streams. The operator is
352                      applied to the first 2 streams, then to the second 2 stream and so on.
353                      The number of output streams is half the number of input streams.
354                    - has 1 argument, it accepts one to many input streams. The operator
355                      is applied to each input stream together with the argument.
356                      For each input stream one output stream is generated.
357                    - has 2 arguments, it is applied to these. The arguments are interpreted as
358                      ACI commands and are applied for each input stream. The results of
359                      the commands are passed as arguments to the binary operator. For each input
360                      stream one output stream is generated.
361
362        CONDITIONAL
363
364                select(a,b,c,...)       each input stream is converted into a number
365                                        (non-numeric text converts to zero). That number is
366                                        used to select one of the given arguments:
367                                             0 selects 'a',
368                                             1 selects 'b',
369                                             2 selects 'c' and so on.
370                                        The selected argument is interpreted as ACI command
371                                        and is applied to an empty input stream.
372
373        DEBUGGING
374
375                trace(onoff)            toggle tracing of ACI actions to standard output.
376                                        Parameter: 0 or 1 (switch off or on)
377
378                                        All streams are copied (like 'dd').
379
380                                        Example: "cmd1 | cmd2 | trace(1) | tracedCmd1 | tracedCmd2 | trace(0) | untracedCmd "
381
382                                        To see the output from trace, either
383                                          * start arb from a terminal or
384                                          * use LINK{console.hlp}
385
386
387        DATABASE AND SEQUENCE
388
389                readdb(field_name)      the contents of the field 'field_name'
390
391                sequence                the sequence in the current alignment.
392
393                                        Note: older ARB versions returned 'no sequence'
394                                        if the current alignment contained no sequence.
395                                        Now it returns an empty string.
396
397                                        For genes it returns only the corresponding part
398                                        of the sequence. If the field complement = 1 then the
399                                        result is the reverse-complement.
400
401                sequence_type           the sequence type of the selected alignment ('rna','dna',..)
402                ali_name                the name of the selected alignment (e.g. 'ali_16s')
403
404                Note: The commands above only work at the beginning of the ACI expression.
405
406                checksum(options)       calculates a CRC checksum
407                                        options:
408                                        "exclude=chrs"    remove 'chrs' before calculation
409                                        "toupper"         make everything uppercase first
410
411                gcgchecksum             a gcg compatible checksum
412
413                format_sequence(options)
414
415                        takes a long string ( sequence ) and breaks it into several lines
416
417                        option       (default)  description
418                        =============================================================
419                        width=#      (50)       sequence line width
420                        firsttab=#   (10)       first line left indent
421                        tab=#        (10)       left indent (not first line)
422                        numleft      (NO)       numbers on the left side
423                        numright=#   (NO)       numbers on the right side (#=width)
424                        gap=#        (10)       insert a gap every # seq. characters.
425
426                    (see also 'format' above)
427
428                extract_sequence("chars",rel_len)
429
430                        like extract_words, but do not sort words, but rel_len is the minimum
431                        percentage of characters of a word that mach a character in 'chars'
432                        before word is taken. All words will be separated by white space.
433
434                taxonomy([treename,] depth)
435
436                        Returns the taxonomy of the current species or group as defined by a tree.
437
438                        If 'treename' is specified, its used as tree, otherwise the 'default tree'
439                        is used (which in most cases is the tree displayed in the ARB_NT main window).
440
441                        'depth' specifies how many "levels" of the taxonomy are used.
442
443        FILTERING
444
445                There are several functions to filter sequential data:
446
447                      - filter
448                      - diff
449                      - gc
450
451                All these functions use the following COMMON OPTIONS to define
452                what is used as filter sequence:
453
454                    - species=name
455
456                      Use species 'name' as filter.
457
458                    - SAI=name
459
460                      Use SAI 'name' as filter.
461
462                    - first=1
463
464                      Use 1st input stream as filter for all other input streams.
465
466                    - pairwise=1
467
468                      Use 1st input stream as filter for 2nd stream,
469                      3rd stream as filter for 4th stream, and so on.
470
471                    - align=ali_name
472
473                      Use alignment 'ali_name' instead of current default
474                      alignment (only meaningful together with 'species' or 'SAI').
475
476                    Note: Only one of the parameters 'species', 'SAI', 'first' or 'pairwise' may be used.
477
478                diff(options)
479
480                        Calculates the difference between the filter (see common options above) and the input stream(s) and
481                        write the result to output stream(s).
482
483                        Additional options:
484
485                        - equal=x
486
487                          Character written to output if filter and stream are equal at
488                          a position (defaults to '.'). To copy the stream contents for
489                          equal columns, specify 'equal=' (directly followed by ',' or ')')
490
491                        - differ=y
492
493                          Character written to output if filter and stream don't match at one column position.
494                          Default is to copy the character from the stream.
495
496                filter(options)
497
498                        Filters only specified columns out of the input stream(s). You need to
499                        specify either
500
501                        - exclude=xyz
502
503                          to use all columns, where the filter (see common options above) has none
504                          of the characters 'xyz'
505
506                        or
507
508                        - include=xyz
509
510                          to use only columns, where the filter has one of the characters 'xyz'
511
512                        All used columns are concatenated and written to the output stream(s).
513
514
515                change(options)
516
517                        Randomly modifies the content of columns selected
518                        by the filter (see common options above).
519                        Only columns containing letters will be modified.
520
521                        The options 'include=xyz' and 'exclude=xyz' work like
522                        with 'filter()', but here they select the columns to modify - all other
523                        columns get copied unmodified.
524
525                        How the selected columns are modified, is specified by the following
526                        parameters:
527
528                                - change=percent
529
530                                  percentage of changed columns (default: silently change nothing, to make
531                                  it more difficult for you to ignore this helpfile)
532
533                                - to=xy
534
535                                  randomly change to one of the characters 'xy'.
536
537                                  Hints:
538
539                                        - Use 'xyy' to produce 33% 'x' and 66% 'y'
540                                        - Use 'xxxxxxxxxy' to produce 90% 'x' and 10% 'y'
541                                        - Use 'x' to replace all matching columns by 'x'
542
543                        I think the intention for this (long undocumented) command is to easily generate
544                        artificial sequences with different GC-content, in order to test treeing-software.
545
546        SPECIALS
547
548                exec(command[,param1,param2,...])
549
550                    Execute external (unix) command.
551
552                    Given params will be single-quoted and passed to the command.
553
554                    All input streams will be concatenated and piped into the command.
555
556                    When the command itself is a pipe, put it in parenthesis (e.g. "(sort|uniq)").
557                    Note: This won't work together with params.
558
559                    The result is the output of the command.
560
561                    WARNING!!!
562
563                        You better not use this command for NDS,
564                        because any slow command will disable all editing -> You never
565                        can remove this command from the NDS. Even arb_panic will not
566                        easily help you.
567
568                command(action)
569
570                        applies 'action' to all input streams using
571
572                                 - ACI,
573                                 - SRT (if starts with ':') (see LINK{srt.hlp})
574                                 - or as REG (if starts with '/') (see LINK{reg.hlp}).
575
576                        If you nest calls (i.e. if 'action' contains further calls to 'command') you have to apply
577                        escaping multiple times (e.g. inside an export filter - which is in fact an
578                        SRT expression - you'll have to use double escapes).
579
580                eval(exprEvalToAction)
581
582                        the 'exprEvalToAction' is evaluated (using an empty string as input)
583                        and the result is interpreted as action and gets applied to all
584                        input streams (as in 'command' above).
585
586                        Example: Said you have two numeric positions stored in database fields
587                                 'pos1' and 'pos2' for each species. Then the following command
588                                 extracts the sequence data from pos1 to pos2:
589
590                                 'sequence|eval(" \"mid(\";readdb(pos1);\";\";readdb(pos2);\")\" ")'
591
592                        How the example works:
593
594                            The argument is the escaped version of the
595                            command '"mid(" ; readdb(pos1) ; ";" ; readdb(pos2) ; ")"'.
596
597                            If pos1 contains '10' and pos2 contains '20' that command will
598                            evaluate to 'mid(10;20)'.
599
600                            For these positions the executed ACI behaves like 'sequence|mid(10;20)'.
601
602                define(name,escapedCommand)
603
604                        defines a ACI-macro 'name'. 'escapedCommand' contains an escaped
605                        ACI command sequence. This command sequence can be executed with
606                        do(name).
607
608                do(name)
609
610                        applies a previously defined ACI-macro to all input streams (see 'define').
611
612                        'define(a,action)' followed by 'do(a)' works similar to 'command(action)'.
613
614                        See embl.eft for an example using define and 'do'
615
616                findspec(action)
617
618                        Each input stream is interpreted as species 'name' (ID) and a species
619                        with that 'name' is searched (aborts with error if species could not be found;
620                        silently ignores empty streams).
621
622                        Otherwise 'action' is applied (to one empty stream).
623                        Instead of the current item, all database commands inside 'action' use the found species.
624
625                findacc(action)
626
627                        like findspec, but search for 'acc' instead of 'name'.
628
629                findgene(action)
630
631                        like findspec, but searches for genes (starting at organism or
632                        at other gene of same organism).
633
634                origin_organism(action)
635                origin_gene(action)
636
637                        like command() but readdb() etc. reads all data from the
638                        origin organism/gene of a gene-species (not from the gene-species itself).
639
640                        This function applies only to gene-species!
641
642SECTION         Future features
643
644                statistic
645
646                        creates a character statistic of the sequence
647                        (not implemented yet)
648
649EXAMPLES        sequence|format_sequence(firsttab=0;tab=10)|"SEQUENCE_";dd
650
651                                fetches the default sequence, formats it,
652                                and prepends 'SEQUENCE_'.
653
654                sequence|remove(".-")|format_sequence
655
656                                get the default sequence, remove all '.-' and
657                                format it
658
659                sequence|remove(".-")|len
660
661                                the number of non '.-' symbols (sequence length )
662
663                "[";taxonomy(tree_other,3);" -> ";taxonomy(3);"]"
664
665                                shows for each species how their taxonomy
666                                changed between "tree_other" and current tree
667
668                equals(readdb(tmp),readdb(acc))|select(echo("tmp and acc differ"),)
669
670                                returns 'tmp and acc differ' if the content of
671                                the database fields 'tmp' and 'acc' differs. empty result
672                                otherwise.
673
674                readdb(full_name)|icontains(bacillus)|compare(0)|select(echo(..),readdb(full_name))
675
676                                returns the content of the 'full_name' database entry if it contains
677                                the substring 'bacillus'. Otherwise returns '..'
678
679
680BUGS            The output of taxonomy() is not always instantly refreshed.
Note: See TracBrowser for help on using the repository browser.