source: branches/nameserver/HELP_SOURCE/oldhelp/aci.hlp

Last change on this file was 16425, checked in by westram, 8 years ago
  • reintegrates 'aci' into 'trunk'
    • extends ACI language (implementing #707)
      • boolean operators: And,Or,Not
      • numeric comparison: isAbove,isBelow,isEqual
      • floating point arithmetic: fplus,fminus,fmult,fdiv
      • misc: round,inRange,isEmpty
    • enforce parameter checks in ACI function code
    • case of commands completely ignored
  • adds: log:branches/aci@16385,16391:16424
  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 28.7 KB
Line 
1#Please insert up references in the next lines (line starts with keyword UP)
2UP      arb.hlp
3UP      glossary.hlp
4
5#Please insert subtopic references  (line starts with keyword SUB)
6SUB     exec_bug.hlp
7
8# Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain}
9
10#************* Title of helpfile !! and start of real helpfile ********
11
12TITLE           ARB Command Interpreter (ACI)
13
14OCCURRENCE      NDS
15                [ export db ]
16                [ ARB_NT/Species/search/parse_fields ]
17
18DESCRIPTION     ACI is a simple command interpreter, which uses streams of data as central concept.
19
20                Many ACI commands have parameters which are specified behind
21                the command in parenthesis.
22
23                All ACI commands
24                 * take the data from (one or multiple) input streams,
25                 * modify that data and
26                 * write that data to (one or multiple) output streams.
27
28                     e.g. the command 'count("a")' counts every 'a' for each input stream and
29                     generates one output stream (containing the char count) for every input stream.
30
31                The first input stream always is a single stream,
32                often the value of a database field (e.g. when ACI is used in LINK{props_nds.hlp}).
33
34                The number of output streams depends on the used command:
35                 * most commands produce one output stream for each input stream (as the count-example above)
36                 * some commands combine two input streams into one output stream (e.g. see binary operators below)
37                 * some commands ignore all input streams and create one output streams (e.g. 'readdb(fieldname)')
38                 * Note: special stream related commands are documented in section 'STREAM HANDLING'
39
40                Multiple commands can be separated by two operator symbols: ';' and '|'.
41                  * ';' binds stronger than '|'
42                  * commands separated by ';' form a command-list and operate independently from each other:
43                    - all(!) commands use all(!) input streams
44                    - each command generates its own output streams
45                  * the '|' operator acts as processing sequence point, i.e.
46                    - all output streams generated by the command-list on the left side of the '|' will be passed
47                    - as input streams to the command-list on the right side of the '|'.
48
49                Finally (at the end of the overall ACI expression) all generated output streams get concatenated.
50
51                Typical uses are to
52                  * show text at the tips of the tree (LINK{props_nds.hlp})
53                  * write information into database fields (LINK{mod_field_list.hlp})
54
55                Instead of using ACI commands (as described in this document) you may always use any of the other
56                integrated data processing languages. Simply prefix the command
57
58                  * with a ':' to use LINK{srt.hlp}
59                  * with a '/' to use LINK{reg.hlp}
60
61                Both are as well available inside ACI via the commands 'srt' and 'command', see below.
62
63SECTION Examples
64
65                        count("A");count("AG")
66
67                                creates two streams:
68
69                                        1. how many A's
70                                        2. and how many A's and G's
71
72                        count("A");count("G")|per_cent
73
74                                per_cent is a command that divides two numbers
75                                (number of 'A's / number of 'G's) and returns the result
76                                as percent.
77
78SECTION Example data flow
79
80        eg: count("A");count("G")|"a/g = "; per_cent
81
82        input                                                         concatenate output
83        "AGG" ----> count("A") -->| -----> "a/g = " --> | --> "a/g = " ---> 'a/g = 50'
84              \                   | \ /                 |               /
85               \                  |  \                  |              /
86                \                 | / \                 |             /
87                 -> count("G") -->| -----> per_cent --> | --> "50" ---
88
89
90SECTION PARAMETERS
91
92        Several commands expect or accept additional parameters in
93        parenthesis (e.g. 'remove(aA)').
94
95        Multiple parameters have to be separated by ',' or ';'.
96
97        There are two distinct ways to specify such a parameter:
98        - unquoted
99
100          Unquoted parameters are taken as specified, despite the following exceptions:
101           - ',;"|\)' need to be escaped by prefixing one '\'
102           - spaces will get removed if unprefixed by '\'
103
104        - quoted
105
106          Quoted parameters begin and end with a '"'. You can use any character,
107          but you need to escape '\' and '"' by preceeding a '\'.
108
109          Example: 'remove("\"")' will remove all double quotes from input.
110                   'remove("\\")' will remove all backslashes from input.
111
112        [@@@ behavior currently not strictly implemented]
113
114SECTION COMMANDLIST
115
116        If not explicitely mentioned, every command
117        creates one output stream for each input stream.
118
119        STREAM HANDLING
120
121                echo(x1;x2;x3...)       creates one output stream from each specified parameter
122                                        (parameters are separated by ';').
123
124                "text"                  same as 'echo("text")'
125
126                dd                      copies all input streams to output streams
127
128                cut(N1,N2,N3)           copies the Nth input stream(s)
129
130                drop(N1,N2)             copies all but the Nth input stream(s)
131
132                dropempty               drops all empty input streams
133
134                dropzero                drops all non-numeric or zero input streams
135
136                swap(N1,N2)             swaps two input streams
137                                        (w/o parameters: swaps last two streams)
138
139                toback(X)               moves the Xth input stream
140                                        to the end of output streams
141
142                tofront(X)              moves the Xth input stream
143                                        to the start of output streams
144
145                merge([sep])            merges all input streams into one output stream.
146                                        If 'sep' is specified, it's inserted between them.
147                                        If no input streams are given, it returns 1 empty
148                                        input stream.
149
150                split([sep[,mode]])     splits all input streams at separator string 'sep'
151                                        (default: split at linefeed).
152
153                                        Modes:
154
155                                        0               remove found separators (default)
156                                        1               split before separator
157                                        2               split after separator
158
159                streams                 returns the number of input streams
160
161        STRING
162
163                head(n)                 the first n characters
164                left(n)                 the first n characters
165
166                tail(n)                 the last n characters
167                right(n)                the last n characters
168
169                                        the above functions return an empty string for n<=0
170
171                len                     the length of the input
172
173                len("chr")              the length of the input excluding the
174                                        characters in 'chr'
175
176                mid(x,y)                the substring string from position x to y
177
178                                        Allowed positions are
179                                        - [1..N] for mid()
180                                        - [0..N-1] for mid0()
181
182                                        A position below that range is relative to the end of the string,
183                                        i.e. mid(-2,0) and mid0(-3,-1) are equiv to tail(3)
184
185                crop("str")             removes characters of 'str' from
186                                        both ends of the input
187
188                remove("str")           removes all characters of 'str'
189                                        e.g. remove(" ") removes all blanks
190
191                keep("str")             the opposite of remove:
192                                        remove all chars that are not a member
193                                        of 'str'
194
195                isEmpty                 return '1' for each empty input stream, '0' for others
196
197                srt("orig=dest",...)    replace command, invokes SRT
198                                        (see LINK{srt.hlp})
199
200                translate("old","new"[,"other"])
201
202                        translates all characters from input that occur in the
203                        first argument ("old") by the corresponding character of the
204                        second argument ("new").
205
206                        An optional third argument (one character only) means:
207                        replace all other characters with the third argument.
208
209                        Example:
210
211                                Input:                        "--AabBCcxXy--"
212                                translate("abc-","xyz-")      "--AxyBCzxXy--"
213                                translate("abc-","xyz-",".")  "--.xy..z...--"
214
215                        This can be used to replace illegal characters from sequence date
216                        (see predefined expressions in 'Modify fields of listed species').
217
218
219                tab(n)                  append n-len(input) spaces
220
221                pretab(n)               prepend n-len(input) spaces
222
223                upper                   converts string to upper case
224                lower                   converts string to lower case
225                caps                    capitalizes string
226
227                format(options)
228
229                    takes a long string and breaks it into several lines
230
231                        option       (default)     description
232                        ==========================================================
233                        width=#      (50)          line width
234                        firsttab=#   (10)          first line left indent
235                        tab=#        (10)          left indent (not first line)
236                        "nl=chrs"    (" ")         list of characters that specify
237                                                   a possibly point of a line break;
238                                                   the line break characters get removed!
239                        "forcenl=chrs" ("\n")      Force a newline at these characters.
240
241                    (see also format_sequence below)
242
243                extract_words("chars",val)
244
245                    Search for all words (separated by ',' ';' ':' ' ' or 'tab') that
246                    contain more characters of type chars than val, sort them
247                    alphabetically and write them separated by ' ' to the output
248
249        ESCAPING AND QUOTING
250
251                 escape         escapes all occurrences of '\' and '"' by preceeding a '\'
252                 quote          quotes the input in '"'
253
254                 unescape       inverse of escape
255                 unquote        removes quotes (if present). otherwise return input
256
257
258        STRING COMPARISON
259
260               compare(a,b)             return -1 if a<b, 0 if a=b, 1 if a>b
261               equals(a,b)              return 1 if a=b, 0 otherwise
262               contains(a,b)            if a contains b, this returns the position of
263                                        b inside a (1..N) and 0 otherwise.
264                                        Always returns 0 if b is empty.
265               partof(a,b)              if a is part of b, this returns the position of
266                                        a inside b (1..N) and 0 otherwise.
267
268               The above functions are binary operators (see below).
269               For each of them a case-insensitive alternative exists (icompare, iequals, ...).
270
271        NUMERIC COMPARISON
272
273                All functions here operate with floating-point numbers.
274
275                isBelow(a,b)            return 1 if a<b, 0 otherwise
276                isAbove(a,b)            return 1 if a>b, 0 otherwise
277                isEqual(a,b)            return 1 if a=b, 0 otherwise
278
279                The above functions are binary operators (see below).
280
281                inRange(low,high)
282
283                        for values of each input stream:
284                        return 1 if low <= value <= high, 0 otherwise
285
286        CALCULATOR
287
288                plus                    add arguments
289                minus                   subtract arguments
290                mult                    multiply arguments
291                div                     divide arguments
292                per_cent                divide arguments * 100 (not rounded; use "fper_cent|round(0)")
293                rest                    divide arguments, take rest
294
295                Calculation is performed with integer numbers.
296                For most of these functions a floating-point variant exists:
297                    * fplus
298                    * fminus
299                    * fmult
300                    * fdiv
301                    * fper_cent
302
303                round(digits)
304
305                        rounds a floating-point input to the given numbers of digits
306                        behind the floating-point.
307                        Specify zero to round to an integer number.
308                        Specify negative digits to round to multiples of 10, 100, 1000, ...
309
310
311                To avoid 'division by zero'-errors, the operators 'div', 'per_cent' and 'rest'
312                return 0 if the second argument is zero.
313
314                The above functions work as binary operators (see below).
315
316        BOOLEAN OPERATORS
317
318                All input streams are converted to boolean
319                values (i.e. 0 or 1) as follows:
320
321                    "0"         -> 0
322                    any number  -> 1
323                    any text    -> 0 (even empty text!)
324
325                Operators:
326
327                    Not     invert values of all input streams (0<->1)
328                    And     return 1 if all input streams are 1, 0 otherwise
329                    Or      return 1 if one input streams is  1, 0 otherwise
330
331                    Use "|or|not" or "|and|not" to execute NOR or NAND.
332
333        BINARY OPERATORS
334
335               Several operators work as so called 'binary operators'.
336               These operators may be used in various ways, which are
337               shown using the operator 'plus':
338
339                     ACI                OUTPUT                  STREAMS
340                     plus(a,b)          a+b                     input:0 output:1
341                     a;b|plus           a+b                     input:2 output:1
342                     a;b;c;d|plus       a+b;c+d                 input:4 output:2
343                     a;b;c|operator(x)  a+x;b+x;c+x             input:3 output:3
344
345               That means, if the binary operator
346
347                    - has no arguments, it expects an even number of input streams. The operator is
348                      applied to the first 2 streams, then to the second 2 stream and so on.
349                      The number of output streams is half the number of input streams.
350                    - has 1 argument, it accepts one to many input streams. The operator
351                      is applied to each input stream together with the argument.
352                      For each input stream one output stream is generated.
353                    - has 2 arguments, it is applied to these. The arguments are interpreted as
354                      ACI commands and are applied for each input stream. The results of
355                      the commands are passed as arguments to the binary operator. For each input
356                      stream one output stream is generated.
357
358        CONDITIONAL
359
360                select(a,b,c,...)       each input stream is converted into a number
361                                        (non-numeric text converts to zero). That number is
362                                        used to select one of the given arguments:
363                                             0 selects 'a',
364                                             1 selects 'b',
365                                             2 selects 'c' and so on.
366                                        The selected argument is interpreted as ACI command
367                                        and is applied to an empty input stream.
368
369        DEBUGGING
370
371                trace(onoff)            toggle tracing of ACI actions to standard output.
372                                        Parameter: 0 or 1 (switch off or on)
373
374                                        All streams are copied (like 'dd').
375
376                                        Example: "cmd1 | cmd2 | trace(1) | tracedCmd1 | tracedCmd2 | trace(0) | untracedCmd "
377
378                                        To see the output from trace, either
379                                          * start arb from a terminal or
380                                          * use LINK{console.hlp}
381
382
383        DATABASE AND SEQUENCE
384
385                readdb(field_name)      the contents of the field 'field_name'
386
387                sequence                the sequence in the current alignment.
388
389                                        Note: older ARB versions returned 'no sequence'
390                                        if the current alignment contained no sequence.
391                                        Now it returns an empty string.
392
393                                        For genes it returns only the corresponding part
394                                        of the sequence. If the field complement = 1 then the
395                                        result is the reverse-complement.
396
397                sequence_type           the sequence type of the selected alignment ('rna','dna',..)
398                ali_name                the name of the selected alignment (e.g. 'ali_16s')
399
400                Note: The commands above only work at the beginning of the ACI expression.
401
402                checksum(options)       calculates a CRC checksum
403                                        options:
404                                        "exclude=chrs"    remove 'chrs' before calculation
405                                        "toupper"         make everything uppercase first
406
407                gcgchecksum             a gcg compatible checksum
408
409                format_sequence(options)
410
411                        takes a long string ( sequence ) and breaks it into several lines
412
413                        option       (default)  description
414                        =============================================================
415                        width=#      (50)       sequence line width
416                        firsttab=#   (10)       first line left indent
417                        tab=#        (10)       left indent (not first line)
418                        numleft      (NO)       numbers on the left side
419                        numright=#   (NO)       numbers on the right side (#=width)
420                        gap=#        (10)       insert a gap every # seq. characters.
421
422                    (see also 'format' above)
423
424                extract_sequence("chars",rel_len)
425
426                        like extract_words, but do not sort words, but rel_len is the minimum
427                        percentage of characters of a word that mach a character in 'chars'
428                        before word is taken. All words will be separated by white space.
429
430                taxonomy([treename,] depth)
431
432                        Returns the taxonomy of the current species or group as defined by a tree.
433
434                        If 'treename' is specified, its used as tree, otherwise the 'default tree'
435                        is used (which in most cases is the tree displayed in the ARB_NT main window).
436
437                        'depth' specifies how many "levels" of the taxonomy are used.
438
439        FILTERING
440
441                There are several functions to filter sequential data:
442
443                      - filter
444                      - diff
445                      - gc
446
447                All these functions use the following COMMON OPTIONS to define
448                what is used as filter sequence:
449
450                    - species=name
451
452                      Use species 'name' as filter.
453
454                    - SAI=name
455
456                      Use SAI 'name' as filter.
457
458                    - first=1
459
460                      Use 1st input stream as filter for all other input streams.
461
462                    - pairwise=1
463
464                      Use 1st input stream as filter for 2nd stream,
465                      3rd stream as filter for 4th stream, and so on.
466
467                    - align=ali_name
468
469                      Use alignment 'ali_name' instead of current default
470                      alignment (only meaningful together with 'species' or 'SAI').
471
472                    Note: Only one of the parameters 'species', 'SAI', 'first' or 'pairwise' may be used.
473
474                diff(options)
475
476                        Calculates the difference between the filter (see common options above) and the input stream(s) and
477                        write the result to output stream(s).
478
479                        Additional options:
480
481                        - equal=x
482
483                          Character written to output if filter and stream are equal at
484                          a position (defaults to '.'). To copy the stream contents for
485                          equal columns, specify 'equal=' (directly followed by ',' or ')')
486
487                        - differ=y
488
489                          Character written to output if filter and stream don't match at one column position.
490                          Default is to copy the character from the stream.
491
492                filter(options)
493
494                        Filters only specified columns out of the input stream(s). You need to
495                        specify either
496
497                        - exclude=xyz
498
499                          to use all columns, where the filter (see common options above) has none
500                          of the characters 'xyz'
501
502                        or
503
504                        - include=xyz
505
506                          to use only columns, where the filter has one of the characters 'xyz'
507
508                        All used columns are concatenated and written to the output stream(s).
509
510
511                change(options)
512
513                        Randomly modifies the content of columns selected
514                        by the filter (see common options above).
515                        Only columns containing letters will be modified.
516
517                        The options 'include=xyz' and 'exclude=xyz' work like
518                        with 'filter()', but here they select the columns to modify - all other
519                        columns get copied unmodified.
520
521                        How the selected columns are modified, is specified by the following
522                        parameters:
523
524                                - change=percent
525
526                                  percentage of changed columns (default: silently change nothing, to make
527                                  it more difficult for you to ignore this helpfile)
528
529                                - to=xy
530
531                                  randomly change to one of the characters 'xy'.
532
533                                  Hints:
534
535                                        - Use 'xyy' to produce 33% 'x' and 66% 'y'
536                                        - Use 'xxxxxxxxxy' to produce 90% 'x' and 10% 'y'
537                                        - Use 'x' to replace all matching columns by 'x'
538
539                        I think the intention for this (long undocumented) command is to easily generate
540                        artificial sequences with different GC-content, in order to test treeing-software.
541
542        SPECIALS
543
544                exec(command[,param1,param2,...])
545
546                    Execute external (unix) command.
547
548                    Given params will be single-quoted and passed to the command.
549
550                    All input streams will be concatenated and piped into the command.
551
552                    When the command itself is a pipe, put it in parenthesis (e.g. "(sort|uniq)").
553                    Note: This won't work together with params.
554
555                    The result is the output of the command.
556
557                    WARNING!!!
558
559                        You better not use this command for NDS,
560                        because any slow command will disable all editing -> You never
561                        can remove this command from the NDS. Even arb_panic will not
562                        easily help you.
563
564                command(action)
565
566                        applies 'action' to all input streams using
567
568                                 - ACI,
569                                 - SRT (if starts with ':') (see LINK{srt.hlp})
570                                 - or as REG (if starts with '/') (see LINK{reg.hlp}).
571
572                        If you nest calls (i.e. if 'action' contains further calls to 'command') you have to apply
573                        escaping multiple times (e.g. inside an export filter - which is in fact an
574                        SRT expression - you'll have to use double escapes).
575
576                eval(exprEvalToAction)
577
578                        the 'exprEvalToAction' is evaluated (using an empty string as input)
579                        and the result is interpreted as action and gets applied to all
580                        input streams (as in 'command' above).
581
582                        Example: Said you have two numeric positions stored in database fields
583                                 'pos1' and 'pos2' for each species. Then the following command
584                                 extracts the sequence data from pos1 to pos2:
585
586                                 'sequence|eval(" \"mid(\";readdb(pos1);\";\";readdb(pos2);\")\" ")'
587
588                        How the example works:
589
590                            The argument is the escaped version of the
591                            command '"mid(" ; readdb(pos1) ; ";" ; readdb(pos2) ; ")"'.
592
593                            If pos1 contains '10' and pos2 contains '20' that command will
594                            evaluate to 'mid(10;20)'.
595
596                            For these positions the executed ACI behaves like 'sequence|mid(10;20)'.
597
598                define(name,escapedCommand)
599
600                        defines a ACI-macro 'name'. 'escapedCommand' contains an escaped
601                        ACI command sequence. This command sequence can be executed with
602                        do(name).
603
604                do(name)
605
606                        applies a previously defined ACI-macro to all input streams (see 'define').
607
608                        'define(a,action)' followed by 'do(a)' works similar to 'command(action)'.
609
610                        See embl.eft for an example using define and 'do'
611
612                origin_organism(action)
613                origin_gene(action)
614
615                        like command() but readdb() etc. reads all data from the
616                        origin organism/gene of a gene-species (not from the gene-species itself).
617
618                        This function applies only to gene-species!
619
620SECTION         Future features
621
622                statistic
623
624                        creates a character statistic of the sequence
625                        (not implemented yet)
626
627EXAMPLES        sequence|format_sequence(firsttab=0;tab=10)|"SEQUENCE_";dd
628
629                                fetches the default sequence, formats it,
630                                and prepends 'SEQUENCE_'.
631
632                sequence|remove(".-")|format_sequence
633
634                                get the default sequence, remove all '.-' and
635                                format it
636
637                sequence|remove(".-")|len
638
639                                the number of non '.-' symbols (sequence length )
640
641                "[";taxonomy(tree_other,3);" -> ";taxonomy(3);"]"
642
643                                shows for each species how their taxonomy
644                                changed between "tree_other" and current tree
645
646                equals(readdb(tmp),readdb(acc))|select(echo("tmp and acc differ"),)
647
648                                returns 'tmp and acc differ' if the content of
649                                the database fields 'tmp' and 'acc' differs. empty result
650                                otherwise.
651
652                readdb(full_name)|icontains(bacillus)|compare(0)|select(echo(..),readdb(full_name))
653
654                                returns the content of the 'full_name' database entry if it contains
655                                the substring 'bacillus'. Otherwise returns '..'
656
657
658BUGS            The output of taxonomy() is not always instantly refreshed.
Note: See TracBrowser for help on using the repository browser.