source: branches/sina/HELP_SOURCE/source/aci.hlp

Last change on this file was 19708, checked in by westram, 3 months ago
  • update doc+gui:
    • avoid terms "ARB_NT" (=former name of main arb window), "ARB_NTREE" and similar
      • instead talk about "ARB main window"
    • correct refs to 'ARB/Probes/PT_SERVER Admin'.
    • update 'What are marked species?'.
    • update protection level documentation (parts already done with previous commit).
  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 30.6 KB
Line 
1#       main topics:
2UP      arb.hlp
3UP      glossary.hlp
4
5#       sub topics:
6SUB     exec_bug.hlp
7
8# format described in ../help.readme
9
10
11
12TITLE           ARB Command Interpreter (ACI)
13
14OCCURRENCE      ARB wide
15
16DESCRIPTION     ACI is a simple command interpreter, which uses streams of data as central concept.
17
18                Many ACI commands have parameters which are specified behind
19                the command in parenthesis.
20
21                All ACI commands
22                 * take the data from (one or multiple) input streams,
23                 * modify that data and
24                 * write that data to (one or multiple) output streams.
25
26                     e.g. the command 'count("a")' counts every 'a' for each input stream and
27                     generates one output stream (containing the char count) for every input stream.
28
29                The first input stream always is a single stream,
30                often the value of a database field (e.g. when ACI is used in LINK{props_nds.hlp}).
31
32                The number of output streams depends on the used command:
33                 * most commands produce one output stream for each input stream (as the count-example above)
34                 * some commands combine two input streams into one output stream (e.g. see binary operators below)
35                 * some commands ignore all input streams and create one output streams (e.g. 'readdb(fieldname)')
36                 * Note: special stream related commands are documented in section 'STREAM HANDLING'
37
38                Multiple commands can be separated by two operator symbols: ';' and '|'.
39                  * ';' binds stronger than '|'
40                  * commands separated by ';' form a command-list and operate independently from each other:
41                    - all(!) commands use all(!) input streams
42                    - each command generates its own output streams
43                  * the '|' operator acts as processing sequence point, i.e.
44                    - all output streams generated by the command-list on the left side of the '|' will be passed
45                    - as input streams to the command-list on the right side of the '|'.
46
47                Finally (at the end of the overall ACI expression) all generated output streams get concatenated.
48
49                Typical uses are to
50                  * show text at the tips of the tree (LINK{props_nds.hlp})
51                  * write information into database fields (LINK{mod_field_list.hlp})
52
53                Instead of using ACI commands (as described in this document) you may always use any of the other
54                integrated data processing languages. Simply prefix the command
55
56                  * with a ':' to use LINK{srt.hlp}
57                  * with a '/' to use LINK{reg.hlp}
58
59                Both are as well available inside ACI via the commands 'srt' and 'command', see below.
60
61SECTION Examples
62
63# PREFORMATTED 1
64                        count("A");count("AG")
65
66                                creates two streams:
67
68                                        1. how many A's
69                                        2. and how many A's and G's
70
71# PREFORMATTED 1
72                        count("A");count("G")|per_cent
73
74                                per_cent is a command that divides two numbers
75                                (number of 'A's / number of 'G's) and returns the result
76                                as percent.
77
78SECTION Simple example to illustrate the data flow
79
80# PREFORMATTED 1
81        count("A");count("G")|"a/g = "; per_cent
82
83        input                                                         concatenate output
84        "AGG" ----> count("A") -->| -----> "a/g = " --> | --> "a/g = " ---> 'a/g = 50'
85              \                   | \ /                 |               /
86               \                  |  \                  |              /
87                \                 | / \                 |             /
88                 -> count("G") -->| -----> per_cent --> | --> "50" ---
89
90
91SECTION PARAMETERS
92
93        Several commands expect or accept additional parameters in
94        parenthesis (e.g. 'remove(aA)').
95
96        Multiple parameters have to be separated by ',' or ';'.
97
98        There are two distinct ways to specify such a parameter:
99        - unquoted
100
101          Unquoted parameters are taken as specified, despite the following exceptions:
102           - any character in ',;"|\)' needs to be escaped by prefixing one '\'.
103           - spaces will get removed if not prefixed by '\'.
104
105        - quoted
106
107          Quoted parameters begin and end with a '"'. You can use any character,
108          but you need to escape '\' and '"' by preceeding a '\'.
109
110          Examples:
111
112          remove("\"")                will remove all double quotes from input.
113          remove("\\")                will remove all backslashes from input.
114
115        [@@@ behavior currently not strictly implemented]
116
117SECTION COMMANDLIST
118
119        If not explicitely mentioned, every command
120        creates one output stream for each input stream.
121
122        STREAM HANDLING
123
124                echo(x1;x2;x3...)       creates one output stream from each specified parameter
125                                        (parameters are separated by ';').
126
127                "text"                  same as 'echo("text")'
128
129                dd                      copies all input streams to output streams
130
131                cut(N1,N2,N3)           copies the Nth input stream(s)
132
133                drop(N1,N2)             copies all but the Nth input stream(s)
134
135                dropempty               drops all empty input streams
136
137                dropzero                drops all non-numeric or zero input streams
138
139                swap(N1,N2)             swaps two input streams
140                                        (w/o parameters: swaps last two streams)
141
142                toback(X)               moves the Xth input stream
143                                        to the end of output streams
144
145                tofront(X)              moves the Xth input stream
146                                        to the start of output streams
147
148                merge([sep])            merges all input streams into one output stream.
149                                        If 'sep' is specified, it's inserted between them.
150                                        If no input streams are given, it returns 1 empty
151                                        output stream.
152
153                split([sep[,mode]])     splits all input streams at separator string 'sep'
154                                        (default: split at linefeed).
155
156                                        Modes:
157
158                                        0               remove found separators (default)
159                                        1               split before separator
160                                        2               split after separator
161
162                colsplit([width])       splits each input stream into multiple streams of the
163                                        specified width (or shorter for last output stream).
164                                        The default width is 1.
165
166                streams                 returns the number of input streams
167
168        STRING
169
170                head(n)                 the first n characters
171                left(n)                 the first n characters
172
173                tail(n)                 the last n characters
174                right(n)                the last n characters
175
176                                        the above functions return an empty string for n<=0
177
178                len                     the length of the input
179
180                len("chr")              the length of the input excluding the
181                                        characters in 'chr'
182
183                mid(x,y)                the substring string from position x to y
184
185                    Allowed positions are
186                    - [1..N] for mid()
187                    - [0..N-1] for mid0()
188
189                    A position below that range is relative to the end of the string,
190                    i.e. mid(-2,0) and mid0(-3,-1) are equiv to tail(3)
191
192                crop("str")             removes characters of 'str' from
193                                        both ends of the input
194
195                remove("str")           removes all characters of 'str'
196                                        e.g. remove(" ") removes all blanks
197
198                keep("str")             keep is the opposite of remove:
199                                        remove all chars that are not a member of 'str'
200
201                isEmpty                 return '1' for each empty input stream, '0' for others
202
203                srt("orig=dest",...)    replace command, invokes SRT
204                                        (see LINK{srt.hlp})
205
206# PREFORMATTED 1
207                translate("old","new"[,"other"])
208
209                        translates all characters from input that occur in the
210                        first argument ("old") by the corresponding character of the
211                        second argument ("new").
212
213                        An optional third argument (one character only) means:
214                        replace all other characters with the third argument.
215
216                        Example:
217
218                                Input:                        "--AabBCcxXy--"
219                                translate("abc-","xyz-")      "--AxyBCzxXy--"
220                                translate("abc-","xyz-",".")  "--.xy..z...--"
221
222                        This can be used to replace illegal characters from sequence date
223                        (see predefined expressions in 'Modify fields of listed species').
224
225
226                tab(n)                  append n-len(input) spaces
227
228                pretab(n)               prepend n-len(input) spaces
229
230                upper                   converts string to upper case
231                lower                   converts string to lower case
232                caps                    capitalizes string
233
234# PREFORMATTED 1
235                format(options)
236
237                    takes a long string and breaks it into several lines
238
239                        option       (default)     description
240                        ==========================================================
241                        width=#      (50)          line width
242                        firsttab=#   (10)          first line left indent
243                        tab=#        (10)          left indent (not first line)
244                        "nl=chrs"    (" ")         list of characters that specify
245                                                   a possibly point of a line break;
246                                                   the line break characters get removed!
247                        "forcenl=chrs" ("\n")      Force a newline at these characters.
248
249                    (see also format_sequence below)
250
251# PREFORMATTED 1
252                extract_words("chars",val)
253
254                    Search for all words (separated by ',' ';' ':' ' ' or 'tab') that
255                    contain more characters of type chars than val, sort them
256                    alphabetically and write them separated by ' ' to the output
257
258        ESCAPING AND QUOTING
259
260                escape         escapes all occurrences of '\' and '"' by preceeding a '\'
261                quote          quotes the input in '"'
262
263                unescape       inverse of escape
264                unquote        removes quotes (if present). otherwise return input
265
266
267        STRING COMPARISON
268
269                compare(a,b)            return -1 if a<b, 0 if a=b, 1 if a>b
270                equals(a,b)             return 1 if a=b, 0 otherwise
271                contains(a,b)           if a contains b, this returns the position of
272                                        b inside a (1..N) and 0 otherwise.
273                                        Always returns 0 if b is empty.
274                partof(a,b)             if a is part of b, this returns the position of
275                                        a inside b (1..N) and 0 otherwise.
276
277                For each of these functions a case-insensitive alternative
278                exists (icompare, iequals, ...).
279
280                Note: all these functions may be used as binary operators
281                (see section 'BOOLEAN OPERATORS' below for concept).
282
283
284        NUMERIC COMPARISON
285
286                All functions here operate with floating-point numbers.
287
288                isBelow(a,b)            return 1 if a<b, 0 otherwise
289                isAbove(a,b)            return 1 if a>b, 0 otherwise
290                isEqual(a,b)            return 1 if a=b, 0 otherwise
291
292                Note: all functions above may be used as binary operators
293                (see section 'BOOLEAN OPERATORS' below for concept).
294
295# PREFORMATTED 1
296                inRange(low,high)
297
298                    For the values of all input streams, this returns
299                    * 1 if low <= value <= high,
300                    * 0 otherwise.
301
302        CALCULATOR
303
304                plus                    add arguments
305                minus                   subtract arguments
306                mult                    multiply arguments
307                div                     divide arguments
308                per_cent                divide arguments * 100
309                                        (not rounded; use "fper_cent|round(0)")
310                rest                    divide arguments, take rest
311
312                The above functions perform calculation with integer numbers.
313
314                For most of these functions there also exists a floating-point variant:
315                    * fplus
316                    * fminus
317                    * fmult
318                    * fdiv
319                    * fper_cent
320
321                To avoid 'division by zero'-errors, the operators 'div', 'per_cent' and 'rest'
322                (and their f-variants) return 0, if the second argument is zero.
323
324                Note: all functions above may be used as binary operators
325                (see section 'BOOLEAN OPERATORS' below for concept).
326
327# PREFORMATTED 1
328                round(digits)
329
330                        rounds a floating-point input to the specified amount of digits
331                        behind the floating-point.
332                        Specify zero to round to an integer number.
333                        Specify negative digits to round to multiples of 10, 100, 1000, ...
334
335
336        BOOLEAN OPERATORS
337
338                All input streams are converted to boolean
339                values (i.e. 0 or 1) as follows:
340
341                    "0"         -> 0
342                    any number  -> 1
343                    any text    -> 0 (even empty text!)
344
345                Operators:
346
347                    Not     invert values of all input streams (0<->1)
348                    And     return 1 if all input streams are 1, 0 otherwise
349                    Or      return 1 if one input streams is  1, 0 otherwise
350
351                    Use "|or|not" or "|and|not" to execute NOR or NAND.
352
353        BINARY OPERATORS
354
355               Several operators work as so called 'binary operators'.
356               These operators may be used in various ways, which are
357               shown using the operator 'plus':
358
359                     ACI                OUTPUT                  STREAMS
360                     plus(a,b)          a+b                     input:0 output:1
361                     a;b|plus           a+b                     input:2 output:1
362                     a;b;c;d|plus       a+b;c+d                 input:4 output:2
363                     a;b;c|plus(x)      a+x;b+x;c+x             input:3 output:3
364
365               That means, if the binary operator
366
367                    - has no arguments, it expects an even number of input streams. The operator is
368                      applied to the first 2 streams, then to the second 2 stream and so on.
369                      The number of output streams is half the number of input streams.
370                    - has 1 argument, it accepts one to many input streams. The operator
371                      is applied to each input stream together with the argument.
372                      For each input stream one output stream is generated.
373                    - has 2 arguments, it is applied to these. The arguments are interpreted as
374                      ACI commands and are applied for each input stream. The results of
375                      the commands are passed as arguments to the binary operator. For each input
376                      stream one output stream is generated.
377
378        CONDITIONAL
379
380                select(a,b,c,...)       each input stream is converted into a number
381                                        (non-numeric text converts to zero). That number is
382                                        used to select one of the given arguments:
383                                             0 selects 'a',
384                                             1 selects 'b',
385                                             2 selects 'c' and so on.
386                                        The selected argument is interpreted as ACI command
387                                        and is applied to an empty input stream.
388
389        DEBUGGING
390
391                trace(onoff)            toggle tracing of ACI actions to standard output.
392                                        Parameter: 0 or 1 (switch off or on)
393
394                                        All streams are copied (like 'dd').
395
396                                        Example:
397# PREFORMATTED 1
398                                          cmd1 | cmd2 | trace(1) | tracedCmd1 | tracedCmd2 | trace(0) | untracedCmd
399
400                                        To see the output from trace, either
401                                          * start arb from a terminal or
402                                          * use LINK{console.hlp}
403
404
405        DATABASE AND SEQUENCE
406
407                readdb(field_name)      the contents of the field 'field_name'
408
409                sequence                the sequence in the current alignment.
410
411                                        Note: older ARB versions returned 'no sequence'
412                                        if the current alignment contained no sequence.
413                                        Now it returns an empty string.
414
415                                        For genes it returns only the corresponding part
416                                        of the sequence. If the field complement = 1 then the
417                                        result is the reverse-complement.
418
419                sequence_type           the sequence type of the selected alignment:
420                                        'rna', 'dna' or 'ami'
421                ali_name                the name of the selected alignment (e.g. 'ali_16s')
422
423                Note: Because they ignore all input streams, the commands above make more
424                sense at the beginning of an ACI expression (or subexpression).
425
426                checksum(options)       calculates a CRC checksum
427                                        options:
428                                        "exclude=chrs"    remove 'chrs' before calculation
429                                        "toupper"         make everything uppercase first
430
431                gcgchecksum             a gcg compatible checksum
432
433# PREFORMATTED 1
434                format_sequence(options)
435
436                        takes a long string (e.g. sequence) and breaks it into several lines.
437
438                        option       (default)  description
439                        =============================================================
440                        width=#      (50)       sequence line width
441                        firsttab=#   (10)       first line left indent
442                        tab=#        (10)       left indent (not first line)
443                        numleft      (NO)       numbers on the left side
444                        numright=#   (NO)       numbers on the right side (#=width)
445                        gap=#        (10)       insert a gap every # seq. characters.
446
447                        (see also 'format' above)
448
449# PREFORMATTED 1
450                extract_sequence("chars",rel_len)
451
452                        like extract_words, but do not sort words, but rel_len is the minimum
453                        percentage of characters of a word that mach a character in 'chars'
454                        before word is taken. All words will be separated by white space.
455
456# PREFORMATTED 1
457                taxonomy([treename,] depth)
458
459                        Returns the taxonomy of the current species or group as defined by a tree.
460
461                        If 'treename' is specified, its used as tree, otherwise the 'default tree'
462                        is used (which in most cases is the tree displayed in the ARB main window).
463
464                        'depth' specifies how many "levels" of the taxonomy are used.
465
466        FILTERING
467
468                There are several functions to filter sequential data:
469
470                      - filter
471                      - diff
472                      - change
473
474                All these functions use the following COMMON OPTIONS to define
475                what is used as filter sequence:
476
477                    - species=name
478
479                      Use species 'name' as filter.
480
481                    - SAI=name
482
483                      Use SAI 'name' as filter.
484
485                    - first=1
486
487                      Use 1st input stream as filter for all other input streams.
488
489                    - pairwise=1
490
491                      Use 1st input stream as filter for 2nd stream,
492                      3rd stream as filter for 4th stream, and so on.
493
494                    - align=ali_name
495
496                      Use alignment 'ali_name' instead of current default
497                      alignment (only meaningful together with 'species' or 'SAI').
498
499                    Note: Only one of the parameters 'species', 'SAI', 'first' or 'pairwise' may be used.
500
501# PREFORMATTED 1
502                diff(options)
503
504                        Calculates the difference between the filter (see common options above) and the input stream(s) and
505                        write the result to output stream(s).
506
507                        Additional options:
508
509                        - equal=x
510
511                          Character written to output if filter and stream are equal at
512                          a position (defaults to '.'). To copy the stream contents for
513                          equal columns, specify 'equal=' (directly followed by ',' or ')')
514
515                        - differ=y
516
517                          Character written to output if filter and stream don't match at one column position.
518                          Default is to copy the character from the stream.
519
520# PREFORMATTED 1
521                filter(options)
522
523                        Filters only specified columns out of the input stream(s). You need to
524                        specify either
525
526                        - exclude=xyz
527
528                          to use all columns, where the filter (see common options above) has none
529                          of the characters 'xyz'
530
531                        or
532
533                        - include=xyz
534
535                          to use only columns, where the filter has one of the characters 'xyz'
536
537                        All used columns are concatenated and written to the output stream(s).
538
539
540# PREFORMATTED 1
541                change(options)
542
543                        Randomly modifies the content of columns selected
544                        by the filter (see common options above).
545                        Only columns containing letters will be modified.
546
547                        The options 'include=xyz' and 'exclude=xyz' work like
548                        with 'filter()', but here they select the columns to modify - all other
549                        columns get copied unmodified.
550
551                        How the selected columns are modified, is specified by the following
552                        parameters:
553
554                                - change=percent
555
556                                  percentage of changed columns (default: silently change nothing, to make
557                                  it more difficult for you to ignore this helpfile)
558
559                                - to=xy
560
561                                  randomly change to one of the characters 'xy'.
562
563                                  Hints:
564
565                                        - Use 'xyy' to produce 33% 'x' and 66% 'y'
566                                        - Use 'xxxxxxxxxy' to produce 90% 'x' and 10% 'y'
567                                        - Use 'x' to replace all matching columns by 'x'
568
569                        I think the intention for this (long undocumented) command is to easily generate
570                        artificial sequences with different GC-content, in order to test treeing-software.
571
572        SPECIALS
573
574# PREFORMATTED 1
575                exec(command[,param1,param2,...])
576
577                    Execute external (unix) command.
578
579                    Given params will be single-quoted and passed to the command.
580
581                    All input streams will be concatenated and piped into the command.
582
583                    When the command itself is a pipe, put it in parenthesis (e.g. "(sort|uniq)").
584                    Note: This won't work together with params.
585
586                    The result is the output of the command.
587
588                    WARNING!!!
589
590                        You better not use this command for NDS,
591                        because any slow command will disable all editing -> You never
592                        can remove this command from the NDS. Even arb_panic will not
593                        easily help you.
594
595# PREFORMATTED 1
596                command(action)
597
598                        applies 'action' to all input streams using
599
600                                 - ACI,
601                                 - SRT (if starts with ':') (see LINK{srt.hlp})
602                                 - or as REG (if starts with '/') (see LINK{reg.hlp}).
603
604                        If you nest calls (i.e. if 'action' contains further calls to 'command') you have to apply
605                        escaping multiple times (e.g. inside an export filter - which is in fact an
606                        SRT expression - you'll have to use double escapes).
607
608# PREFORMATTED 1
609                eval(exprEvalToAction)
610
611                        the 'exprEvalToAction' is evaluated (using an empty string as input)
612                        and the result is interpreted as action and gets applied to all
613                        input streams (as in 'command' above).
614
615                        Example: Said you have two numeric positions stored in database
616                        fields 'pos1' and 'pos2' for each species. Then the following
617                        command extracts the sequence data from pos1 to pos2:
618
619# PREFORMATTED 1
620                             sequence|eval(" \"mid(\";readdb(pos1);\";\";readdb(pos2);\")\" ")
621
622                        How the example works:
623
624                            The argument is the escaped version of the
625                            command
626# PREFORMATTED 1
627                                "mid(" ; readdb(pos1) ; ";" ; readdb(pos2) ; ")"
628
629                            If pos1 contains '10' and pos2 contains '20' that command will
630                            evaluate to 'mid(10;20)'.
631
632                            For these positions the executed ACI behaves like 'sequence|mid(10;20)'.
633
634# PREFORMATTED 1
635                define(name,escapedCommand)
636
637                        defines a ACI-macro 'name'. 'escapedCommand' contains an escaped
638                        ACI command sequence. This command sequence can be executed with
639                        do(name).
640
641# PREFORMATTED 1
642                do(name)
643
644                        applies a previously defined ACI-macro to all input streams (see 'define').
645
646                        'define(a,action)' followed by 'do(a)' works similar to 'command(action)'.
647
648                        See embl.eft for an example using define and 'do'
649
650# PREFORMATTED 1
651                findspec(action)
652
653                        Each input stream is interpreted as species 'name' (ID) and a species
654                        with that 'name' is searched (aborts with error if species could not be found;
655                        silently ignores empty streams).
656
657                        Otherwise 'action' is applied (to one empty stream).
658                        Instead of the current item, all database commands inside 'action' use the found species.
659
660# PREFORMATTED 1
661                findacc(action)
662
663                        like findspec, but search for 'acc' instead of 'name'.
664
665# PREFORMATTED 1
666                findgene(action)
667
668                        like findspec, but searches for genes (starting at organism or
669                        at other gene of same organism).
670
671# PREFORMATTED WIDTH DEFAULT
672                origin_organism(action)
673                origin_gene(action)
674# PREFORMATTED RESET
675
676                        like command() but readdb() etc. reads all data from the
677                        origin organism/gene of a gene-species (not from the gene-species itself).
678
679                        This function applies only to gene-species!
680
681SECTION         Future features
682
683# PREFORMATTED 1
684                statistic
685
686                        creates a character statistic of the sequence
687                        (not implemented yet)
688
689EXAMPLES
690
691                Some random ACI expression examples:
692
693# PREFORMATTED 1
694                sequence|format_sequence(firsttab=0;tab=10)|"SEQUENCE_";dd
695
696                                fetches the default sequence, formats it,
697                                and prepends 'SEQUENCE_'.
698
699# PREFORMATTED 1
700                sequence|remove(".-")|format_sequence
701
702                                get the default sequence, remove all '.-' and
703                                format it
704
705# PREFORMATTED 1
706                sequence|remove(".-")|len
707
708                                the number of non '.-' symbols (sequence length )
709
710# PREFORMATTED 1
711                "[";taxonomy(tree_other,3);" -> ";taxonomy(3);"]"
712
713                                shows for each species how their taxonomy
714                                changed between "tree_other" and current tree
715
716# PREFORMATTED 1
717                equals(readdb(tmp),readdb(acc))|select(echo("tmp and acc differ"),)
718
719                                returns 'tmp and acc differ' if the content of
720                                the database fields 'tmp' and 'acc' differs. empty result
721                                otherwise.
722
723# PREFORMATTED 1
724                readdb(full_name)|icontains(bacillus)|compare(0)|select(echo(..),readdb(full_name))
725
726                                returns the content of the 'full_name' database entry if it contains
727                                the substring 'bacillus'. Otherwise returns '..'
728
729
730BUGS            The output of taxonomy() is not always instantly refreshed.
Note: See TracBrowser for help on using the repository browser.