Context Navigation

← Previous Revision
Next Revision →
Blame
Revision Log

aci.hlp

Visit:

Last change on this file was 18769, checked in by westram, 4 years ago
move all helpfiles to new source location
Property svn:eol-style set to `native` Property svn:keywords set to `Author Date Id Revision`
File size: 29.7 KB

Line
1	#Please insert up references in the next lines (line starts with keyword UP)
2	UP arb.hlp
3	UP glossary.hlp
4
5	#Please insert subtopic references (line starts with keyword SUB)
6	SUB exec_bug.hlp
7
8	# Hypertext links in helptext can be added like this: LINK{ref.hlp\|http://add\|bla@domain}
9
10	#*********** Title of helpfile !! and start of real helpfile ******
11
12	TITLE ARB Command Interpreter (ACI)
13
14	OCCURRENCE NDS
15	[ export db ]
16	[ ARB_NT/Species/search/parse_fields ]
17
18	DESCRIPTION ACI is a simple command interpreter, which uses streams of data as central concept.
19
20	Many ACI commands have parameters which are specified behind
21	the command in parenthesis.
22
23	All ACI commands
24	* take the data from (one or multiple) input streams,
25	* modify that data and
26	* write that data to (one or multiple) output streams.
27
28	e.g. the command 'count("a")' counts every 'a' for each input stream and
29	generates one output stream (containing the char count) for every input stream.
30
31	The first input stream always is a single stream,
32	often the value of a database field (e.g. when ACI is used in LINK{props_nds.hlp}).
33
34	The number of output streams depends on the used command:
35	* most commands produce one output stream for each input stream (as the count-example above)
36	* some commands combine two input streams into one output stream (e.g. see binary operators below)
37	* some commands ignore all input streams and create one output streams (e.g. 'readdb(fieldname)')
38	* Note: special stream related commands are documented in section 'STREAM HANDLING'
39
40	Multiple commands can be separated by two operator symbols: ';' and '\|'.
41	* ';' binds stronger than '\|'
42	* commands separated by ';' form a command-list and operate independently from each other:
43	- all(!) commands use all(!) input streams
44	- each command generates its own output streams
45	* the '\|' operator acts as processing sequence point, i.e.
46	- all output streams generated by the command-list on the left side of the '\|' will be passed
47	- as input streams to the command-list on the right side of the '\|'.
48
49	Finally (at the end of the overall ACI expression) all generated output streams get concatenated.
50
51	Typical uses are to
52	* show text at the tips of the tree (LINK{props_nds.hlp})
53	* write information into database fields (LINK{mod_field_list.hlp})
54
55	Instead of using ACI commands (as described in this document) you may always use any of the other
56	integrated data processing languages. Simply prefix the command
57
58	* with a ':' to use LINK{srt.hlp}
59	* with a '/' to use LINK{reg.hlp}
60
61	Both are as well available inside ACI via the commands 'srt' and 'command', see below.
62
63	SECTION Examples
64
65	count("A");count("AG")
66
67	creates two streams:
68
69	1. how many A's
70	2. and how many A's and G's
71
72	count("A");count("G")\|per_cent
73
74	per_cent is a command that divides two numbers
75	(number of 'A's / number of 'G's) and returns the result
76	as percent.
77
78	SECTION Example data flow
79
80	eg: count("A");count("G")\|"a/g = "; per_cent
81
82	input concatenate output
83	"AGG" ----> count("A") -->\| -----> "a/g = " --> \| --> "a/g = " ---> 'a/g = 50'
84	\ \| \ / \| /
85	\ \| \ \| /
86	\ \| / \ \| /
87	-> count("G") -->\| -----> per_cent --> \| --> "50" ---
88
89
90	SECTION PARAMETERS
91
92	Several commands expect or accept additional parameters in
93	parenthesis (e.g. 'remove(aA)').
94
95	Multiple parameters have to be separated by ',' or ';'.
96
97	There are two distinct ways to specify such a parameter:
98	- unquoted
99
100	Unquoted parameters are taken as specified, despite the following exceptions:
101	- ',;"\|\)' need to be escaped by prefixing one '\'
102	- spaces will get removed if unprefixed by '\'
103
104	- quoted
105
106	Quoted parameters begin and end with a '"'. You can use any character,
107	but you need to escape '\' and '"' by preceeding a '\'.
108
109	Example: 'remove("\"")' will remove all double quotes from input.
110	'remove("\\")' will remove all backslashes from input.
111
112	[@@@ behavior currently not strictly implemented]
113
114	SECTION COMMANDLIST
115
116	If not explicitely mentioned, every command
117	creates one output stream for each input stream.
118
119	STREAM HANDLING
120
121	echo(x1;x2;x3...) creates one output stream from each specified parameter
122	(parameters are separated by ';').
123
124	"text" same as 'echo("text")'
125
126	dd copies all input streams to output streams
127
128	cut(N1,N2,N3) copies the Nth input stream(s)
129
130	drop(N1,N2) copies all but the Nth input stream(s)
131
132	dropempty drops all empty input streams
133
134	dropzero drops all non-numeric or zero input streams
135
136	swap(N1,N2) swaps two input streams
137	(w/o parameters: swaps last two streams)
138
139	toback(X) moves the Xth input stream
140	to the end of output streams
141
142	tofront(X) moves the Xth input stream
143	to the start of output streams
144
145	merge([sep]) merges all input streams into one output stream.
146	If 'sep' is specified, it's inserted between them.
147	If no input streams are given, it returns 1 empty
148	output stream.
149
150	split([sep[,mode]]) splits all input streams at separator string 'sep'
151	(default: split at linefeed).
152
153	Modes:
154
155	0 remove found separators (default)
156	1 split before separator
157	2 split after separator
158
159	colsplit([width]) splits each input stream into multiple streams of the specified
160	width (or shorter for last output stream).
161	The default width is 1.
162
163	streams returns the number of input streams
164
165	STRING
166
167	head(n) the first n characters
168	left(n) the first n characters
169
170	tail(n) the last n characters
171	right(n) the last n characters
172
173	the above functions return an empty string for n<=0
174
175	len the length of the input
176
177	len("chr") the length of the input excluding the
178	characters in 'chr'
179
180	mid(x,y) the substring string from position x to y
181
182	Allowed positions are
183	- [1..N] for mid()
184	- [0..N-1] for mid0()
185
186	A position below that range is relative to the end of the string,
187	i.e. mid(-2,0) and mid0(-3,-1) are equiv to tail(3)
188
189	crop("str") removes characters of 'str' from
190	both ends of the input
191
192	remove("str") removes all characters of 'str'
193	e.g. remove(" ") removes all blanks
194
195	keep("str") the opposite of remove:
196	remove all chars that are not a member
197	of 'str'
198
199	isEmpty return '1' for each empty input stream, '0' for others
200
201	srt("orig=dest",...) replace command, invokes SRT
202	(see LINK{srt.hlp})
203
204	translate("old","new"[,"other"])
205
206	translates all characters from input that occur in the
207	first argument ("old") by the corresponding character of the
208	second argument ("new").
209
210	An optional third argument (one character only) means:
211	replace all other characters with the third argument.
212
213	Example:
214
215	Input: "--AabBCcxXy--"
216	translate("abc-","xyz-") "--AxyBCzxXy--"
217	translate("abc-","xyz-",".") "--.xy..z...--"
218
219	This can be used to replace illegal characters from sequence date
220	(see predefined expressions in 'Modify fields of listed species').
221
222
223	tab(n) append n-len(input) spaces
224
225	pretab(n) prepend n-len(input) spaces
226
227	upper converts string to upper case
228	lower converts string to lower case
229	caps capitalizes string
230
231	format(options)
232
233	takes a long string and breaks it into several lines
234
235	option (default) description
236	==========================================================
237	width=# (50) line width
238	firsttab=# (10) first line left indent
239	tab=# (10) left indent (not first line)
240	"nl=chrs" (" ") list of characters that specify
241	a possibly point of a line break;
242	the line break characters get removed!
243	"forcenl=chrs" ("\n") Force a newline at these characters.
244
245	(see also format_sequence below)
246
247	extract_words("chars",val)
248
249	Search for all words (separated by ',' ';' ':' ' ' or 'tab') that
250	contain more characters of type chars than val, sort them
251	alphabetically and write them separated by ' ' to the output
252
253	ESCAPING AND QUOTING
254
255	escape escapes all occurrences of '\' and '"' by preceeding a '\'
256	quote quotes the input in '"'
257
258	unescape inverse of escape
259	unquote removes quotes (if present). otherwise return input
260
261
262	STRING COMPARISON
263
264	compare(a,b) return -1 if a<b, 0 if a=b, 1 if a>b
265	equals(a,b) return 1 if a=b, 0 otherwise
266	contains(a,b) if a contains b, this returns the position of
267	b inside a (1..N) and 0 otherwise.
268	Always returns 0 if b is empty.
269	partof(a,b) if a is part of b, this returns the position of
270	a inside b (1..N) and 0 otherwise.
271
272	The above functions are binary operators (see below).
273	For each of them a case-insensitive alternative exists (icompare, iequals, ...).
274
275	NUMERIC COMPARISON
276
277	All functions here operate with floating-point numbers.
278
279	isBelow(a,b) return 1 if a<b, 0 otherwise
280	isAbove(a,b) return 1 if a>b, 0 otherwise
281	isEqual(a,b) return 1 if a=b, 0 otherwise
282
283	The above functions are binary operators (see below).
284
285	inRange(low,high)
286
287	for values of each input stream:
288	return 1 if low <= value <= high, 0 otherwise
289
290	CALCULATOR
291
292	plus add arguments
293	minus subtract arguments
294	mult multiply arguments
295	div divide arguments
296	per_cent divide arguments * 100 (not rounded; use "fper_cent\|round(0)")
297	rest divide arguments, take rest
298
299	Calculation is performed with integer numbers.
300	For most of these functions a floating-point variant exists:
301	* fplus
302	* fminus
303	* fmult
304	* fdiv
305	* fper_cent
306
307	round(digits)
308
309	rounds a floating-point input to the given numbers of digits
310	behind the floating-point.
311	Specify zero to round to an integer number.
312	Specify negative digits to round to multiples of 10, 100, 1000, ...
313
314
315	To avoid 'division by zero'-errors, the operators 'div', 'per_cent' and 'rest'
316	return 0 if the second argument is zero.
317
318	The above functions work as binary operators (see below).
319
320	BOOLEAN OPERATORS
321
322	All input streams are converted to boolean
323	values (i.e. 0 or 1) as follows:
324
325	"0" -> 0
326	any number -> 1
327	any text -> 0 (even empty text!)
328
329	Operators:
330
331	Not invert values of all input streams (0<->1)
332	And return 1 if all input streams are 1, 0 otherwise
333	Or return 1 if one input streams is 1, 0 otherwise
334
335	Use "\|or\|not" or "\|and\|not" to execute NOR or NAND.
336
337	BINARY OPERATORS
338
339	Several operators work as so called 'binary operators'.
340	These operators may be used in various ways, which are
341	shown using the operator 'plus':
342
343	ACI OUTPUT STREAMS
344	plus(a,b) a+b input:0 output:1
345	a;b\|plus a+b input:2 output:1
346	a;b;c;d\|plus a+b;c+d input:4 output:2
347	a;b;c\|operator(x) a+x;b+x;c+x input:3 output:3
348
349	That means, if the binary operator
350
351	- has no arguments, it expects an even number of input streams. The operator is
352	applied to the first 2 streams, then to the second 2 stream and so on.
353	The number of output streams is half the number of input streams.
354	- has 1 argument, it accepts one to many input streams. The operator
355	is applied to each input stream together with the argument.
356	For each input stream one output stream is generated.
357	- has 2 arguments, it is applied to these. The arguments are interpreted as
358	ACI commands and are applied for each input stream. The results of
359	the commands are passed as arguments to the binary operator. For each input
360	stream one output stream is generated.
361
362	CONDITIONAL
363
364	select(a,b,c,...) each input stream is converted into a number
365	(non-numeric text converts to zero). That number is
366	used to select one of the given arguments:
367	0 selects 'a',
368	1 selects 'b',
369	2 selects 'c' and so on.
370	The selected argument is interpreted as ACI command
371	and is applied to an empty input stream.
372
373	DEBUGGING
374
375	trace(onoff) toggle tracing of ACI actions to standard output.
376	Parameter: 0 or 1 (switch off or on)
377
378	All streams are copied (like 'dd').
379
380	Example: "cmd1 \| cmd2 \| trace(1) \| tracedCmd1 \| tracedCmd2 \| trace(0) \| untracedCmd "
381
382	To see the output from trace, either
383	* start arb from a terminal or
384	* use LINK{console.hlp}
385
386
387	DATABASE AND SEQUENCE
388
389	readdb(field_name) the contents of the field 'field_name'
390
391	sequence the sequence in the current alignment.
392
393	Note: older ARB versions returned 'no sequence'
394	if the current alignment contained no sequence.
395	Now it returns an empty string.
396
397	For genes it returns only the corresponding part
398	of the sequence. If the field complement = 1 then the
399	result is the reverse-complement.
400
401	sequence_type the sequence type of the selected alignment ('rna','dna',..)
402	ali_name the name of the selected alignment (e.g. 'ali_16s')
403
404	Note: The commands above only work at the beginning of the ACI expression.
405
406	checksum(options) calculates a CRC checksum
407	options:
408	"exclude=chrs" remove 'chrs' before calculation
409	"toupper" make everything uppercase first
410
411	gcgchecksum a gcg compatible checksum
412
413	format_sequence(options)
414
415	takes a long string ( sequence ) and breaks it into several lines
416
417	option (default) description
418	=============================================================
419	width=# (50) sequence line width
420	firsttab=# (10) first line left indent
421	tab=# (10) left indent (not first line)
422	numleft (NO) numbers on the left side
423	numright=# (NO) numbers on the right side (#=width)
424	gap=# (10) insert a gap every # seq. characters.
425
426	(see also 'format' above)
427
428	extract_sequence("chars",rel_len)
429
430	like extract_words, but do not sort words, but rel_len is the minimum
431	percentage of characters of a word that mach a character in 'chars'
432	before word is taken. All words will be separated by white space.
433
434	taxonomy([treename,] depth)
435
436	Returns the taxonomy of the current species or group as defined by a tree.
437
438	If 'treename' is specified, its used as tree, otherwise the 'default tree'
439	is used (which in most cases is the tree displayed in the ARB_NT main window).
440
441	'depth' specifies how many "levels" of the taxonomy are used.
442
443	FILTERING
444
445	There are several functions to filter sequential data:
446
447	- filter
448	- diff
449	- gc
450
451	All these functions use the following COMMON OPTIONS to define
452	what is used as filter sequence:
453
454	- species=name
455
456	Use species 'name' as filter.
457
458	- SAI=name
459
460	Use SAI 'name' as filter.
461
462	- first=1
463
464	Use 1st input stream as filter for all other input streams.
465
466	- pairwise=1
467
468	Use 1st input stream as filter for 2nd stream,
469	3rd stream as filter for 4th stream, and so on.
470
471	- align=ali_name
472
473	Use alignment 'ali_name' instead of current default
474	alignment (only meaningful together with 'species' or 'SAI').
475
476	Note: Only one of the parameters 'species', 'SAI', 'first' or 'pairwise' may be used.
477
478	diff(options)
479
480	Calculates the difference between the filter (see common options above) and the input stream(s) and
481	write the result to output stream(s).
482
483	Additional options:
484
485	- equal=x
486
487	Character written to output if filter and stream are equal at
488	a position (defaults to '.'). To copy the stream contents for
489	equal columns, specify 'equal=' (directly followed by ',' or ')')
490
491	- differ=y
492
493	Character written to output if filter and stream don't match at one column position.
494	Default is to copy the character from the stream.
495
496	filter(options)
497
498	Filters only specified columns out of the input stream(s). You need to
499	specify either
500
501	- exclude=xyz
502
503	to use all columns, where the filter (see common options above) has none
504	of the characters 'xyz'
505
506	or
507
508	- include=xyz
509
510	to use only columns, where the filter has one of the characters 'xyz'
511
512	All used columns are concatenated and written to the output stream(s).
513
514
515	change(options)
516
517	Randomly modifies the content of columns selected
518	by the filter (see common options above).
519	Only columns containing letters will be modified.
520
521	The options 'include=xyz' and 'exclude=xyz' work like
522	with 'filter()', but here they select the columns to modify - all other
523	columns get copied unmodified.
524
525	How the selected columns are modified, is specified by the following
526	parameters:
527
528	- change=percent
529
530	percentage of changed columns (default: silently change nothing, to make
531	it more difficult for you to ignore this helpfile)
532
533	- to=xy
534
535	randomly change to one of the characters 'xy'.
536
537	Hints:
538
539	- Use 'xyy' to produce 33% 'x' and 66% 'y'
540	- Use 'xxxxxxxxxy' to produce 90% 'x' and 10% 'y'
541	- Use 'x' to replace all matching columns by 'x'
542
543	I think the intention for this (long undocumented) command is to easily generate
544	artificial sequences with different GC-content, in order to test treeing-software.
545
546	SPECIALS
547
548	exec(command[,param1,param2,...])
549
550	Execute external (unix) command.
551
552	Given params will be single-quoted and passed to the command.
553
554	All input streams will be concatenated and piped into the command.
555
556	When the command itself is a pipe, put it in parenthesis (e.g. "(sort\|uniq)").
557	Note: This won't work together with params.
558
559	The result is the output of the command.
560
561	WARNING!!!
562
563	You better not use this command for NDS,
564	because any slow command will disable all editing -> You never
565	can remove this command from the NDS. Even arb_panic will not
566	easily help you.
567
568	command(action)
569
570	applies 'action' to all input streams using
571
572	- ACI,
573	- SRT (if starts with ':') (see LINK{srt.hlp})
574	- or as REG (if starts with '/') (see LINK{reg.hlp}).
575
576	If you nest calls (i.e. if 'action' contains further calls to 'command') you have to apply
577	escaping multiple times (e.g. inside an export filter - which is in fact an
578	SRT expression - you'll have to use double escapes).
579
580	eval(exprEvalToAction)
581
582	the 'exprEvalToAction' is evaluated (using an empty string as input)
583	and the result is interpreted as action and gets applied to all
584	input streams (as in 'command' above).
585
586	Example: Said you have two numeric positions stored in database fields
587	'pos1' and 'pos2' for each species. Then the following command
588	extracts the sequence data from pos1 to pos2:
589
590	'sequence\|eval(" \"mid(\";readdb(pos1);\";\";readdb(pos2);\")\" ")'
591
592	How the example works:
593
594	The argument is the escaped version of the
595	command '"mid(" ; readdb(pos1) ; ";" ; readdb(pos2) ; ")"'.
596
597	If pos1 contains '10' and pos2 contains '20' that command will
598	evaluate to 'mid(10;20)'.
599
600	For these positions the executed ACI behaves like 'sequence\|mid(10;20)'.
601
602	define(name,escapedCommand)
603
604	defines a ACI-macro 'name'. 'escapedCommand' contains an escaped
605	ACI command sequence. This command sequence can be executed with
606	do(name).
607
608	do(name)
609
610	applies a previously defined ACI-macro to all input streams (see 'define').
611
612	'define(a,action)' followed by 'do(a)' works similar to 'command(action)'.
613
614	See embl.eft for an example using define and 'do'
615
616	findspec(action)
617
618	Each input stream is interpreted as species 'name' (ID) and a species
619	with that 'name' is searched (aborts with error if species could not be found;
620	silently ignores empty streams).
621
622	Otherwise 'action' is applied (to one empty stream).
623	Instead of the current item, all database commands inside 'action' use the found species.
624
625	findacc(action)
626
627	like findspec, but search for 'acc' instead of 'name'.
628
629	findgene(action)
630
631	like findspec, but searches for genes (starting at organism or
632	at other gene of same organism).
633
634	origin_organism(action)
635	origin_gene(action)
636
637	like command() but readdb() etc. reads all data from the
638	origin organism/gene of a gene-species (not from the gene-species itself).
639
640	This function applies only to gene-species!
641
642	SECTION Future features
643
644	statistic
645
646	creates a character statistic of the sequence
647	(not implemented yet)
648
649	EXAMPLES sequence\|format_sequence(firsttab=0;tab=10)\|"SEQUENCE_";dd
650
651	fetches the default sequence, formats it,
652	and prepends 'SEQUENCE_'.
653
654	sequence\|remove(".-")\|format_sequence
655
656	get the default sequence, remove all '.-' and
657	format it
658
659	sequence\|remove(".-")\|len
660
661	the number of non '.-' symbols (sequence length )
662
663	"[";taxonomy(tree_other,3);" -> ";taxonomy(3);"]"
664
665	shows for each species how their taxonomy
666	changed between "tree_other" and current tree
667
668	equals(readdb(tmp),readdb(acc))\|select(echo("tmp and acc differ"),)
669
670	returns 'tmp and acc differ' if the content of
671	the database fields 'tmp' and 'acc' differs. empty result
672	otherwise.
673
674	readdb(full_name)\|icontains(bacillus)\|compare(0)\|select(echo(..),readdb(full_name))
675
676	returns the content of the 'full_name' database entry if it contains
677	the substring 'bacillus'. Otherwise returns '..'
678
679
680	BUGS The output of taxonomy() is not always instantly refreshed.

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format