Context Navigation

← Previous Revision
Next Revision →
Blame
Revision Log

reg.hlp

Visit:

Last change on this file was 18769, checked in by westram, 3 years ago
move all helpfiles to new source location
Property svn:eol-style set to `native` Property svn:keywords set to `Author Date Id Revision`
File size: 7.8 KB

Line
1	#Please insert up references in the next lines (line starts with keyword UP)
2	UP arb.hlp
3	UP glossary.hlp
4
5	#Please insert subtopic references (line starts with keyword SUB)
6	SUB srt.hlp
7	SUB aci.hlp
8
9	# Hypertext links in helptext can be added like this: LINK{ref.hlp\|http://add\|bla@domain}
10
11	#*********** Title of helpfile !! and start of real helpfile ******
12	TITLE Regular Expressions (REG)
13
14	OCCURRENCE Many places
15
16	SECTION Ways to use regular expressions
17
18	There are two ways to use regular expressions:
19
20	[1] /Search Regexpr/Replace String/
21	[2] /Search Regexpr/
22
23	[1] searches the input for occurrences of 'Search Regexpr' and
24	replaces every occurrence with 'Replace String'.
25
26	[2] searches the input for the FIRST occurrence of 'Search
27	Regexpr' and returns the found match.
28	If nothing matches, it returns an empty string.
29
30	Notes:
31
32	* You can use regular expressions everywhere where you can use
33	ACI and SRT expressions.
34	* At some places only [2] is available (e.g. in Search&Query).
35	* Normally regular expressions work case sensitive. To make them
36	work case insensitive, simply append an 'i' to the
37	expression (i.e. '/expr/i' or '/expr/repl/i')
38
39	SECTION Syntax of POSIX extended regular expressions as used in ARB
40
41	A regular expression specifies a set of character strings,
42	e.g. the expression '/pseu/i' specifies all strings containing
43	"pseu", "Pseu" or "pSeu" and so on. We say the expression "matches"
44	(a part of) these strings.
45
46	Several characters have special meanings in regular expressions.
47	All other characters just match against themselves.
48
49	Special characters:
50
51	'.' matches any character (e.g. '/h.s/' matches "has" and "his")
52	'[xyz]' matches 'x', 'y' or 'z'
53	'[a-z]' matches all lower case letters
54	'^' matches the beginning of the string
55	(e.g. '/^pseu/i' matches all strings starting with "pseu")
56	'$' matches the end of the string
57	(e.g. '/cens$/i' matches all strings ending in "cens")
58
59	'*' matches the preceding element zero or more times
60	(e.g. '/th*is/' matches "tis", "this", "thhhhhhiss", ..)
61	'?' matches the preceding element zero or one time
62	(e.g. '/th?is/' matches "tis" or "this", but not "thhis")
63	'+' matches the preceding element one or more times
64	(e.g. '/th+is/' matches "this" or "thhhis", but not "tis")
65	'{mi,ma}' matches the preceding element 3 to 5 times
66	(e.g. '/th{2,4}is/' matches "thhis", "thhhis" or "thhhhis")
67
68	'\|' marks an alternative
69	(e.g. '/bacter\|spiri/i' matches all strings containing "bacter" or "spiri")
70
71	'()' marks a subexpression. Subexpressions can be used to separate alternatives
72	or to mark parts for use in the replace expression (see below).
73
74	(e.g. '/bact\|spiri.cens/' match '/bact/' or '/spiri.cens/',
75	whereas '/(bact\|spiri).cens/' match '/bact.cens/' or '/spiri.*cens/')
76
77	To match against special characters themselves, escape them
78	using a '\' (e.g. '/\/' matches the character "", '/\\/' matches "\")
79
80
81	Character classes:
82
83	[...] is called a character class. It matches against any of the characters
84	listed in between the brackets.
85	[^...] If the character class starts with '^' it matches against any character
86	NOT listed (e.g. '[^78]' matches all but '7' or '8')
87	[5-9] If the character class contains a '-' it is interpreted as "range of characters".
88	Here '5-9' is equivalent to '56789'.
89	You may mix ranges and single characters, e.g. '14-79' is same as '145679',
90	'7-91-3' is same as '789123'.
91
92	To add special characters to a character class, escape them using '\'.
93
94	There are several special predefined character classes like
95	* [:alpha:] = [a-zA-Z]
96	* [:digit:] = [0-9]
97	* [:alnum:] = [[:alpha:][:digit:]]
98	* [:punct:] = Punctuation characters
99	* [:print:] = Visible characters and the space character
100	* [:blank:] = Space and tab
101	* [:space:] = Whitespace characters (including newlines)
102	* [:cntrl:] = Control characters
103
104	Use these inside brackets (e.g. '/[[:cntrl:]]//' will remove all control characters).
105	See links below for details.
106
107
108	Links:
109
110	* A more in-depth explanation of POSIX extended regular expressions can be
111	found at LINK{http://en.wikipedia.org/wiki/Regular_expression#POSIX}.
112	* Many examples are given in this guide: LINK{http://www.digitalamit.com/article/regular_expression.phtml}
113
114	Notes:
115
116	* if an expression matches one string multiple times, the longest leftmost
117	match is used (e.g: '/ae/' matches 'aaeee' at position 3 of the
118	string 'bbaaeeeffaegg', not 'ae' at position 10).
119
120
121	SECTION Special syntax for search and replace
122
123	Syntax: '/regexp/replace/'
124
125	The part of the input string matched by 'regexp' gets replaced by 'replace'.
126
127	Simple example:
128
129	Input string: 'The quick brown fox jumps over the lazy dog'
130	Search&replace: '/fox\|dog/cat/'
131	Result: 'The quick brown cat jumps over the lazy cat'
132
133	Additionally the match (or parts of it) can be referenced in the replace string:
134
135	\0 refers to the whole match
136	\1 refers to the first subexpression
137	\2 refers to the second subexpression
138	...
139	\9 refers to the ninth subexpression
140
141	Example using refs:
142
143	Input string: 'The quick brown fox jumps over the lazy dog'
144	Search&replace: '/(brown\|lazy)\s+(fox\|dog)/\2 \1/'
145	Result: 'The quick fox brown jumps over the dog lazy'
146
147	WARNINGS POSIX extended regular expressions are not greedy, i.e. an expression
148	like '_*' does normally match an empty string (if used w/o context).
149
150	This makes some replacements difficult, e.g. if you have data containing
151	multiple consecutive characters and you'd like to replace these.
152	The expression "/_*/_/" does not work as expected and reports
153	an error: "regular expression '_*' matched an empty string".
154
155	A workaround is the following expression:
156	"/(_+)([^_]\|$)/_\2/"
157
158	Other, simpler workarounds do use the BOL/EOL operators ('^'/'$'),
159	e.g. to remove all trailing underscores:
160	"/_*$//"
161
162	Or all leading underscores:
163	"/^_*//"
164
165	BUGS No bugs known
166

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format