source: branches/stable/HELP_SOURCE/oldhelp/reg.hlp

Last change on this file was 18642, checked in by westram, 4 years ago
  • tweak replacement via regular expressions:
    • allow replacement of whole line regexpr which matches empty source string.
    • test to replace whole string by nothing.
    • always fail BOL/EOL operator starting from second replace-operation. fixes some failing replacements.
  • doc: describe other simpler, now working replacements.
  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 7.2 KB
Line 
1#Please insert up references in the next lines (line starts with keyword UP)
2UP      arb.hlp
3UP      glossary.hlp
4
5#Please insert subtopic references  (line starts with keyword SUB)
6SUB     srt.hlp
7SUB     aci.hlp
8
9# Hypertext links in helptext can be added like this: LINK{ref.hlp|http://add|bla@domain}
10
11#************* Title of helpfile !! and start of real helpfile ********
12TITLE           Regular Expressions (REG)
13
14OCCURRENCE      Many places
15
16SECTION         Ways to use regular expressions
17
18                There are two ways to use regular expressions:
19
20                [1]     /Search Regexpr/Replace String/
21                [2]     /Search Regexpr/
22
23                [1] searches the input for occurrences of 'Search Regexpr' and
24                replaces every occurrence with 'Replace String'.
25
26                [2] searches the input for the FIRST occurrence of 'Search
27                Regexpr' and returns the found match.
28                If nothing matches, it returns an empty string.
29
30                Notes:
31
32                * You can use regular expressions everywhere where you can use
33                  ACI and SRT expressions.
34                * At some places only [2] is available (e.g. in Search&Query).
35                * Normally regular expressions work case sensitive. To make them
36                  work case insensitive, simply append an 'i' to the
37                  expression (i.e. '/expr/i' or '/expr/repl/i')
38
39SECTION         Syntax of POSIX extended regular expressions as used in ARB
40
41                A regular expression specifies a set of character strings,
42                e.g. the expression '/pseu/i' specifies all strings containing
43                "pseu", "Pseu" or "pSeu" and so on. We say the expression "matches"
44                (a part of) these strings.
45
46                Several characters have special meanings in regular expressions.
47                All other characters just match against themselves.
48
49                Special characters:
50
51                  '.'          matches any character (e.g. '/h.s/' matches "has" and "his")
52                  '[xyz]'      matches 'x', 'y' or 'z'
53                  '[a-z]'      matches all lower case letters
54                  '^'          matches the beginning of the string
55                               (e.g. '/^pseu/i' matches all strings starting with "pseu")
56                  '$'          matches the end of the string
57                               (e.g. '/cens$/i' matches all strings ending in "cens")
58
59                  '*'          matches the preceding element zero or more times
60                               (e.g. '/th*is/' matches "tis", "this", "thhhhhhiss", ..)
61                  '?'          matches the preceding element zero or one time
62                               (e.g. '/th?is/' matches "tis" or "this", but not "thhis")
63                  '+'          matches the preceding element one or more times
64                               (e.g. '/th+is/' matches "this" or "thhhis", but not "tis")
65                  '{mi,ma}'    matches the preceding element 3 to 5 times
66                               (e.g. '/th{2,4}is/' matches "thhis", "thhhis" or "thhhhis")
67
68                  '|'          marks an alternative
69                               (e.g. '/bacter|spiri/i' matches all strings containing "bacter" or "spiri")
70
71                  '()'         marks a subexpression. Subexpressions can be used to separate alternatives
72                               or to mark parts for use in the replace expression (see below).
73
74                               (e.g.    '/bact|spiri.*cens/'   match '/bact/'       or '/spiri.*cens/',
75                                whereas '/(bact|spiri).*cens/' match '/bact.*cens/' or '/spiri.*cens/')
76
77                  To match against special characters themselves, escape them
78                  using a '\' (e.g. '/\*/' matches the character "*", '/\\/' matches "\")
79
80
81                Character classes:
82
83                  [...]      is called a character class. It matches against any of the characters
84                             listed in between the brackets.
85                  [^...]     If the character class starts with '^' it matches against any character
86                             NOT listed (e.g. '[^78]' matches all but '7' or '8')
87                  [5-9]      If the character class contains a '-' it is interpreted as "range of characters".
88                             Here '5-9' is equivalent to '56789'.
89                             You may mix ranges and single characters, e.g. '14-79' is same as '145679',
90                             '7-91-3' is same as '789123'.
91
92                  To add special characters to a character class, escape them using '\'.
93
94                  There are several special predefined character classes like '[:word:]' or
95                  '[:punct:]'. See link below for details.
96
97
98                Links:
99
100                * A more in-depth explanation of POSIX extended regular expressions can be
101                  found at LINK{http://en.wikipedia.org/wiki/Regular_expression#POSIX}.
102                * Many examples are given in this guide: LINK{http://www.digitalamit.com/article/regular_expression.phtml}
103
104                Notes:
105
106                * if an expression matches one string multiple times, the longest leftmost
107                  match is used (e.g: '/a*e*/' matches 'aaeee' at position 3 of the
108                  string 'bbaaeeeffaegg', not 'ae' at position 10).
109
110
111SECTION         Special syntax for search and replace
112
113                Syntax: '/regexp/replace/'
114
115                       The part of the input string matched by 'regexp' gets replaced by 'replace'.
116
117                Simple example:
118
119                       Input string:    'The quick brown fox jumps over the lazy dog'
120                       Search&replace:  '/fox|dog/cat/'
121                       Result:          'The quick brown cat jumps over the lazy cat'
122
123                Additionally the match (or parts of it) can be referenced in the replace string:
124
125                             \0        refers to the whole match
126                             \1        refers to the first subexpression
127                             \2        refers to the second subexpression
128                             ...
129                             \9        refers to the ninth subexpression
130
131                Example using refs:
132
133                       Input string:    'The quick brown fox jumps over the lazy dog'
134                       Search&replace:  '/(brown|lazy)\s+(fox|dog)/\2 \1/'
135                       Result:          'The quick fox brown jumps over the dog lazy'
136
137WARNINGS        POSIX extended regular expressions are not greedy, i.e. an expression
138                like '_*' does normally match an empty string (if used w/o context).
139
140                This makes some replacements difficult, e.g. if you have data containing
141                multiple consecutive characters and you'd like to replace these.
142                The expression "/_*/_/" does not work as expected and reports
143                an error: "regular expression '_*' matched an empty string".
144
145                A workaround is the following expression:
146                             "/(_+)([^_]|$)/_\2/"
147
148                Other, simpler workarounds do use the BOL/EOL operators ('^'/'$'),
149                e.g. to remove all trailing underscores:
150                             "/_*$//"
151
152                Or all leading underscores:
153                             "/^_*//"
154
155BUGS            No bugs known
156
Note: See TracBrowser for help on using the repository browser.