source: branches/lib/HELP_SOURCE/source/reg.hlp

Last change on this file was 19575, checked in by westram, 3 weeks ago
  • reintegrates 'help' into 'trunk'
    • preformatted text gets checked for width now (to enforce it fits into the arb help window).
    • fixed help following these checks, using the following steps:
      • ignore problems in foreign documentation.
      • increase default help window width.
      • introduce control comments to
        • accept oversized preformatted sections.
        • enforce preformatted style for whole sections.
        • simply define single-line preformatted sections
          Used intensive for definition of internal script languages.
    • fixed several non-related problems found in documentation.
    • minor layout changes for HTML version of arb help (more compacted; highlight anchored/all sections).
    • refactor system interface (GUI version) and use it from help module.
  • adds: log:branches/help@19532:19574
  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
File size: 7.8 KB
Line 
1#       main topics:
2UP      arb.hlp
3UP      glossary.hlp
4
5#       sub topics:
6SUB     srt.hlp
7SUB     aci.hlp
8
9# format described in ../help.readme
10
11
12TITLE           Regular Expressions (REG)
13
14OCCURRENCE      Many places
15
16SECTION         Ways to use regular expressions
17
18                There are two ways to use regular expressions:
19
20                [1]     /Search Regexpr/Replace String/
21                [2]     /Search Regexpr/
22
23                [1] searches the input for occurrences of 'Search Regexpr' and
24                replaces every occurrence with 'Replace String'.
25
26                [2] searches the input for the FIRST occurrence of 'Search
27                Regexpr' and returns the found match.
28                If nothing matches, it returns an empty string.
29
30                Notes:
31
32                * You can use regular expressions everywhere where you can use
33                  ACI and SRT expressions.
34                * At some places only [2] is available (e.g. in Search&Query).
35                * Normally regular expressions work case sensitive. To make them
36                  work case insensitive, simply append an 'i' to the
37                  expression (i.e. '/expr/i' or '/expr/repl/i')
38
39SECTION         Syntax of POSIX extended regular expressions as used in ARB
40
41                A regular expression specifies a set of character strings,
42                e.g. the expression '/pseu/i' specifies all strings containing
43                "pseu", "Pseu" or "pSeu" and so on. We say the expression "matches"
44                (a part of) these strings.
45
46                Several characters have special meanings in regular expressions.
47                All other characters just match against themselves.
48
49                Special characters:
50
51                  '.'          matches any character (e.g. '/h.s/' matches "has" and "his")
52                  '[xyz]'      matches 'x', 'y' or 'z'
53                  '[a-z]'      matches all lower case letters
54                  '^'          matches the beginning of the string
55                               (e.g. '/^pseu/i' matches all strings starting with "pseu")
56                  '$'          matches the end of the string
57                               (e.g. '/cens$/i' matches all strings ending in "cens")
58
59                  '*'          matches the preceding element zero or more times
60                               (e.g. '/th*is/' matches "tis", "this", "thhhhhhiss", ..)
61                  '?'          matches the preceding element zero or one time
62                               (e.g. '/th?is/' matches "tis" or "this", but not "thhis")
63                  '+'          matches the preceding element one or more times
64                               (e.g. '/th+is/' matches "this" or "thhhis", but not "tis")
65                  '{mi,ma}'    matches the preceding element 3 to 5 times
66                               (e.g. '/th{2,4}is/' matches "thhis", "thhhis" or "thhhhis")
67
68                  '|'          marks an alternative
69
70                               Example: '/bacter|spiri/i' matches all strings containing
71                               either "bacter" or "spiri".
72
73                  '()'         marks a subexpression.
74
75                               Subexpressions can be used to separate alternatives or to mark parts
76                               for reference in the replace expression (see section about
77                               replacement below).
78
79                               Examples:
80                               * '/bact|spiri.*cens/'
81
82                                 matches '/bact/' or '/spiri.*cens/'.
83
84                               * whereas '/(bact|spiri).*cens/'
85
86                                 matches '/bact.*cens/' or '/spiri.*cens/'.
87
88                  To match against special characters themselves, escape them
89                  using a '\' (e.g. '/\*/' matches the character "*", '/\\/' matches "\")
90
91
92                Character classes:
93
94                  [...]      is called a character class. It matches against any of the characters
95                             listed in between the brackets.
96                  [^...]     If the character class starts with '^' it matches against any character
97                             NOT listed (e.g. '[^78]' matches all but '7' or '8')
98                  [5-9]      When the character class contains a '-', it will be interpreted as
99                             "range of characters". Here '5-9' is equivalent to '56789'.
100                             You may mix ranges and single characters,
101                             e.g. '14-79' is same as '145679', '7-91-3' is same as '789123'.
102
103                  To add special characters to a character class, escape them using '\'.
104
105                  There are several special predefined character classes like
106                    * [:alpha:] = [a-zA-Z]
107                    * [:digit:] = [0-9]
108                    * [:alnum:] = [[:alpha:][:digit:]]
109                    * [:punct:] = Punctuation characters
110                    * [:print:] = Visible characters and the space character
111                    * [:blank:] = Space and tab
112                    * [:space:] = Whitespace characters (including newlines)
113                    * [:cntrl:] = Control characters
114
115                  Use these inside brackets (e.g. '/[[:cntrl:]]//' will remove all control characters).
116                  See links below for details.
117
118
119                Links:
120
121                * A more in-depth explanation of POSIX extended regular expressions can be
122                  found at LINK{http://en.wikipedia.org/wiki/Regular_expression#POSIX}.
123                * Many examples are given in this guide: LINK{http://www.digitalamit.com/article/regular_expression.phtml}
124
125                Notes:
126
127                * if an expression matches one string multiple times, the longest leftmost
128                  match is used (e.g: '/a*e*/' matches 'aaeee' at position 3 of the
129                  string 'bbaaeeeffaegg', not 'ae' at position 10).
130
131
132SECTION         Special syntax for search and replace
133
134                Syntax: '/regexp/replace/'
135
136                       The part of the input string matched by 'regexp' gets replaced by 'replace'.
137
138                Simple example:
139
140                       Input string:    'The quick brown fox jumps over the lazy dog'
141                       Search&replace:  '/fox|dog/cat/'
142                       Result:          'The quick brown cat jumps over the lazy cat'
143
144                Additionally the match (or parts of it) can be referenced in the replace string:
145
146                             \0        refers to the whole match
147                             \1        refers to the first subexpression
148                             \2        refers to the second subexpression
149                             ...
150                             \9        refers to the ninth subexpression
151
152                Example using refs:
153
154                       Input string:    'The quick brown fox jumps over the lazy dog'
155                       Search&replace:  '/(brown|lazy)\s+(fox|dog)/\2 \1/'
156                       Result:          'The quick fox brown jumps over the dog lazy'
157
158WARNINGS        POSIX extended regular expressions are not greedy, i.e. an expression
159                like '_*' does normally match an empty string (if used w/o context).
160
161                This makes some replacements difficult, e.g. if you have data containing
162                multiple consecutive characters and you'd like to replace these.
163                The expression "/_*/_/" does not work as expected and reports
164                an error: "regular expression '_*' matched an empty string".
165
166                A workaround is the following expression:
167                             "/(_+)([^_]|$)/_\2/"
168
169                Other, simpler workarounds do use the BOL/EOL operators ('^'/'$'),
170                e.g. to remove all trailing underscores:
171                             "/_*$//"
172
173                Or all leading underscores:
174                             "/^_*//"
175
176BUGS            No bugs known
177
Note: See TracBrowser for help on using the repository browser.