To receive notifications about scheduled maintenance, please subscribe to the mailing-list gitlab-operations@sympa.ethz.ch. You can subscribe to the mailing-list at https://sympa.ethz.ch

Commit 8f5be178 authored by vermeul's avatar vermeul
Browse files

added substitution example

parent ce0421b5
......@@ -34,6 +34,40 @@ Anyone accustomed to Perl, grep, or sed regular expression matching is mislead b
There is actually a reason why `re.match` exists at all: it is **speed**. When no matching is possible, it takes a considerable amount [more time](https://stackoverflow.com/questions/29007197/why-have-re-match) for `re.search()` than `re.match()` to find this out. I am inclined to say: Python has an implementation problem here. I think `re.match()` should either be *deprecated*, or its current behaviour should be *fixed*. The speed gain in special cases should be implemented in the `re` module itself.
## always use `re.X` in long expressions
**Regular expressions are not easy to fix if your intention is not clear**
The re.X flag allows you to define regular expressions over multiple lines. More importantly, it allows you to add comments to every part, so the original intention is preserved. If something is wrong with the regular expression, such a commented regular expression is much easier to debug.
Who would like to debug this regular expression?
```python
regex = re.compile('^(?P<alias_alternative>(?P<requested_entity>experiment|collection)(\.(?P<attribute>\w+))?)(\s+(?i)AS\s+(?P<alias>\w+))?\s*$')
```
Split the regular expression on multiple lines and add comments. Of course, you need now to specify every blank space with `\s`, but this good practice anyway:
<strong>
```python
regex = re.compile(
r"""^ # beginning of the string
(?P<alias_alternative> # use first part as alias, if no alias is defined
(?P<requested_entity>sample|object) # string starts with sample or object
(\.(?P<attribute>\w+))? # capture an optional .attribute
)
( # capture an optional alias: entity.attribute AS alias
\s+(?i)AS\s+ # ignore case of 'AS'
(?P<alias>\w+) # capture the alias
)? #
\s* # ignore any trailing whitespace
$ # end of string
""", re.X
)
```
</strong>
## Make use of **named capture groups**
A very common practice is to group elements in a regular expression:
......@@ -54,14 +88,22 @@ match.groups()
However, this leads to the problem that the parameters fetched are positional. If you have nested group captures, you have to count the number of the opening round brackets `(` to get the position of every parameter right. And if you decide to remove a grouping later, you will have to check every position again.
Instead, you would rather give your groups a name so you can easily rearrange your groupings without having to worry about their positions:
Instead, you would rather give your groups a name so you can easily rearrange your groupings without having to worry about their positions. And we also make use of the `re.X` flag again.
<strong>
```python
import re
url = '/some/url/our_first_parameter/our_second_parameter'
match = re.search(r'^/some/url/(?P<the_whole_thing>(?P<param1>.*?)/(?P<param2>.*?))$', url)
match = re.search(r'''
^
/some/url/ # the beginning of our url
(?P<the_whole_thing>
(?P<param1>.*?)
/
(?P<param2>.*?)
)
$''', url, flags=re.X)
match.groupdict()
# returns
......@@ -73,42 +115,31 @@ match.groupdict()
This leads to much more robust regular expressions.
In **substitutions** or within regular expressions, named capture groups are back-referenced by
### Group capture in substitutions or back-references
The group captures can be accessed in substitutions or in back-references, like this:
<strong>
```
\g<the_name_of_the_captured_group>
```
## always use `re.X`
**Regular expressions are not easy to fix if your intention is not clear**
The re.X flag allows you to define regular expressions over multiple lines. More importantly, it allows you to add comments to every part, so the original intention is preserved. If something is wrong with the regular expression, such a commented regular expression is much easier to debug.
Who would like to debug this regular expression?
**Example**
```python
regex = re.compile('^(?P<alias_alternative>(?P<requested_entity>experiment|collection)(\.(?P<attribute>\w+))?)(\s+(?i)AS\s+(?P<alias>\w+))?\s*$')
```
Split the regular expression on multiple lines and add comments. Of course, you need now to specify every blank space with `\s`, but this good practice anyway:
<strong>
import re
```python
regex = re.compile(
r"""^ # beginning of the string
(?P<alias_alternative> # use first part as alias, if no alias is defined
(?P<requested_entity>sample|object) # string starts with sample or object
(\.(?P<attribute>\w+))? # capture an optional .attribute
)
( # capture an optional alias: entity.attribute AS alias
\s+(?i)AS\s+ # ignore case of 'AS'
(?P<alias>\w+) # capture the alias
)? #
\s* # ignore any trailing whitespace
$ # end of string
""", re.X
)
url = '/some/url/our_first_parameter/our_second_parameter'
new_url = re.sub(r'''
^
/some/url/
(?P<the_whole_thing>
(?P<param1>.*?)
/
(?P<param2>.*?)
)
$''',
r'/some/other/url/\g<param2>/\g<param1>',
url, flags=re.X)
new_url
```
</strong>
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment