To receive notifications about scheduled maintenance, please subscribe to the mailing-list gitlab-operations@sympa.ethz.ch. You can subscribe to the mailing-list at https://sympa.ethz.ch

Commit 3bf80a62 authored by vermeul's avatar vermeul
Browse files

minor changes

parent d09d77ed
......@@ -2,28 +2,36 @@
## Do not use `re.match`, always use `re.search` instead
This regular expression below is **not matching anything**:
This regular expression below **does not matching anything**:
```python
import re
line = "Cats are smarter than dogs"
re.match("dogs$", line)
re.match(r'dogs', line)
```
But this is:
But this does:
<strong>
```python
import re
line = "Cats are smarter than dogs"
re.search("dogs$", line)
re.search(r'dogs', line)
```
</strong>
The difference between `re.match()` and `re.search()` is that `re.match()` behaves as if every pattern has `\A` prepended (or `^` if you don't use multiline). Anyone accustomed to Perl, grep, or sed regular expression matching is mislead by `re.match()`.
The difference between `re.match()` and `re.search()` is that `re.match()` behaves as if every regex pattern is prepended with `^`:
```python
re.match(r'dogs', line)
re.search(r'^dogs', line) # same as above
```
Anyone accustomed to Perl, grep, or sed regular expression matching is mislead by `re.match()`. This method is error prone and and should be avoided.
There is actually a reason why re.match exists at all: it is **speed**. When `re.search()` is used and no matching is possible, it takes a considerable amount [more time](https://stackoverflow.com/questions/29007197/why-have-re-match) than `re.match()` until the matching fails. I am inclined to say: Python has an implementation problem here. I think `re.match()` should better be *deprecated*, because it leads to unnecessary problems, despite the speed gain one might observe.
There is actually a reason why `re.match` exists at all: it is **speed**. When no matching is possible, it takes a considerable amount [more time](https://stackoverflow.com/questions/29007197/why-have-re-match) for `re.search()` than `re.match()` to find this out. I am inclined to say: Python has an implementation problem here. I think `re.match()` should either be *deprecated*, or its current behaviour should be *fixed*. The speed gain in special cases should be implemented in the `re` module itself.
## Make use of **named capture groups**
......@@ -34,7 +42,7 @@ A very common practice is to group elements in a regular expression:
import re
url = '/some/url/our_first_parameter/our_second_parameter'
match = re.search("^/some/url/((.*?)/(.*?))$", url)
match = re.search(r'^/some/url/((.*?)/(.*?))$', url)
match.groups()
# returns
......@@ -53,7 +61,7 @@ Instead, you would rather give your groups a name so you can easily rearrange yo
import re
url = '/some/url/our_first_parameter/our_second_parameter'
match = re.search("^/some/url/(?P<the_whole_thing>(?P<param1>.*?)/(?P<param2>.*?))$", url)
match = re.search(r'^/some/url/(?P<the_whole_thing>(?P<param1>.*?)/(?P<param2>.*?))$', url)
match.groupdict()
# returns
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment