To receive notifications about scheduled maintenance, please subscribe to the mailing-list gitlab-operations@sympa.ethz.ch. You can subscribe to the mailing-list at https://sympa.ethz.ch

Commit e4b4a5c9 authored by vermeul's avatar vermeul
Browse files

Update 07-Built-in_Functions.md

parent 513b6762
......@@ -32,3 +32,92 @@ print(string[:1].upper() + string[1:])
</strong>
## Sorting and filtering
**input:** a file list which needs to be filtered and sorted:
```
20_Ms_229_7.xml
20_Ms_229_37.xml~
20_Ms_229_6.xml
20_Ms_229_29.xml
20_Ms_229_15.xml
229.xpr
20_Ms_229_17.xml
20_Ms_229_4.xml
20_Ms_229_5.xml
20_Ms_229_16.xml
semper_edition_schema_prov.rng
schema_semper_with_mathml.rng
20_Ms_229_38_verso.xml
...
```
**desired output**
* files which are not of the pattern `20_Ms_<collection>_<page_number>` should be filtered out
* Files should be sorted by its page number, i.e. its last digit
* page 10 should come after page 9
```
20_Ms_229_1.xml
20_Ms_229_2.xml
20_Ms_229_3.xml
20_Ms_229_4.xml
20_Ms_229_5.xml
20_Ms_229_6.xml
20_Ms_229_7.xml
20_Ms_229_8.xml
20_Ms_229_9.xml
20_Ms_229_10.xml
20_Ms_229_11.xml
20_Ms_229_12.xml
20_Ms_229_13.xml
...
20_Ms_229_40.xml
```
**Solution**:
* define a filter method
* define a sorting method
* both return a specialised function which do the filtering and sorting
* use a `sorted()` function (leave file list untouched)
* inside the `sorted()` function, place the `filter()` function
* both `sorted()` and `filter()` can take our pre-defined functions `my_sort` and `my_filter` as arguments.
```
import os
import re
def my_sort(val, coll):
def my_coll_sort(val):
match = re.search('\d+_Ms_{}_(?P<page>\d+)'.format(coll), val)
if match:
return int(match.groupdict()['page'])
else:
return 0
return my_coll_sort
def my_filter(val, coll):
def my_coll_filter(val):
match = re.search(r'^\d+_Ms_{}.*?xml$'.format(coll), val)
return match
return my_coll_filter
# later in the program
collection = '229'
path = os.path.join('/Users/vermeul/semper-tei', collection)
for root, dirs, files in os.walk(path):
for filename in sorted(
filter( my_filter(val=filename, coll=collection), files),
key=my_sort(val=filename, coll=collection)
):
print(filename)
```
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment