To receive notifications about scheduled maintenance, please subscribe to the mailing-list gitlab-operations@sympa.ethz.ch. You can subscribe to the mailing-list at https://sympa.ethz.ch

Commit 4de5bf97 authored by scmalte's avatar scmalte
Browse files

Merge branch 'master' of gitlab.ethz.ch:scmalte/mossutils

parents 6791cb1d 45672f49
...@@ -4,8 +4,10 @@ A collection for useful scripts for working with Moss in the context of ETH (eDo ...@@ -4,8 +4,10 @@ A collection for useful scripts for working with Moss in the context of ETH (eDo
## Installation ## Installation
``` ```shell
$ pip install --upgrade git+https://gitlab.ethz.ch/scmalte/mossutils.git` $ pip install --upgrade git+https://gitlab.ethz.ch/scmalte/mossutils.git
...
Successfully installed [...] mossutils [...]
``` ```
Argument `--upgrade` (re)installs the package, even if the version number hasn't changed. Argument `--upgrade` (re)installs the package, even if the version number hasn't changed.
...@@ -14,11 +16,11 @@ Argument `--upgrade` (re)installs the package, even if the version number hasn't ...@@ -14,11 +16,11 @@ Argument `--upgrade` (re)installs the package, even if the version number hasn't
**TODO:** Add instructions for how to obtain and prepare the involved data. See also `preprocessing/README.md`. **TODO:** Add instructions for how to obtain and prepare the involved data. See also `preprocessing/README.md`.
1. Obtain and prepare data 1. Obtain and prepare data, see `preprocessing/README.md`.
1. Run `mu-moss --help` for arguments that can/must be configured. Afterwards, run `mu-moss` as desired. 1. Run `mu-moss --help` for arguments that can/must be configured. Afterwards, run `mu-moss` as desired, e.g. `mu-moss -u 1234 -n 300 "./ex1/*/main.cpp"`.
`mu-moss` connects to the Moss service, uploads submissions and downloads the generated report. `mu-moss` connects to the Moss service, uploads submissions and downloads the generated report. This may take a while, and will probably not work for large `-n` values (3000 worked for me, 10,000 didn't).
1. Run `mu-revise`. 1. Run `mu-revise`.
......
import os
import logging import logging
import csv import csv
import jinja2 import jinja2
...@@ -13,6 +14,12 @@ def main( ...@@ -13,6 +14,12 @@ def main(
logutils.configure_level_and_format() logutils.configure_level_and_format()
if not os.path.isfile(clusters_summary_csv_file):
raise RuntimeError("Cluster summary CSV file {} doesn't exist. Should have been created by mu-cluster.".format(clusters_summary_csv_file))
if not os.path.isfile(cx_course_students_csv_file):
raise RuntimeError("Code Expert course data CSV file {} doesn't exist. Download it from Code Expert as follows: My Courses -> Students -> Export to CSV.".format(cx_course_students_csv_file))
clusters_csv: pd.DataFrame = pd.read_csv(clusters_summary_csv_file) clusters_csv: pd.DataFrame = pd.read_csv(clusters_summary_csv_file)
# Read CX course data, reduce to relevant columns, truncate TotalScore (which are floats), set index column # Read CX course data, reduce to relevant columns, truncate TotalScore (which are floats), set index column
...@@ -23,6 +30,11 @@ def main( ...@@ -23,6 +30,11 @@ def main(
course_csv.set_index("Legi", inplace=True) course_csv.set_index("Legi", inplace=True)
## TODO: Remove staff from course_csv ## TODO: Remove staff from course_csv
## TODO: Make eDoz files configurable
## TODO: Make eDoz files optional
## TODO: Could integrate eDoz data "Leistungskontrollen" to get information whether
## or not a student is a repeater
# Analogous for eDoz course data # Analogous for eDoz course data
relevant_edoz_columns = ["Nummer", "Departement"] relevant_edoz_columns = ["Nummer", "Departement"]
edoz1_csv: pd.DataFrame = pd.read_csv("edoz-252083200L.csv", sep="\t") edoz1_csv: pd.DataFrame = pd.read_csv("edoz-252083200L.csv", sep="\t")
...@@ -39,11 +51,6 @@ def main( ...@@ -39,11 +51,6 @@ def main(
# print(edoz2_csv.index) # print(edoz2_csv.index)
# print("edoz2_csv.index.is_unique = {}".format(edoz2_csv.index.is_unique)) # print("edoz2_csv.index.is_unique = {}".format(edoz2_csv.index.is_unique))
## TODO: Could integrate eDoz data "Leistungskontrollen" to get information whether
## or not a student is a repeater
# Vertically concat eDoz data. Since students may be enrolled into multiple # Vertically concat eDoz data. Since students may be enrolled into multiple
# courses, duplicated rows are afterwards dropped. # courses, duplicated rows are afterwards dropped.
edoz_csv: pd.DataFrame = pd.concat([edoz1_csv, edoz2_csv]) edoz_csv: pd.DataFrame = pd.concat([edoz1_csv, edoz2_csv])
......
...@@ -10,6 +10,15 @@ import networkx as nx ...@@ -10,6 +10,15 @@ import networkx as nx
from dataclass_csv import DataclassReader from dataclass_csv import DataclassReader
from .utils import logging as logutils from .utils import logging as logutils
## TODO: cluster.py could create a first, less detailed version of the
## clusters.html report, by extracting the strictly necessary information
## (student name and e-mail address) from the details.json file located
## in the CX export. This information would already be enough to generate
## e-mails afterwards.
## aggr.py would then be optional, if a more detailed cluster report is desired.
##
## TODO: Generate DOT, SVG and CSV files in a subdirectory, e.g. "_clusters"
DEFAULT_RESULTS_CSV_FILE="moss-report.csv" DEFAULT_RESULTS_CSV_FILE="moss-report.csv"
DEFAULT_TOTAL_GRAPH_DOT_FILE="moss-report.dot" DEFAULT_TOTAL_GRAPH_DOT_FILE="moss-report.dot"
DEFAULT_CLUSTERS_DOT_FILE="clusters.dot" DEFAULT_CLUSTERS_DOT_FILE="clusters.dot"
...@@ -123,7 +132,7 @@ def get_results_graph(results, percentage_threshold, lines_threshold): ...@@ -123,7 +132,7 @@ def get_results_graph(results, percentage_threshold, lines_threshold):
return graph return graph
def create_cluster_dot_and_svg_files(subgraph, index, cluster_dot_file, cluster_svg_file=None): def create_cluster_dot_and_svg_files(subgraph, index, cluster_dot_file, cluster_svg_file=None):
logging.debug( logging.debug(
"Writing cluster {} with {}/{} nodes/edge to file {}".format( "Writing cluster {} with {}/{} nodes/edge to file {}".format(
index, index,
subgraph.number_of_nodes(), subgraph.number_of_nodes(),
...@@ -133,8 +142,12 @@ def create_cluster_dot_and_svg_files(subgraph, index, cluster_dot_file, cluster_ ...@@ -133,8 +142,12 @@ def create_cluster_dot_and_svg_files(subgraph, index, cluster_dot_file, cluster_
nx.drawing.nx_pydot.write_dot(subgraph, cluster_dot_file) nx.drawing.nx_pydot.write_dot(subgraph, cluster_dot_file)
if cluster_svg_file: if cluster_svg_file:
dot_command = ["dot", "-Tsvg", "-o{}".format(cluster_svg_file), cluster_dot_file]
logging.debug("Calling dot to create SVG {} file from {}".format(cluster_svg_file, cluster_dot_file)) logging.debug("Calling dot to create SVG {} file from {}".format(cluster_svg_file, cluster_dot_file))
subprocess.run(["dot", "-Tsvg", "-o{}".format(cluster_svg_file), cluster_dot_file]) logging.debug("Command: {}".format(" ".join(dot_command)))
subprocess.run(dot_command)
def create_clusters(graph, cluster_file_pattern, create_svg_files): def create_clusters(graph, cluster_file_pattern, create_svg_files):
logging.info("Computing connected component (CC) clusters") logging.info("Computing connected component (CC) clusters")
......
...@@ -62,8 +62,6 @@ def run_moss( ...@@ -62,8 +62,6 @@ def run_moss(
moss.addFilesByWildcard(file_pattern) moss.addFilesByWildcard(file_pattern)
exit(0)
logging.info("Sending files to Moss") logging.info("Sending files to Moss")
url = moss.send() # Submission Report URL url = moss.send() # Submission Report URL
...@@ -128,7 +126,7 @@ def configure_cli_parser(parser): ...@@ -128,7 +126,7 @@ def configure_cli_parser(parser):
"pattern", "pattern",
type=str, type=str,
default=DEFAULT_FILE_PATTERN, default=DEFAULT_FILE_PATTERN,
help="Pattern for files to send to Moss (e.g.: {})".format(DEFAULT_FILE_PATTERN)) help="Pattern for files to send to Moss (e.g.: '{}'). Must be in quotes!".format(DEFAULT_FILE_PATTERN))
def main(): def main():
parser = argparse.ArgumentParser() parser = argparse.ArgumentParser()
......
...@@ -81,7 +81,8 @@ def localize_match_links(doc, input_report_subdir): ...@@ -81,7 +81,8 @@ def localize_match_links(doc, input_report_subdir):
url_pattern = r"http://moss\.stanford\.edu/results/\d+/\d+/(match.*\.html)" url_pattern = r"http://moss\.stanford\.edu/results/\d+/\d+/(match.*\.html)"
# E.g. ./12-345-678/main.cpp (77%) # E.g. ./12-345-678/main.cpp (77%)
text_pattern = r"\./([\d-]+)/.* (\(\d+%\))" # ./some/dir/12-345-678/main.cpp (77%)
text_pattern = r".*?/([\d-]+)/.* (\(\d+%\))"
logging.info("Localising links to match files") logging.info("Localising links to match files")
...@@ -92,14 +93,22 @@ def localize_match_links(doc, input_report_subdir): ...@@ -92,14 +93,22 @@ def localize_match_links(doc, input_report_subdir):
for a in row.find_all("a"): for a in row.find_all("a"):
# Change remote URLs to local ones # Change remote URLs to local ones
url_match = re.search(url_pattern, a["href"]) url_match = re.search(url_pattern, a["href"])
if not url_match:
raise RuntimeError("Failure while localising match links in the Moss report. Failed to match link '{}' against regex '{}'".format(a["href"], url_pattern))
a["href"] = "./{}/{}".format(input_report_subdir, url_match.group(1)) a["href"] = "./{}/{}".format(input_report_subdir, url_match.group(1))
# Open links in a new tab/window # Open links in a new tab/window
a["target"] = "_blank" a["target"] = "_blank"
# Strip away unnecessary link text # Strip away unnecessary link text
# print(a.get_text().strip()) link_text = a.get_text().strip()
text_match = re.search(text_pattern, a.get_text().strip())#.group(1) text_match = re.search(text_pattern, link_text)
if not text_match:
raise RuntimeError("Failure while localising match links in the Moss report. Failed to match link text '{}' against regex '{}'".format(link_text, text_pattern))
a.string = "{} {}".format(text_match.group(1), text_match.group(2)) a.string = "{} {}".format(text_match.group(1), text_match.group(2))
def get_match_percentage(match_text): def get_match_percentage(match_text):
......
...@@ -2,25 +2,16 @@ ...@@ -2,25 +2,16 @@
## Prerequisites ## Prerequisites
* Tested with Python 3.7
* Script `run_-_moss.py`
* https://pypi.org/project/mosspy/
* `pip install mosspy`
* Script `rename_to_legi.sh`: * Script `rename_to_legi.sh`:
* https://stedolan.github.io/jq/ * https://stedolan.github.io/jq/
* Download `jq` and add to path * Download `jq` and add to path
* Script `visualize.py` * TODO: Re-implement shell script in Python
* https://github.com/hjalti/mossum
* `pip3 install git+https://github.com/hjalti/mossum@master`
* Replace `<python3>/Lib/site-packages/mossum/mossum.py` with `MODIFIED-mossum.py`
* http://networkx.github.io/
* `pip install networkx`
## Tidying up files and directories ## Tidying up files and directories
* `cx-dump_bonus-exercise-1_2020-04-17.zip` contains the latest submission from each user * `cx-dump_bonus-exercise-1_2020-04-17.zip` contains the latest submission from each user
* Execute next commands on the level of the user directories, e.g. in `./bonus_exercise_1`, so that, e.g. `./bonus_exercise_1/<user>` are the individual user directories * Execute commands on the level of the user directories, e.g. in `./bonus_exercise_1`, so that, e.g. `./bonus_exercise_1/<user>` are the individual user directories
* Assumption: relevant files per submission are `<user>/details.json` and `<user>/files/main.cpp`, whereas all other files and directories can be deleted: * Assumption: relevant files per submission are `<user>/details.json` and `<user>/files/main.cpp`, whereas all other files and directories can be deleted:
...@@ -29,9 +20,9 @@ ...@@ -29,9 +20,9 @@
```plain ```plain
$ cd ./bonus_exercise_1 $ cd ./bonus_exercise_1
$ find -type f ! \( -iname details.json -or -iname main.cpp \) -delete $ find -type f ! \( -iname details.json -or -iname main.cpp \) -delete -print
$ find . -type d -iname cx_data -delete $ find . -type d -iname cx_data -delete -print
``` ```
* Move `<user>/files/main.cpp` to `<user>/main.cpp` and delete the (now empty) directory `<user>/files`: * Move `<user>/files/main.cpp` to `<user>/main.cpp` and delete the (now empty) directory `<user>/files`:
...@@ -39,11 +30,13 @@ ...@@ -39,11 +30,13 @@
```plain ```plain
$ cd ./bonus_exercise_1 $ cd ./bonus_exercise_1
$ find . -type d -iname files -execdir mv ./files/main.cpp . \; $ find . -type d -iname files -execdir mv ./files/main.cpp . \; -print
$ find . -type d -iname files -delete $ find . -type d -iname files -delete -print
``` ```
* Now, each `<user>` directory should only have two files in it: `<user>/main.cpp` and `<user>/details.json`
## Renaming user directories ## Renaming user directories
* Rename directories from user names to Legi numbers before submitting data to Moss, e.g. rename `scmalte` to `01-234-567`. The file `<user>/details.json` provides the Legi number. * Rename directories from user names to Legi numbers before submitting data to Moss, e.g. rename `scmalte` to `01-234-567`. The file `<user>/details.json` provides the Legi number.
...@@ -53,37 +46,7 @@ ...@@ -53,37 +46,7 @@
```plain ```plain
$ cd ./bonus_exercise_1 $ cd ./bonus_exercise_1
$ ../rename_to_legi.sh $ <path-to-mossutils>/rename_to_legi.sh
``` ```
The script prompts for confirmation before the first renaming is executed. The script prompts for confirmation before the first renaming is executed.
## Moss
### moss.py
* Edit `moss.py` and check configuration
* Execute `moss.py` from e.g. `./bonus_exercise_1/`:
```plain
$ cd ./bonus_exercise_1/
$ python ../moss.py
```
* If not configured otherwise, open `./bonus_exercise_1/moss-report.html` in your browser
### clusters.py
* Edit `clusters.py` and check configuration
* Execute `clusters.py` from e.g. `./bonus_exercise_1/`:
```plain
$ cd ./bonus_exercise_1/
$ python ../clusters.py
```
* If not configured otherwise, open `./bonus_exercise_1/clusters.html` in your browser
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment