Skip to content

feat: add dvc pipeline

cbaker requested to merge feat/dvc into main

Add CI and tooling infrastructure to the template meta-repository, and extend the data_project cookiecutter variant with a DVC pipeline scaffold.

Template meta-repository (root-level) changes:

  • Add root .gitlab-ci.yml that delegates to the blueprint's CI config via include: local, with pytest and mypy_check jobs disabled (no runnable Python at this level)
  • Add root .pre-commit-config.yaml with ruff lint/format and standard file hygiene hooks, excluding the blueprint directory and hooks/post_gen_project.py from checks that cannot handle Jinja2 syntax
  • Add root .gitignore
  • Fix post_gen_project.py: add missing type annotations to satisfy strict mypy; exclude it from ruff and mypy since it contains Jinja2 placeholders
  • Change auto-generated branch-preservation tags from to bak/ to make their purpose explicit

data_project cookiecutter variant:

  • Add pipeline/ scaffold: dvc.yaml with a placeholder stage, scripts/process.py, config/, data/raw/, and outputs/ directories
  • Add pipeline/.gitignore with anchored patterns (/outputs/, /data/raw/.csv) and un-ignore *.dvc
  • Add numpy, pandas, pyyaml as runtime dependencies and dvc as a dev dependency
  • Post-gen hook now runs dvc init --subdir and dvc config core.autostage true inside pipeline/ after uv sync; removes pipeline/ entirely for standard projects
  • Exclude pipeline/dvc.lock from pre-commit whitespace/YAML hooks (auto-generated by DVC)

Merge request reports

Loading