Changeset View
Changeset View
Standalone View
Standalone View
swh/provenance/tests/data/README.md
# Provenance Index Test Dataset | # Provenance Index Test Dataset | ||||
This directory contains datasets used by `test_provenance_heurstics` tests of | This directory contains datasets used by `test_provenance_heurstics` tests of | ||||
the provenance index database. | the provenance index database. | ||||
## Datasets | |||||
There are currently 3 dataset: | |||||
- cmdbts2: original dataset | |||||
- out-of-order: with unsorted revisions | |||||
- with-merge: with merge revisions | |||||
Each dataset `xxx` consist in several parts: | Each dataset `xxx` consist in several parts: | ||||
- a description of a git repository as a yaml file named `xxx_repo.yaml`, | - a description of a git repository as a yaml file named `xxx_repo.yaml`, | ||||
- a msgpack file containing storage objects for the given repository, from | - a msgpack file containing storage objects for the given repository, from | ||||
which the storage is filled before each test using these data, and | which the storage is filled before each test using these data, and | ||||
- a set of synthetic files, named `synthetic_xxx_(lower|upper)_<mindepth>.txt`, | - a set of synthetic files, named `synthetic_xxx_(lower|upper)_<mindepth>.txt`, | ||||
describing the expected result in the provenance database if ingested with | describing the expected result in the provenance database if ingested with | ||||
the flag `lower` set or not set, and the `mindepth` value (integer, most | the flag `lower` set or not set, and the `mindepth` value (integer, most | ||||
often `1` or `2`). | often `1` or `2`). | ||||
### Generate datasets files | |||||
For each dataset `xxx`, execute a number of commands: | |||||
``` | |||||
for dataset in cmdbts2 out-of-order with-merges; do | |||||
python generate_repo.py -C ${dataset}_repo.yaml $dataset > synthetic_${dataset}_template.txt | |||||
# you may want to edit/update synthetic files from this template, see below | |||||
python generate_storage_from_git.py $dataset | |||||
done | |||||
``` | |||||
## Git repos description file | ## Git repos description file | ||||
The description of a git repository is a yaml file which contains a list dicts, | The description of a git repository is a yaml file which contains a list dicts, | ||||
each one representing a git revision to add (linearly) in the git repo used a | each one representing a git revision to add (linearly) in the git repo used a | ||||
base for the dataset. Each dict consist in a structure like: | base for the dataset. Each dict consist in a structure like: | ||||
``` yaml | ``` yaml | ||||
▲ Show 20 Lines • Show All 143 Lines • Show Last 20 Lines |