Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F9345008
README.md
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
6 KB
Subscribers
None
README.md
View Options
#
Provenance
Index
Test
Dataset
This
directory
contains
datasets
used
by
`
test_provenance_heurstics
`
tests
of
the
provenance
index
database
.
Each
dataset
`
xxx
`
consist
in
several
parts
:
-
a
description
of
a
git
repository
as
a
yaml
file
named
`
xxx_repo
.
yaml
`
,
-
a
msgpack
file
containing
storage
objects
for
the
given
repository
,
from
which
the
storage
is
filled
before
each
test
using
these
data
,
and
-
a
set
of
synthetic
files
,
named
`
synthetic_xxx_
(
lower
|
upper
)
_
<
mindepth
>.
txt
`
,
describing
the
expected
result
in
the
provenance
database
if
ingested
with
the
flag
`
lower
`
set
or
not
set
,
and
the
`
mindepth
`
value
(
integer
,
most
often
`
1
`
or
`
2
`
).
##
Git
repos
description
file
The
description
of
a
git
repository
is
a
yaml
file
which
contains
a
list
dicts
,
each
one
representing
a
git
revision
to
add
(
linearly
)
in
the
git
repo
used
a
base
for
the
dataset
.
Each
dict
consist
in
a
structure
like
:
```
yaml
-
msg
:
R00
date
:
1000000000
content
:
A
/
B
/
C
/
a
:
"content a"
```
this
example
will
generate
a
git
commit
with
the
commit
message
"R00"
,
the
author
and
committer
date
1000000000
(
given
as
a
unix
timestamp
),
and
a
one
file
which
path
is
`
A
/
B
/
C
/
a
`
and
content
is
"content a"
.
The
file
is
parsed
to
create
git
revisions
in
a
temporary
git
repository
,
in
order
of
appearance
in
the
yaml
file
(
so
one
may
create
an
git
repository
with
'
out
-
of
-
order
'
commits
).
There
is
no
way
of
creating
branches
and
merges
for
now
.
The
tool
to
generate
this
git
repo
is
`
generate_repo
.
py
`
:
```
python
generate_repo
.
py
--
help
Usage
:
generate_repo
.
py
[
OPTIONS
]
INPUT_FILE
OUTPUT_DIR
Options
:
-
C
,
--
clean
-
output
/
--
no
-
clean
-
output
--
help
Show
this
message
and
exit
.
```
It
generates
a
git
repository
in
the
`
OUTPUT_DIR
`
as
well
as
produces
a
template
`
synthetic
`
file
on
its
standard
output
,
which
can
be
used
to
ease
writing
the
expected
`
synthetic
`
files
.
Typical
usage
will
be
:
```
python
generate_repo
.
py
repo2_repo
.
yaml
repo2
>
synthetic_repo2_template
.
txt
```
Note
that
hashes
(
for
revision
,
directories
and
content
)
of
the
git
objects
only
depends
on
the
content
of
the
input
yaml
file
.
Calling
the
tool
twice
on
the
same
input
file
should
generate
the
exact
same
git
repo
twice
.
Also
note
that
the
tool
will
add
a
branch
at
each
revision
(
using
the
commit
message
as
bramch
name
),
to
make
it
easier
to
reference
any
point
in
the
git
history
.
##
Msgpack
dump
of
the
storage
This
file
contains
a
set
of
storage
objects
(
`
Revision
`
,
`
Content
`
and
`
Directory
`
)
and
is
usually
generated
from
a
local
git
repository
(
typically
the
one
generated
by
the
previous
command
)
using
the
`
generate_storage_from_git
.
py
`
tool
:
```
python
generate_storage_from_git
.
py
--
help
Usage
:
generate_storage_from_git
.
py
[
OPTIONS
]
GIT_REPO
simple
tool
to
generate
the
CMDBTS
.
msgpack
dataset
filed
used
in
tests
Options
:
-
r
,
--
head
TEXT
head
revision
to
start
from
-
o
,
--
output
TEXT
output
file
--
help
Show
this
message
and
exit
.
```
Typical
usage
would
be
,
using
the
git
repository
`
repo2
`
created
previously
:
```
python
generate_storage_from_git
.
py
repo2
Revision
hash
for
master
is
8363e8
e98751dc9f264d2fedd6b829ad4b1218b0
Wrote
86
objects
in
repo2
.
msgpack
```
###
Adding
extra
visits
/
snapshots
It
is
also
possible
to
generate
a
storage
from
a
git
repo
with
extra
origin
visits
,
using
the
`
--
visit
`
option
of
the
`
generate_storage_from_git
`
tool
.
This
option
expect
a
yaml
file
as
argument
.
This
file
contains
a
description
of
extra
visits
(
and
snapshots
)
you
want
to
add
to
the
storage
.
The
format
is
simple
,
for
example
:
```
#
a
visit
pattern
scenario
for
the
'
repo_with_merges
'
repo
-
origin
:
http
:
//repo_with_merges/1/
date
:
1000000015
branches
:
-
R01
```
will
create
an
OriginVisit
(
at
given
date
)
for
the
given
origin
URL
(
the
Origin
will
be
created
as
well
),
with
a
`
Snapshot
`
including
the
listed
branches
.
##
Synthetic
files
These
files
describe
the
expected
content
of
the
provenance
database
for
each
revision
(
in
order
of
ingestion
).
The
`
generate_repo
.
py
`
tool
will
produce
a
template
of
synthetic
file
like
:
```
1000000000.0
b582a17b3fc37f72fc57877616f85c3f0abed064
R00
R00
|
|
|
R
b582a17b3fc37f72fc57877616f85c3f0abed064
|
1000000000.0
|
|
.
|
D
a4cb5e6b2831f7e8eef0e6e08e43d642c97303a1
|
0.0
|
|
A
|
D
1
c8d9fd9afa7e5a2cf52a3db6f05dc5c3a1ca86b
|
0.0
|
|
A
/
B
|
D
36876
d475197b5ad86ad592e8e28818171455f16
|
0.0
|
|
A
/
B
/
C
|
D
98
f7a4a23d8df1fb1a5055facae2aff9b2d0a8b3
|
0.0
|
|
A
/
B
/
C
/
a
|
C
20329687
bb9c1231a7e05afe86160343ad49b494
|
0.0
1000000010.0
8259
eeae2ff5046f0bb4393d6e894fe6d7e01bfe
R01
R01
|
|
|
R
8259
eeae2ff5046f0bb4393d6e894fe6d7e01bfe
|
1000000010.0
|
|
.
|
D
b3cf11b22c9f93c3c494cf90ab072f394155072d
|
0.0
|
|
A
|
D
baca735bf8b8720131b4bfdb47c51631a9260348
|
0.0
|
|
A
/
B
|
D
4
b28979d88ed209a09c272bcc80f69d9b18339c2
|
0.0
|
|
A
/
B
/
C
|
D
c9cabe7f49012e3fdef6ac6b929efb5654f583cf
|
0.0
|
|
A
/
B
/
C
/
a
|
C
20329687
bb9c1231a7e05afe86160343ad49b494
|
0.0
|
|
A
/
B
/
C
/
b
|
C
50e9
cdb03f9719261dd39d7f2920b906db3711a3
|
0.0
[...]
```
where
all
the
content
and
directories
of
each
revision
are
listed
;
it
'
s
then
the
responsibility
of
the
user
to
create
the
expected
synthetic
file
for
a
given
heuristics
configuration
.
For
example
,
the
2
revisions
above
are
to
be
adapted
,
for
the
`
(
lower
=
True
,
mindepth
=
1
)
`
case
,
as
:
```
1000000000
c0d8929936631ecbcf9147be6b8aa13b13b014e4
R00
R00
|
|
|
R
c0d8929936631ecbcf9147be6b8aa13b13b014e4
|
1000000000
|
R
---
C
|
A
/
B
/
C
/
a
|
C
20329687
bb9c1231a7e05afe86160343ad49b494
|
0
1000000010
1444
db96cbd8cd791abe83527becee73d3c64e86
R01
R01
|
|
|
R
1444
db96cbd8cd791abe83527becee73d3c64e86
|
1000000010
|
R
---
C
|
A
/
B
/
C
/
a
|
C
20329687
bb9c1231a7e05afe86160343ad49b494
|
-
10
|
R
---
C
|
A
/
B
/
C
/
b
|
C
50e9
cdb03f9719261dd39d7f2920b906db3711a3
|
0
```
File Metadata
Details
Attached
Mime Type
text/plain
Expires
Fri, Jul 4, 3:00 PM (5 d, 8 h ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3269381
Attached To
rDPROV Provenance database
Event Timeline
Log In to Comment