Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F9123754
git2graph.md
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
2 KB
Subscribers
None
git2graph.md
View Options
git2graph
=========
`
git2graph
`
crawls
a
Git
repository
and
outputs
it
as
a
graph
,
i
.
e
.,
as
a
pair
of
textual
files
<
nodes
,
edges
>.
The
nodes
file
will
contain
a
list
of
graph
nodes
as
[
Software
Heritage
](
https
:
//www.softwareheritage.org/)
{
ref
}
`
identifiers
(
SWHIDs
)
<
persistent
-
identifiers
>
`
;
the
edges
file
a
list
of
graph
edges
as
<
from
,
to
>
SWHID
pairs
.
Dependencies
------------
Build
time
dependencies
:
-
[
glib
](
https
:
//developer.gnome.org/glib/)
-
[
libgit2
](
https
:
//libgit2.org/)
Test
dependencies
:
-
[
bats
](
https
:
//github.com/bats-core/bats-core)
Micro
benchmark
---------------
$
time
./
git2graph
-
n
>(
zstdmt
>
nodes
.
csv
.
zst
)
-
e
>(
zstdmt
-
c
>
edges
.
csv
.
zst
)
/
srv
/
src
/
linux
160
,
38
s
user
12
,
72
s
system
98
%
cpu
2
:
55
,
02
total
$
zstdcat
nodes
.
csv
.
zst
|
wc
-
l
6503403
$
zstdcat
edges
.
csv
.
zst
|
wc
-
l
305096029
Parallel
use
------------
`
git2graph
`
writes
fixed
-
length
lines
,
long
either
51
bytes
(
nodes
)
or
102
bytes
(
edges
).
When
writing
to
a
FIFO
less
than
`
PIPE_BUF
`
bytes
(
which
is
4096
bytes
on
Linux
,
and
guaranteed
to
be
at
least
512
bytes
by
POSIX
),
writes
are
atomic
.
Hence
it
is
possible
to
mass
analyze
many
repositories
in
parallel
with
something
like
:
$
mkfifo
nodes
.
fifo
edges
.
fifo
$
sort
-
u
<
nodes
.
fifo
|
zstdmt
>
nodes
.
csv
.
zst
&
$
sort
-
u
<
edges
.
fifo
|
zstdmt
>
edges
.
csv
.
zst
&
$
parallel
git2graph
-
n
nodes
.
fifo
-
e
edges
.
fifo
--
repo_dir_1
repo_dir_2
...
$
rm
nodes
.
fifo
edges
.
fifo
Note
that
you
most
likely
want
to
tune
`
sort
`
in
order
to
be
parallel
(
`
--
parallel
`
),
use
a
large
buffer
size
(
`
-
S
`
),
and
use
a
temporary
directory
with
enough
available
space
(
`
-
T
`
).
(
The
above
example
uses
`
parallel
`
from
[
moreutils
](
https
:
//joeyh.name/code/moreutils/), but it could trivially be
adapted
to
use
[
GNU
parallel
](
https
:
//www.gnu.org/software/parallel/) or
similar
parallelization
tools
.)
Limitations
-----------
SWHID
calculation
for
snapshots
is
not
fully
compatible
with
the
{
py
:
func
}
`
specification
<
swh
.
model
.
identifiers
.
snapshot_identifier
>
`
,
because
currently
only
HEAD
is
considered
as
a
symbolic
reference
.
Other
symbolic
refs
,
if
present
,
will
be
ignored
,
potentially
leading
to
a
different
snapshot
SWHID
than
what
Software
Heritage
will
obtain
.
This
is
due
to
a
limitation
of
libgit2
,
that
at
the
time
of
writing
does
not
allow
to
list
all
symbolic
references
.
The
graph
structure
is
not
affected
,
but
looking
up
obtained
snapshots
by
SWHID
on
the
main
Software
Heritage
archive
might
fail
.
File Metadata
Details
Attached
Mime Type
text/plain
Expires
Sat, Jun 21, 6:04 PM (1 w, 6 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3237895
Attached To
rDGRPH Compressed graph representation
Event Timeline
Log In to Comment