Page MenuHomeSoftware Heritage
Paste P1176

Patching dulwich to decrease memory footprint
ActivePublic

Authored by ardumont on Sep 27 2021, 10:01 AM.
#+title: Patching Dulwich to decrease memory footprint
#+author: ardumont
In the following analysis, we will executes multiple ingestions with and without the
patched [1] dulwich version.
The idea of this analysis is to ensure that the patch actually diminishes the footprint
memory sufficiently for the ingestion to run completely with our standard workers
without impeding the standard swh hash computations. For this last check, we ensure the
snapshot hashes are the same at the end of the ingestions (with standard and patched
workers). As the swh model is a merkle dag and the snapshot is the top-level model
objects, that's enough.
tl; dr, the patch fixes the problem without hash divergences.
|-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------|
| origin (large) | run (standard) | run (patched) | snapshot hash comparison | snapshot hash |
| | X: failed | | ok: same hash both with | |
| | ok: visit finished | | standard/patched version | |
| | X: visit did not finish | | | |
|-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------|
| keybase/client | X (worker0.staging) | ok | ok | \xcddaccc0a2d452098701dec921731e8c96630e2b |
| keybase/client | ok | | | |
|-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------|
| torvalds/linux | ok | ok | ok | \xde499fdc325524ee0e7c3f57c6c2ae6a09091845 |
|-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------|
| kubernetes/kubernetes | ok | ok | ok | \xa2a6299e3527bbba548eec0f0ef80cca9e80f545 |
|-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------|
| NixOS/nixpkgs | ok | ok | ok | \xda0e3e4a3eff6fb6370259fd2bdfcf932fa6ac69 |
|-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------|
| CocoaPods/Specs | ongoing | ongoing | ongoing | |
|-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------|
| origin (medium) | run (standard) | run (patched) | snapshot hash comparison | snapshot |
|-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------|
| rdicosmo/parmap | ok | ok | ok | \x2d869aa00591d2ac8ec8e7abacdda563d413189d |
|-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------|
| hylang/hy | ok | ok | ok | \x821f28af45edaedc6f70b84c9bc4d407e7436452 |
|-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------|
| hylang/hyrule | ok | ok | ok | \x882db61b629bd9f2c7ef3492924e3ff73382d3a6 |
|-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------|
If you are not interested about the details, you can stop reading. Otherwise, feel free
to continue.
3 nodes are being used:
- worker17 (production): overall large machine (64gib ram, 20 cpus) able to handle
current large repository ingestion without it being killed. Ingestion is expected to
work as is, given enough time. The resulting snapshots after loading become the
references.
- worker[0:2] (staging): Those nodes are smaller and they will fail (OOM kill) the
ingestion with their current spec (12Gib ram, 4 cpus) for large repositories without a
dulwich version patched. With the patch applied, the loading is expected to work. We
shall then be able to compare the snapshots between the runs. The resulting snapshots
should be the same as the one generated on worker17 [2]
Note that the ingestion timing is not important for the analysis. It's expected the
staging workers are slowers since the machines are not running the same specs. Plus, the
underlying database does not hold the same information, the production one is more
complete than the staging one (although it's less loaded). It's added to roughly have an
idea of the order of magnitude of the time it takes to ingest those. Again, the most
important criteria are the ingestion must finish with the same snapshot.
[1] git pack walking in DFS instead of BFS order https://github.com/dulwich/dulwich/pull/903
[2] providing the origins ingested were the same at the time of ingestion (snapshot also
depends on data being the same).
* TODO worker17 (production, standard version) [7/8]
** DONE keybase/client run
CLOSED: [2021-09-26 Sun 07:30]
#+begin_src sh
swhworker@worker17:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_oneshot.yml swh loader run git https://github.com/keybase/client
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/keybase/client' with type 'git'
Enumerating objects: 431414, done.
Counting objects: 100% (76/76), done.
Compressing objects: 100% (45/45), done.
Total 431414 (delta 33), reused 67 (delta 29), pack-reused 431338
INFO:swh.loader.git.loader.GitLoader:Listed 19852 refs for repo https://github.com/keybase/client
{'status': 'eventful'}
real 64m21.848s
user 51m58.363s
sys 6m38.612s
swhworker@worker17:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_oneshot.yml swh loader run git https://github.com/keybase/client
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/keybase/client' with type 'git'
INFO:swh.loader.git.loader.GitLoader:Listed 19852 refs for repo https://github.com/keybase/client
^[[A{'status': 'uneventful'}
real 0m29.048s
user 0m24.279s
sys 0m0.742s
#+end_src
Snapshot: \xcddaccc0a2d452098701dec921731e8c96630e2b
#+begin_src sh
07:13:24 softwareheritage@belvedere:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/keybase/client' and ovs.type='git' order by date desc limit 4;
+-------------------------------+----------+-----------------------------------+----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| now | id | url | origin | visit | date | status | metadata | snapshot | type |
+-------------------------------+----------+-----------------------------------+----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| 2021-09-26 05:14:23.035696+00 | 34438017 | https://github.com/keybase/client | 34438017 | 113 | 2021-09-26 05:13:14.294663+00 | full | (null) | \xcddaccc0a2d452098701dec921731e8c96630e2b | git |
| 2021-09-26 05:14:23.035696+00 | 34438017 | https://github.com/keybase/client | 34438017 | 113 | 2021-09-26 05:12:47.255969+00 | created | (null) | (null) | git |
| 2021-09-26 05:14:23.035696+00 | 34438017 | https://github.com/keybase/client | 34438017 | 112 | 2021-09-26 04:28:44.02918+00 | full | (null) | \xcddaccc0a2d452098701dec921731e8c96630e2b | git |
| 2021-09-26 05:14:23.035696+00 | 34438017 | https://github.com/keybase/client | 34438017 | 112 | 2021-09-26 03:24:25.041466+00 | created | (null) | (null) | git |
+-------------------------------+----------+-----------------------------------+----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
(4 rows)
Time: 8.674 ms
#+end_src
** DONE torvalds/linux
CLOSED: [2021-09-26 Sun 11:42]
#+begin_src sh
swhworker@worker17:~$ time swh loader -C /etc/softwareheritage/loader_oneshot.yml run git https://github.com/torvalds/linux
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/torvalds/linux' with type 'git'
Enumerating objects: 8350856, done.
Total 8350856 (delta 0), reused 0 (delta 0), pack-reused 8350856
INFO:swh.loader.git.loader.GitLoader:Listed 1495 refs for repo https://github.com/torvalds/linux
{'status': 'eventful'}
real 1095m42.647s
user 891m49.090s
sys 34m46.475s
#+end_src
snapshot: \xde499fdc325524ee0e7c3f57c6c2ae6a09091845
#+begin_src sql
11:02:07 softwareheritage@belvedere:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/torvalds/linux' and ovs.type='git' order by date desc limit 2;
+-------------------------------+----+-----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| now | id | url | origin | visit | date | status | metadata | snapshot | type |
+-------------------------------+----+-----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| 2021-09-26 09:02:11.975793+00 | 2 | https://github.com/torvalds/linux | 2 | 74 | 2021-09-26 08:11:27.373415+00 | full | (null) | \xde499fdc325524ee0e7c3f57c6c2ae6a09091845 | git |
| 2021-09-26 09:02:11.975793+00 | 2 | https://github.com/torvalds/linux | 2 | 74 | 2021-09-25 13:55:47.466862+00 | created | (null) | (null) | git |
+-------------------------------+----+-----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
(2 rows)
#+end_src
** DONE rdicosmo/parmap
CLOSED: [2021-09-26 Sun 11:43]
ingestion:
#+begin_src sh
swhworker@worker17:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_oneshot.yml swh loader run git https://github.com/rdicosmo/parmap
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/rdicosmo/parmap' with type 'git'
INFO:swh.loader.git.loader.GitLoader:Listed 108 refs for repo https://github.com/rdicosmo/parmap
{'status': 'uneventful'}
real 0m3.547s
user 0m2.014s
sys 0m0.444s
#+end_src
snapshot: \x2d869aa00591d2ac8ec8e7abacdda563d413189d
#+begin_src sql
11:28:34 softwareheritage@belvedere:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/rdicosmo/parmap' and ovs.type='git' order by date desc limit 2;
+-------------------------------+---------+------------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| now | id | url | origin | visit | date | status | metadata | snapshot | type |
+-------------------------------+---------+------------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| 2021-09-26 09:42:22.148212+00 | 2695882 | https://github.com/rdicosmo/parmap | 2695882 | 220 | 2021-09-26 09:41:47.376616+00 | full | (null) | \x2d869aa00591d2ac8ec8e7abacdda563d413189d | git |
| 2021-09-26 09:42:22.148212+00 | 2695882 | https://github.com/rdicosmo/parmap | 2695882 | 220 | 2021-09-26 09:41:46.242228+00 | created | (null) | (null) | git |
+-------------------------------+---------+------------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
(2 rows)
#+end_src
** DONE hylang/hy
CLOSED: [2021-09-26 Sun 11:58]
ingestion:
#+begin_src sh
swhworker@worker17:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_oneshot.yml swh loader run git https://github.com/hylang/hy
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/hylang/hy' with type 'git'
INFO:swh.loader.git.loader.GitLoader:Listed 1140 refs for repo https://github.com/hylang/hy
{'status': 'uneventful'}
real 0m3.149s
user 0m1.933s
sys 0m0.406s
#+end_src
snapshot: \x821f28af45edaedc6f70b84c9bc4d407e7436452
#+begin_src sql
11:42:22 softwareheritage@belvedere:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/hylang/hy' and ovs.type='git' order by date desc limit 2;
+------------------------------+----+------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| now | id | url | origin | visit | date | status | metadata | snapshot | type |
+------------------------------+----+------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| 2021-09-26 09:51:36.06779+00 | 1 | https://github.com/hylang/hy | 1 | 37 | 2021-09-26 09:49:55.773545+00 | full | (null) | \x821f28af45edaedc6f70b84c9bc4d407e7436452 | git |
| 2021-09-26 09:51:36.06779+00 | 1 | https://github.com/hylang/hy | 1 | 37 | 2021-09-26 09:49:54.686042+00 | created | (null) | (null) | git |
+------------------------------+----+------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
(2 rows)
Time: 6.993 ms
#+end_src
** DONE hylang/hyrule
CLOSED: [2021-09-26 Sun 13:19]
ingestion:
#+begin_src sh
swhworker@worker17:~$ time swh loader -C /etc/softwareheritage/loader_oneshot.yml run git https://github.com/hylang/hyrule
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/hylang/hyrule' with type 'git'
Enumerating objects: 220, done.
Counting objects: 100% (208/208), done.
Compressing objects: 100% (121/121), done.
Total 220 (delta 97), reused 164 (delta 83), pack-reused 12
INFO:swh.loader.git.loader.GitLoader:Listed 5 refs for repo https://github.com/hylang/hyrule
{'status': 'eventful'}
real 0m14.348s
user 0m1.861s
sys 0m0.492s
#+end_src
snapshot: \x882db61b629bd9f2c7ef3492924e3ff73382d3a6
#+begin_src sql
13:06:38 softwareheritage@belvedere:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/hylang/hyrule' and ovs.type='git' order by date desc limit 2;
+-------------------------------+-----------+----------------------------------+-----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| now | id | url | origin | visit | date | status | metadata | snapshot | type |
+-------------------------------+-----------+----------------------------------+-----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| 2021-09-26 11:15:44.138959+00 | 164246965 | https://github.com/hylang/hyrule | 164246965 | 1 | 2021-09-26 11:14:46.964276+00 | full | (null) | \x882db61b629bd9f2c7ef3492924e3ff73382d3a6 | git |
| 2021-09-26 11:15:44.138959+00 | 164246965 | https://github.com/hylang/hyrule | 164246965 | 1 | 2021-09-26 11:14:34.557485+00 | created | (null) | (null) | git |
+-------------------------------+-----------+----------------------------------+-----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
(2 rows)
Time: 5.845 ms
#+end_src
** DONE kubernetes/kubernetes
CLOSED: [2021-09-26 Sun 13:19]
ingestion:
#+begin_src sh
swhworker@worker17:~$ time swh loader -C /etc/softwareheritage/loader_oneshot.yml run git https://github.com/kubernetes/kubernetes
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/kubernetes/kubernetes' with type 'git'
Enumerating objects: 1177718, done.
Counting objects: 100% (458/458), done.
Compressing objects: 100% (234/234), done.
Total 1177718 (delta 256), reused 276 (delta 220), pack-reused 1177260
INFO:swh.loader.git.loader.GitLoader:Listed 87799 refs for repo https://github.com/kubernetes/kubernetes
{'status': 'eventful'}
real 88m19.138s
user 65m20.258s
sys 3m33.093s
#+end_src
snapshot: \xa2a6299e3527bbba548eec0f0ef80cca9e80f545
#+begin_src sql
11:59:20 softwareheritage@belvedere:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/kubernetes/kubernetes' and ovs.type='git' order by date desc limit 2;
+-------------------------------+----------+------------------------------------------+----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| now | id | url | origin | visit | date | status | metadata | snapshot | type |
+-------------------------------+----------+------------------------------------------+----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| 2021-09-26 11:06:38.084217+00 | 25254667 | https://github.com/kubernetes/kubernetes | 25254667 | 85 | 2021-09-26 10:39:37.910535+00 | full | (null) | \xa2a6299e3527bbba548eec0f0ef80cca9e80f545 | git |
| 2021-09-26 11:06:38.084217+00 | 25254667 | https://github.com/kubernetes/kubernetes | 25254667 | 85 | 2021-09-26 09:11:21.008026+00 | created | (null) | (null) | git |
+-------------------------------+----------+------------------------------------------+----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
(2 rows)
Time: 27.147 ms
#+end_src
** DONE NixOS/nixpkgs
CLOSED: [2021-09-26 Sun 17:17]
ingestion:
#+begin_src sh
swhworker@worker17:~$ time swh loader -C /etc/softwareheritage/loader_oneshot.yml run git https://github.com/NixOS/nixpkgs
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/NixOS/nixpkgs' with type 'git'
Enumerating objects: 2562950, done.
Counting objects: 100% (272/272), done.
Compressing objects: 100% (148/148), done.
Total 2562950 (delta 162), reused 193 (delta 113), pack-reused 2562678
INFO:swh.loader.git.loader.GitLoader:Listed 117486 refs for repo https://github.com/NixOS/nixpkgs
{'status': 'eventful'}
real 462m33.180s
user 397m36.670s
sys 11m31.401s
#+end_src
snapshot: \xda0e3e4a3eff6fb6370259fd2bdfcf932fa6ac69
#+begin_src sql
13:15:44 softwareheritage@belvedere:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/NixOS/nixpkgs' and ovs.type='git' order by date desc limit 2;
+-------------------------------+---------+----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| now | id | url | origin | visit | date | status | metadata | snapshot | type |
+-------------------------------+---------+----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| 2021-09-26 17:17:55.707596+00 | 1927035 | https://github.com/NixOS/nixpkgs | 1927035 | 129 | 2021-09-26 17:01:22.199577+00 | full | (null) | \xda0e3e4a3eff6fb6370259fd2bdfcf932fa6ac69 | git |
| 2021-09-26 17:17:55.707596+00 | 1927035 | https://github.com/NixOS/nixpkgs | 1927035 | 129 | 2021-09-26 09:18:51.295111+00 | created | (null) | (null) | git |
+-------------------------------+---------+----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
(2 rows)
Time: 30.475 ms
#+end_src
** IN-PROGRESS CocoaPods/Specs
* TODO workers (staging, patched) [7/8]
** DONE keybase/client tryout
CLOSED: [2021-09-26 Sun 13:01]
This node will execute the loading of keybase/client repository with the current
loader.git (with memory issues) and with the dulwich patched with our proposed PR
First ingestion without any patch, the loading does not finish, it gets killed:
#+begin_src sh
swhworker@worker0:~/ve$ dpkg -l python3-swh.loader.git python3-dulwich | grep ii
ii python3-dulwich 0.19.11-2 amd64 Python Git library - Python3 module
ii python3-swh.loader.git 1.0.1-1~swh1~bpo10+1 all Software Heritage Git loader
swhworker@worker0:~$ time swh loader -C /etc/softwareheritage/loader_git.yml run git https://github.com/keybase/client
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/keybase/client' with type 'git'
Enumerating objects: 868404, done.
Counting objects: 100% (169/169), done.
Compressing objects: 100% (74/74), done.
Total 868404 (delta 104), reused 133 (delta 93), pack-reused 868235
INFO:swh.loader.git.loader.GitLoader:Listed 19852 refs for repo https://github.com/keybase/client
Killed
real 36m26.230s
user 4m29.100s
sys 0m34.254s
#+end_src
First, creating a venv, installing swh.loader.git and patching dulwich with [1].
The ingestion finishes:
#+begin_src sh
(ve) swhworker@worker0:~$ pip install swh.loader.git
...
(ve) swhworker@worker0:~$ pip list | grep dulwich
dulwich 0.20.25 # ----------------> assuming it's ok to diverge a bit
(ve) swhworker@worker0:~$ pip list | grep swh.loader.git
swh.loader.git 1.0.1
(ve) swhworker@worker0:~/ve$ time swh loader -C /etc/softwareheritage/loader_git.yml run git https://github.com/keybase
/client
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/keybase/client' with type 'git'
Enumerating objects: 868404, done.
Counting objects: 100% (169/169), done.
Compressing objects: 100% (74/74), done.
Total 868404 (delta 104), reused 133 (delta 93), pack-reused 868235
INFO:swh.loader.git.loader.GitLoader:Listed 19852 refs for repo https://github.com/keybase/client
{'status': 'eventful'}
real 183m33.441s
user 56m13.315s
sys 3m4.313s
#+end_src
Then another visit which finishes with the same snapshot:
#+begin_src sh
(ve) swhworker@worker0:~/ve$ time swh loader -C /etc/softwareheritage/loader_git.yml run git https://github.com/keybase/client
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/keybase/client' with type 'git'
INFO:swh.loader.git.loader.GitLoader:Listed 19852 refs for repo https://github.com/keybase/client
{'status': 'uneventful'}
real 0m18.420s
user 0m11.367s
sys 0m0.146s
#+end_src
The resulting snapshot is: \xcddaccc0a2d452098701dec921731e8c96630e2b
#+begin_src sql
07:15:14 swh@db1:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/keybase/client' and ovs.type='git' order by date desc limit 4;
+-------------------------------+---------+-----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| now | id | url | origin | visit | date | status | metadata | snapshot | type |
+-------------------------------+---------+-----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| 2021-09-26 05:15:23.337205+00 | 1314456 | https://github.com/keybase/client | 1314456 | 4 | 2021-09-26 05:13:05.650233+00 | full | (null) | \xcddaccc0a2d452098701dec921731e8c96630e2b | git |
| 2021-09-26 05:15:23.337205+00 | 1314456 | https://github.com/keybase/client | 1314456 | 4 | 2021-09-26 05:12:49.659434+00 | created | (null) | (null) | git |
| 2021-09-26 05:15:23.337205+00 | 1314456 | https://github.com/keybase/client | 1314456 | 3 | 2021-09-26 03:22:06.031157+00 | full | (null) | \xcddaccc0a2d452098701dec921731e8c96630e2b | git |
| 2021-09-26 05:15:23.337205+00 | 1314456 | https://github.com/keybase/client | 1314456 | 3 | 2021-09-26 03:21:49.699763+00 | created | (null) | (null) | git |
+-------------------------------+---------+-----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
(4 rows)
Time: 42.198 ms
#+end_src
Let's check the snapshot is the same on production. If it is, the patch is then right.
The snapshot is the same, no regression is introduced.
** DONE torvalds/linus
CLOSED: [2021-09-26 Sun 11:59]
ingestion (patched):
#+begin_src sh
(ve) swhworker@worker1:~/ve$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_git.yml swh loader run git https://github.com/torvalds/linux
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/torvalds/linux' with type 'git'
Enumerating objects: 8364066, done.
Total 8364066 (delta 0), reused 0 (delta 0), pack-reused 8364066
INFO:swh.loader.git.loader.GitLoader:Listed 1495 refs for repo https://github.com/torvalds/linux
{'status': 'eventful'}
real 1197m38.257s
user 331m18.023s
sys 7m5.545s
#+end_src
snapshot: \xde499fdc325524ee0e7c3f57c6c2ae6a09091845
#+begin_src sql
11:59:26 swh@db1:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/torvalds/linux' and ovs.type='git' order by date desc limit 2;
+-------------------------------+----+-----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| now | id | url | origin | visit | date | status | metadata | snapshot | type |
+-------------------------------+----+-----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| 2021-09-26 09:59:29.659689+00 | 13 | https://github.com/torvalds/linux | 13 | 9 | 2021-09-26 09:54:04.520031+00 | full | (null) | \xde499fdc325524ee0e7c3f57c6c2ae6a09091845 | git |
| 2021-09-26 09:59:29.659689+00 | 13 | https://github.com/torvalds/linux | 13 | 9 | 2021-09-25 13:56:28.279472+00 | created | (null) | (null) | git |
+-------------------------------+----+-----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
(2 rows)
Time: 7.659 ms
#+end_src
** DONE rdicosmo/parmap
CLOSED: [2021-09-26 Sun 11:45]
ingestion (patched):
#+begin_src sh
(ve) swhworker@worker0:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_git.yml swh loader run git https://github.com/rdicosmo/parmap
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/rdicosmo/parmap' with type 'git'
INFO:swh.loader.git.loader.GitLoader:Listed 108 refs for repo https://github.com/rdicosmo/parmap
{'status': 'uneventful'}
real 0m6.287s
user 0m1.332s
sys 0m0.081s
#+end_src
snapshot: \x2d869aa00591d2ac8ec8e7abacdda563d413189d
#+begin_src sql
11:42:11 swh@db1:5432=> select * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/rdicosmo/parmap' and ovs.type='git' order by date desc limit 2;
+----+------------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| id | url | origin | visit | date | status | metadata | snapshot | type |
+----+------------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| 14 | https://github.com/rdicosmo/parmap | 14 | 170 | 2021-09-26 09:41:48.690862+00 | full | (null) | \x2d869aa00591d2ac8ec8e7abacdda563d413189d | git |
| 14 | https://github.com/rdicosmo/parmap | 14 | 170 | 2021-09-26 09:41:44.382767+00 | created | (null) | (null) | git |
+----+------------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
(2 rows)
Time: 7.702 ms
#+end_src
** DONE hylang/hy
CLOSED: [2021-09-26 Sun 11:57]
ingestion (patched):
#+begin_src sh
(ve) swhworker@worker0:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_git.yml swh loader run git https://github.com/hylang/hy
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/hylang/hy' with type 'git'
Enumerating objects: 19814, done.
Counting objects: 100% (2898/2898), done.
Compressing objects: 100% (881/881), done.
Total 19814 (delta 2119), reused 2669 (delta 2009), pack-reused 16916
INFO:swh.loader.git.loader.GitLoader:Listed 1140 refs for repo https://github.com/hylang/hy
{'status': 'eventful'}
real 1m17.035s
user 0m17.007s
sys 0m0.217s
#+end_src
snapshot: \x821f28af45edaedc6f70b84c9bc4d407e7436452
#+begin_src sql
11:42:14 swh@db1:5432=> select * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/hylang/hy' and ovs.type='git' order by date desc limit 2;
+----+------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| id | url | origin | visit | date | status | metadata | snapshot | type |
+----+------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| 15 | https://github.com/hylang/hy | 15 | 2 | 2021-09-26 09:51:06.405326+00 | full | (null) | \x821f28af45edaedc6f70b84c9bc4d407e7436452 | git |
| 15 | https://github.com/hylang/hy | 15 | 2 | 2021-09-26 09:49:50.879993+00 | created | (null) | (null) | git |
+----+------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
(2 rows)
Time: 8.428 ms
#+end_src
** DONE hylang/hyrule
CLOSED: [2021-09-26 Sun 13:19]
ingestion:
#+begin_src sh
(ve) swhworker@worker1:~$ SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_git.yml swh loader run git https://github.com/hylang/hyrule
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/hylang/hyrule' with type 'git'
Enumerating objects: 220, done.
Counting objects: 100% (208/208), done.
Compressing objects: 100% (121/121), done.
Total 220 (delta 97), reused 164 (delta 83), pack-reused 12
INFO:swh.loader.git.loader.GitLoader:Listed 5 refs for repo https://github.com/hylang/hyrule
#+end_src
(forgot to time it but it was almost immediate as the current prod worker)
snapshot: \x882db61b629bd9f2c7ef3492924e3ff73382d3a6
#+begin_src sql
13:03:26 swh@db1:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/hylang/hyrule' and ovs.type='git' order by date desc limit 2;
+-------------------------------+---------+----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| now | id | url | origin | visit | date | status | metadata | snapshot | type |
+-------------------------------+---------+----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| 2021-09-26 11:16:16.318671+00 | 1358887 | https://github.com/hylang/hyrule | 1358887 | 1 | 2021-09-26 11:14:39.858552+00 | full | (null) | \x882db61b629bd9f2c7ef3492924e3ff73382d3a6 | git |
| 2021-09-26 11:16:16.318671+00 | 1358887 | https://github.com/hylang/hyrule | 1358887 | 1 | 2021-09-26 11:14:33.31341+00 | created | (null) | (null) | git |
+-------------------------------+---------+----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
(2 rows)
Time: 7.015 ms
#+end_src
** DONE kubernetes/kubernetes
CLOSED: [2021-09-26 Sun 13:19]
ingestion:
#+begin_src sh
(ve) swhworker@worker0:~$ time swh loader -C /etc/softwareheritage/loader_git.yml run git https://github.com/kubernetes/kubernetes
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/kubernetes/kubernetes' with type 'git'
Enumerating objects: 1965974, done.
Counting objects: 100% (2460/2460), done.
Compressing objects: 100% (1377/1377), done.
Total 1965974 (delta 1373), reused 1136 (delta 1079), pack-reused 1963514
INFO:swh.loader.git.loader.GitLoader:Listed 87799 refs for repo https://github.com/kubernetes/kubernetes
{'status': 'eventful'}
real 95m21.555s
user 41m1.660s
sys 1m31.310s
#+end_src
snapshot: \xa2a6299e3527bbba548eec0f0ef80cca9e80f545
#+begin_src sql
11:59:29 swh@db1:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/kubernetes/kubernetes' and ovs.type='git' order by date desc limit 2;
+-------------------------------+---------+------------------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| now | id | url | origin | visit | date | status | metadata | snapshot | type |
+-------------------------------+---------+------------------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| 2021-09-26 11:03:26.822212+00 | 1356011 | https://github.com/kubernetes/kubernetes | 1356011 | 1 | 2021-09-26 10:46:41.377103+00 | full | (null) | \xa2a6299e3527bbba548eec0f0ef80cca9e80f545 | git |
| 2021-09-26 11:03:26.822212+00 | 1356011 | https://github.com/kubernetes/kubernetes | 1356011 | 1 | 2021-09-26 09:11:21.746195+00 | created | (null) | (null) | git |
+-------------------------------+---------+------------------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
(2 rows)
Time: 6.178 ms
#+end_src
** DONE NixOS/nixpkgs
CLOSED: [2021-09-27 Mon 09:58]
ingestion:
#+begin_src sh
(ve) swhworker@worker2:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_git.yml swh loader run git https://github.com/NixOS/nixpkgs
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/NixOS/nixpkgs' with type 'git'
Enumerating objects: 2829935, done.
Counting objects: 100% (3480/3480), done.
Compressing objects: 100% (235/235), done.
Total 2829935 (delta 3339), reused 3317 (delta 3234), pack-reused 2826455
INFO:swh.loader.git.loader.GitLoader:Listed 117486 refs for repo https://github.com/NixOS/nixpkgs
{'status': 'eventful'}
real 498m20.101s
user 220m28.927s
sys 3m47.302s
#+end_src
snapshot: \xda0e3e4a3eff6fb6370259fd2bdfcf932fa6ac69
#+begin_src sql
09:57:29 swh@db1:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/NixOS/nixpkgs' and ovs.type='git' order by date desc limit 2;
+------------------------------+----+----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| now | id | url | origin | visit | date | status | metadata | snapshot | type |
+------------------------------+----+----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
| 2021-09-27 07:57:31.15406+00 | 18 | https://github.com/NixOS/nixpkgs | 18 | 6 | 2021-09-26 17:37:11.095706+00 | full | (null) | \xda0e3e4a3eff6fb6370259fd2bdfcf932fa6ac69 | git |
| 2021-09-27 07:57:31.15406+00 | 18 | https://github.com/NixOS/nixpkgs | 18 | 6 | 2021-09-26 09:18:52.800631+00 | created | (null) | (null) | git |
+------------------------------+----+----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+
(2 rows)
Time: 20.222 ms
#+end_src
** IN-PROGRESS CocoaPods/Specs