#+title: Patching Dulwich to decrease memory footprint #+author: ardumont In the following analysis, we will executes multiple ingestions with and without the patched [1] dulwich version. The idea of this analysis is to ensure that the patch actually diminishes the footprint memory sufficiently for the ingestion to run completely with our standard workers without impeding the standard swh hash computations. For this last check, we ensure the snapshot hashes are the same at the end of the ingestions (with standard and patched workers). As the swh model is a merkle dag and the snapshot is the top-level model objects, that's enough. tl; dr, the patch fixes the problem without hash divergences. |-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------| | origin (large) | run (standard) | run (patched) | snapshot hash comparison | snapshot hash | | | X: failed | | ok: same hash both with | | | | ok: visit finished | | standard/patched version | | | | X: visit did not finish | | | | |-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------| | keybase/client | X (worker0.staging) | ok | ok | \xcddaccc0a2d452098701dec921731e8c96630e2b | | keybase/client | ok | | | | |-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------| | torvalds/linux | ok | ok | ok | \xde499fdc325524ee0e7c3f57c6c2ae6a09091845 | |-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------| | kubernetes/kubernetes | ok | ok | ok | \xa2a6299e3527bbba548eec0f0ef80cca9e80f545 | |-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------| | NixOS/nixpkgs | ok | ok | ok | \xda0e3e4a3eff6fb6370259fd2bdfcf932fa6ac69 | |-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------| | CocoaPods/Specs | ongoing | ongoing | ongoing | | |-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------| | origin (medium) | run (standard) | run (patched) | snapshot hash comparison | snapshot | |-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------| | rdicosmo/parmap | ok | ok | ok | \x2d869aa00591d2ac8ec8e7abacdda563d413189d | |-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------| | hylang/hy | ok | ok | ok | \x821f28af45edaedc6f70b84c9bc4d407e7436452 | |-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------| | hylang/hyrule | ok | ok | ok | \x882db61b629bd9f2c7ef3492924e3ff73382d3a6 | |-----------------------+-------------------------+---------------+--------------------------+--------------------------------------------| If you are not interested about the details, you can stop reading. Otherwise, feel free to continue. 3 nodes are being used: - worker17 (production): overall large machine (64gib ram, 20 cpus) able to handle current large repository ingestion without it being killed. Ingestion is expected to work as is, given enough time. The resulting snapshots after loading become the references. - worker[0:2] (staging): Those nodes are smaller and they will fail (OOM kill) the ingestion with their current spec (12Gib ram, 4 cpus) for large repositories without a dulwich version patched. With the patch applied, the loading is expected to work. We shall then be able to compare the snapshots between the runs. The resulting snapshots should be the same as the one generated on worker17 [2] Note that the ingestion timing is not important for the analysis. It's expected the staging workers are slowers since the machines are not running the same specs. Plus, the underlying database does not hold the same information, the production one is more complete than the staging one (although it's less loaded). It's added to roughly have an idea of the order of magnitude of the time it takes to ingest those. Again, the most important criteria are the ingestion must finish with the same snapshot. [1] git pack walking in DFS instead of BFS order https://github.com/dulwich/dulwich/pull/903 [2] providing the origins ingested were the same at the time of ingestion (snapshot also depends on data being the same). * TODO worker17 (production, standard version) [7/8] ** DONE keybase/client run CLOSED: [2021-09-26 Sun 07:30] #+begin_src sh swhworker@worker17:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_oneshot.yml swh loader run git https://github.com/keybase/client INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/keybase/client' with type 'git' Enumerating objects: 431414, done. Counting objects: 100% (76/76), done. Compressing objects: 100% (45/45), done. Total 431414 (delta 33), reused 67 (delta 29), pack-reused 431338 INFO:swh.loader.git.loader.GitLoader:Listed 19852 refs for repo https://github.com/keybase/client {'status': 'eventful'} real 64m21.848s user 51m58.363s sys 6m38.612s swhworker@worker17:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_oneshot.yml swh loader run git https://github.com/keybase/client INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/keybase/client' with type 'git' INFO:swh.loader.git.loader.GitLoader:Listed 19852 refs for repo https://github.com/keybase/client ^[[A{'status': 'uneventful'} real 0m29.048s user 0m24.279s sys 0m0.742s #+end_src Snapshot: \xcddaccc0a2d452098701dec921731e8c96630e2b #+begin_src sh 07:13:24 softwareheritage@belvedere:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/keybase/client' and ovs.type='git' order by date desc limit 4; +-------------------------------+----------+-----------------------------------+----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | now | id | url | origin | visit | date | status | metadata | snapshot | type | +-------------------------------+----------+-----------------------------------+----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | 2021-09-26 05:14:23.035696+00 | 34438017 | https://github.com/keybase/client | 34438017 | 113 | 2021-09-26 05:13:14.294663+00 | full | (null) | \xcddaccc0a2d452098701dec921731e8c96630e2b | git | | 2021-09-26 05:14:23.035696+00 | 34438017 | https://github.com/keybase/client | 34438017 | 113 | 2021-09-26 05:12:47.255969+00 | created | (null) | (null) | git | | 2021-09-26 05:14:23.035696+00 | 34438017 | https://github.com/keybase/client | 34438017 | 112 | 2021-09-26 04:28:44.02918+00 | full | (null) | \xcddaccc0a2d452098701dec921731e8c96630e2b | git | | 2021-09-26 05:14:23.035696+00 | 34438017 | https://github.com/keybase/client | 34438017 | 112 | 2021-09-26 03:24:25.041466+00 | created | (null) | (null) | git | +-------------------------------+----------+-----------------------------------+----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ (4 rows) Time: 8.674 ms #+end_src ** DONE torvalds/linux CLOSED: [2021-09-26 Sun 11:42] #+begin_src sh swhworker@worker17:~$ time swh loader -C /etc/softwareheritage/loader_oneshot.yml run git https://github.com/torvalds/linux INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/torvalds/linux' with type 'git' Enumerating objects: 8350856, done. Total 8350856 (delta 0), reused 0 (delta 0), pack-reused 8350856 INFO:swh.loader.git.loader.GitLoader:Listed 1495 refs for repo https://github.com/torvalds/linux {'status': 'eventful'} real 1095m42.647s user 891m49.090s sys 34m46.475s #+end_src snapshot: \xde499fdc325524ee0e7c3f57c6c2ae6a09091845 #+begin_src sql 11:02:07 softwareheritage@belvedere:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/torvalds/linux' and ovs.type='git' order by date desc limit 2; +-------------------------------+----+-----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | now | id | url | origin | visit | date | status | metadata | snapshot | type | +-------------------------------+----+-----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | 2021-09-26 09:02:11.975793+00 | 2 | https://github.com/torvalds/linux | 2 | 74 | 2021-09-26 08:11:27.373415+00 | full | (null) | \xde499fdc325524ee0e7c3f57c6c2ae6a09091845 | git | | 2021-09-26 09:02:11.975793+00 | 2 | https://github.com/torvalds/linux | 2 | 74 | 2021-09-25 13:55:47.466862+00 | created | (null) | (null) | git | +-------------------------------+----+-----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ (2 rows) #+end_src ** DONE rdicosmo/parmap CLOSED: [2021-09-26 Sun 11:43] ingestion: #+begin_src sh swhworker@worker17:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_oneshot.yml swh loader run git https://github.com/rdicosmo/parmap INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/rdicosmo/parmap' with type 'git' INFO:swh.loader.git.loader.GitLoader:Listed 108 refs for repo https://github.com/rdicosmo/parmap {'status': 'uneventful'} real 0m3.547s user 0m2.014s sys 0m0.444s #+end_src snapshot: \x2d869aa00591d2ac8ec8e7abacdda563d413189d #+begin_src sql 11:28:34 softwareheritage@belvedere:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/rdicosmo/parmap' and ovs.type='git' order by date desc limit 2; +-------------------------------+---------+------------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | now | id | url | origin | visit | date | status | metadata | snapshot | type | +-------------------------------+---------+------------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | 2021-09-26 09:42:22.148212+00 | 2695882 | https://github.com/rdicosmo/parmap | 2695882 | 220 | 2021-09-26 09:41:47.376616+00 | full | (null) | \x2d869aa00591d2ac8ec8e7abacdda563d413189d | git | | 2021-09-26 09:42:22.148212+00 | 2695882 | https://github.com/rdicosmo/parmap | 2695882 | 220 | 2021-09-26 09:41:46.242228+00 | created | (null) | (null) | git | +-------------------------------+---------+------------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ (2 rows) #+end_src ** DONE hylang/hy CLOSED: [2021-09-26 Sun 11:58] ingestion: #+begin_src sh swhworker@worker17:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_oneshot.yml swh loader run git https://github.com/hylang/hy INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/hylang/hy' with type 'git' INFO:swh.loader.git.loader.GitLoader:Listed 1140 refs for repo https://github.com/hylang/hy {'status': 'uneventful'} real 0m3.149s user 0m1.933s sys 0m0.406s #+end_src snapshot: \x821f28af45edaedc6f70b84c9bc4d407e7436452 #+begin_src sql 11:42:22 softwareheritage@belvedere:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/hylang/hy' and ovs.type='git' order by date desc limit 2; +------------------------------+----+------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | now | id | url | origin | visit | date | status | metadata | snapshot | type | +------------------------------+----+------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | 2021-09-26 09:51:36.06779+00 | 1 | https://github.com/hylang/hy | 1 | 37 | 2021-09-26 09:49:55.773545+00 | full | (null) | \x821f28af45edaedc6f70b84c9bc4d407e7436452 | git | | 2021-09-26 09:51:36.06779+00 | 1 | https://github.com/hylang/hy | 1 | 37 | 2021-09-26 09:49:54.686042+00 | created | (null) | (null) | git | +------------------------------+----+------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ (2 rows) Time: 6.993 ms #+end_src ** DONE hylang/hyrule CLOSED: [2021-09-26 Sun 13:19] ingestion: #+begin_src sh swhworker@worker17:~$ time swh loader -C /etc/softwareheritage/loader_oneshot.yml run git https://github.com/hylang/hyrule INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/hylang/hyrule' with type 'git' Enumerating objects: 220, done. Counting objects: 100% (208/208), done. Compressing objects: 100% (121/121), done. Total 220 (delta 97), reused 164 (delta 83), pack-reused 12 INFO:swh.loader.git.loader.GitLoader:Listed 5 refs for repo https://github.com/hylang/hyrule {'status': 'eventful'} real 0m14.348s user 0m1.861s sys 0m0.492s #+end_src snapshot: \x882db61b629bd9f2c7ef3492924e3ff73382d3a6 #+begin_src sql 13:06:38 softwareheritage@belvedere:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/hylang/hyrule' and ovs.type='git' order by date desc limit 2; +-------------------------------+-----------+----------------------------------+-----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | now | id | url | origin | visit | date | status | metadata | snapshot | type | +-------------------------------+-----------+----------------------------------+-----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | 2021-09-26 11:15:44.138959+00 | 164246965 | https://github.com/hylang/hyrule | 164246965 | 1 | 2021-09-26 11:14:46.964276+00 | full | (null) | \x882db61b629bd9f2c7ef3492924e3ff73382d3a6 | git | | 2021-09-26 11:15:44.138959+00 | 164246965 | https://github.com/hylang/hyrule | 164246965 | 1 | 2021-09-26 11:14:34.557485+00 | created | (null) | (null) | git | +-------------------------------+-----------+----------------------------------+-----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ (2 rows) Time: 5.845 ms #+end_src ** DONE kubernetes/kubernetes CLOSED: [2021-09-26 Sun 13:19] ingestion: #+begin_src sh swhworker@worker17:~$ time swh loader -C /etc/softwareheritage/loader_oneshot.yml run git https://github.com/kubernetes/kubernetes INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/kubernetes/kubernetes' with type 'git' Enumerating objects: 1177718, done. Counting objects: 100% (458/458), done. Compressing objects: 100% (234/234), done. Total 1177718 (delta 256), reused 276 (delta 220), pack-reused 1177260 INFO:swh.loader.git.loader.GitLoader:Listed 87799 refs for repo https://github.com/kubernetes/kubernetes {'status': 'eventful'} real 88m19.138s user 65m20.258s sys 3m33.093s #+end_src snapshot: \xa2a6299e3527bbba548eec0f0ef80cca9e80f545 #+begin_src sql 11:59:20 softwareheritage@belvedere:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/kubernetes/kubernetes' and ovs.type='git' order by date desc limit 2; +-------------------------------+----------+------------------------------------------+----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | now | id | url | origin | visit | date | status | metadata | snapshot | type | +-------------------------------+----------+------------------------------------------+----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | 2021-09-26 11:06:38.084217+00 | 25254667 | https://github.com/kubernetes/kubernetes | 25254667 | 85 | 2021-09-26 10:39:37.910535+00 | full | (null) | \xa2a6299e3527bbba548eec0f0ef80cca9e80f545 | git | | 2021-09-26 11:06:38.084217+00 | 25254667 | https://github.com/kubernetes/kubernetes | 25254667 | 85 | 2021-09-26 09:11:21.008026+00 | created | (null) | (null) | git | +-------------------------------+----------+------------------------------------------+----------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ (2 rows) Time: 27.147 ms #+end_src ** DONE NixOS/nixpkgs CLOSED: [2021-09-26 Sun 17:17] ingestion: #+begin_src sh swhworker@worker17:~$ time swh loader -C /etc/softwareheritage/loader_oneshot.yml run git https://github.com/NixOS/nixpkgs INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/NixOS/nixpkgs' with type 'git' Enumerating objects: 2562950, done. Counting objects: 100% (272/272), done. Compressing objects: 100% (148/148), done. Total 2562950 (delta 162), reused 193 (delta 113), pack-reused 2562678 INFO:swh.loader.git.loader.GitLoader:Listed 117486 refs for repo https://github.com/NixOS/nixpkgs {'status': 'eventful'} real 462m33.180s user 397m36.670s sys 11m31.401s #+end_src snapshot: \xda0e3e4a3eff6fb6370259fd2bdfcf932fa6ac69 #+begin_src sql 13:15:44 softwareheritage@belvedere:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/NixOS/nixpkgs' and ovs.type='git' order by date desc limit 2; +-------------------------------+---------+----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | now | id | url | origin | visit | date | status | metadata | snapshot | type | +-------------------------------+---------+----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | 2021-09-26 17:17:55.707596+00 | 1927035 | https://github.com/NixOS/nixpkgs | 1927035 | 129 | 2021-09-26 17:01:22.199577+00 | full | (null) | \xda0e3e4a3eff6fb6370259fd2bdfcf932fa6ac69 | git | | 2021-09-26 17:17:55.707596+00 | 1927035 | https://github.com/NixOS/nixpkgs | 1927035 | 129 | 2021-09-26 09:18:51.295111+00 | created | (null) | (null) | git | +-------------------------------+---------+----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ (2 rows) Time: 30.475 ms #+end_src ** IN-PROGRESS CocoaPods/Specs * TODO workers (staging, patched) [7/8] ** DONE keybase/client tryout CLOSED: [2021-09-26 Sun 13:01] This node will execute the loading of keybase/client repository with the current loader.git (with memory issues) and with the dulwich patched with our proposed PR First ingestion without any patch, the loading does not finish, it gets killed: #+begin_src sh swhworker@worker0:~/ve$ dpkg -l python3-swh.loader.git python3-dulwich | grep ii ii python3-dulwich 0.19.11-2 amd64 Python Git library - Python3 module ii python3-swh.loader.git 1.0.1-1~swh1~bpo10+1 all Software Heritage Git loader swhworker@worker0:~$ time swh loader -C /etc/softwareheritage/loader_git.yml run git https://github.com/keybase/client INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/keybase/client' with type 'git' Enumerating objects: 868404, done. Counting objects: 100% (169/169), done. Compressing objects: 100% (74/74), done. Total 868404 (delta 104), reused 133 (delta 93), pack-reused 868235 INFO:swh.loader.git.loader.GitLoader:Listed 19852 refs for repo https://github.com/keybase/client Killed real 36m26.230s user 4m29.100s sys 0m34.254s #+end_src First, creating a venv, installing swh.loader.git and patching dulwich with [1]. The ingestion finishes: #+begin_src sh (ve) swhworker@worker0:~$ pip install swh.loader.git ... (ve) swhworker@worker0:~$ pip list | grep dulwich dulwich 0.20.25 # ----------------> assuming it's ok to diverge a bit (ve) swhworker@worker0:~$ pip list | grep swh.loader.git swh.loader.git 1.0.1 (ve) swhworker@worker0:~/ve$ time swh loader -C /etc/softwareheritage/loader_git.yml run git https://github.com/keybase /client INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/keybase/client' with type 'git' Enumerating objects: 868404, done. Counting objects: 100% (169/169), done. Compressing objects: 100% (74/74), done. Total 868404 (delta 104), reused 133 (delta 93), pack-reused 868235 INFO:swh.loader.git.loader.GitLoader:Listed 19852 refs for repo https://github.com/keybase/client {'status': 'eventful'} real 183m33.441s user 56m13.315s sys 3m4.313s #+end_src Then another visit which finishes with the same snapshot: #+begin_src sh (ve) swhworker@worker0:~/ve$ time swh loader -C /etc/softwareheritage/loader_git.yml run git https://github.com/keybase/client INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/keybase/client' with type 'git' INFO:swh.loader.git.loader.GitLoader:Listed 19852 refs for repo https://github.com/keybase/client {'status': 'uneventful'} real 0m18.420s user 0m11.367s sys 0m0.146s #+end_src The resulting snapshot is: \xcddaccc0a2d452098701dec921731e8c96630e2b #+begin_src sql 07:15:14 swh@db1:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/keybase/client' and ovs.type='git' order by date desc limit 4; +-------------------------------+---------+-----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | now | id | url | origin | visit | date | status | metadata | snapshot | type | +-------------------------------+---------+-----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | 2021-09-26 05:15:23.337205+00 | 1314456 | https://github.com/keybase/client | 1314456 | 4 | 2021-09-26 05:13:05.650233+00 | full | (null) | \xcddaccc0a2d452098701dec921731e8c96630e2b | git | | 2021-09-26 05:15:23.337205+00 | 1314456 | https://github.com/keybase/client | 1314456 | 4 | 2021-09-26 05:12:49.659434+00 | created | (null) | (null) | git | | 2021-09-26 05:15:23.337205+00 | 1314456 | https://github.com/keybase/client | 1314456 | 3 | 2021-09-26 03:22:06.031157+00 | full | (null) | \xcddaccc0a2d452098701dec921731e8c96630e2b | git | | 2021-09-26 05:15:23.337205+00 | 1314456 | https://github.com/keybase/client | 1314456 | 3 | 2021-09-26 03:21:49.699763+00 | created | (null) | (null) | git | +-------------------------------+---------+-----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ (4 rows) Time: 42.198 ms #+end_src Let's check the snapshot is the same on production. If it is, the patch is then right. The snapshot is the same, no regression is introduced. ** DONE torvalds/linus CLOSED: [2021-09-26 Sun 11:59] ingestion (patched): #+begin_src sh (ve) swhworker@worker1:~/ve$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_git.yml swh loader run git https://github.com/torvalds/linux INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/torvalds/linux' with type 'git' Enumerating objects: 8364066, done. Total 8364066 (delta 0), reused 0 (delta 0), pack-reused 8364066 INFO:swh.loader.git.loader.GitLoader:Listed 1495 refs for repo https://github.com/torvalds/linux {'status': 'eventful'} real 1197m38.257s user 331m18.023s sys 7m5.545s #+end_src snapshot: \xde499fdc325524ee0e7c3f57c6c2ae6a09091845 #+begin_src sql 11:59:26 swh@db1:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/torvalds/linux' and ovs.type='git' order by date desc limit 2; +-------------------------------+----+-----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | now | id | url | origin | visit | date | status | metadata | snapshot | type | +-------------------------------+----+-----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | 2021-09-26 09:59:29.659689+00 | 13 | https://github.com/torvalds/linux | 13 | 9 | 2021-09-26 09:54:04.520031+00 | full | (null) | \xde499fdc325524ee0e7c3f57c6c2ae6a09091845 | git | | 2021-09-26 09:59:29.659689+00 | 13 | https://github.com/torvalds/linux | 13 | 9 | 2021-09-25 13:56:28.279472+00 | created | (null) | (null) | git | +-------------------------------+----+-----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ (2 rows) Time: 7.659 ms #+end_src ** DONE rdicosmo/parmap CLOSED: [2021-09-26 Sun 11:45] ingestion (patched): #+begin_src sh (ve) swhworker@worker0:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_git.yml swh loader run git https://github.com/rdicosmo/parmap INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/rdicosmo/parmap' with type 'git' INFO:swh.loader.git.loader.GitLoader:Listed 108 refs for repo https://github.com/rdicosmo/parmap {'status': 'uneventful'} real 0m6.287s user 0m1.332s sys 0m0.081s #+end_src snapshot: \x2d869aa00591d2ac8ec8e7abacdda563d413189d #+begin_src sql 11:42:11 swh@db1:5432=> select * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/rdicosmo/parmap' and ovs.type='git' order by date desc limit 2; +----+------------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | id | url | origin | visit | date | status | metadata | snapshot | type | +----+------------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | 14 | https://github.com/rdicosmo/parmap | 14 | 170 | 2021-09-26 09:41:48.690862+00 | full | (null) | \x2d869aa00591d2ac8ec8e7abacdda563d413189d | git | | 14 | https://github.com/rdicosmo/parmap | 14 | 170 | 2021-09-26 09:41:44.382767+00 | created | (null) | (null) | git | +----+------------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ (2 rows) Time: 7.702 ms #+end_src ** DONE hylang/hy CLOSED: [2021-09-26 Sun 11:57] ingestion (patched): #+begin_src sh (ve) swhworker@worker0:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_git.yml swh loader run git https://github.com/hylang/hy INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/hylang/hy' with type 'git' Enumerating objects: 19814, done. Counting objects: 100% (2898/2898), done. Compressing objects: 100% (881/881), done. Total 19814 (delta 2119), reused 2669 (delta 2009), pack-reused 16916 INFO:swh.loader.git.loader.GitLoader:Listed 1140 refs for repo https://github.com/hylang/hy {'status': 'eventful'} real 1m17.035s user 0m17.007s sys 0m0.217s #+end_src snapshot: \x821f28af45edaedc6f70b84c9bc4d407e7436452 #+begin_src sql 11:42:14 swh@db1:5432=> select * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/hylang/hy' and ovs.type='git' order by date desc limit 2; +----+------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | id | url | origin | visit | date | status | metadata | snapshot | type | +----+------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | 15 | https://github.com/hylang/hy | 15 | 2 | 2021-09-26 09:51:06.405326+00 | full | (null) | \x821f28af45edaedc6f70b84c9bc4d407e7436452 | git | | 15 | https://github.com/hylang/hy | 15 | 2 | 2021-09-26 09:49:50.879993+00 | created | (null) | (null) | git | +----+------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ (2 rows) Time: 8.428 ms #+end_src ** DONE hylang/hyrule CLOSED: [2021-09-26 Sun 13:19] ingestion: #+begin_src sh (ve) swhworker@worker1:~$ SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_git.yml swh loader run git https://github.com/hylang/hyrule INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/hylang/hyrule' with type 'git' Enumerating objects: 220, done. Counting objects: 100% (208/208), done. Compressing objects: 100% (121/121), done. Total 220 (delta 97), reused 164 (delta 83), pack-reused 12 INFO:swh.loader.git.loader.GitLoader:Listed 5 refs for repo https://github.com/hylang/hyrule #+end_src (forgot to time it but it was almost immediate as the current prod worker) snapshot: \x882db61b629bd9f2c7ef3492924e3ff73382d3a6 #+begin_src sql 13:03:26 swh@db1:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/hylang/hyrule' and ovs.type='git' order by date desc limit 2; +-------------------------------+---------+----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | now | id | url | origin | visit | date | status | metadata | snapshot | type | +-------------------------------+---------+----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | 2021-09-26 11:16:16.318671+00 | 1358887 | https://github.com/hylang/hyrule | 1358887 | 1 | 2021-09-26 11:14:39.858552+00 | full | (null) | \x882db61b629bd9f2c7ef3492924e3ff73382d3a6 | git | | 2021-09-26 11:16:16.318671+00 | 1358887 | https://github.com/hylang/hyrule | 1358887 | 1 | 2021-09-26 11:14:33.31341+00 | created | (null) | (null) | git | +-------------------------------+---------+----------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ (2 rows) Time: 7.015 ms #+end_src ** DONE kubernetes/kubernetes CLOSED: [2021-09-26 Sun 13:19] ingestion: #+begin_src sh (ve) swhworker@worker0:~$ time swh loader -C /etc/softwareheritage/loader_git.yml run git https://github.com/kubernetes/kubernetes INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/kubernetes/kubernetes' with type 'git' Enumerating objects: 1965974, done. Counting objects: 100% (2460/2460), done. Compressing objects: 100% (1377/1377), done. Total 1965974 (delta 1373), reused 1136 (delta 1079), pack-reused 1963514 INFO:swh.loader.git.loader.GitLoader:Listed 87799 refs for repo https://github.com/kubernetes/kubernetes {'status': 'eventful'} real 95m21.555s user 41m1.660s sys 1m31.310s #+end_src snapshot: \xa2a6299e3527bbba548eec0f0ef80cca9e80f545 #+begin_src sql 11:59:29 swh@db1:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/kubernetes/kubernetes' and ovs.type='git' order by date desc limit 2; +-------------------------------+---------+------------------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | now | id | url | origin | visit | date | status | metadata | snapshot | type | +-------------------------------+---------+------------------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | 2021-09-26 11:03:26.822212+00 | 1356011 | https://github.com/kubernetes/kubernetes | 1356011 | 1 | 2021-09-26 10:46:41.377103+00 | full | (null) | \xa2a6299e3527bbba548eec0f0ef80cca9e80f545 | git | | 2021-09-26 11:03:26.822212+00 | 1356011 | https://github.com/kubernetes/kubernetes | 1356011 | 1 | 2021-09-26 09:11:21.746195+00 | created | (null) | (null) | git | +-------------------------------+---------+------------------------------------------+---------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ (2 rows) Time: 6.178 ms #+end_src ** DONE NixOS/nixpkgs CLOSED: [2021-09-27 Mon 09:58] ingestion: #+begin_src sh (ve) swhworker@worker2:~$ time SWH_CONFIG_FILENAME=/etc/softwareheritage/loader_git.yml swh loader run git https://github.com/NixOS/nixpkgs INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/NixOS/nixpkgs' with type 'git' Enumerating objects: 2829935, done. Counting objects: 100% (3480/3480), done. Compressing objects: 100% (235/235), done. Total 2829935 (delta 3339), reused 3317 (delta 3234), pack-reused 2826455 INFO:swh.loader.git.loader.GitLoader:Listed 117486 refs for repo https://github.com/NixOS/nixpkgs {'status': 'eventful'} real 498m20.101s user 220m28.927s sys 3m47.302s #+end_src snapshot: \xda0e3e4a3eff6fb6370259fd2bdfcf932fa6ac69 #+begin_src sql 09:57:29 swh@db1:5432=> select now(), * from origin o inner join origin_visit_status ovs on o.id=ovs.origin where o.url = 'https://github.com/NixOS/nixpkgs' and ovs.type='git' order by date desc limit 2; +------------------------------+----+----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | now | id | url | origin | visit | date | status | metadata | snapshot | type | +------------------------------+----+----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ | 2021-09-27 07:57:31.15406+00 | 18 | https://github.com/NixOS/nixpkgs | 18 | 6 | 2021-09-26 17:37:11.095706+00 | full | (null) | \xda0e3e4a3eff6fb6370259fd2bdfcf932fa6ac69 | git | | 2021-09-27 07:57:31.15406+00 | 18 | https://github.com/NixOS/nixpkgs | 18 | 6 | 2021-09-26 09:18:52.800631+00 | created | (null) | (null) | git | +------------------------------+----+----------------------------------+--------+-------+-------------------------------+---------+----------+--------------------------------------------+------+ (2 rows) Time: 20.222 ms #+end_src ** IN-PROGRESS CocoaPods/Specs