#+title: About dropping ingestion of PR-like branches in the loader git * synthesis Code got adapted so the loader git actually retrieves the full packfile (D6550). Then compares with the size limit and raise if too big. Then runs got done both without the patch, the standard git loader as it is deployed today, and with the patch D6401 which filters out PR branches. Both origins are too big and the ingestion does not happen. Here is the summary: |--------------------------------------+-------------+--------------------------+-----------------------| | origin | load status | packfile size (no patch) | packfile size (patch) | |--------------------------------------+-------------+--------------------------+-----------------------| | https://github.com/chromium/chromium | raise | 25747762082 | 25638636164 | | https://github.com/WebKit/WebKit | raise | 8117017915 | 8096817703 | |--------------------------------------+-------------+--------------------------+-----------------------| * https://github.com/chromium/chromium ** second run (computes full packfile size then crash) *** no patch Version: #+begin_src sh swh-loader_1 | Successfully installed swh.loader.git-1.1.4.dev3+g31a0c77 #+end_src Load fails: #+begin_src sh /usr/bin/time -v swh-doco exec swh-loader swh loader run git https://github.com/chromium/chromium + cd /home/tony/work/inria/repo/swh/swh-environment/docker + docker-compose -f docker-compose.yml -f docker-compose.override.yml exec swh-loader swh loader run git https://github.com/chromium/chromium INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/chromium/chromium' with type 'git' Enumerating objects: 17065959, done. Counting objects: 100% (39817/39817), done. Compressing objects: 100% (17935/17935), done. Total 17065959 (delta 22959), reused 36082 (delta 20313), pack-reused 17026142 INFO:swh.loader.git.loader:fetched_pack_size=25747762082 ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` status Traceback (most recent call last): File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 338, in load more_data_to_fetch = self.fetch_data() File "/src/swh-loader-git/swh/loader/git/loader.py", line 262, in fetch_data self.origin.url, base_repo, do_progress File "/src/swh-loader-git/swh/loader/git/loader.py", line 206, in fetch_pack_from_origin f"Pack file too big for repository {origin_url}, " OSError: Pack file too big for repository https://github.com/chromium/chromium, limit is 4294967296 bytes, current size is 25747762082 {'status': 'failed'} Command being timed: "swh-doco exec swh-loader swh loader run git https://github.com/chromium/chromium" User time (seconds): 0.71 System time (seconds): 0.68 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 15:42.52 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 56380 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 23982 Voluntary context switches: 2697 Involuntary context switches: 175 Swaps: 0 File system inputs: 120 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 #+end_src *** patch Version: #+begin_src sh swh-loader_1 | Successfully installed swh.loader.git-1.1.4.dev4+g3254ad7 #+end_src Load fails: #+begin_src sh /usr/bin/time -v swh-doco exec swh-loader swh loader run git https://github.com/chromium/chromium + cd /home/tony/work/inria/repo/swh/swh-environment/docker + docker-compose -f docker-compose.yml -f docker-compose.override.yml exec swh-loader swh loader run git https://github.com/chromium/chromium INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/chromium/chromium' with type 'git' Enumerating objects: 17060977, done. Counting objects: 100% (39812/39812), done. Compressing objects: 100% (17931/17931), done. Total 17060977 (delta 22956), reused 36079 (delta 20312), pack-reused 17021165 INFO:swh.loader.git.loader:fetched_pack_size=25638636164 ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` status Traceback (most recent call last): File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 338, in load more_data_to_fetch = self.fetch_data() File "/src/swh-loader-git/swh/loader/git/loader.py", line 262, in fetch_data self.origin.url, base_repo, do_progress File "/src/swh-loader-git/swh/loader/git/loader.py", line 206, in fetch_pack_from_origin f"Pack file too big for repository {origin_url}, " OSError: Pack file too big for repository https://github.com/chromium/chromium, limit is 4294967296 bytes, current size is 25638636164 {'status': 'failed'} Command being timed: "swh-doco exec swh-loader swh loader run git https://github.com/chromium/chromium" User time (seconds): 0.64 System time (seconds): 0.72 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 14:22.21 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 66196 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 24103 Voluntary context switches: 2446 Involuntary context switches: 121 Swaps: 0 File system inputs: 152 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 #+end_src ** first run (standard version, stops as soon as packfile size exceeds threshold) *** no patch Version: #+begin_src sh swh-loader_1 | swh.loader.git 1.1.4.dev2+ga40f1e0 /src/swh-loader-git #+end_src Load fails: #+begin_src sh /usr/bin/time -v swh-doco exec swh-loader swh loader run git https://github.com/chromium/chromium + cd /home/tony/work/inria/repo/swh/swh-environment/docker + docker-compose -f docker-compose.yml -f docker-compose.override.yml exec swh-loader swh loader run git https://github.com/chromium/chromium INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/chromium/chromium' with type 'git' Enumerating objects: 17061309, done. Counting objects: 100% (35164/35164), done. Compressing objects: 100% (14586/14586), done. ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` status Traceback (most recent call last): File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 338, in load more_data_to_fetch = self.fetch_data() File "/src/swh-loader-git/swh/loader/git/loader.py", line 266, in fetch_data self.origin.url, base_repo, do_progress File "/src/swh-loader-git/swh/loader/git/loader.py", line 202, in fetch_pack_from_origin progress=do_activity, File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 1060, in fetch_pack progress, File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 845, in _handle_upload_pack_tail SIDE_BAND_CHANNEL_PROGRESS: progress, File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 604, in _read_side_band64k_data cb(pkt) File "/src/swh-loader-git/swh/loader/git/loader.py", line 190, in do_pack f"Pack file too big for repository {origin_url}, " OSError: Pack file too big for repository https://github.com/chromium/chromium, limit is 4294967296 bytes, current size is 4294959115, would write 8192 {'status': 'failed'} Command being timed: "swh-doco exec swh-loader swh loader run git https://github.com/chromium/chromium" User time (seconds): 0.73 System time (seconds): 0.67 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 22:31.06 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 59480 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 2 Minor (reclaiming a frame) page faults: 23850 Voluntary context switches: 3137 Involuntary context switches: 172 Swaps: 0 File system inputs: 312 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 #+end_src *** patch #+begin_src sh swh-loader_1 | Successfully installed swh.loader.git-1.1.4.dev3+g1e26dec #+end_src Load fails: #+begin_src sh $ /usr/bin/time -v swh-doco exec swh-loader swh loader run git https://github.com/chromium/chromium + cd /home/tony/work/inria/repo/swh/swh-environment/docker + docker-compose -f docker-compose.yml -f docker-compose.override.yml exec swh-loader swh loader run git https://github.com/chromium/chromium INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/chromium/chromium' with type 'git' Enumerating objects: 17056494, done. Counting objects: 100% (35329/35329), done. Compressing objects: 100% (14567/14567), done. ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` status Traceback (most recent call last): File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 338, in load more_data_to_fetch = self.fetch_data() File "/src/swh-loader-git/swh/loader/git/loader.py", line 266, in fetch_data self.origin.url, base_repo, do_progress File "/src/swh-loader-git/swh/loader/git/loader.py", line 202, in fetch_pack_from_origin progress=do_activity, File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 1060, in fetch_pack progress, File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 845, in _handle_upload_pack_tail SIDE_BAND_CHANNEL_PROGRESS: progress, File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 604, in _read_side_band64k_data cb(pkt) File "/src/swh-loader-git/swh/loader/git/loader.py", line 190, in do_pack f"Pack file too big for repository {origin_url}, " OSError: Pack file too big for repository https://github.com/chromium/chromium, limit is 4294967296 bytes, current size is 4294959115, would write 8192 {'status': 'failed'} Command being timed: "swh-doco exec swh-loader swh loader run git https://github.com/chromium/chromium" User time (seconds): 0.69 System time (seconds): 0.71 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 19:02.45 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 58588 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 23775 Voluntary context switches: 2775 Involuntary context switches: 245 Swaps: 0 File system inputs: 120 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 #+end_src * https://github.com/WebKit/WebKit ** second run *** no patch Version: #+begin_src sh swh-loader_1 | Successfully installed swh.loader.git-1.1.4.dev3+g31a0c77 #+end_src load fails: #+begin_src sh /usr/bin/time -v swh-doco exec swh-loader swh loader run git https://github.com/WebKit/WebKit + cd /home/tony/work/inria/repo/swh/swh-environment/docker + docker-compose -f docker-compose.yml -f docker-compose.override.yml exec swh-loader swh loader run git https://github.com/WebKit/WebKit INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/WebKit/WebKit' with type 'git' Enumerating objects: 4749671, done. Counting objects: 100% (2256/2256), done. Compressing objects: 100% (1640/1640), done. Total 4749671 (delta 973), reused 1206 (delta 598), pack-reused 4747415 INFO:swh.loader.git.loader:fetched_pack_size=8117017915 ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` status Traceback (most recent call last): File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 338, in load more_data_to_fetch = self.fetch_data() File "/src/swh-loader-git/swh/loader/git/loader.py", line 262, in fetch_data self.origin.url, base_repo, do_progress File "/src/swh-loader-git/swh/loader/git/loader.py", line 206, in fetch_pack_from_origin f"Pack file too big for repository {origin_url}, " OSError: Pack file too big for repository https://github.com/WebKit/WebKit, limit is 4294967296 bytes, current size is 8117017915 {'status': 'failed'} Command being timed: "swh-doco exec swh-loader swh loader run git https://github.com/WebKit/WebKit" User time (seconds): 0.66 System time (seconds): 0.59 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 6:48.67 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 55764 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 1 Minor (reclaiming a frame) page faults: 25242 Voluntary context switches: 1433 Involuntary context switches: 95142 Swaps: 0 File system inputs: 216 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 #+end_src *** patch Version: #+begin_src sh swh-loader_1 | Successfully installed swh.loader.git-1.1.4.dev4+g3254ad7 #+end_src Load fails: #+begin_src sh /usr/bin/time -v swh-doco exec swh-loader swh loader run git https://github.com/WebKit/WebKit + cd /home/tony/work/inria/repo/swh/swh-environment/docker + docker-compose -f docker-compose.yml -f docker-compose.override.yml exec swh-loader swh loader run git https://github.com/WebKit/WebKit INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/WebKit/WebKit' with type 'git' Enumerating objects: 4746000, done. Counting objects: 100% (2256/2256), done. Compressing objects: 100% (1644/1644), done. Total 4746000 (delta 971), reused 1202 (delta 594), pack-reused 4743744 INFO:swh.loader.git.loader:fetched_pack_size=8096817703 ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` status Traceback (most recent call last): File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 338, in load more_data_to_fetch = self.fetch_data() File "/src/swh-loader-git/swh/loader/git/loader.py", line 262, in fetch_data self.origin.url, base_repo, do_progress File "/src/swh-loader-git/swh/loader/git/loader.py", line 206, in fetch_pack_from_origin f"Pack file too big for repository {origin_url}, " OSError: Pack file too big for repository https://github.com/WebKit/WebKit, limit is 4294967296 bytes, current size is 8096817703 {'status': 'failed'} Command being timed: "swh-doco exec swh-loader swh loader run git https://github.com/WebKit/WebKit" User time (seconds): 0.67 System time (seconds): 0.67 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 6:12.08 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 56408 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 24254 Voluntary context switches: 1591 Involuntary context switches: 133 Swaps: 0 File system inputs: 120 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 #+end_src ** first run *** no patch Version: #+begin_src sh swh-loader_1 | swh.loader.git 1.1.4.dev2+ga40f1e0 /src/swh-loader-git #+end_src Load fails: #+begin_src sh $ /usr/bin/time -v swh-doco exec swh-loader swh loader run git https://github.com/WebKit/WebKit + cd /home/tony/work/inria/repo/swh/swh-environment/docker + docker-compose -f docker-compose.yml -f docker-compose.override.yml exec swh-loader swh loader run git https://github.com/WebKit/WebKit INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/WebKit/WebKit' with type 'git' Enumerating objects: 4748109, done. Counting objects: 100% (557/557), done. Compressing objects: 100% (372/372), done. ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` status Traceback (most recent call last): File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 338, in load more_data_to_fetch = self.fetch_data() File "/src/swh-loader-git/swh/loader/git/loader.py", line 266, in fetch_data self.origin.url, base_repo, do_progress File "/src/swh-loader-git/swh/loader/git/loader.py", line 202, in fetch_pack_from_origin progress=do_activity, File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 1060, in fetch_pack progress, File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 845, in _handle_upload_pack_tail SIDE_BAND_CHANNEL_PROGRESS: progress, File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 604, in _read_side_band64k_data cb(pkt) File "/src/swh-loader-git/swh/loader/git/loader.py", line 190, in do_pack f"Pack file too big for repository {origin_url}, " OSError: Pack file too big for repository https://github.com/WebKit/WebKit, limit is 4294967296 bytes, current size is 4294959115, would write 8192 {'status': 'failed'} Command being timed: "swh-doco exec swh-loader swh loader run git https://github.com/WebKit/WebKit" User time (seconds): 0.67 System time (seconds): 0.63 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 22:06.72 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 58984 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 23838 Voluntary context switches: 2797 Involuntary context switches: 155896 Swaps: 0 File system inputs: 120 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 #+end_src *** patch Version: #+begin_src sh swh-loader_1 | Successfully installed swh.loader.git-1.1.4.dev3+g1e26dec #+end_src Load fails: #+begin_src sh $ /usr/bin/time -v swh-doco exec swh-loader swh loader run git https://github.com/WebKit/WebKit + cd /home/tony/work/inria/repo/swh/swh-environment/docker + docker-compose -f docker-compose.yml -f docker-compose.override.yml exec swh-loader swh loader run git https://github.com/WebKit/WebKit INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/WebKit/WebKit' with type 'git' Enumerating objects: 4744437, done. Counting objects: 100% (563/563), done. Compressing objects: 100% (376/376), done. ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` status Traceback (most recent call last): File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 338, in load more_data_to_fetch = self.fetch_data() File "/src/swh-loader-git/swh/loader/git/loader.py", line 266, in fetch_data self.origin.url, base_repo, do_progress File "/src/swh-loader-git/swh/loader/git/loader.py", line 202, in fetch_pack_from_origin progress=do_activity, File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 1060, in fetch_pack progress, File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 845, in _handle_upload_pack_tail SIDE_BAND_CHANNEL_PROGRESS: progress, File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 604, in _read_side_band64k_data cb(pkt) File "/src/swh-loader-git/swh/loader/git/loader.py", line 190, in do_pack f"Pack file too big for repository {origin_url}, " OSError: Pack file too big for repository https://github.com/WebKit/WebKit, limit is 4294967296 bytes, current size is 4294959115, would write 8192 {'status': 'failed'} Command being timed: "swh-doco exec swh-loader swh loader run git https://github.com/WebKit/WebKit" User time (seconds): 0.74 System time (seconds): 0.66 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 18:43.45 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 61056 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 23765 Voluntary context switches: 2838 Involuntary context switches: 137 Swaps: 0 File system inputs: 120 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 #+end_src * garbage repo origin: https://github.com/naifdos/PPAppsIPARepo #+begin_src sh $ /usr/bin/time -v swh-doco exec swh-loader swh loader run git https://github.com/naifdos/PPAppsIPARepo + cd /home/tony/work/inria/repo/swh/swh-environment/docker + docker-compose -f docker-compose.yml -f docker-compose.override.yml exec swh-loader swh loader run git https://github.com/naifdos/PPAppsIPARepo INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/naifdos/PPAppsIPARepo' with type 'git' Enumerating objects: 2748, done. ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` status Traceback (most recent call last): File "/srv/softwareheritage/venv/lib/python3.7/site-packages/swh/loader/core/loader.py", line 338, in load more_data_to_fetch = self.fetch_data() File "/src/swh-loader-git/swh/loader/git/loader.py", line 266, in fetch_data self.origin.url, base_repo, do_progress File "/src/swh-loader-git/swh/loader/git/loader.py", line 202, in fetch_pack_from_origin progress=do_activity, File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 1060, in fetch_pack progress, File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 845, in _handle_upload_pack_tail SIDE_BAND_CHANNEL_PROGRESS: progress, File "/srv/softwareheritage/venv/lib/python3.7/site-packages/dulwich/client.py", line 604, in _read_side_band64k_data cb(pkt) File "/src/swh-loader-git/swh/loader/git/loader.py", line 190, in do_pack f"Pack file too big for repository {origin_url}, " OSError: Pack file too big for repository https://github.com/naifdos/PPAppsIPARepo, limit is 4294967296 bytes, current size is 4294959115, would write 8192 {'status': 'failed'} Command being timed: "swh-doco exec swh-loader swh loader run git https://github.com/naifdos/PPAppsIPARepo" User time (seconds): 0.68 System time (seconds): 0.70 Percent of CPU this job got: 0% Elapsed (wall clock) time (h:mm:ss or m:ss): 16:49.11 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 58616 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 24191 Voluntary context switches: 2440 Involuntary context switches: 131 Swaps: 0 File system inputs: 120 File system outputs: 0 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 #+end_src