Page MenuHomeSoftware Heritage

Fill in the gap with scanoss tool
Closed, MigratedEdits Locked

Description

After discussing with the upstream of scanoss tool, Roberto compulsed a list of (github)
repositories (large [1] and normal [2]) we are currently missing. Let's try and ingest
those using what we did for the chromium repository [3].

fwiw, we have a huge number of those reported by sentry [6].

Plan:

  • Clean up large worker17 and 18 setup and keep them out of the standard consumption loop [4]
  • Schedule large repositories on dedicated queue oneshot:swh.loader.git.tasks.UpdateGitRepository
  • Schedule normal repositories on dedicated queue oneshot2:swh.loader.git.tasks.UpdateGitRepository
  • Configure parallelism to not be too much as well (large repo queue: 1, normal repo queue: 5)
  • Babysit processes (grafana dashboard [4])

[1] big:

[2] normal:

[3] T4283

[4] Recent tryouts on chromium and liferay-portal repositories currently failed possibly
due to the standard consumption happening in parallel. If large repositories is consumed
at the same time, the machine might become unable to finish both repositories...

[5] https://grafana.softwareheritage.org/goto/6HwEWEgVk?orgId=1

[6] https://sentry.softwareheritage.org/share/issue/bbcb3aef5b974dac9a3194f7bf8ede87/

Event Timeline

ardumont created this task.
ardumont updated the task description. (Show Details)
ardumont changed the task status from Open to Work in Progress.Jul 19 2022, 1:50 PM
ardumont moved this task from Backlog to in-progress on the System administration board.
ardumont updated the task description. (Show Details)

It's currently ingesting [1].

Note that I may have scheduled some origins multiple times during my initial tryouts.

[1]

root@pergamon:~# clush -b -w 'worker[17-18]' "journalctl -xe -u 'swh-worker@loader_*' | grep 'status\|Listed'"
---------------
worker17
---------------
Jul 19 10:51:37 worker17 python3[3195396]: [2022-07-19 10:51:37,610: INFO/ForkPoolWorker-59960] Task swh.loader.git.tasks.UpdateGitRepository[dbc997f2-36f6-4d99-847e-b5d85ac9031c] succeeded in 1754.2839108598419s: {'status': 'eventful'}
Jul 19 11:23:37 worker17 python3[3193346]: [2022-07-19 11:23:37,755: INFO/ForkPoolWorker-59735] Task swh.loader.git.tasks.UpdateGitRepository[dafe47c1-70f2-43fa-905a-c10b0ce57eca] succeeded in 5420.659149193205s: {'status': 'eventful'}
Jul 19 13:20:29 worker17 python3[3200962]: [2022-07-19 13:20:29,305: INFO/ForkPoolWorker-4] Listed 2496 refs for repo https://github.com/CitizenLabDotCo/citizenlab
Jul 19 13:20:29 worker17 python3[3200961]: [2022-07-19 13:20:29,605: INFO/ForkPoolWorker-3] Listed 9851 refs for repo https://github.com/ppy/osu
Jul 19 13:21:13 worker17 python3[3200958]: [2022-07-19 13:21:13,410: INFO/ForkPoolWorker-1] Listed 40 refs for repo https://github.com/Project-Wildflower/Wildflower
Jul 19 13:21:16 worker17 python3[3200963]: [2022-07-19 13:21:16,607: INFO/ForkPoolWorker-5] Listed 1650 refs for repo https://github.com/BoHBranch/BoH-Bay
Jul 19 13:21:39 worker17 python3[3200960]: [2022-07-19 13:21:39,471: INFO/ForkPoolWorker-2] Listed 26 refs for repo https://github.com/Koboboldic/Baystation12
Jul 19 13:43:27 worker17 python3[3200961]: [2022-07-19 13:43:27,313: INFO/ForkPoolWorker-3] Task swh.loader.git.tasks.UpdateGitRepository[4a9e07de-98b9-43dc-868e-982c2c408b26] succeeded in 1400.2739224974066s: {'status': 'eventful'}
Jul 19 13:43:54 worker17 python3[3202094]: [2022-07-19 13:43:54,155: INFO/ForkPoolWorker-6] Listed 2497 refs for repo https://github.com/CitizenLabDotCo/citizenlab
Jul 19 14:10:33 worker17 python3[3200959]: [2022-07-19 14:10:33,409: INFO/ForkPoolWorker-1] Listed 1844 refs for repo https://github.com/blueboxd/chromium-legacy
Jul 19 14:11:14 worker17 python3[3200958]: [2022-07-19 14:11:14,578: INFO/ForkPoolWorker-1] Task swh.loader.git.tasks.UpdateGitRepository[2521ecc3-1916-4243-a120-bcc453db1d6d] succeeded in 3067.57541832095s: {'status': 'eventful'}
Jul 19 14:11:16 worker17 python3[3202886]: [2022-07-19 14:11:16,340: INFO/ForkPoolWorker-7] Listed 5 refs for repo https://github.com/PSTMRTM/nebula-Infernum
Jul 19 14:11:16 worker17 python3[3202886]: [2022-07-19 14:11:16,463: INFO/ForkPoolWorker-7] Task swh.loader.git.tasks.UpdateGitRepository[687157c5-bd7c-4c0e-9b06-475fc97b63da] succeeded in 1.5067564840428531s: {'status': 'uneventful'}
Jul 19 14:11:19 worker17 python3[3202887]: [2022-07-19 14:11:19,010: INFO/ForkPoolWorker-8] Listed 2896 refs for repo https://github.com/InsightSoftwareConsortium/ITK
Jul 19 14:11:20 worker17 python3[3202887]: [2022-07-19 14:11:20,347: INFO/ForkPoolWorker-8] Task swh.loader.git.tasks.UpdateGitRepository[c2ba06e8-41e0-4217-9e6a-a1dfe094797c] succeeded in 3.549511077813804s: {'status': 'uneventful'}
Jul 19 14:11:22 worker17 python3[3202888]: [2022-07-19 14:11:22,261: INFO/ForkPoolWorker-9] Listed 99 refs for repo https://github.com/Foxterosa/Manaos
Jul 19 14:11:22 worker17 python3[3202888]: [2022-07-19 14:11:22,433: INFO/ForkPoolWorker-9] Task swh.loader.git.tasks.UpdateGitRepository[d6ea9f44-4f61-4ba0-9f04-6fd156efc494] succeeded in 1.7707305937074125s: {'status': 'uneventful'}                                                                                                                                                    Jul 19 14:11:26 worker17 python3[3202889]: [2022-07-19 14:11:26,272: INFO/ForkPoolWorker-10] Listed 428 refs for repo https://github.com/PercyDan54/osu
Jul 19 14:11:30 worker17 python3[3202889]: [2022-07-19 14:11:30,471: INFO/ForkPoolWorker-10] Task swh.loader.git.tasks.UpdateGitRepository[2a5e55d3-6277-45bf-9838-d4c01ec2f5fd] succeeded in 7.698113488033414s: {'status': 'uneventful'}                                                                                                                                                    Jul 19 14:13:12 worker17 python3[3200960]: [2022-07-19 14:13:12,276: INFO/ForkPoolWorker-2] Task swh.loader.git.tasks.UpdateGitRepository[cc1c5145-0431-416f-a288-84870d3f80c0] succeeded in 3185.2454502852634s: {'status': 'eventful'}
Jul 19 14:13:14 worker17 python3[3202892]: [2022-07-19 14:13:14,621: INFO/ForkPoolWorker-11] Listed 6422 refs for repo https://github.com/ome/openmicroscopy                                                                                   Jul 19 14:13:59 worker17 python3[3202941]: [2022-07-19 14:13:59,472: INFO/ForkPoolWorker-12] Listed 182 refs for repo https://github.com/Haven-13/Haven-Urist
Jul 19 14:26:31 worker17 python3[3200963]: [2022-07-19 14:26:31,617: INFO/ForkPoolWorker-5] Task swh.loader.git.tasks.UpdateGitRepository[8ff91019-b0c4-4cfa-b852-a65867afd27a] succeeded in 3984.6104614078067s: {'status': 'eventful'}                                                                                                                                                      Jul 19 14:26:48 worker17 python3[3203281]: [2022-07-19 14:26:48,081: INFO/ForkPoolWorker-13] Listed 3532 refs for repo https://github.com/xwiki/xwiki-platform
Jul 19 15:12:31 worker17 python3[3202892]: [2022-07-19 15:12:31,895: INFO/ForkPoolWorker-11] Task swh.loader.git.tasks.UpdateGitRepository[a5b77c1d-1bb1-48b4-920f-6fe3001cef89] succeeded in 3658.965365310665s: {'status': 'eventful'}                                                                                                                                                      Jul 19 15:12:45 worker17 python3[3204461]: [2022-07-19 15:12:45,132: INFO/ForkPoolWorker-14] Listed 385 refs for repo https://github.com/tine20/tine20
Jul 19 15:22:47 worker17 python3[3203281]: [2022-07-19 15:22:47,838: INFO/ForkPoolWorker-13] Task swh.loader.git.tasks.UpdateGitRepository[b6532eb2-467d-4d2f-bb4a-c729faf222de] succeeded in 3375.793684532866s: {'status': 'eventful'}                                                                                                                                                      Jul 19 15:23:22 worker17 python3[3204731]: [2022-07-19 15:23:22,499: INFO/ForkPoolWorker-15] Listed 32823 refs for repo https://github.com/getsentry/sentry
Jul 19 15:28:39 worker17 python3[3202941]: [2022-07-19 15:28:39,522: INFO/ForkPoolWorker-12] Task swh.loader.git.tasks.UpdateGitRepository[b229745e-bf6e-4094-b638-bace550e78c2] succeeded in 4526.834132967982s: {'status': 'eventful'}
Jul 19 15:29:57 worker17 python3[3204874]: [2022-07-19 15:29:57,630: INFO/ForkPoolWorker-16] Listed 3367 refs for repo https://github.com/ChaoticOnyx/OnyxBay
Jul 19 15:43:03 worker17 python3[3204461]: [2022-07-19 15:43:03,636: INFO/ForkPoolWorker-14] Task swh.loader.git.tasks.UpdateGitRepository[8cce91cc-e865-4e13-bf34-4b2c0d299abf] succeeded in 1831.345026913099s: {'status': 'eventful'}
Jul 19 15:43:12 worker17 python3[3205247]: [2022-07-19 15:43:12,501: INFO/ForkPoolWorker-17] Listed 118 refs for repo https://github.com/AdaCore/gnatstudio
---------------
worker18
---------------
Jul 19 10:27:12 worker18 python3[3470358]: [2022-07-19 10:27:12,361: INFO/ForkPoolWorker-27699] Listed 2 refs for repo https://github.com/gentilfp/ZedGraphApp
Jul 19 10:27:12 worker18 python3[3470358]: [2022-07-19 10:27:12,738: INFO/ForkPoolWorker-27699] Task swh.loader.git.tasks.UpdateGitRepository[52274dd0-f434-4477-8d7d-d0496423c6c8] succeeded in 1.7503184489905834s: {'status': 'eventful'}
Jul 19 10:27:13 worker18 python3[3470360]: [2022-07-19 10:27:13,306: ERROR/ForkPoolWorker-27700] Loading failure, updating to `not_found` status
Jul 19 10:27:13 worker18 python3[3470360]: [2022-07-19 10:27:13,465: INFO/ForkPoolWorker-27700] Task swh.loader.git.tasks.UpdateGitRepository[0404384b-805f-42b9-8ce2-08b11ccd6d5a] succeeded in 0.8653513696044683s: {'status': 'uneventful'}
Jul 19 10:27:14 worker18 python3[3470359]: [2022-07-19 10:27:14,373: INFO/ForkPoolWorker-33980] Listed 3 refs for repo https://github.com/akachukwudee/django_project_boilerplate
Jul 19 10:27:15 worker18 python3[3470359]: [2022-07-19 10:27:15,304: INFO/ForkPoolWorker-33980] Task swh.loader.git.tasks.UpdateGitRepository[7e52cf33-f5ad-40e7-a83a-dbe870237ba7] succeeded in 3.081051674671471s: {'status': 'eventful'}
Jul 19 10:27:15 worker18 python3[3470363]: [2022-07-19 10:27:15,632: INFO/ForkPoolWorker-27702] Listed 4 refs for repo https://github.com/NazmusShakib/php-practical-test
Jul 19 10:27:15 worker18 python3[3470362]: [2022-07-19 10:27:15,953: INFO/ForkPoolWorker-27701] Listed 2 refs for repo https://github.com/caviare/caviare
Jul 19 10:27:17 worker18 python3[3470362]: [2022-07-19 10:27:17,053: INFO/ForkPoolWorker-27701] Task swh.loader.git.tasks.UpdateGitRepository[0491bbed-21ec-4faf-9a3d-9e679d2272ca] succeeded in 3.116828029975295s: {'status': 'eventful'}
Jul 19 10:27:18 worker18 python3[3470389]: [2022-07-19 10:27:18,260: ERROR/ForkPoolWorker-27703] Loading failure, updating to `not_found` status
Jul 19 10:27:18 worker18 python3[3470389]: [2022-07-19 10:27:18,431: INFO/ForkPoolWorker-27703] Task swh.loader.git.tasks.UpdateGitRepository[c75ab57b-3afe-48e6-be9d-342e92dfc94f] succeeded in 1.0723764449357986s: {'status': 'uneventful'}
Jul 19 10:27:20 worker18 python3[3470391]: [2022-07-19 10:27:20,140: INFO/ForkPoolWorker-27704] Listed 2 refs for repo https://github.com/ManjitShortcut/FlutterListView
Jul 19 10:27:24 worker18 python3[3470398]: [2022-07-19 10:27:24,307: INFO/ForkPoolWorker-33981] Listed 24 refs for repo https://github.com/morriedig/rails_personal_pages
Jul 19 10:27:24 worker18 python3[3470363]: [2022-07-19 10:27:24,424: INFO/ForkPoolWorker-27702] Task swh.loader.git.tasks.UpdateGitRepository[2f2ed6cc-387a-4be4-8cf9-f06e320f5d46] succeeded in 10.447246747091413s: {'status': 'eventful'}
Jul 19 10:27:24 worker18 python3[3470391]: [2022-07-19 10:27:24,756: INFO/ForkPoolWorker-27704] Task swh.loader.git.tasks.UpdateGitRepository[ade9cda5-46db-49ca-be40-d901cd3e4f88] succeeded in 6.011202989146113s: {'status': 'eventful'}
Jul 19 10:27:25 worker18 python3[3470410]: [2022-07-19 10:27:25,648: ERROR/ForkPoolWorker-27705] Loading failure, updating to `not_found` status
Jul 19 10:27:25 worker18 python3[3470410]: [2022-07-19 10:27:25,895: INFO/ForkPoolWorker-27705] Task swh.loader.git.tasks.UpdateGitRepository[d8fb0d16-a163-4608-b062-af71392556b6] succeeded in 1.0759181082248688s: {'status': 'uneventful'}
Jul 19 10:27:27 worker18 python3[3470398]: [2022-07-19 10:27:27,474: INFO/ForkPoolWorker-33981] Task swh.loader.git.tasks.UpdateGitRepository[4c3b09c6-400b-4966-9a74-308ab4cf688c] succeeded in 6.077941656112671s: {'status': 'eventful'}
Jul 19 10:27:27 worker18 python3[3470413]: [2022-07-19 10:27:27,696: INFO/ForkPoolWorker-27706] Listed 3 refs for repo https://github.com/QuintenDeBruyne/TestRepo
Jul 19 10:27:29 worker18 python3[3470413]: [2022-07-19 10:27:29,938: INFO/ForkPoolWorker-27706] Task swh.loader.git.tasks.UpdateGitRepository[742d2424-7e5e-4ab7-a405-be682052e7b4] succeeded in 3.7812292836606503s: {'status': 'eventful'}
Jul 19 10:27:30 worker18 python3[3470420]: [2022-07-19 10:27:30,605: INFO/ForkPoolWorker-27707] Listed 2 refs for repo https://github.com/shivanshu3241/c-21-bullets-and-walls
Jul 19 10:27:31 worker18 python3[3470421]: [2022-07-19 10:27:31,900: INFO/ForkPoolWorker-27708] Listed 3 refs for repo https://github.com/mdziya47/may22
Jul 19 10:27:31 worker18 python3[3470420]: [2022-07-19 10:27:31,922: INFO/ForkPoolWorker-27707] Task swh.loader.git.tasks.UpdateGitRepository[79c9840c-5f9d-45be-aea8-2c16264f83b6] succeeded in 2.7514815209433436s: {'status': 'eventful'}
Jul 19 10:27:33 worker18 python3[3470423]: [2022-07-19 10:27:33,210: ERROR/ForkPoolWorker-27709] Loading failure, updating to `not_found` status
Jul 19 10:27:33 worker18 python3[3470423]: [2022-07-19 10:27:33,477: INFO/ForkPoolWorker-27709] Task swh.loader.git.tasks.UpdateGitRepository[b6e418dc-4718-4d8d-818e-493223daad98] succeeded in 1.1940287835896015s: {'status': 'uneventful'}
Jul 19 10:27:34 worker18 python3[3470422]: [2022-07-19 10:27:34,322: INFO/ForkPoolWorker-33982] Listed 2 refs for repo https://github.com/thomasgeissl/ofxAsap
Jul 19 10:27:34 worker18 python3[3470425]: [2022-07-19 10:27:34,599: ERROR/ForkPoolWorker-27710] Loading failure, updating to `not_found` status
Jul 19 10:27:34 worker18 python3[3470425]: [2022-07-19 10:27:34,785: INFO/ForkPoolWorker-27710] Task swh.loader.git.tasks.UpdateGitRepository[9f634cfe-17e6-4b60-bf5e-c72a1480d48c] succeeded in 1.062545725144446s: {'status': 'uneventful'}
Jul 19 10:27:35 worker18 python3[3470421]: [2022-07-19 10:27:35,244: INFO/ForkPoolWorker-27708] Task swh.loader.git.tasks.UpdateGitRepository[02a2a1bc-18dd-4522-9dcb-dce631dd7dcc] succeeded in 4.864033135585487s: {'status': 'eventful'}
Jul 19 10:27:37 worker18 python3[3470430]: [2022-07-19 10:27:37,337: INFO/ForkPoolWorker-27712] Listed 2 refs for repo https://github.com/rasoolgit257/Transfer-Learning-Resnet50
Jul 19 10:27:37 worker18 python3[3470422]: [2022-07-19 10:27:37,693: INFO/ForkPoolWorker-33982] Task swh.loader.git.tasks.UpdateGitRepository[94f6df58-9f6a-44ba-bab3-c62728461a49] succeeded in 6.297500366345048s: {'status': 'eventful'}
Jul 19 10:27:38 worker18 python3[3470429]: [2022-07-19 10:27:38,591: INFO/ForkPoolWorker-27711] Listed 220 refs for repo https://github.com/ngs117zqw/game-of-life
Jul 19 10:27:38 worker18 python3[3470430]: [2022-07-19 10:27:38,879: INFO/ForkPoolWorker-27712] Task swh.loader.git.tasks.UpdateGitRepository[122fac8c-f77e-4d8e-8e8b-39517725d8b1] succeeded in 3.191435880959034s: {'status': 'eventful'}
Jul 19 10:27:42 worker18 python3[3470433]: [2022-07-19 10:27:42,287: INFO/ForkPoolWorker-27713] Listed 250 refs for repo https://github.com/Metatavu/oioi-management
Jul 19 10:27:44 worker18 python3[3470434]: [2022-07-19 10:27:44,728: INFO/ForkPoolWorker-33983] Listed 58 refs for repo https://github.com/legacyUT/android_system_security
Jul 19 10:27:56 worker18 python3[3470434]: [2022-07-19 10:27:56,478: INFO/ForkPoolWorker-33983] Task swh.loader.git.tasks.UpdateGitRepository[0b7c025e-564a-4e2e-aa92-d9712b0121d1] succeeded in 14.07545832823962s: {'status': 'eventful'}
Jul 19 10:28:02 worker18 python3[3470465]: [2022-07-19 10:28:02,770: INFO/ForkPoolWorker-33984] Listed 2 refs for repo https://github.com/FehStella/ApiGit
Jul 19 10:28:03 worker18 python3[3470465]: [2022-07-19 10:28:03,939: INFO/ForkPoolWorker-33984] Task swh.loader.git.tasks.UpdateGitRepository[96eadb86-c17c-4eb7-a476-77904f2a07fb] succeeded in 2.6069162217900157s: {'status': 'eventful'}
Jul 19 10:28:08 worker18 python3[3470433]: [2022-07-19 10:28:08,770: INFO/ForkPoolWorker-27713] Task swh.loader.git.tasks.UpdateGitRepository[91420972-e6d9-485b-8291-2374d428e5cd] succeeded in 29.63042890187353s: {'status': 'eventful'}
Jul 19 10:28:11 worker18 python3[3470429]: [2022-07-19 10:28:11,055: INFO/ForkPoolWorker-27711] Task swh.loader.git.tasks.UpdateGitRepository[2fd42ae2-5c50-47ed-bfd7-d5eab493b6b1] succeeded in 35.920224393717945s: {'status': 'eventful'}
Jul 19 10:33:09 worker18 python3[3470060]: [2022-07-19 10:33:09,964: INFO/ForkPoolWorker-27630] Task swh.loader.git.tasks.UpdateGitRepository[81b62a5f-bf8a-40d0-abc8-8716f16bdb3d] succeeded in 826.5972496075556s: {'status': 'eventful'}
Jul 19 10:43:02 worker18 systemd[1]: swh-worker@loader_git.service: Main process exited, code=killed, status=9/KILL
░░ The process' exit code is 'killed' and its exit status is 9.
Jul 19 10:43:02 worker18 systemd[1]: swh-worker@loader_oneshot.service: Main process exited, code=killed, status=9/KILL
░░ The process' exit code is 'killed' and its exit status is 9.
Jul 19 10:44:18 worker18 python3[3449259]: [2022-07-19 10:44:18,308: INFO/ForkPoolWorker-26304] Task swh.loader.git.tasks.UpdateGitRepository[e74e0d9b-4484-494a-81f9-b45365493b00] succeeded in 22469.82053990569s: {'status': 'eventful'}
Jul 19 13:20:09 worker18 python3[3475060]: [2022-07-19 13:20:09,045: INFO/ForkPoolWorker-3] Listed 370 refs for repo https://github.com/Alanii/NewHestia
Jul 19 13:20:09 worker18 python3[3475060]: [2022-07-19 13:20:09,503: INFO/ForkPoolWorker-3] Task swh.loader.git.tasks.UpdateGitRepository[889b3f47-173f-4d2e-954b-e715258d02a0] succeeded in 2.482829655520618s: {'status': 'uneventful'}
Jul 19 13:20:17 worker18 python3[3475057]: [2022-07-19 13:20:17,197: INFO/ForkPoolWorker-1] Listed 2896 refs for repo https://github.com/InsightSoftwareConsortium/ITK
Jul 19 13:20:22 worker18 python3[3475059]: [2022-07-19 13:20:22,254: INFO/ForkPoolWorker-2] Listed 428 refs for repo https://github.com/PercyDan54/osu
Jul 19 13:20:51 worker18 python3[3475061]: [2022-07-19 13:20:51,754: INFO/ForkPoolWorker-4] Listed 5 refs for repo https://github.com/PSTMRTM/nebula-Infernum
Jul 19 13:20:57 worker18 python3[3475062]: [2022-07-19 13:20:57,323: INFO/ForkPoolWorker-5] Listed 99 refs for repo https://github.com/Foxterosa/Manaos
Jul 19 13:43:10 worker18 python3[3475516]: [2022-07-19 13:43:10,187: INFO/ForkPoolWorker-6] Listed 2497 refs for repo https://github.com/CitizenLabDotCo/citizenlab
Jul 19 13:43:11 worker18 python3[3475059]: [2022-07-19 13:43:11,765: INFO/ForkPoolWorker-2] Task swh.loader.git.tasks.UpdateGitRepository[e8bfbf91-d4dc-487f-8927-cbf9ff1d16d8] succeeded in 1384.7495882995427s: {'status': 'eventful'}
Jul 19 13:43:52 worker18 python3[3476187]: [2022-07-19 13:43:52,326: INFO/ForkPoolWorker-7] Listed 40 refs for repo https://github.com/Project-Wildflower/Wildflower
Jul 19 13:48:35 worker18 python3[3475058]: [2022-07-19 13:48:35,200: INFO/ForkPoolWorker-1] Listed 8839 refs for repo https://github.com/archlinux/svntogit-community
Jul 19 13:58:01 worker18 python3[3475057]: [2022-07-19 13:58:01,540: INFO/ForkPoolWorker-1] Task swh.loader.git.tasks.UpdateGitRepository[d026d0bc-992b-42ba-b0c9-5c8dee0c0645] succeeded in 2274.5673629501835s: {'status': 'eventful'}
Jul 19 13:58:45 worker18 python3[3476608]: [2022-07-19 13:58:45,300: INFO/ForkPoolWorker-8] Listed 1650 refs for repo https://github.com/BoHBranch/BoH-Bay
Jul 19 14:07:08 worker18 python3[3475061]: [2022-07-19 14:07:08,380: INFO/ForkPoolWorker-4] Task swh.loader.git.tasks.UpdateGitRepository[107710aa-0d21-49d3-91ee-1a9490dcc27d] succeeded in 2821.3996562585235s: {'status': 'eventful'}
Jul 19 14:08:04 worker18 python3[3475062]: [2022-07-19 14:08:04,313: INFO/ForkPoolWorker-5] Task swh.loader.git.tasks.UpdateGitRepository[7c362d94-29a1-4535-81db-23cc0010aec8] succeeded in 2877.311755471863s: {'status': 'eventful'}
Jul 19 14:08:18 worker18 python3[3476843]: [2022-07-19 14:08:18,924: INFO/ForkPoolWorker-9] Listed 26 refs for repo https://github.com/Koboboldic/Baystation12
Jul 19 14:08:19 worker18 python3[3476867]: [2022-07-19 14:08:19,050: INFO/ForkPoolWorker-10] Listed 9853 refs for repo https://github.com/ppy/osu
Jul 19 14:29:45 worker18 python3[3476187]: [2022-07-19 14:29:45,921: INFO/ForkPoolWorker-7] Task swh.loader.git.tasks.UpdateGitRepository[11e03db4-c49f-496c-a907-3dd17aef9ccc] succeeded in 2793.6964187435806s: {'status': 'eventful'}
Jul 19 14:29:50 worker18 python3[3477420]: [2022-07-19 14:29:50,508: INFO/ForkPoolWorker-11] Listed 370 refs for repo https://github.com/Alanii/NewHestia
Jul 19 14:29:50 worker18 python3[3477420]: [2022-07-19 14:29:50,724: INFO/ForkPoolWorker-11] Task swh.loader.git.tasks.UpdateGitRepository[b48064af-313f-407f-a1b3-2da4acd7b373] succeeded in 1.4327953085303307s: {'status': 'uneventful'}
Jul 19 14:29:52 worker18 python3[3476867]: [2022-07-19 14:29:52,497: INFO/ForkPoolWorker-10] Task swh.loader.git.tasks.UpdateGitRepository[e02d47a7-bcad-43db-98ba-87d3e1adedb0] succeeded in 1304.5309353396297s: {'status': 'eventful'}
Jul 19 14:30:28 worker18 python3[3477427]: [2022-07-19 14:30:28,207: INFO/ForkPoolWorker-13] Listed 24056 refs for repo https://github.com/joomla/joomla-cms
Jul 19 14:32:16 worker18 python3[3477426]: [2022-07-19 14:32:16,491: INFO/ForkPoolWorker-12] Listed 6422 refs for repo https://github.com/ome/openmicroscopy
Jul 19 14:44:58 worker18 python3[3476608]: [2022-07-19 14:44:58,568: INFO/ForkPoolWorker-8] Task swh.loader.git.tasks.UpdateGitRepository[e8d17e29-466d-448d-a8e3-4846347cf171] succeeded in 2816.370258978568s: {'status': 'eventful'}
Jul 19 14:45:44 worker18 python3[3477802]: [2022-07-19 14:45:44,246: INFO/ForkPoolWorker-14] Listed 805 refs for repo https://github.com/UristMcStation/UristMcStation
Jul 19 14:57:55 worker18 python3[3476843]: [2022-07-19 14:57:55,197: INFO/ForkPoolWorker-9] Task swh.loader.git.tasks.UpdateGitRepository[096baf3e-57c8-4b95-9e77-a12eb0858cc3] succeeded in 3046.3838498126715s: {'status': 'eventful'}
Jul 19 14:58:34 worker18 python3[3478125]: [2022-07-19 14:58:34,031: INFO/ForkPoolWorker-15] Listed 342 refs for repo https://github.com/yfisyak/TFG
Jul 19 15:16:02 worker18 python3[3477427]: [2022-07-19 15:16:02,380: INFO/ForkPoolWorker-13] Task swh.loader.git.tasks.UpdateGitRepository[4bc56c25-e47f-4616-8064-43883820ed10] succeeded in 2769.582609018311s: {'status': 'eventful'}
Jul 19 15:16:56 worker18 python3[3478573]: [2022-07-19 15:16:56,336: INFO/ForkPoolWorker-16] Listed 460 refs for repo https://github.com/FortunaSS13/Fortuna
Jul 19 15:22:53 worker18 python3[3477426]: [2022-07-19 15:22:53,976: INFO/ForkPoolWorker-12] Task swh.loader.git.tasks.UpdateGitRepository[6551315a-651e-4e7d-b6ea-71e4285d6f5f] succeeded in 3182.959041844122s: {'status': 'eventful'}
Jul 19 15:23:59 worker18 python3[3478751]: [2022-07-19 15:23:59,210: INFO/ForkPoolWorker-17] Listed 705 refs for repo https://github.com/fortune13-ss13/Fortune13
Jul 19 15:26:05 worker18 python3[3477802]: [2022-07-19 15:26:05,350: INFO/ForkPoolWorker-14] Task swh.loader.git.tasks.UpdateGitRepository[cceb444c-b106-4d07-8d47-4b488cc6da89] succeeded in 2466.373187897727s: {'status': 'eventful'}
Jul 19 15:27:37 worker18 python3[3478826]: [2022-07-19 15:27:37,843: INFO/ForkPoolWorker-18] Listed 471 refs for repo https://github.com/axis-project/axis

fwiw, large repositories are taking their sweet time but it's on its way:

Jul 19 14:10:33 worker17 python3[3200959]: [2022-07-19 14:10:33,409: INFO/ForkPoolWorker-1] Listed 1844 refs for repo https://github.com/blueboxd/chromium-legacy
Jul 21 07:08:27 worker17 python3[3200959]: [2022-07-21 07:08:27,864: INFO/ForkPoolWorker-1] Task swh.loader.git.tasks.UpdateGitRepository[69499115-15c6-49fd-a7b2-fdbd7b85c58e] succeeded in 148867.974679037s: {'status': 'eventful'}
Jul 21 07:30:26 worker17 python3[3267083]: [2022-07-21 07:30:26,080: INFO/ForkPoolWorker-2] Listed 17053 refs for repo https://github.com/otcshare/chromium-src
Jul 22 16:38:14 worker17 python3[3267083]: [2022-07-22 16:38:14,904: INFO/ForkPoolWorker-2] Task swh.loader.git.tasks.UpdateGitRepository[e336ec48-949b-4082-9f7c-bf3f0276b49c] succeeded in 120586.60165330442s: {'status': 'eventful'}
Jul 22 16:59:29 worker17 python3[3320170]: [2022-07-22 16:59:29,094: INFO/ForkPoolWorker-3] Listed 16990 refs for repo https://github.com/huningxin/chromium-src
Jul 24 02:33:06 worker17 python3[3320170]: [2022-07-24 02:33:06,151: INFO/ForkPoolWorker-3] Task swh.loader.git.tasks.UpdateGitRepository[74c7b78b-8ddc-400d-94df-ae765f05a0bc] succeeded in 122090.91188609181s: {'status': 'eventful'}
Jul 19 13:48:35 worker18 python3[3475058]: [2022-07-19 13:48:35,200: INFO/ForkPoolWorker-1] Listed 8839 refs for repo https://github.com/archlinux/svntogit-community
Jul 24 17:59:14 worker18 python3[3475058]: [2022-07-24 17:59:14,900: INFO/ForkPoolWorker-1] Task swh.loader.git.tasks.UpdateGitRepository[1028a8ba-f895-4635-9484-61289d303586] succeeded in 447115.0368490154s: {'status': 'eventful'}

At this point in time:

  • 1 "normal" origin
  • 22 "large" origins

remains to be ingested to fill-in the actual gap with scanoss tools.

@rdicosmo jsyk ^


I took the opportunity to retrieve large origins [7] out of the sentry issue listing [6] (cf. description) [8].
And schedule those in the large queues after the one scheduled out of the scanoss exchange.

If it's considered useless at some point, feel free to dismiss them (by purging the queue).

[7] 28310 unique origins ->

[8] command used to create the listing out of sentry, in a venv (snippets repository) in worker1.staging:

(sentry-U52ipwI-) ardumont@worker1:~/snippets/ardumont/sentry% python -m list-urls-from-issue --project-name swh-loader-git --event-id 5823 | tee loader-git.pack-file-too-big-issue-5823.urls.txt
...

Since the normal ingestion is mostly done (1 last normal ingestion ongoing), i've now make worker17-18 consumes 1 more task for the large repositories queue as well (vs. letting them twiddle their thumbs ;).

I took the opportunity to retrieve large origins [7] out of the sentry issue listing [6] (cf. description) [8].
And schedule those in the large queues after the one scheduled out of the scanoss exchange.

If it's considered useless at some point, feel free to dismiss them (by purging the queue).

[7] 28310 unique origins ->

[8] command used to create the listing out of sentry, in a venv (snippets repository) in worker1.staging:

(sentry-U52ipwI-) ardumont@worker1:~/snippets/ardumont/sentry% python -m list-urls-from-issue --project-name swh-loader-git --event-id 5823 | tee loader-git.pack-file-too-big-issue-5823.urls.txt
...

Thanks for bringing this up, I'm filtering the list to get the relevant repositories to prioritize

@ardumont here is the subset of the repositories on GitHub, ordered by number of stars, that are:

  • still on GitHub
  • not a fork

May your purge the current queue and reinsert this list instead ?

@ardumont here is the subset of the repositories on GitHub, ordered by number of stars, that are:

  • still on GitHub
  • not a fork

May your purge the current queue and reinsert this list instead ?

done

Polished up and shared the tool built to produce the refined list priority.list.github.
It is now available at https://github.com/rdicosmo/swh-check-repositories