Page MenuHomeSoftware Heritage
Feed Advanced Search

Sep 22 2021

ardumont changed the status of T3588: Deploy swh.loader.git v1.0 from Open to Work in Progress.
Sep 22 2021, 2:00 PM · System administration, Git loader

Sep 21 2021

ardumont added a revision to T3588: Deploy swh.loader.git v1.0: D6314: Fix tests for Dulwich < 0.20.22.
Sep 21 2021, 11:38 AM · System administration, Git loader
ardumont added a comment to T3588: Deploy swh.loader.git v1.0.

Fix [1] first

[1] https://jenkins.softwareheritage.org/job/debian/job/packages/job/DLDG/job/gbp-buildpackage/77/console

Sep 21 2021, 11:38 AM · System administration, Git loader

Sep 20 2021

ardumont moved T3588: Deploy swh.loader.git v1.0 from Backlog to Weekly backlog on the System administration board.
Sep 20 2021, 11:59 AM · System administration, Git loader
ardumont added a project to T3588: Deploy swh.loader.git v1.0: System administration.
Sep 20 2021, 11:59 AM · System administration, Git loader
ardumont added a comment to T3588: Deploy swh.loader.git v1.0.

Fix [1] first

Sep 20 2021, 11:58 AM · System administration, Git loader

Sep 17 2021

ardumont triaged T3588: Deploy swh.loader.git v1.0 as Normal priority.
Sep 17 2021, 3:50 PM · System administration, Git loader
anlambert updated the task description for T2489: Git origin without smart transfer protocol support cannot be loaded.
Sep 17 2021, 10:44 AM · Git loader

Sep 1 2021

zack added a comment to T3544: Deal with GitHub removing support for git:// URLs.
In T3544#69746, @olasd wrote:

I can see a few alternatives to using git:// over tcp:

  • Give our swh bot accounts SSH keys, and use that to clone from GitHub over ssh.
Sep 1 2021, 10:06 PM · Origin-GitHub, Git loader
olasd added a comment to T3544: Deal with GitHub removing support for git:// URLs.

The dulwich HTTP(s) support is implemented on top of urllib(3?).

Sep 1 2021, 9:18 PM · Origin-GitHub, Git loader
vlorentz triaged T3544: Deal with GitHub removing support for git:// URLs as High priority.
Sep 1 2021, 9:11 PM · Origin-GitHub, Git loader

Aug 10 2021

zack raised the priority of T3457: Some git repositories are failing to be ingested because of MemoryError from Normal to High.
Aug 10 2021, 12:10 PM · Git loader
vsellier added a comment to T3457: Some git repositories are failing to be ingested because of MemoryError.

Another example in production, during the stop phase of a worker, the loader was alone on the server (with 12Go of ram) and was oom killed:

Aug 10 08:53:24 worker05 python3[871]: [2021-08-10 08:53:24,745: INFO/ForkPoolWorker-1] Load origin 'https://github.com/evands/Specs' with type 'git'
Aug 10 08:54:17 worker05 python3[871]: [62B blob data]
Aug 10 08:54:17 worker05 python3[871]: [586B blob data]
Aug 10 08:54:17 worker05 python3[871]: [473B blob data]
Aug 10 08:54:29 worker05 python3[871]: Total 782419 (delta 6), reused 5 (delta 5), pack-reused 782401                                         
Aug 10 08:54:29 worker05 python3[871]: [2021-08-10 08:54:29,044: INFO/ForkPoolWorker-1] Listed 6 refs for repo https://github.com/evands/Specs
Aug 10 08:59:21 worker05 kernel: [    871]  1004   871   247194   161634  1826816    46260             0 python3                              
Aug 10 09:08:29 worker05 systemd[1]: swh-worker@loader_git.service: Unit process 871 (python3) remains running after unit stopped.            
Aug 10 09:15:29 worker05 kernel: [    871]  1004   871   412057   372785  3145728        0             0 python3                              
Aug 10 09:16:57 worker05 kernel: [    871]  1004   871   823648   784496  6443008        0             0 python3                              
Aug 10 09:24:44 worker05 kernel: CPU: 2 PID: 871 Comm: python3 Not tainted 5.10.0-0.bpo.7-amd64 #1 Debian 5.10.40-1~bpo10+1                   
Aug 10 09:24:44 worker05 kernel: [    871]  1004   871  2800000  2760713 22286336        0             0 python3                              
Aug 10 09:24:44 worker05 kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0-2,oom_memcg=/system.slice/system-swh\x2dworker.slice,task_memcg=/system.slice/system-swh\x2dworker.slice/swh-worker@loader_git.service,task=python3,pid=871,uid=1004           
Aug 10 09:24:44 worker05 kernel: Memory cgroup out of memory: Killed process 871 (python3) total-vm:11200000kB, anon-rss:11038844kB, file-rss:4008kB, shmem-rss:0kB, UID:1004 pgtables:21764kB oom_score_adj:0
Aug 10 09:24:45 worker05 kernel: oom_reaper: reaped process 871 (python3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Aug 10 2021, 11:32 AM · Git loader

Aug 9 2021

ardumont added a comment to T3457: Some git repositories are failing to be ingested because of MemoryError.

[3] possibly T2373

Aug 9 2021, 4:19 PM · Git loader
ardumont triaged T3472: Git loader implementations divergence point of attention as Normal priority.
Aug 9 2021, 2:26 PM · Git loader

Aug 5 2021

vlorentz renamed T3457: Some git repositories are failing to be ingested because of MemoryError from Big git repositories are failing to be ingested to Some git repositories are failing to be ingested because of MemoryError.
Aug 5 2021, 2:13 PM · Git loader
vlorentz added a comment to T3457: Some git repositories are failing to be ingested because of MemoryError.

It's exactly the same issue AFAIK

Aug 5 2021, 1:37 PM · Git loader
ardumont updated subscribers of T3457: Some git repositories are failing to be ingested because of MemoryError.

For information, @vlorentz opened a related issue in dulwich [1].

Aug 5 2021, 1:05 PM · Git loader
ardumont updated the task description for T3457: Some git repositories are failing to be ingested because of MemoryError.
Aug 5 2021, 1:04 PM · Git loader

Aug 4 2021

ardumont updated the task description for T3457: Some git repositories are failing to be ingested because of MemoryError.
Aug 4 2021, 12:21 PM · Git loader
ardumont updated the task description for T3457: Some git repositories are failing to be ingested because of MemoryError.
Aug 4 2021, 12:08 PM · Git loader
ardumont updated the task description for T3457: Some git repositories are failing to be ingested because of MemoryError.
Aug 4 2021, 12:05 PM · Git loader
ardumont updated the task description for T3457: Some git repositories are failing to be ingested because of MemoryError.
Aug 4 2021, 10:30 AM · Git loader
ardumont triaged T3457: Some git repositories are failing to be ingested because of MemoryError as Normal priority.
Aug 4 2021, 10:28 AM · Git loader

Jul 13 2021

vlorentz added a comment to T3311: Use .gitmodules to discover origins.

if it would be worth submitting these recursive origins with "save code now" so we can try to get submodule updates close to the update of the main repository

Jul 13 2021, 12:10 PM · Archive coverage, Git loader

Jul 12 2021

olasd added a comment to T3311: Use .gitmodules to discover origins.

I also wonder if we have a somewhat common approach to handle the SVN externals as well.

Jul 12 2021, 3:48 PM · Archive coverage, Git loader
olasd added a comment to T3311: Use .gitmodules to discover origins.

I think this is worthwhile in general, at least for repositories that are still live.

Jul 12 2021, 3:47 PM · Archive coverage, Git loader

May 6 2021

zack added a comment to T3311: Use .gitmodules to discover origins.

I think the only issue with (3) is not being retroactive

May 6 2021, 6:49 PM · Archive coverage, Git loader
vlorentz added a project to T3311: Use .gitmodules to discover origins: Archive coverage.
May 6 2021, 6:34 PM · Archive coverage, Git loader
vlorentz added a comment to T3311: Use .gitmodules to discover origins.

I think the only issue with (3) is not being retroactive

May 6 2021, 6:30 PM · Archive coverage, Git loader
zack added a comment to T3311: Use .gitmodules to discover origins.

This is a good idea, thanks for raising it.

May 6 2021, 6:06 PM · Archive coverage, Git loader
vlorentz triaged T3311: Use .gitmodules to discover origins as Low priority.
May 6 2021, 5:07 PM · Archive coverage, Git loader

Apr 14 2021

KShivendu closed T3132: loader-git: Bad formatting of the "Pack file too big" error message as Resolved.
Apr 14 2021, 5:44 PM · Easy hack, Git loader

Apr 5 2021

aastha1999 added a revision to T3132: loader-git: Bad formatting of the "Pack file too big" error message: D5418: Fix Pack File too big error formatting.
Apr 5 2021, 9:22 AM · Easy hack, Git loader

Apr 4 2021

KShivendu added a comment to T3132: loader-git: Bad formatting of the "Pack file too big" error message.

I am here to just say: swh-loader-git doesn't have a CONTRIBUTORS file. You may ask the contributor to add it as well :)

Apr 4 2021, 10:56 AM · Easy hack, Git loader

Mar 15 2021

vlorentz triaged T3132: loader-git: Bad formatting of the "Pack file too big" error message as Low priority.
Mar 15 2021, 1:18 PM · Easy hack, Git loader

Mar 5 2021

vlorentz added a subtask for T2059: Generate (swh) releases from all git tags: T3089: Remove the 'metadata' column of the 'revision' table.
Mar 5 2021, 12:30 PM · Git loader

Mar 1 2021

anlambert lowered the priority of T2926: Failed ingestion of a GitHub repository from High to Normal.

Lowering task priority to normal, nothing critical here.

Mar 1 2021, 2:57 PM · Web app, Git loader

Feb 3 2021

olasd updated subscribers of T3025: git loaders are getting oom-killed repeatedly in prod.

After mulling this over with @zack, and looking at the starved worker logs for a while, I suspect that we're also being bitten by our (early, early) choice of using celery acks_late, which only acknowledges tasks when they're done: when a worker is OOM-killed, it will never send task acknowledgements to rabbitmq, which will keep re-sending it the tasks.

Feb 3 2021, 8:16 PM · Git loader, System administration
olasd added a revision to T3025: git loaders are getting oom-killed repeatedly in prod: D5003: celery: acknowledge tasks as soon as they're received.
Feb 3 2021, 8:11 PM · Git loader, System administration
olasd added a comment to T3025: git loaders are getting oom-killed repeatedly in prod.

My current workaround attempt is switching pack fetches from https://github.com/* to git://github.com/*, transparently in the git loader; dulwich's git over TCP transport doesn't have to do the same "double-buffering" as the https transport, so it should allow us to fail earlier (hopefully without involving the oom killer).

Feb 3 2021, 5:36 PM · Git loader, System administration
olasd added a comment to T3025: git loaders are getting oom-killed repeatedly in prod.

Attempts at mitigating the issue:

Feb 3 2021, 5:28 PM · Git loader, System administration
olasd changed the status of T3025: git loaders are getting oom-killed repeatedly in prod from Open to Work in Progress.
Feb 3 2021, 3:38 PM · Git loader, System administration

Jan 13 2021

moranegg updated the task description for T1101: fetch release note from github to keep in release_metadata table.
Jan 13 2021, 10:53 AM · Git loader

Jan 7 2021

anlambert added a comment to T2926: Failed ingestion of a GitHub repository.

Thanks Antoine, any way to have this kind of errors also reported in the admin dashboard for save code now.

Jan 7 2021, 11:44 AM · Web app, Git loader
rdicosmo added a comment to T2926: Failed ingestion of a GitHub repository.

Thanks Antoine, any way to have this kind of errors also reported in the admin dashboard for save code now.

Jan 7 2021, 11:41 AM · Web app, Git loader
anlambert added a comment to T2926: Failed ingestion of a GitHub repository.

For the record, the load failure on 2021-01-04T17:05:11Z was due to a network error (found via Kibana):

Jan 7 2021, 11:34 AM · Web app, Git loader

Jan 6 2021

anlambert added a comment to T2926: Failed ingestion of a GitHub repository.

The repository has been correctly ingested on 05 January 2021, 11:56 UTC .

Jan 6 2021, 1:53 PM · Web app, Git loader
rdicosmo updated subscribers of T2926: Failed ingestion of a GitHub repository.
Jan 6 2021, 12:34 PM · Web app, Git loader

Jan 4 2021

rdicosmo triaged T2926: Failed ingestion of a GitHub repository as High priority.
Jan 4 2021, 7:29 PM · Web app, Git loader

Oct 16 2020

vlorentz added projects to T2666: GitHub releases not available in record: Data Model, Git loader.
Oct 16 2020, 2:28 PM · Git loader, Data Model
vlorentz merged T2666: GitHub releases not available in record into T2059: Generate (swh) releases from all git tags.
Oct 16 2020, 2:26 PM · Git loader

Sep 24 2020

vlorentz added a comment to T340: add missing "archive_type" property to revision.metadata JSON for all imported dsc.

I don't think so; the loader is storing the data elsewhere, but still doesn't write the archive type in each of these entries

Sep 24 2020, 11:10 AM · Git loader

Sep 22 2020

olasd closed T340: add missing "archive_type" property to revision.metadata JSON for all imported dsc as Wontfix.

I suspect that this is superseded by work done by @vlorentz for the extrinsic metadata store.

Sep 22 2020, 6:23 PM · Git loader
olasd placed T996: Load git origins with missing revisions again up for grabs.
Sep 22 2020, 4:43 PM · Git loader
ardumont added a comment to T2373: git loader OOM when loading huge repository.

running some of the sources on production. I have "save code now" guix and
nixpkgs repositories, i could also add the linux kernel (it the visit is old
enough).

Sep 22 2020, 9:45 AM · Git loader

Sep 21 2020

ardumont added a comment to T2616: Analyze the launchpad repository failures.

I have opened a "fresher" dashboard on kibana with the errors (grouped by error message as kibana filter, they needs toggling on/off to actually see them) [1]
I think we need to cross those filtering messages with sentry to actually have some context though... (as we don't have really any with that board...).

Sep 21 2020, 7:27 PM · Git loader
ardumont added a comment to T2373: git loader OOM when loading huge repository.

fwiw, loader-core v0.11.0 deployed in production.

Sep 21 2020, 3:57 PM · Git loader
vlorentz closed T2373: git loader OOM when loading huge repository as Resolved.
Sep 21 2020, 3:35 PM · Git loader
zack added a comment to T2373: git loader OOM when loading huge repository.

fwiw, loader-core v0.11.0 deployed in production.

Sep 21 2020, 2:43 PM · Git loader
ardumont added a comment to T2373: git loader OOM when loading huge repository.

fwiw, loader-core v0.11.0 deployed in production.

Sep 21 2020, 2:38 PM · Git loader
ardumont renamed T2616: Analyze the launchpad repository failures from Analyze the gitea repository (codeberg) failures to Analyze the launchpad repository failures.
Sep 21 2020, 1:44 PM · Git loader
ardumont triaged T2616: Analyze the launchpad repository failures as Normal priority.
Sep 21 2020, 1:44 PM · Git loader

Sep 20 2020

zack added a comment to T2373: git loader OOM when loading huge repository.

I can confirm that with the current master HEAD of swh-loader-core (452fa224f9ca635a979cf1a8e98c88bb560ca98a), loading of the Linux kernel repo no longer OOM.
(It failed after ~24 hours, but apparently for unrelated reasons.)

Sep 20 2020, 2:31 PM · Git loader

Sep 18 2020

ardumont changed the status of T2373: git loader OOM when loading huge repository from Open to Work in Progress.
Sep 18 2020, 3:42 PM · Git loader
ardumont added a revision to T2373: git loader OOM when loading huge repository: D3988: loaders: Move the proxy storage filter after the buffer proxy.
Sep 18 2020, 3:15 PM · Git loader
ardumont added a revision to T2373: git loader OOM when loading huge repository: D3986: loaders: Move the proxy storage filter after the buffer proxy.
Sep 18 2020, 3:11 PM · Git loader
ardumont added a comment to T2373: git loader OOM when loading huge repository.

Status on this. Loader-core has been tagged 0.11.0 which includes D3976.

Sep 18 2020, 2:57 PM · Git loader
swh-public-ci added a comment to D3978: tests: Don't check the number of created 'person' objects..

Build is green

Sep 18 2020, 11:19 AM · Git loader
vlorentz closed D3978: tests: Don't check the number of created 'person' objects..
Sep 18 2020, 11:18 AM · Git loader
vlorentz updated the diff for D3978: tests: Don't check the number of created 'person' objects..

rebase

Sep 18 2020, 11:18 AM · Git loader
swh-public-ci added a comment to D3978: tests: Don't check the number of created 'person' objects..

Build is green

Sep 18 2020, 11:16 AM · Git loader
ardumont updated the summary of D3978: tests: Don't check the number of created 'person' objects..
Sep 18 2020, 11:15 AM · Git loader

Sep 17 2020

vlorentz added a revision to T2373: git loader OOM when loading huge repository: D3976: loader: Stop materializing full lists of objects to be stored..
Sep 17 2020, 2:32 PM · Git loader
vlorentz added a comment to T2373: git loader OOM when loading huge repository.

Adding pagination to these endpoints seems quite overkill.

Sep 17 2020, 2:31 PM · Git loader
olasd added a comment to T2373: git loader OOM when loading huge repository.

So content_missing call explodes mid-air client side (`"POST /content/missing
HTTP/1.1" 200 9475383` so client received the data).

It so happens that the content_missing api is taking an unlimited amount of
bytes ids as input [1] And then "tries" to stream to the client the results
(rpc layer in the middle makes that moot).

Sep 17 2020, 2:03 PM · Git loader
zack merged task T2607: git loader OOM when loading the linux kernel repo into T2373: git loader OOM when loading huge repository.
Sep 17 2020, 9:53 AM · Git loader
zack merged T2607: git loader OOM when loading the linux kernel repo into T2373: git loader OOM when loading huge repository.
Sep 17 2020, 9:53 AM · Git loader
zack renamed T2373: git loader OOM when loading huge repository from staging: git loader: failure to ingest huge repository (e.g. nixpkgs) to git loader OOM when loading huge repository.
Sep 17 2020, 9:53 AM · Git loader
ardumont added a comment to T2373: git loader OOM when loading huge repository.

So content_missing call explodes mid-air client side (`"POST /content/missing
HTTP/1.1" 200 9475383` so client received the data).

Sep 17 2020, 9:48 AM · Git loader
douardda added a comment to T2373: git loader OOM when loading huge repository.

FTR, in a test setup I made a few days ago on docker, I had a git loader crunching ~28GB of RES mem (on 32 available on that machine). Not sure which repo it was ingesting, but it was on codeberg.

Sep 17 2020, 9:10 AM · Git loader
zack renamed T2607: git loader OOM when loading the linux kernel repo from git loader OOM when loading the linux kernel repo (at least in the docker dev environment) to git loader OOM when loading the linux kernel repo.
Sep 17 2020, 9:03 AM · Git loader
zack raised the priority of T2607: git loader OOM when loading the linux kernel repo from Normal to High.

Very likely the same issue, thanks @ardumont !
Given what @olasd said in that issue (the ingestion logic having remained pretty much the same since ever), and that I can confirm linux.git was loading just fine on my laptop no more than a year ago, the increased memory usage probably comes from elsewhere.
Anyway, it looks like a potentially important issue, so I'm raising priority and also removing the association with the docker env (as you could also reproduce this on staging).

Sep 17 2020, 9:03 AM · Git loader
ardumont added a comment to T2607: git loader OOM when loading the linux kernel repo.

possibly related to T2373.

Sep 17 2020, 8:51 AM · Git loader

Sep 16 2020

zack updated the task description for T2607: git loader OOM when loading the linux kernel repo.
Sep 16 2020, 8:28 PM · Git loader
zack triaged T2607: git loader OOM when loading the linux kernel repo as Normal priority.
Sep 16 2020, 8:26 PM · Git loader

Sep 11 2020

douardda closed T1342: Handle annotated tag with no tagger, a subtask of T1280: git origins: latest failure reports, as Resolved.
Sep 11 2020, 2:30 PM · Git loader
douardda closed T1342: Handle annotated tag with no tagger as Resolved.

Let's call it fixed (until further notice).

Sep 11 2020, 2:30 PM · Git loader
olasd placed T1280: git origins: latest failure reports up for grabs.
Sep 11 2020, 2:25 PM · Git loader

Jul 27 2020

ardumont closed T2481: Migrate dvcs loader tests code to pytest as Resolved.
Jul 27 2020, 3:21 PM · SVN Loader, Mercurial loader, Git loader, Core Loader
ardumont added a parent task for T2481: Migrate dvcs loader tests code to pytest: T2221: Development workflow & code quality.
Jul 27 2020, 3:20 PM · SVN Loader, Mercurial loader, Git loader, Core Loader

Jul 20 2020

ardumont closed T2483: tests: Make check-snapshot utility test function recursively check targetted object exists, a subtask of T2481: Migrate dvcs loader tests code to pytest, as Resolved.
Jul 20 2020, 9:17 AM · SVN Loader, Mercurial loader, Git loader, Core Loader
ardumont closed T2483: tests: Make check-snapshot utility test function recursively check targetted object exists as Resolved.
Jul 20 2020, 9:17 AM · SVN Loader, Mercurial loader, Git loader, Core Loader
ardumont closed T2484: Move sharable fixtures out of conftest into a dedicated pytest plugin, a subtask of T2481: Migrate dvcs loader tests code to pytest, as Resolved.
Jul 20 2020, 9:16 AM · SVN Loader, Mercurial loader, Git loader, Core Loader
ardumont closed T2484: Move sharable fixtures out of conftest into a dedicated pytest plugin as Resolved.
Jul 20 2020, 9:16 AM · SVN Loader, Mercurial loader, Git loader, Core Loader

Jul 17 2020

ardumont added a revision to T2484: Move sharable fixtures out of conftest into a dedicated pytest plugin: D3551: tests: Reuse pytest fixtures from swh.loader.core.
Jul 17 2020, 12:12 PM · SVN Loader, Mercurial loader, Git loader, Core Loader
ardumont added a revision to T2484: Move sharable fixtures out of conftest into a dedicated pytest plugin: D3550: tests: Reuse pytest fixtures from swh.loader.core.
Jul 17 2020, 12:04 PM · SVN Loader, Mercurial loader, Git loader, Core Loader
ardumont added a revision to T2484: Move sharable fixtures out of conftest into a dedicated pytest plugin: D3549: tests: Reuse pytest fixtures from swh.loader.core.
Jul 17 2020, 12:04 PM · SVN Loader, Mercurial loader, Git loader, Core Loader

Jul 16 2020

ardumont closed T2488: Drop loader.core BaseLoaderTest and BaseLoaderStorageTest, a subtask of T2481: Migrate dvcs loader tests code to pytest, as Resolved.
Jul 16 2020, 3:18 PM · SVN Loader, Mercurial loader, Git loader, Core Loader
ardumont closed T2488: Drop loader.core BaseLoaderTest and BaseLoaderStorageTest as Resolved.
Jul 16 2020, 3:18 PM · SVN Loader, Mercurial loader, Git loader, Core Loader