Page MenuHomeSoftware Heritage
Feed Advanced Search

Jan 8 2023

gitlab-migration changed the status of T102: Add synthetic flag to false for swh-loader-git from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:16 PM · Git loader
gitlab-migration changed the status of T76: Reload repositories whose import failed due to connection issues from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:16 PM · Git loader
gitlab-migration changed the status of T73: Reload repositories with null tag names from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:16 PM · Git loader
gitlab-migration changed the status of T68: support for git tags that point to arbitrary git objects, instead of revisions from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:16 PM · Git loader
gitlab-migration changed the status of T64: Support tags with empty or non-utf8 messages from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:16 PM · Git loader
gitlab-migration changed the status of T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have from Resolved to Migrated.

This task has been migrated to GitLab.

Jan 8 2023, 4:16 PM · Git loader

Jan 5 2023

vlorentz placed T4744: TypeError: argument of type 'ssl.SSLError' is not iterable up for grabs.
Jan 5 2023, 1:05 PM · Easy hack, Git loader
swh-sentry-integration assigned T4744: TypeError: argument of type 'ssl.SSLError' is not iterable to vlorentz.
Jan 5 2023, 1:05 PM · Easy hack, Git loader

Dec 13 2022

vlorentz closed T4724: UnicodeDecodeError on branch names in git loader as Resolved.
Dec 13 2022, 1:35 PM · Git loader
vlorentz triaged T4724: UnicodeDecodeError on branch names in git loader as Normal priority.
Dec 13 2022, 1:24 PM · Git loader
vlorentz added a revision to T4724: UnicodeDecodeError on branch names in git loader: D8956: Fix crash on non-UTF8 branch names.
Dec 13 2022, 1:23 PM · Git loader
swh-sentry-integration assigned T4724: UnicodeDecodeError on branch names in git loader to vlorentz.
Dec 13 2022, 1:22 PM · Git loader

Dec 1 2022

vlorentz closed T3273: Use "fork" relationships to speed-up initial load of large repositories as Resolved.
Dec 1 2022, 4:18 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader
vlorentz closed T3273: Use "fork" relationships to speed-up initial load of large repositories, a subtask of T4283: Load https://github.com/chromium/chromium with a higher packfile size limit, as Resolved.
Dec 1 2022, 4:18 PM · System administration, Git loader
vlorentz closed T4219: Investigate why GitHub fork detection did not bring a speed-up, a subtask of T3273: Use "fork" relationships to speed-up initial load of large repositories, as Resolved.
Dec 1 2022, 4:18 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader
vlorentz closed T4219: Investigate why GitHub fork detection did not bring a speed-up as Resolved.
Dec 1 2022, 4:18 PM · Origin-GitHub, Git loader

Nov 4 2022

olasd added a comment to T4219: Investigate why GitHub fork detection did not bring a speed-up.

swh.loader.git 2.1.0 has now been deployed on all workers.

Nov 4 2022, 9:25 PM · Origin-GitHub, Git loader

Nov 3 2022

olasd added a revision to T4219: Investigate why GitHub fork detection did not bring a speed-up: D8808: Eagerly populate the set of local heads in RepoRepresentation.__init__.
Nov 3 2022, 5:28 PM · Origin-GitHub, Git loader

Nov 2 2022

vlorentz closed T4660: Workaround / in tree entry names as Resolved.
Nov 2 2022, 11:52 AM · Git loader
vlorentz closed T4660: Workaround / in tree entry names, a subtask of T4659: Fix all crashes of the git loader caused by malformed git objects, as Resolved.
Nov 2 2022, 11:52 AM · meta-task, Git loader

Oct 26 2022

vlorentz added a parent task for T4663: ObjectFormatException: not enough values to unpack (expected 2, got 1): T4659: Fix all crashes of the git loader caused by malformed git objects.
Oct 26 2022, 4:45 PM · Git loader
vlorentz added a subtask for T4659: Fix all crashes of the git loader caused by malformed git objects: T4663: ObjectFormatException: not enough values to unpack (expected 2, got 1).
Oct 26 2022, 4:45 PM · meta-task, Git loader
vlorentz placed T4663: ObjectFormatException: not enough values to unpack (expected 2, got 1) up for grabs.
Oct 26 2022, 4:45 PM · Git loader
swh-sentry-integration assigned T4663: ObjectFormatException: not enough values to unpack (expected 2, got 1) to vlorentz.
Oct 26 2022, 4:45 PM · Git loader
vlorentz placed T3880: Support Git commits with no angle brackets in author name up for grabs.
Oct 26 2022, 11:22 AM · Git loader
vlorentz added a revision to T4660: Workaround / in tree entry names: D8776: converters: Replace '/' with '_' in directory entries.
Oct 26 2022, 11:21 AM · Git loader
vlorentz triaged T4660: Workaround / in tree entry names as Low priority.
Oct 26 2022, 11:21 AM · Git loader
vlorentz added a parent task for T4659: Fix all crashes of the git loader caused by malformed git objects: T3653: Stabilize loader git.
Oct 26 2022, 10:37 AM · meta-task, Git loader
vlorentz added a subtask for T3653: Stabilize loader git: T4659: Fix all crashes of the git loader caused by malformed git objects.
Oct 26 2022, 10:37 AM · Git loader
vlorentz added a parent task for T1339: Handle malformed author and committer dates: T4659: Fix all crashes of the git loader caused by malformed git objects.
Oct 26 2022, 10:33 AM · Storage manager, Git loader
vlorentz added a subtask for T4659: Fix all crashes of the git loader caused by malformed git objects: T1339: Handle malformed author and committer dates.
Oct 26 2022, 10:33 AM · meta-task, Git loader
vlorentz renamed T4659: Fix all crashes of the git loader caused by malformed git objects from Fix all crashes of the git loader to Fix all crashes of the git loader caused by malformed git objects.
Oct 26 2022, 10:32 AM · meta-task, Git loader
vlorentz added a subtask for T4659: Fix all crashes of the git loader caused by malformed git objects: Unknown Object (Maniphest Task).
Oct 26 2022, 10:32 AM · meta-task, Git loader
vlorentz placed T4658: ObjectFormatException: Unknown field b'>' up for grabs.
Oct 26 2022, 10:31 AM · Git loader
vlorentz added a parent task for T4658: ObjectFormatException: Unknown field b'>': T4659: Fix all crashes of the git loader caused by malformed git objects.
Oct 26 2022, 10:31 AM · Git loader
vlorentz added a parent task for T3880: Support Git commits with no angle brackets in author name: T4659: Fix all crashes of the git loader caused by malformed git objects.
Oct 26 2022, 10:31 AM · Git loader
vlorentz added subtasks for T4659: Fix all crashes of the git loader caused by malformed git objects: T4658: ObjectFormatException: Unknown field b'>', T3880: Support Git commits with no angle brackets in author name.
Oct 26 2022, 10:31 AM · meta-task, Git loader
vlorentz triaged T4659: Fix all crashes of the git loader caused by malformed git objects as Normal priority.
Oct 26 2022, 10:31 AM · meta-task, Git loader
swh-sentry-integration assigned T4658: ObjectFormatException: Unknown field b'>' to vlorentz.
Oct 26 2022, 10:30 AM · Git loader

Oct 19 2022

gitlab-migration closed T4400: Fill in the gap with scanoss tool as Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:07 PM · System administration, Git loader
gitlab-migration closed T4390: Reschedule pack file too big failing loading task to dedicated queue consumed by large enough workers as Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:07 PM · System administration, Git loader
gitlab-migration changed the status of T4311: Package and deploy dulwich 0.20.43 in production from Resolved to Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:07 PM · System administration, Git loader
gitlab-migration changed the status of T4283: Load https://github.com/chromium/chromium with a higher packfile size limit from Resolved to Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:07 PM · System administration, Git loader
gitlab-migration changed the status of T4243: Deploy loader.metadata credentials for high and oneshot loaders from Resolved to Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:06 PM · System administration, Metadata Loaders, Git loader
gitlab-migration changed the status of T4242: Deployed loader.git v1.8, a subtask of T4219: Investigate why GitHub fork detection did not bring a speed-up, from Resolved to Migrated.
Oct 19 2022, 6:06 PM · Origin-GitHub, Git loader
gitlab-migration changed the status of T4242: Deployed loader.git v1.8 from Resolved to Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:06 PM · System administration, Git loader
gitlab-migration changed the status of T4225: Deploy a more recent version of prometheus-statsd-exporter on all nodes, a subtask of T4219: Investigate why GitHub fork detection did not bring a speed-up, from Resolved to Migrated.
Oct 19 2022, 6:06 PM · Origin-GitHub, Git loader
gitlab-migration changed the status of T4206: prod: Deploy metadata loader v0.0.2, a subtask of T3273: Use "fork" relationships to speed-up initial load of large repositories, from Resolved to Migrated.
Oct 19 2022, 6:06 PM · Origin-GitHub, Origin-GitLab, Git loader, Extrinsic metadata, Core Loader
gitlab-migration closed T3640: Make long running task stop fast when warm shutdown is triggered, a subtask of T3653: Stabilize loader git, as Migrated.
Oct 19 2022, 6:04 PM · Git loader
gitlab-migration changed the status of T3614: Deploy swh.loader.git v1.1.0 and swh.model v3.0 from Resolved to Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:04 PM · System administration, Git loader
gitlab-migration changed the status of T3588: Deploy swh.loader.git v1.0 from Resolved to Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:04 PM · System administration, Git loader
gitlab-migration closed T3025: git loaders are getting oom-killed repeatedly in prod as Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 6:01 PM · Git loader, System administration
gitlab-migration changed the status of T1988: Upgrade dulwich on celery workers from Resolved to Migrated.

This task has been migrated to GitLab.

Oct 19 2022, 5:56 PM · System administration, Git loader

Sep 27 2022

swh-sentry-integration added a comment to T3880: Support Git commits with no angle brackets in author name.

Sentry issue: SWH-LOADER-GIT-198

Sep 27 2022, 9:25 AM · Git loader

Aug 11 2022

swh-sentry-integration added a comment to T3880: Support Git commits with no angle brackets in author name.

Sentry issue: SWH-LOADER-GIT-16G

Aug 11 2022, 12:43 PM · Git loader

Aug 6 2022

rdicosmo added a comment to T4400: Fill in the gap with scanoss tool.

Polished up and shared the tool built to produce the refined list priority.list.github.
It is now available at https://github.com/rdicosmo/swh-check-repositories

Aug 6 2022, 3:43 PM · System administration, Git loader

Aug 4 2022

ardumont added a comment to T4400: Fill in the gap with scanoss tool.

@ardumont here is the subset of the repositories on GitHub, ordered by number of stars, that are:

  • still on GitHub
  • not a fork

May your purge the current queue and reinsert this list instead ?

Aug 4 2022, 8:01 PM · System administration, Git loader
rdicosmo added a comment to T4400: Fill in the gap with scanoss tool.

@ardumont here is the subset of the repositories on GitHub, ordered by number of stars, that are:

  • still on GitHub
  • not a fork
Aug 4 2022, 6:02 PM · System administration, Git loader
rdicosmo added a comment to T4400: Fill in the gap with scanoss tool.

I took the opportunity to retrieve large origins [7] out of the sentry issue listing [6] (cf. description) [8].
And schedule those in the large queues after the one scheduled out of the scanoss exchange.

If it's considered useless at some point, feel free to dismiss them (by purging the queue).

[7] 28310 unique origins ->

[8] command used to create the listing out of sentry, in a venv (snippets repository) in worker1.staging:

(sentry-U52ipwI-) ardumont@worker1:~/snippets/ardumont/sentry% python -m list-urls-from-issue --project-name swh-loader-git --event-id 5823 | tee loader-git.pack-file-too-big-issue-5823.urls.txt
...
Aug 4 2022, 3:11 PM · System administration, Git loader

Aug 2 2022

ardumont added a comment to T4400: Fill in the gap with scanoss tool.

Since the normal ingestion is mostly done (1 last normal ingestion ongoing), i've now make worker17-18 consumes 1 more task for the large repositories queue as well (vs. letting them twiddle their thumbs ;).

Aug 2 2022, 11:12 AM · System administration, Git loader

Aug 1 2022

ardumont updated subscribers of T4400: Fill in the gap with scanoss tool.
Aug 1 2022, 12:32 PM · System administration, Git loader
ardumont added a comment to T4400: Fill in the gap with scanoss tool.

At this point in time:

  • 1 "normal" origin
  • 22 "large" origins
Aug 1 2022, 12:16 PM · System administration, Git loader

Jul 29 2022

ardumont updated the task description for T4400: Fill in the gap with scanoss tool.
Jul 29 2022, 6:53 PM · System administration, Git loader

Jul 25 2022

ardumont added a comment to T4400: Fill in the gap with scanoss tool.

fwiw, large repositories are taking their sweet time but it's on its way:

Jul 25 2022, 9:41 AM · System administration, Git loader

Jul 22 2022

ardumont updated the task description for T4400: Fill in the gap with scanoss tool.
Jul 22 2022, 6:23 PM · System administration, Git loader

Jul 19 2022

ardumont added a comment to T4400: Fill in the gap with scanoss tool.

It's currently ingesting [1].

Jul 19 2022, 6:05 PM · System administration, Git loader
ardumont updated the task description for T4400: Fill in the gap with scanoss tool.
Jul 19 2022, 3:58 PM · System administration, Git loader
ardumont moved T4400: Fill in the gap with scanoss tool from in-progress to deployed/landed/monitoring on the System administration board.
Jul 19 2022, 3:51 PM · System administration, Git loader
ardumont updated the task description for T4400: Fill in the gap with scanoss tool.
Jul 19 2022, 3:51 PM · System administration, Git loader
ardumont updated the task description for T4400: Fill in the gap with scanoss tool.
Jul 19 2022, 1:51 PM · System administration, Git loader
ardumont changed the status of T4400: Fill in the gap with scanoss tool from Open to Work in Progress.
Jul 19 2022, 1:50 PM · System administration, Git loader
ardumont added projects to T4400: Fill in the gap with scanoss tool: Git loader, System administration.
Jul 19 2022, 12:21 PM · System administration, Git loader

Jul 12 2022

ardumont added projects to T4390: Reschedule pack file too big failing loading task to dedicated queue consumed by large enough workers: Git loader, System administration.
Jul 12 2022, 11:58 AM · System administration, Git loader

Jun 16 2022

anlambert added a revision to T4311: Package and deploy dulwich 0.20.43 in production: D7996: loader: Bump dulwich and remove no longer valid comments.
Jun 16 2022, 1:47 PM · System administration, Git loader
olasd closed T4311: Package and deploy dulwich 0.20.43 in production as Resolved.

All production loaders have been restarted now.

Jun 16 2022, 11:36 AM · System administration, Git loader

Jun 10 2022

ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

Let's do this the other way around, closing this as i'm done.
Please reopen if you need something else.

Jun 10 2022, 9:05 AM · System administration, Git loader
ardumont closed T4283: Load https://github.com/chromium/chromium with a higher packfile size limit as Resolved.
Jun 10 2022, 9:05 AM · System administration, Git loader

Jun 9 2022

ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

@vlorentz I don't have anything left to do, can i close it now?

Jun 9 2022, 6:10 PM · System administration, Git loader
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

And the 2nd fork ingestion is done as well:

swhworker@worker17:~$ url=https://github.com/Tomahawkd/chromium; /usr/bin/time -v swh loader run git $url lister_name=github lister_instance_name=github pack_size_bytes=34359738368 | tee chromium-20220607-05-pack-size-limit-32g-fork2.txt
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/Tomahawkd/chromium' with type 'git'
Enumerating objects: 12661350, done.
Counting objects: 100% (191/191), done.
Compressing objects: 100% (56/56), done.
Total 12661350 (delta 140), reused 135 (delta 135), pack-reused 12661159
INFO:swh.loader.git.loader:Listed 15230 refs for repo https://github.com/Tomahawkd/chromium
INFO:swh.loader.git.loader.GitLoader:Fetched 12661351 objects; 2 are new
self.statsd.constant_tags: {'visit_type': 'git', 'incremental_enabled': True, 'has_parent_origins': True, 'has_parent_snapshot': True, 'has_previous_snapshot': False}
self.parent_origins: [Origin(url='https://github.com/chromium/chromium', id=b'\xa9\xf66\xa1/\\\xc3\\\xa4\x18+\r\xe7L\x91\x94\xe9\x00\x96J')]
{'status': 'eventful'} for origin 'https://github.com/Tomahawkd/chromium'
        Command being timed: "swh loader run git https://github.com/Tomahawkd/chromium lister_name=github lister_instance_name=github pack_size_bytes=34359738368"
        User time (seconds): 62323.33
        System time (seconds): 3001.76
        Percent of CPU this job got: 72%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 25:03:29
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 29352136
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 8
        Minor (reclaiming a frame) page faults: 10355329
        Voluntary context switches: 265156
        Involuntary context switches: 265330
        Swaps: 0
        File system inputs: 2048
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
Jun 9 2022, 6:01 PM · System administration, Git loader
olasd added a comment to T4311: Package and deploy dulwich 0.20.43 in production.

I've restarted the staging workers (loader_git and loader_high_priority) with the new dulwich version

Jun 9 2022, 5:36 PM · System administration, Git loader
vlorentz added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

heh, ok, so it's indeed because github sends us way too much

Jun 9 2022, 2:41 PM · System administration, Git loader
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

status, second fork ingestion done (prior to the other one still ongoing) [1]

Jun 9 2022, 2:28 PM · System administration, Git loader

Jun 8 2022

ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

That's still a pretty big packfile ~12.6G [1]... I'm pondering whether i should stop it,
install the new python3-dulwich olasd packaged and trigger it back...

Jun 8 2022, 4:32 PM · System administration, Git loader
ardumont added a comment to T4311: Package and deploy dulwich 0.20.43 in production.

fwiw, jenkins is python3-dulwich aware.

I don't see the point of that for packages that can be backported with no changes, which is what I had done before, so I admit I hadn't even looked.

Jun 8 2022, 4:25 PM · System administration, Git loader
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

So the first fork ingestion finished and took less time.

Looks like either the loader didn't detect it is a fork, or github sent a large packfile anyway.

In swh/loader/git/loader.py at the end of the prepare function, could you print self.statsd.constant_tags and self.parent_origins, to see which it is?

jsyk, I've edited accordingly the file and triggered back another fork ingestion:

swhworker@worker17:~$ url=https://github.com/Tomahawkd/chromium; /usr/bin/time -v swh loader run git $url lister_name=github lister_instance_name=github pack_size_bytes=34359738368 | tee chromium-20220607-05-pack-size-limit-32g-fork2.txt
INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/Tomahawkd/chromium' with type 'git'
Jun 8 2022, 4:02 PM · System administration, Git loader
olasd added a comment to T4311: Package and deploy dulwich 0.20.43 in production.

fwiw, jenkins is python3-dulwich aware.

Jun 8 2022, 3:59 PM · System administration, Git loader
ardumont added a comment to T4311: Package and deploy dulwich 0.20.43 in production.

fwiw, jenkins is python3-dulwich aware.

Jun 8 2022, 3:52 PM · System administration, Git loader
olasd added a comment to T4311: Package and deploy dulwich 0.20.43 in production.

I checked that the swh.loader.git tests are green with the new dulwich version.

Jun 8 2022, 3:51 PM · System administration, Git loader
olasd changed the status of T4311: Package and deploy dulwich 0.20.43 in production from Open to Work in Progress.
Jun 8 2022, 3:43 PM · System administration, Git loader
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

@vlorentz I also encountered [1] this morning which might explain the large packfile...

Jun 8 2022, 3:30 PM · System administration, Git loader
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

So the first fork ingestion finished and took less time.

Jun 8 2022, 3:20 PM · System administration, Git loader
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

Note that the first repo run took 134:40:21 (after multiple iterations so maybe more than that actually), so even if the fork ingestion take like ~10h, that'd be much quicker already ¯\_(ツ)_/¯ (been ongoing for ~52min now)

Jun 8 2022, 3:10 PM · System administration, Git loader

Jun 7 2022

ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

Which one has that much more commit, the initial one?

Yes

If so, i would expect the fork to be loaded way faster since they should have a shared history at some point in the past.

I would have expected it not to run out of memory (which was the point of the manual load), and it already failed that test

Jun 7 2022, 4:28 PM · System administration, Git loader
vlorentz added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

Which one has that much more commit, the initial one?

Jun 7 2022, 4:24 PM · System administration, Git loader
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

initial load of a different repository, which has 338k more commits

Jun 7 2022, 4:16 PM · System administration, Git loader
vlorentz added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

initial load of a different repository, which has 338k more commits

Jun 7 2022, 3:36 PM · System administration, Git loader
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

Looks like either the loader didn't detect it is a fork, or github sent a large packfile anyway.

In swh/loader/git/loader.py at the end of the prepare function, could you print self.statsd.constant_tags and self.parent_origins, to see which it is?

Jun 7 2022, 3:34 PM · System administration, Git loader
vlorentz added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

Looks like either the loader didn't detect it is a fork, or github sent a large packfile anyway.

Jun 7 2022, 3:30 PM · System administration, Git loader
ardumont added a comment to T4283: Load https://github.com/chromium/chromium with a higher packfile size limit.

Loader crashed with memory issues. Probably too much loading in //.
Currently stopping the worker's other processes to let this one finish (i'll restart it).

Jun 7 2022, 3:12 PM · System administration, Git loader
anlambert triaged T4311: Package and deploy dulwich 0.20.43 in production as Normal priority.
Jun 7 2022, 2:45 PM · System administration, Git loader