Page MenuHomeSoftware Heritage
Feed Advanced Search

Feb 5 2019

olasd added a comment to T1514: MemoryError in loader-git.

That's a fairly large repo (as seen with how the content bundles get spread out to limit their size). It looks like it has some large directories (e.g. the .bugs directory looks like it has a lot of entries) so I'm not too surprised.

Feb 5 2019, 3:40 PM · Git loader
douardda triaged T1514: MemoryError in loader-git as Normal priority.
Feb 5 2019, 9:47 AM · Git loader

Jan 21 2019

anlambert added a comment to T1280: git origins: latest failure reports.

Errors of type dulwich.errors.NotGitRepository [1] are likely related to a bug in dulwich regarding redirected repository urls not correctly handled.
A pull request [2] has been submitted to fix that issue.

Jan 21 2019, 5:53 PM · Git loader

Dec 17 2018

ardumont raised the priority of T1219: add tests to git loader from High to Needs Triage.
Dec 17 2018, 1:56 PM · Sprint 2018 12, Git loader
ardumont moved T1219: add tests to git loader from in progress to done on the Sprint 2018 12 board.
Dec 17 2018, 1:56 PM · Sprint 2018 12, Git loader
ardumont closed T1219: add tests to git loader as Resolved.

Up to 85% now.

Dec 17 2018, 1:55 PM · Sprint 2018 12, Git loader
ardumont moved T1219: add tests to git loader from Backlog to in progress on the Sprint 2018 12 board.
Dec 17 2018, 12:02 PM · Sprint 2018 12, Git loader
ardumont added a project to T1219: add tests to git loader: Sprint 2018 12.
Dec 17 2018, 12:01 PM · Sprint 2018 12, Git loader
ardumont changed the status of T1219: add tests to git loader from Open to Work in Progress.
Dec 17 2018, 12:01 PM · Sprint 2018 12, Git loader

Dec 4 2018

vlorentz added a parent task for T1219: add tests to git loader: T1411: at least 80% SLOC coverage in all components.
Dec 4 2018, 11:27 AM · Sprint 2018 12, Git loader

Nov 27 2018

zack placed T917: Git loader: update README for YAML-based syntax up for grabs.
Nov 27 2018, 12:17 PM · Git loader, Development documentation
zack added a parent task for T917: Git loader: update README for YAML-based syntax: T1388: Document the configuration system of each component.
Nov 27 2018, 12:17 PM · Git loader, Development documentation

Nov 16 2018

vlorentz added a revision to T1219: add tests to git loader: D665: Run git loader tests on BulkUpdater too..
Nov 16 2018, 12:29 PM · Sprint 2018 12, Git loader

Nov 14 2018

anlambert added a comment to T1280: git origins: latest failure reports.

Errors of type ValueError: year is out of range [1] are related to commit dates that can not be represented using standard datetime.datetime Python object (minyear = 0, maxyear = 9999).
See for instance:

Nov 14 2018, 5:13 PM · Git loader
anlambert added a comment to T1280: git origins: latest failure reports.

All the errors of type psycopg2.IntegrityError: duplicate key value violates unique constraint "content_pkey" [1] are all about sha1 collisions, mainly from repositories
testing the attack uncovered by SHAttered [2]

Nov 14 2018, 3:47 PM · Git loader
anlambert updated the task description for T1339: Handle malformed author and committer dates.
Nov 14 2018, 3:39 PM · Storage manager, Git loader
anlambert triaged T1342: Handle annotated tag with no tagger as Normal priority.
Nov 14 2018, 3:38 PM · Git loader
anlambert added a comment to T1339: Handle malformed author and committer dates.

Indeed, you're right the timezone offset is used to compute a revision identifier so even if its value is incorrect it should be stored anyway.

Nov 14 2018, 1:25 PM · Storage manager, Git loader
zack added a comment to T1339: Handle malformed author and committer dates.

The simplest solution would be to check if the computed timezone offset lies in the adequate bounds [UTC−14:00, UTC+14:00] and set it to 0 if not.

Nov 14 2018, 12:03 PM · Storage manager, Git loader
anlambert triaged T1339: Handle malformed author and committer dates as Normal priority.
Nov 14 2018, 11:57 AM · Storage manager, Git loader

Nov 9 2018

vlorentz reopened T1219: add tests to git loader as "Open".
Nov 9 2018, 5:49 PM · Sprint 2018 12, Git loader
zack added a comment to T1219: add tests to git loader.

Resolved by D622.

Nov 9 2018, 1:49 PM · Sprint 2018 12, Git loader
vlorentz closed T1219: add tests to git loader as Resolved.

Resolved by D622.

Nov 9 2018, 1:45 PM · Sprint 2018 12, Git loader

Oct 30 2018

vlorentz added a revision to T1219: add tests to git loader: D622: New tests for the Git loader..
Oct 30 2018, 3:47 PM · Sprint 2018 12, Git loader
vlorentz claimed T1219: add tests to git loader.
Oct 30 2018, 3:14 PM · Sprint 2018 12, Git loader

Oct 20 2018

ardumont edited P320 loader errors per loader type: ~/.config/swh/kibana/group-by.yml.
Oct 20 2018, 12:54 PM · Git loader, Mercurial loader, PyPI loader
ardumont added a comment to T1280: git origins: latest failure reports.

Heads up, i amended the description.
I did it that way to not destroy the analysis thread (adding a bloated comment in the middle of it seemed wrong to me).

Oct 20 2018, 8:43 AM · Git loader
ardumont updated the task description for T1280: git origins: latest failure reports.
Oct 20 2018, 8:38 AM · Git loader

Oct 19 2018

olasd added a comment to T1280: git origins: latest failure reports.

"dulwich.errors.ObjectFormatException: invalid literal for int() with base 10:" is https://github.com/dulwich/dulwich/pull/664

Oct 19 2018, 5:53 PM · Git loader
olasd claimed T1280: git origins: latest failure reports.

Thanks for the analysis!

Oct 19 2018, 4:08 PM · Git loader
ardumont triaged T1280: git origins: latest failure reports as Normal priority.
Oct 19 2018, 11:49 AM · Git loader
ardumont edited P320 loader errors per loader type: ~/.config/swh/kibana/group-by.yml.
Oct 19 2018, 10:20 AM · Git loader, Mercurial loader, PyPI loader
ardumont added projects to P320 loader errors per loader type: ~/.config/swh/kibana/group-by.yml: PyPI loader, Mercurial loader, Git loader.
Oct 19 2018, 9:54 AM · Git loader, Mercurial loader, PyPI loader

Oct 11 2018

olasd closed T1263: Git loader: created scheduler tasks always fail as Resolved by committing rDLDG85866507949a: Use explicit keyword argument for base_url in the load task.
Oct 11 2018, 3:42 PM · Git loader
anlambert triaged T1263: Git loader: created scheduler tasks always fail as High priority.
Oct 11 2018, 3:10 PM · Git loader

Oct 1 2018

zack triaged T1219: add tests to git loader as High priority.
Oct 1 2018, 7:40 PM · Sprint 2018 12, Git loader

Sep 27 2018

olasd merged task T1213: Git loader: allow ignoring the contents of the archive when updating an origin into T1212: git loader: add an option to force re-download/ingest everything.
Sep 27 2018, 11:40 AM · Git loader
olasd merged T1213: Git loader: allow ignoring the contents of the archive when updating an origin into T1212: git loader: add an option to force re-download/ingest everything.
Sep 27 2018, 11:40 AM · Git loader
olasd triaged T1213: Git loader: allow ignoring the contents of the archive when updating an origin as Normal priority.
Sep 27 2018, 11:39 AM · Git loader
zack triaged T1212: git loader: add an option to force re-download/ingest everything as Normal priority.
Sep 27 2018, 11:39 AM · Git loader

Sep 21 2018

olasd closed T1199: (Incremental) loading of large git repositories fails with an HTTP timeout as Resolved.

That's deployed now.

Sep 21 2018, 2:37 PM · Git loader

Sep 20 2018

olasd added a comment to T1199: (Incremental) loading of large git repositories fails with an HTTP timeout.

Turns out that eadefcb15384ac0c68f3ba664f9607e1f588257d already truncates the history to a depth of 0 *cough*. It just needs deploying.

Sep 20 2018, 5:21 PM · Git loader

Sep 17 2018

olasd claimed T1199: (Incremental) loading of large git repositories fails with an HTTP timeout.
Sep 17 2018, 1:58 PM · Git loader
olasd triaged T1199: (Incremental) loading of large git repositories fails with an HTTP timeout as High priority.
Sep 17 2018, 1:58 PM · Git loader

Sep 14 2018

olasd added a comment to T1195: git loader: fail to ingest our own hello world repository.

Tracked down the dulwich bug. Pull Request : https://github.com/dulwich/dulwich/pull/659

Sep 14 2018, 6:57 PM · Git loader
olasd added a comment to T1195: git loader: fail to ingest our own hello world repository.

Thanks to

export GIT_TRACE_PACKET=1
export GIT_TRACE=1
export GIT_CURL_VERBOSE=1

I've compared traces from git cloning on our forge and on other repositories with actual git clone.

Sep 14 2018, 5:29 PM · Git loader
zack triaged T1195: git loader: fail to ingest our own hello world repository as Normal priority.
Sep 14 2018, 5:01 PM · Git loader

Jun 26 2018

olasd added a comment to T917: Git loader: update README for YAML-based syntax.
---
content_size_limit: 1000000
save_data: True
save_data_path: /home/ndandrim/.cache/swh/packfiles
storage:
  cls: remote
  args:
    url: http://localhost:5002/
Jun 26 2018, 4:55 PM · Git loader, Development documentation

Jun 13 2018

moranegg added a parent task for T17: handle github assets in git loader: T1102: Handle all GitHub elements (meta task).
Jun 13 2018, 4:24 PM · Git loader
moranegg added a parent task for T1101: fetch release note from github to keep in release_metadata table: T1102: Handle all GitHub elements (meta task).
Jun 13 2018, 4:24 PM · Git loader
moranegg removed a subtask for T17: handle github assets in git loader: T1101: fetch release note from github to keep in release_metadata table.
Jun 13 2018, 4:18 PM · Git loader
moranegg removed a parent task for T1101: fetch release note from github to keep in release_metadata table: T17: handle github assets in git loader.
Jun 13 2018, 4:18 PM · Git loader
moranegg renamed T1101: fetch release note from github to keep in release_metadata table from fetch release note from github to keep with release object to fetch release note from github to keep in release_metadata table.
Jun 13 2018, 4:07 PM · Git loader
olasd added a comment to T1101: fetch release note from github to keep in release_metadata table.

In my opinion, the release notes shouldn't be stored in the release objects, for the following reasons:

  • they are dependent of the origin (a clone of this exact same repository on another git hosting platform won't have that information)
  • they aren't part of the data used to compute the release identifier
  • they can be modified after the fact
Jun 13 2018, 4:04 PM · Git loader
moranegg updated the task description for T1101: fetch release note from github to keep in release_metadata table.
Jun 13 2018, 4:00 PM · Git loader
moranegg triaged T1101: fetch release note from github to keep in release_metadata table as Low priority.
Jun 13 2018, 3:59 PM · Git loader

Mar 16 2018

olasd triaged T996: Load git origins with missing revisions again as High priority.
Mar 16 2018, 5:55 PM · Git loader

Mar 14 2018

ardumont created P233 ~/.config/swh/loader/git-updater.yml.
Mar 14 2018, 10:12 AM · Git loader

Mar 2 2018

rdicosmo updated the task description for T980: Identify and fix releases that are stored as revisions.
Mar 2 2018, 5:19 PM · Archive content
rdicosmo created T980: Identify and fix releases that are stored as revisions.
Mar 2 2018, 5:17 PM · Archive content

Jan 11 2018

zack added a comment to T917: Git loader: update README for YAML-based syntax.
In T917#16911, @olasd wrote:

Moving the documentation about the config to a top-level configuration document, in the docs directory, sounds like a very good plan. You assigned yourself the ticket, will you do it?

Jan 11 2018, 8:25 AM · Git loader, Development documentation

Jan 10 2018

olasd added a comment to T917: Git loader: update README for YAML-based syntax.

Moving the documentation about the config to a top-level configuration document, in the docs directory, sounds like a very good plan. You assigned yourself the ticket, will you do it?

Jan 10 2018, 4:46 PM · Git loader, Development documentation

Jan 6 2018

zack renamed T917: Git loader: update README for YAML-based syntax from Git loader: update README for YAML-based syntax. to Git loader: update README for YAML-based syntax.
Jan 6 2018, 8:50 AM · Git loader, Development documentation
zack created T917: Git loader: update README for YAML-based syntax.
Jan 6 2018, 8:50 AM · Git loader, Development documentation

Dec 15 2017

ardumont closed T816: Gitorious import: loose object parsing error with corrupted file as empty one as Resolved.
Dec 15 2017, 7:43 PM · Git loader, Origin-Gitorious
ardumont added a comment to T816: Gitorious import: loose object parsing error with corrupted file as empty one.

Fixed with that latest version package:

Dec 15 2017, 7:43 PM · Git loader, Origin-Gitorious
ardumont added a comment to T816: Gitorious import: loose object parsing error with corrupted file as empty one.

Packaged it and pushed to our own repository.

Dec 15 2017, 7:37 PM · Git loader, Origin-Gitorious
ardumont added a comment to T816: Gitorious import: loose object parsing error with corrupted file as empty one.

Update on this:

  • Issue opened.
  • Pull Request (PR) proposed and merged.
Dec 15 2017, 1:43 PM · Git loader, Origin-Gitorious

Nov 13 2017

ardumont closed T823: Gitorious import: Overflow error in revision time as Resolved by committing rDLDG120f23dd0bf2: swh.loader.git.disk: Force further checks on objects.
Nov 13 2017, 6:40 PM · Origin-Gitorious, Storage manager, Git loader
ardumont added a comment to T816: Gitorious import: loose object parsing error with corrupted file as empty one.

This error slipped under my radar last week.
I opened a related issue in dulwich since it should be handled upstream.

Nov 13 2017, 2:53 PM · Git loader, Origin-Gitorious

Nov 10 2017

ardumont added a comment to T823: Gitorious import: Overflow error in revision time.

PR got merged \m/

Nov 10 2017, 6:29 PM · Origin-Gitorious, Storage manager, Git loader

Nov 7 2017

ardumont closed T822: Gitorious import: ObjectFormatException raised when badly formatted object (around date?) as Resolved by committing rDLDGfece2335e246: swh.loader.git.loader: Warn when object malformed and continue.
Nov 7 2017, 6:22 PM · Origin-Gitorious, Git loader
ardumont closed T819: Gitorious import: ObjectFormatException raised when badly formatted tag object exists in the repository as Resolved by committing rDLDGfece2335e246: swh.loader.git.loader: Warn when object malformed and continue.
Nov 7 2017, 6:22 PM · Origin-Gitorious, Git loader
ardumont closed T814: Gitorious import: unexisting object retrieval makes the loading fail as Resolved by committing rDLDG5e2d236b6a3f: swh.loader.git.loader: Trap missing object id and continue.
Nov 7 2017, 6:22 PM · Origin-Gitorious, Git loader

Nov 4 2017

ardumont added a comment to T823: Gitorious import: Overflow error in revision time.

PR got merged \m/

Nov 4 2017, 12:58 PM · Origin-Gitorious, Storage manager, Git loader

Oct 31 2017

ardumont added a comment to T823: Gitorious import: Overflow error in revision time.

Follow up on this:

Oct 31 2017, 3:38 PM · Origin-Gitorious, Storage manager, Git loader

Oct 27 2017

ardumont triaged T823: Gitorious import: Overflow error in revision time as Normal priority.
Oct 27 2017, 2:26 PM · Origin-Gitorious, Storage manager, Git loader
ardumont added a comment to T823: Gitorious import: Overflow error in revision time.

The revision in question is:

Oct 27 2017, 2:18 PM · Origin-Gitorious, Storage manager, Git loader
ardumont added a comment to T823: Gitorious import: Overflow error in revision time.

Debugging some more, the date generating this error is the following, which raises indeed the initial overflow error:

Oct 27 2017, 2:10 PM · Origin-Gitorious, Storage manager, Git loader
ardumont created T823: Gitorious import: Overflow error in revision time.
Oct 27 2017, 2:09 PM · Origin-Gitorious, Storage manager, Git loader
ardumont added a comment to T814: Gitorious import: unexisting object retrieval makes the loading fail.

Possibly related error.

Oct 27 2017, 1:36 PM · Origin-Gitorious, Git loader
ardumont updated the task description for T822: Gitorious import: ObjectFormatException raised when badly formatted object (around date?).
Oct 27 2017, 12:25 PM · Origin-Gitorious, Git loader
ardumont added a comment to T822: Gitorious import: ObjectFormatException raised when badly formatted object (around date?).

Debugging problematic object shows 1e82c9224b8898672b3b6fe8b6b737f7eed24cf6 which git fsck references as well.
Turns out it's a badly formatted tag:

Oct 27 2017, 11:52 AM · Origin-Gitorious, Git loader
ardumont added a parent task for T822: Gitorious import: ObjectFormatException raised when badly formatted object (around date?): T674: Gitorious import: Examine ingestion logs for errors and list them if any.
Oct 27 2017, 11:00 AM · Origin-Gitorious, Git loader
ardumont renamed T822: Gitorious import: ObjectFormatException raised when badly formatted object (around date?) from Gitorious import: ObjectFormatException raised when badly formatted object to Gitorious import: ObjectFormatException raised when badly formatted object (around date?).
Oct 27 2017, 10:59 AM · Origin-Gitorious, Git loader
ardumont created T822: Gitorious import: ObjectFormatException raised when badly formatted object (around date?).
Oct 27 2017, 10:59 AM · Origin-Gitorious, Git loader

Oct 26 2017

ardumont renamed T816: Gitorious import: loose object parsing error with corrupted file as empty one from Gitorious import: loose object parsing error with the empty file to Gitorious import: loose object parsing error with corrupted file as empty one.
Oct 26 2017, 4:14 PM · Git loader, Origin-Gitorious
ardumont added a parent task for T819: Gitorious import: ObjectFormatException raised when badly formatted tag object exists in the repository: T674: Gitorious import: Examine ingestion logs for errors and list them if any.
Oct 26 2017, 4:07 PM · Origin-Gitorious, Git loader
ardumont renamed T819: Gitorious import: ObjectFormatException raised when badly formatted tag object exists in the repository from Gitorious import: ObjectFormatException on what looks like a date field to Gitorious import: ObjectFormatException raised when badly formatted tag object exists in the repository.
Oct 26 2017, 4:07 PM · Origin-Gitorious, Git loader
ardumont added a comment to T819: Gitorious import: ObjectFormatException raised when badly formatted tag object exists in the repository.

Patching the version to print the identifier in error, i retrieve the following object ae51106031a0bb39a8def57a8592f70116487eab (which is amongst the badly formatted tags listed by git fsck below).

Oct 26 2017, 4:02 PM · Origin-Gitorious, Git loader
ardumont updated the task description for T815: Gitorious import: Release time conversion issue when no release date is provided.
Oct 26 2017, 3:48 PM · Origin-Gitorious, Git loader
ardumont updated the task description for T816: Gitorious import: loose object parsing error with corrupted file as empty one.
Oct 26 2017, 3:48 PM · Git loader, Origin-Gitorious
ardumont renamed T819: Gitorious import: ObjectFormatException raised when badly formatted tag object exists in the repository from Gitorious import: to Gitorious import: ObjectFormatException on what looks like a date field.
Oct 26 2017, 3:48 PM · Origin-Gitorious, Git loader
ardumont updated the task description for T814: Gitorious import: unexisting object retrieval makes the loading fail.
Oct 26 2017, 3:45 PM · Origin-Gitorious, Git loader
ardumont closed T815: Gitorious import: Release time conversion issue when no release date is provided as Resolved by committing rDLDG2c91a6feb6f0: converters: Fix release time conversion issue when no date provided.
Oct 26 2017, 1:11 PM · Origin-Gitorious, Git loader
ardumont renamed T815: Gitorious import: Release time conversion issue when no release date is provided from Gitorious import: Release time conversion issue when none is provided to Gitorious import: Release time conversion issue when no release date is provided.
Oct 26 2017, 1:09 PM · Origin-Gitorious, Git loader
ardumont renamed T816: Gitorious import: loose object parsing error with corrupted file as empty one from Gitorious import: loose object parsing error to Gitorious import: loose object parsing error with the empty file.
Oct 26 2017, 11:55 AM · Git loader, Origin-Gitorious
ardumont renamed T815: Gitorious import: Release time conversion issue when no release date is provided from Gitorious import: Time conversion issue to Gitorious import: Release time conversion issue when none is provided.
Oct 26 2017, 11:53 AM · Origin-Gitorious, Git loader
ardumont added a comment to T815: Gitorious import: Release time conversion issue when no release date is provided.

In that particular repository, the tag has no time (tag.tag_time and tag.tag_timezone are None, tag._tag_timezone_neg_utc is False - those are the default values for that object).
But the swh-loader-git's code expects those values to exist.
In our model though, we are ok with that date not being provided.

Oct 26 2017, 11:53 AM · Origin-Gitorious, Git loader
ardumont added a comment to T816: Gitorious import: loose object parsing error with corrupted file as empty one.

Tweaking the loader git to print the actual sha1:

Oct 26 2017, 11:36 AM · Git loader, Origin-Gitorious