Page MenuHomeSoftware Heritage
Feed Advanced Search

Jan 21 2019

anlambert added a comment to T1280: git origins: latest failure reports.

Errors of type dulwich.errors.NotGitRepository [1] are likely related to a bug in dulwich regarding redirected repository urls not correctly handled.
A pull request [2] has been submitted to fix that issue.

Jan 21 2019, 5:53 PM · Git loader

Dec 17 2018

ardumont raised the priority of T1219: add tests to git loader from High to Needs Triage.
Dec 17 2018, 1:56 PM · Sprint 2018 12, Git loader
ardumont moved T1219: add tests to git loader from in progress to done on the Sprint 2018 12 board.
Dec 17 2018, 1:56 PM · Sprint 2018 12, Git loader
ardumont closed T1219: add tests to git loader as Resolved.

Up to 85% now.

Dec 17 2018, 1:55 PM · Sprint 2018 12, Git loader
ardumont moved T1219: add tests to git loader from Backlog to in progress on the Sprint 2018 12 board.
Dec 17 2018, 12:02 PM · Sprint 2018 12, Git loader
ardumont added a project to T1219: add tests to git loader: Sprint 2018 12.
Dec 17 2018, 12:01 PM · Sprint 2018 12, Git loader
ardumont changed the status of T1219: add tests to git loader from Open to Work in Progress.
Dec 17 2018, 12:01 PM · Sprint 2018 12, Git loader

Dec 4 2018

vlorentz added a parent task for T1219: add tests to git loader: T1411: reach a minimum of 80% SLOC coverage across all components.
Dec 4 2018, 11:27 AM · Sprint 2018 12, Git loader

Nov 27 2018

zack placed T917: Git loader: update README for YAML-based syntax up for grabs.
Nov 27 2018, 12:17 PM · Git loader, Documentation
zack added a parent task for T917: Git loader: update README for YAML-based syntax: T1388: Document the configuration system of each component.
Nov 27 2018, 12:17 PM · Git loader, Documentation

Nov 16 2018

vlorentz added a revision to T1219: add tests to git loader: D665: Run git loader tests on BulkUpdater too..
Nov 16 2018, 12:29 PM · Sprint 2018 12, Git loader

Nov 14 2018

anlambert added a comment to T1280: git origins: latest failure reports.

Errors of type ValueError: year is out of range [1] are related to commit dates that can not be represented using standard datetime.datetime Python object (minyear = 0, maxyear = 9999).
See for instance:

Nov 14 2018, 5:13 PM · Git loader
anlambert added a comment to T1280: git origins: latest failure reports.

All the errors of type psycopg2.IntegrityError: duplicate key value violates unique constraint "content_pkey" [1] are all about sha1 collisions, mainly from repositories
testing the attack uncovered by SHAttered [2]

Nov 14 2018, 3:47 PM · Git loader
anlambert updated the task description for T1339: Handle malformed author and committer dates.
Nov 14 2018, 3:39 PM · Storage manager, Git loader
anlambert triaged T1342: Handle annotated tag with no tagger as Normal priority.
Nov 14 2018, 3:38 PM · Git loader
anlambert added a comment to T1339: Handle malformed author and committer dates.

Indeed, you're right the timezone offset is used to compute a revision identifier so even if its value is incorrect it should be stored anyway.

Nov 14 2018, 1:25 PM · Storage manager, Git loader
zack added a comment to T1339: Handle malformed author and committer dates.

The simplest solution would be to check if the computed timezone offset lies in the adequate bounds [UTC−14:00, UTC+14:00] and set it to 0 if not.

Nov 14 2018, 12:03 PM · Storage manager, Git loader
anlambert triaged T1339: Handle malformed author and committer dates as Normal priority.
Nov 14 2018, 11:57 AM · Storage manager, Git loader

Nov 9 2018

vlorentz reopened T1219: add tests to git loader as "Open".
Nov 9 2018, 5:49 PM · Sprint 2018 12, Git loader
zack added a comment to T1219: add tests to git loader.

Resolved by D622.

Nov 9 2018, 1:49 PM · Sprint 2018 12, Git loader
vlorentz closed T1219: add tests to git loader as Resolved.

Resolved by D622.

Nov 9 2018, 1:45 PM · Sprint 2018 12, Git loader

Oct 30 2018

vlorentz added a revision to T1219: add tests to git loader: D622: New tests for the Git loader..
Oct 30 2018, 3:47 PM · Sprint 2018 12, Git loader
vlorentz claimed T1219: add tests to git loader.
Oct 30 2018, 3:14 PM · Sprint 2018 12, Git loader

Oct 20 2018

ardumont edited P320 loader errors per loader type: ~/.config/swh/kibana/group-by.yml.
Oct 20 2018, 12:54 PM · Git loader, Mercurial loader, PyPI loader
ardumont added a comment to T1280: git origins: latest failure reports.

Heads up, i amended the description.
I did it that way to not destroy the analysis thread (adding a bloated comment in the middle of it seemed wrong to me).

Oct 20 2018, 8:43 AM · Git loader
ardumont updated the task description for T1280: git origins: latest failure reports.
Oct 20 2018, 8:38 AM · Git loader

Oct 19 2018

olasd added a comment to T1280: git origins: latest failure reports.

"dulwich.errors.ObjectFormatException: invalid literal for int() with base 10:" is https://github.com/dulwich/dulwich/pull/664

Oct 19 2018, 5:53 PM · Git loader
olasd claimed T1280: git origins: latest failure reports.

Thanks for the analysis!

Oct 19 2018, 4:08 PM · Git loader
ardumont triaged T1280: git origins: latest failure reports as Normal priority.
Oct 19 2018, 11:49 AM · Git loader
ardumont edited P320 loader errors per loader type: ~/.config/swh/kibana/group-by.yml.
Oct 19 2018, 10:20 AM · Git loader, Mercurial loader, PyPI loader
ardumont added projects to P320 loader errors per loader type: ~/.config/swh/kibana/group-by.yml: PyPI loader, Mercurial loader, Git loader.
Oct 19 2018, 9:54 AM · Git loader, Mercurial loader, PyPI loader

Oct 11 2018

olasd closed T1263: Git loader: created scheduler tasks always fail as Resolved by committing rDLDG85866507949a: Use explicit keyword argument for base_url in the load task.
Oct 11 2018, 3:42 PM · Git loader
anlambert triaged T1263: Git loader: created scheduler tasks always fail as High priority.
Oct 11 2018, 3:10 PM · Git loader

Oct 1 2018

zack triaged T1219: add tests to git loader as High priority.
Oct 1 2018, 7:40 PM · Sprint 2018 12, Git loader

Sep 27 2018

olasd merged task T1213: Git loader: allow ignoring the contents of the archive when updating an origin into T1212: git loader: add an option to force re-download/ingest everything.
Sep 27 2018, 11:40 AM · Git loader
olasd merged T1213: Git loader: allow ignoring the contents of the archive when updating an origin into T1212: git loader: add an option to force re-download/ingest everything.
Sep 27 2018, 11:40 AM · Git loader
olasd triaged T1213: Git loader: allow ignoring the contents of the archive when updating an origin as Normal priority.
Sep 27 2018, 11:39 AM · Git loader
zack triaged T1212: git loader: add an option to force re-download/ingest everything as Normal priority.
Sep 27 2018, 11:39 AM · Git loader

Sep 21 2018

olasd closed T1199: (Incremental) loading of large git repositories fails with an HTTP timeout as Resolved.

That's deployed now.

Sep 21 2018, 2:37 PM · Git loader

Sep 20 2018

olasd added a comment to T1199: (Incremental) loading of large git repositories fails with an HTTP timeout.

Turns out that eadefcb15384ac0c68f3ba664f9607e1f588257d already truncates the history to a depth of 0 *cough*. It just needs deploying.

Sep 20 2018, 5:21 PM · Git loader

Sep 17 2018

olasd claimed T1199: (Incremental) loading of large git repositories fails with an HTTP timeout.
Sep 17 2018, 1:58 PM · Git loader
olasd triaged T1199: (Incremental) loading of large git repositories fails with an HTTP timeout as High priority.
Sep 17 2018, 1:58 PM · Git loader

Sep 14 2018

olasd added a comment to T1195: git loader: fail to ingest our own hello world repository.

Tracked down the dulwich bug. Pull Request : https://github.com/dulwich/dulwich/pull/659

Sep 14 2018, 6:57 PM · Git loader
olasd added a comment to T1195: git loader: fail to ingest our own hello world repository.

Thanks to

export GIT_TRACE_PACKET=1
export GIT_TRACE=1
export GIT_CURL_VERBOSE=1

I've compared traces from git cloning on our forge and on other repositories with actual git clone.

Sep 14 2018, 5:29 PM · Git loader
zack triaged T1195: git loader: fail to ingest our own hello world repository as Normal priority.
Sep 14 2018, 5:01 PM · Git loader

Jun 26 2018

olasd added a comment to T917: Git loader: update README for YAML-based syntax.
---
content_size_limit: 1000000
save_data: True
save_data_path: /home/ndandrim/.cache/swh/packfiles
storage:
  cls: remote
  args:
    url: http://localhost:5002/
Jun 26 2018, 4:55 PM · Git loader, Documentation

Jun 13 2018

moranegg added a parent task for T17: handle github assets in git loader: T1102: Handle all GitHub elements.
Jun 13 2018, 4:24 PM · Git loader
moranegg added a parent task for T1101: fetch release note from github to keep in release_metadata table: T1102: Handle all GitHub elements.
Jun 13 2018, 4:24 PM · Git loader
moranegg removed a subtask for T17: handle github assets in git loader: T1101: fetch release note from github to keep in release_metadata table.
Jun 13 2018, 4:18 PM · Git loader
moranegg removed a parent task for T1101: fetch release note from github to keep in release_metadata table: T17: handle github assets in git loader.
Jun 13 2018, 4:18 PM · Git loader
moranegg renamed T1101: fetch release note from github to keep in release_metadata table from fetch release note from github to keep with release object to fetch release note from github to keep in release_metadata table.
Jun 13 2018, 4:07 PM · Git loader
olasd added a comment to T1101: fetch release note from github to keep in release_metadata table.

In my opinion, the release notes shouldn't be stored in the release objects, for the following reasons:

  • they are dependent of the origin (a clone of this exact same repository on another git hosting platform won't have that information)
  • they aren't part of the data used to compute the release identifier
  • they can be modified after the fact
Jun 13 2018, 4:04 PM · Git loader
moranegg updated the task description for T1101: fetch release note from github to keep in release_metadata table.
Jun 13 2018, 4:00 PM · Git loader
moranegg triaged T1101: fetch release note from github to keep in release_metadata table as Low priority.
Jun 13 2018, 3:59 PM · Git loader

Mar 16 2018

olasd triaged T996: Load git origins with missing revisions again as High priority.
Mar 16 2018, 5:55 PM · Git loader

Mar 14 2018

ardumont created P233 ~/.config/swh/loader/git-updater.yml.
Mar 14 2018, 10:12 AM · Git loader

Mar 2 2018

rdicosmo updated the task description for T980: Identify and fix releases that are stored as revisions.
Mar 2 2018, 5:19 PM · Archive content
rdicosmo created T980: Identify and fix releases that are stored as revisions.
Mar 2 2018, 5:17 PM · Archive content

Jan 11 2018

zack added a comment to T917: Git loader: update README for YAML-based syntax.
In T917#16911, @olasd wrote:

Moving the documentation about the config to a top-level configuration document, in the docs directory, sounds like a very good plan. You assigned yourself the ticket, will you do it?

Jan 11 2018, 8:25 AM · Git loader, Documentation

Jan 10 2018

olasd added a comment to T917: Git loader: update README for YAML-based syntax.

Moving the documentation about the config to a top-level configuration document, in the docs directory, sounds like a very good plan. You assigned yourself the ticket, will you do it?

Jan 10 2018, 4:46 PM · Git loader, Documentation

Jan 6 2018

zack renamed T917: Git loader: update README for YAML-based syntax from Git loader: update README for YAML-based syntax. to Git loader: update README for YAML-based syntax.
Jan 6 2018, 8:50 AM · Git loader, Documentation
zack created T917: Git loader: update README for YAML-based syntax.
Jan 6 2018, 8:50 AM · Git loader, Documentation

Dec 15 2017

ardumont closed T816: Gitorious import: loose object parsing error with corrupted file as empty one as Resolved.
Dec 15 2017, 7:43 PM · Git loader, Origin-Gitorious
ardumont added a comment to T816: Gitorious import: loose object parsing error with corrupted file as empty one.

Fixed with that latest version package:

Dec 15 2017, 7:43 PM · Git loader, Origin-Gitorious
ardumont added a comment to T816: Gitorious import: loose object parsing error with corrupted file as empty one.

Packaged it and pushed to our own repository.

Dec 15 2017, 7:37 PM · Git loader, Origin-Gitorious
ardumont added a comment to T816: Gitorious import: loose object parsing error with corrupted file as empty one.

Update on this:

  • Issue opened.
  • Pull Request (PR) proposed and merged.
Dec 15 2017, 1:43 PM · Git loader, Origin-Gitorious

Nov 13 2017

ardumont closed T823: Gitorious import: Overflow error in revision time as Resolved by committing rDLDG120f23dd0bf2: swh.loader.git.disk: Force further checks on objects.
Nov 13 2017, 6:40 PM · Origin-Gitorious, Storage manager, Git loader
ardumont added a comment to T816: Gitorious import: loose object parsing error with corrupted file as empty one.

This error slipped under my radar last week.
I opened a related issue in dulwich since it should be handled upstream.

Nov 13 2017, 2:53 PM · Git loader, Origin-Gitorious

Nov 10 2017

ardumont added a comment to T823: Gitorious import: Overflow error in revision time.

PR got merged \m/

Nov 10 2017, 6:29 PM · Origin-Gitorious, Storage manager, Git loader

Nov 7 2017

ardumont closed T822: Gitorious import: ObjectFormatException raised when badly formatted object (around date?) as Resolved by committing rDLDGfece2335e246: swh.loader.git.loader: Warn when object malformed and continue.
Nov 7 2017, 6:22 PM · Origin-Gitorious, Git loader
ardumont closed T819: Gitorious import: ObjectFormatException raised when badly formatted tag object exists in the repository as Resolved by committing rDLDGfece2335e246: swh.loader.git.loader: Warn when object malformed and continue.
Nov 7 2017, 6:22 PM · Origin-Gitorious, Git loader
ardumont closed T814: Gitorious import: unexisting object retrieval makes the loading fail as Resolved by committing rDLDG5e2d236b6a3f: swh.loader.git.loader: Trap missing object id and continue.
Nov 7 2017, 6:22 PM · Origin-Gitorious, Git loader

Nov 4 2017

ardumont added a comment to T823: Gitorious import: Overflow error in revision time.

PR got merged \m/

Nov 4 2017, 12:58 PM · Origin-Gitorious, Storage manager, Git loader

Oct 31 2017

ardumont added a comment to T823: Gitorious import: Overflow error in revision time.

Follow up on this:

Oct 31 2017, 3:38 PM · Origin-Gitorious, Storage manager, Git loader

Oct 27 2017

ardumont triaged T823: Gitorious import: Overflow error in revision time as Normal priority.
Oct 27 2017, 2:26 PM · Origin-Gitorious, Storage manager, Git loader
ardumont added a comment to T823: Gitorious import: Overflow error in revision time.

The revision in question is:

Oct 27 2017, 2:18 PM · Origin-Gitorious, Storage manager, Git loader
ardumont added a comment to T823: Gitorious import: Overflow error in revision time.

Debugging some more, the date generating this error is the following, which raises indeed the initial overflow error:

Oct 27 2017, 2:10 PM · Origin-Gitorious, Storage manager, Git loader
ardumont created T823: Gitorious import: Overflow error in revision time.
Oct 27 2017, 2:09 PM · Origin-Gitorious, Storage manager, Git loader
ardumont added a comment to T814: Gitorious import: unexisting object retrieval makes the loading fail.

Possibly related error.

Oct 27 2017, 1:36 PM · Origin-Gitorious, Git loader
ardumont updated the task description for T822: Gitorious import: ObjectFormatException raised when badly formatted object (around date?).
Oct 27 2017, 12:25 PM · Origin-Gitorious, Git loader
ardumont added a comment to T822: Gitorious import: ObjectFormatException raised when badly formatted object (around date?).

Debugging problematic object shows 1e82c9224b8898672b3b6fe8b6b737f7eed24cf6 which git fsck references as well.
Turns out it's a badly formatted tag:

Oct 27 2017, 11:52 AM · Origin-Gitorious, Git loader
ardumont added a parent task for T822: Gitorious import: ObjectFormatException raised when badly formatted object (around date?): T674: Gitorious import: Examine ingestion logs for errors and list them if any.
Oct 27 2017, 11:00 AM · Origin-Gitorious, Git loader
ardumont renamed T822: Gitorious import: ObjectFormatException raised when badly formatted object (around date?) from Gitorious import: ObjectFormatException raised when badly formatted object to Gitorious import: ObjectFormatException raised when badly formatted object (around date?).
Oct 27 2017, 10:59 AM · Origin-Gitorious, Git loader
ardumont created T822: Gitorious import: ObjectFormatException raised when badly formatted object (around date?).
Oct 27 2017, 10:59 AM · Origin-Gitorious, Git loader

Oct 26 2017

ardumont renamed T816: Gitorious import: loose object parsing error with corrupted file as empty one from Gitorious import: loose object parsing error with the empty file to Gitorious import: loose object parsing error with corrupted file as empty one.
Oct 26 2017, 4:14 PM · Git loader, Origin-Gitorious
ardumont added a parent task for T819: Gitorious import: ObjectFormatException raised when badly formatted tag object exists in the repository: T674: Gitorious import: Examine ingestion logs for errors and list them if any.
Oct 26 2017, 4:07 PM · Origin-Gitorious, Git loader
ardumont renamed T819: Gitorious import: ObjectFormatException raised when badly formatted tag object exists in the repository from Gitorious import: ObjectFormatException on what looks like a date field to Gitorious import: ObjectFormatException raised when badly formatted tag object exists in the repository.
Oct 26 2017, 4:07 PM · Origin-Gitorious, Git loader
ardumont added a comment to T819: Gitorious import: ObjectFormatException raised when badly formatted tag object exists in the repository.

Patching the version to print the identifier in error, i retrieve the following object ae51106031a0bb39a8def57a8592f70116487eab (which is amongst the badly formatted tags listed by git fsck below).

Oct 26 2017, 4:02 PM · Origin-Gitorious, Git loader
ardumont updated the task description for T815: Gitorious import: Release time conversion issue when no release date is provided.
Oct 26 2017, 3:48 PM · Origin-Gitorious, Git loader
ardumont updated the task description for T816: Gitorious import: loose object parsing error with corrupted file as empty one.
Oct 26 2017, 3:48 PM · Git loader, Origin-Gitorious
ardumont renamed T819: Gitorious import: ObjectFormatException raised when badly formatted tag object exists in the repository from Gitorious import: to Gitorious import: ObjectFormatException on what looks like a date field.
Oct 26 2017, 3:48 PM · Origin-Gitorious, Git loader
ardumont updated the task description for T814: Gitorious import: unexisting object retrieval makes the loading fail.
Oct 26 2017, 3:45 PM · Origin-Gitorious, Git loader
ardumont closed T815: Gitorious import: Release time conversion issue when no release date is provided as Resolved by committing rDLDG2c91a6feb6f0: converters: Fix release time conversion issue when no date provided.
Oct 26 2017, 1:11 PM · Origin-Gitorious, Git loader
ardumont renamed T815: Gitorious import: Release time conversion issue when no release date is provided from Gitorious import: Release time conversion issue when none is provided to Gitorious import: Release time conversion issue when no release date is provided.
Oct 26 2017, 1:09 PM · Origin-Gitorious, Git loader
ardumont renamed T816: Gitorious import: loose object parsing error with corrupted file as empty one from Gitorious import: loose object parsing error to Gitorious import: loose object parsing error with the empty file.
Oct 26 2017, 11:55 AM · Git loader, Origin-Gitorious
ardumont renamed T815: Gitorious import: Release time conversion issue when no release date is provided from Gitorious import: Time conversion issue to Gitorious import: Release time conversion issue when none is provided.
Oct 26 2017, 11:53 AM · Origin-Gitorious, Git loader
ardumont added a comment to T815: Gitorious import: Release time conversion issue when no release date is provided.

In that particular repository, the tag has no time (tag.tag_time and tag.tag_timezone are None, tag._tag_timezone_neg_utc is False - those are the default values for that object).
But the swh-loader-git's code expects those values to exist.
In our model though, we are ok with that date not being provided.

Oct 26 2017, 11:53 AM · Origin-Gitorious, Git loader
ardumont added a comment to T816: Gitorious import: loose object parsing error with corrupted file as empty one.

Tweaking the loader git to print the actual sha1:

Oct 26 2017, 11:36 AM · Git loader, Origin-Gitorious
ardumont updated the task description for T816: Gitorious import: loose object parsing error with corrupted file as empty one.
Oct 26 2017, 11:21 AM · Git loader, Origin-Gitorious
ardumont added a parent task for T816: Gitorious import: loose object parsing error with corrupted file as empty one: T674: Gitorious import: Examine ingestion logs for errors and list them if any.
Oct 26 2017, 11:20 AM · Git loader, Origin-Gitorious