Page MenuHomeSoftware Heritage

Git loaderFolder
ActivePublic

Members

  • This project does not have any members.

Watchers

  • This project does not have any watchers.

Details

Recent Activity

Tue, Dec 4

vlorentz added a parent task for T1219: add tests to git loader: T1411: 80% SLOC coverage.
Tue, Dec 4, 11:27 AM · Git loader

Tue, Nov 27

zack placed T917: Git loader: update README for YAML-based syntax up for grabs.
Tue, Nov 27, 12:17 PM · Git loader, Development documentation
zack added a parent task for T917: Git loader: update README for YAML-based syntax: T1388: Document the configuration system of each component.
Tue, Nov 27, 12:17 PM · Git loader, Development documentation

Fri, Nov 16

vlorentz added a revision to T1219: add tests to git loader: D665: Run git loader tests on BulkUpdater too..
Fri, Nov 16, 12:29 PM · Git loader

Wed, Nov 14

anlambert added a comment to T1280: git origins: latest failure reports.

Errors of type ValueError: year is out of range [1] are related to commit dates that can not be represented using standard datetime.datetime Python object (minyear = 0, maxyear = 9999).
See for instance:

Wed, Nov 14, 5:13 PM · Git loader
anlambert added a comment to T1280: git origins: latest failure reports.

All the errors of type psycopg2.IntegrityError: duplicate key value violates unique constraint "content_pkey" [1] are all about sha1 collisions, mainly from repositories
testing the attack uncovered by SHAttered [2]

Wed, Nov 14, 3:47 PM · Git loader
anlambert updated the task description for T1339: Handle malformed author and committer dates.
Wed, Nov 14, 3:39 PM · Storage manager, Git loader
anlambert triaged T1342: Handle annotated tag with no tagger as Normal priority.
Wed, Nov 14, 3:38 PM · Git loader
anlambert added a comment to T1339: Handle malformed author and committer dates.

Indeed, you're right the timezone offset is used to compute a revision identifier so even if its value is incorrect it should be stored anyway.

Wed, Nov 14, 1:25 PM · Storage manager, Git loader
zack added a comment to T1339: Handle malformed author and committer dates.

The simplest solution would be to check if the computed timezone offset lies in the adequate bounds [UTC−14:00, UTC+14:00] and set it to 0 if not.

Wed, Nov 14, 12:03 PM · Storage manager, Git loader
anlambert triaged T1339: Handle malformed author and committer dates as Normal priority.
Wed, Nov 14, 11:57 AM · Storage manager, Git loader

Nov 9 2018

vlorentz reopened T1219: add tests to git loader as "Open".
Nov 9 2018, 5:49 PM · Git loader
zack added a comment to T1219: add tests to git loader.

Resolved by D622.

Nov 9 2018, 1:49 PM · Git loader
vlorentz closed T1219: add tests to git loader as Resolved.

Resolved by D622.

Nov 9 2018, 1:45 PM · Git loader

Oct 30 2018

vlorentz added a revision to T1219: add tests to git loader: D622: New tests for the Git loader..
Oct 30 2018, 3:47 PM · Git loader
vlorentz claimed T1219: add tests to git loader.
Oct 30 2018, 3:14 PM · Git loader

Oct 20 2018

ardumont edited P320 loader errors per loader type: ~/.config/swh/kibana/group-by.yml.
Oct 20 2018, 12:54 PM · Git loader, Mercurial loader, PyPI loader
ardumont added a comment to T1280: git origins: latest failure reports.

Heads up, i amended the description.
I did it that way to not destroy the analysis thread (adding a bloated comment in the middle of it seemed wrong to me).

Oct 20 2018, 8:43 AM · Git loader
ardumont updated the task description for T1280: git origins: latest failure reports.
Oct 20 2018, 8:38 AM · Git loader

Oct 19 2018

olasd added a comment to T1280: git origins: latest failure reports.

"dulwich.errors.ObjectFormatException: invalid literal for int() with base 10:" is https://github.com/dulwich/dulwich/pull/664

Oct 19 2018, 5:53 PM · Git loader
olasd claimed T1280: git origins: latest failure reports.

Thanks for the analysis!

Oct 19 2018, 4:08 PM · Git loader
ardumont triaged T1280: git origins: latest failure reports as Normal priority.
Oct 19 2018, 11:49 AM · Git loader
ardumont edited P320 loader errors per loader type: ~/.config/swh/kibana/group-by.yml.
Oct 19 2018, 10:20 AM · Git loader, Mercurial loader, PyPI loader
ardumont added projects to P320 loader errors per loader type: ~/.config/swh/kibana/group-by.yml: PyPI loader, Mercurial loader, Git loader.
Oct 19 2018, 9:54 AM · Git loader, Mercurial loader, PyPI loader

Oct 11 2018

olasd closed T1263: Git loader: created scheduler tasks always fail as Resolved by committing rDLDG85866507949a: Use explicit keyword argument for base_url in the load task.
Oct 11 2018, 3:42 PM · Git loader
anlambert triaged T1263: Git loader: created scheduler tasks always fail as High priority.
Oct 11 2018, 3:10 PM · Git loader

Oct 1 2018

zack triaged T1219: add tests to git loader as High priority.
Oct 1 2018, 7:40 PM · Git loader

Sep 27 2018

olasd merged task T1213: Git loader: allow ignoring the contents of the archive when updating an origin into T1212: git loader: add an option to force re-download/ingest everything.
Sep 27 2018, 11:40 AM · Git loader
olasd merged T1213: Git loader: allow ignoring the contents of the archive when updating an origin into T1212: git loader: add an option to force re-download/ingest everything.
Sep 27 2018, 11:40 AM · Git loader
olasd triaged T1213: Git loader: allow ignoring the contents of the archive when updating an origin as Normal priority.
Sep 27 2018, 11:39 AM · Git loader
zack triaged T1212: git loader: add an option to force re-download/ingest everything as Normal priority.
Sep 27 2018, 11:39 AM · Git loader

Sep 21 2018

olasd closed T1199: (Incremental) loading of large git repositories fails with an HTTP timeout as Resolved.

That's deployed now.

Sep 21 2018, 2:37 PM · Git loader

Sep 20 2018

olasd added a comment to T1199: (Incremental) loading of large git repositories fails with an HTTP timeout.

Turns out that eadefcb15384ac0c68f3ba664f9607e1f588257d already truncates the history to a depth of 0 *cough*. It just needs deploying.

Sep 20 2018, 5:21 PM · Git loader

Sep 17 2018

olasd claimed T1199: (Incremental) loading of large git repositories fails with an HTTP timeout.
Sep 17 2018, 1:58 PM · Git loader
olasd triaged T1199: (Incremental) loading of large git repositories fails with an HTTP timeout as High priority.
Sep 17 2018, 1:58 PM · Git loader

Sep 14 2018

olasd added a comment to T1195: git loader: fail to ingest our own hello world repository.

Tracked down the dulwich bug. Pull Request : https://github.com/dulwich/dulwich/pull/659

Sep 14 2018, 6:57 PM · Git loader
olasd added a comment to T1195: git loader: fail to ingest our own hello world repository.

Thanks to

export GIT_TRACE_PACKET=1
export GIT_TRACE=1
export GIT_CURL_VERBOSE=1

I've compared traces from git cloning on our forge and on other repositories with actual git clone.

Sep 14 2018, 5:29 PM · Git loader
zack triaged T1195: git loader: fail to ingest our own hello world repository as Normal priority.
Sep 14 2018, 5:01 PM · Git loader

Jun 26 2018

olasd added a comment to T917: Git loader: update README for YAML-based syntax.
---
content_size_limit: 1000000
save_data: True
save_data_path: /home/ndandrim/.cache/swh/packfiles
storage:
  cls: remote
  args:
    url: http://localhost:5002/
Jun 26 2018, 4:55 PM · Git loader, Development documentation

Jun 13 2018

moranegg added a parent task for T17: handle github assets in git loader: T1102: Handle GitHub elements .
Jun 13 2018, 4:24 PM · Git loader
moranegg added a parent task for T1101: fetch release note from github to keep in release_metadata table: T1102: Handle GitHub elements .
Jun 13 2018, 4:24 PM · Git loader
moranegg removed a subtask for T17: handle github assets in git loader: T1101: fetch release note from github to keep in release_metadata table.
Jun 13 2018, 4:18 PM · Git loader
moranegg removed a parent task for T1101: fetch release note from github to keep in release_metadata table: T17: handle github assets in git loader.
Jun 13 2018, 4:18 PM · Git loader
moranegg renamed T1101: fetch release note from github to keep in release_metadata table from fetch release note from github to keep with release object to fetch release note from github to keep in release_metadata table.
Jun 13 2018, 4:07 PM · Git loader
olasd added a comment to T1101: fetch release note from github to keep in release_metadata table.

In my opinion, the release notes shouldn't be stored in the release objects, for the following reasons:

  • they are dependent of the origin (a clone of this exact same repository on another git hosting platform won't have that information)
  • they aren't part of the data used to compute the release identifier
  • they can be modified after the fact
Jun 13 2018, 4:04 PM · Git loader
moranegg updated the task description for T1101: fetch release note from github to keep in release_metadata table.
Jun 13 2018, 4:00 PM · Git loader
moranegg triaged T1101: fetch release note from github to keep in release_metadata table as Low priority.
Jun 13 2018, 3:59 PM · Git loader

Mar 16 2018

olasd triaged T996: Load git origins with missing revisions again as High priority.
Mar 16 2018, 5:55 PM · Git loader

Mar 14 2018

ardumont created P233 ~/.config/swh/loader/git-updater.yml.
Mar 14 2018, 10:12 AM · Git loader

Mar 2 2018

rdicosmo updated the task description for T980: Identify and fix releases that are stored as revisions.
Mar 2 2018, 5:19 PM · Archive content