Page MenuHomeSoftware Heritage

Git loaderFolder
ActivePublic

Members

  • This project does not have any members.

Watchers

  • This project does not have any watchers.

Details

Recent Activity

Feb 5 2019

olasd added a comment to T1514: MemoryError in loader-git.

That's a fairly large repo (as seen with how the content bundles get spread out to limit their size). It looks like it has some large directories (e.g. the .bugs directory looks like it has a lot of entries) so I'm not too surprised.

Feb 5 2019, 3:40 PM · Git loader
douardda triaged T1514: MemoryError in loader-git as Normal priority.
Feb 5 2019, 9:47 AM · Git loader

Jan 21 2019

anlambert added a comment to T1280: git origins: latest failure reports.

Errors of type dulwich.errors.NotGitRepository [1] are likely related to a bug in dulwich regarding redirected repository urls not correctly handled.
A pull request [2] has been submitted to fix that issue.

Jan 21 2019, 5:53 PM · Git loader

Dec 17 2018

ardumont raised the priority of T1219: add tests to git loader from High to Needs Triage.
Dec 17 2018, 1:56 PM · Sprint 2018 12, Git loader
ardumont moved T1219: add tests to git loader from in progress to done on the Sprint 2018 12 board.
Dec 17 2018, 1:56 PM · Sprint 2018 12, Git loader
ardumont closed T1219: add tests to git loader as Resolved.

Up to 85% now.

Dec 17 2018, 1:55 PM · Sprint 2018 12, Git loader
ardumont moved T1219: add tests to git loader from Backlog to in progress on the Sprint 2018 12 board.
Dec 17 2018, 12:02 PM · Sprint 2018 12, Git loader
ardumont added a project to T1219: add tests to git loader: Sprint 2018 12.
Dec 17 2018, 12:01 PM · Sprint 2018 12, Git loader
ardumont changed the status of T1219: add tests to git loader from Open to Work in Progress.
Dec 17 2018, 12:01 PM · Sprint 2018 12, Git loader

Dec 4 2018

vlorentz added a parent task for T1219: add tests to git loader: T1411: at least 80% SLOC coverage in all components.
Dec 4 2018, 11:27 AM · Sprint 2018 12, Git loader

Nov 27 2018

zack placed T917: Git loader: update README for YAML-based syntax up for grabs.
Nov 27 2018, 12:17 PM · Git loader, Development documentation
zack added a parent task for T917: Git loader: update README for YAML-based syntax: T1388: Document the configuration system of each component.
Nov 27 2018, 12:17 PM · Git loader, Development documentation

Nov 16 2018

vlorentz added a revision to T1219: add tests to git loader: D665: Run git loader tests on BulkUpdater too..
Nov 16 2018, 12:29 PM · Sprint 2018 12, Git loader

Nov 14 2018

anlambert added a comment to T1280: git origins: latest failure reports.

Errors of type ValueError: year is out of range [1] are related to commit dates that can not be represented using standard datetime.datetime Python object (minyear = 0, maxyear = 9999).
See for instance:

Nov 14 2018, 5:13 PM · Git loader
anlambert added a comment to T1280: git origins: latest failure reports.

All the errors of type psycopg2.IntegrityError: duplicate key value violates unique constraint "content_pkey" [1] are all about sha1 collisions, mainly from repositories
testing the attack uncovered by SHAttered [2]

Nov 14 2018, 3:47 PM · Git loader
anlambert updated the task description for T1339: Handle malformed author and committer dates.
Nov 14 2018, 3:39 PM · Storage manager, Git loader
anlambert triaged T1342: Handle annotated tag with no tagger as Normal priority.
Nov 14 2018, 3:38 PM · Git loader
anlambert added a comment to T1339: Handle malformed author and committer dates.

Indeed, you're right the timezone offset is used to compute a revision identifier so even if its value is incorrect it should be stored anyway.

Nov 14 2018, 1:25 PM · Storage manager, Git loader
zack added a comment to T1339: Handle malformed author and committer dates.

The simplest solution would be to check if the computed timezone offset lies in the adequate bounds [UTC−14:00, UTC+14:00] and set it to 0 if not.

Nov 14 2018, 12:03 PM · Storage manager, Git loader
anlambert triaged T1339: Handle malformed author and committer dates as Normal priority.
Nov 14 2018, 11:57 AM · Storage manager, Git loader

Nov 9 2018

vlorentz reopened T1219: add tests to git loader as "Open".
Nov 9 2018, 5:49 PM · Sprint 2018 12, Git loader
zack added a comment to T1219: add tests to git loader.

Resolved by D622.

Nov 9 2018, 1:49 PM · Sprint 2018 12, Git loader
vlorentz closed T1219: add tests to git loader as Resolved.

Resolved by D622.

Nov 9 2018, 1:45 PM · Sprint 2018 12, Git loader

Oct 30 2018

vlorentz added a revision to T1219: add tests to git loader: D622: New tests for the Git loader..
Oct 30 2018, 3:47 PM · Sprint 2018 12, Git loader
vlorentz claimed T1219: add tests to git loader.
Oct 30 2018, 3:14 PM · Sprint 2018 12, Git loader

Oct 20 2018

ardumont edited P320 loader errors per loader type: ~/.config/swh/kibana/group-by.yml.
Oct 20 2018, 12:54 PM · Git loader, Mercurial loader, PyPI loader
ardumont added a comment to T1280: git origins: latest failure reports.

Heads up, i amended the description.
I did it that way to not destroy the analysis thread (adding a bloated comment in the middle of it seemed wrong to me).

Oct 20 2018, 8:43 AM · Git loader
ardumont updated the task description for T1280: git origins: latest failure reports.
Oct 20 2018, 8:38 AM · Git loader

Oct 19 2018

olasd added a comment to T1280: git origins: latest failure reports.

"dulwich.errors.ObjectFormatException: invalid literal for int() with base 10:" is https://github.com/dulwich/dulwich/pull/664

Oct 19 2018, 5:53 PM · Git loader
olasd claimed T1280: git origins: latest failure reports.

Thanks for the analysis!

Oct 19 2018, 4:08 PM · Git loader
ardumont triaged T1280: git origins: latest failure reports as Normal priority.
Oct 19 2018, 11:49 AM · Git loader
ardumont edited P320 loader errors per loader type: ~/.config/swh/kibana/group-by.yml.
Oct 19 2018, 10:20 AM · Git loader, Mercurial loader, PyPI loader
ardumont added projects to P320 loader errors per loader type: ~/.config/swh/kibana/group-by.yml: PyPI loader, Mercurial loader, Git loader.
Oct 19 2018, 9:54 AM · Git loader, Mercurial loader, PyPI loader

Oct 11 2018

olasd closed T1263: Git loader: created scheduler tasks always fail as Resolved by committing rDLDG85866507949a: Use explicit keyword argument for base_url in the load task.
Oct 11 2018, 3:42 PM · Git loader
anlambert triaged T1263: Git loader: created scheduler tasks always fail as High priority.
Oct 11 2018, 3:10 PM · Git loader

Oct 1 2018

zack triaged T1219: add tests to git loader as High priority.
Oct 1 2018, 7:40 PM · Sprint 2018 12, Git loader

Sep 27 2018

olasd merged task T1213: Git loader: allow ignoring the contents of the archive when updating an origin into T1212: git loader: add an option to force re-download/ingest everything.
Sep 27 2018, 11:40 AM · Git loader
olasd merged T1213: Git loader: allow ignoring the contents of the archive when updating an origin into T1212: git loader: add an option to force re-download/ingest everything.
Sep 27 2018, 11:40 AM · Git loader
olasd triaged T1213: Git loader: allow ignoring the contents of the archive when updating an origin as Normal priority.
Sep 27 2018, 11:39 AM · Git loader
zack triaged T1212: git loader: add an option to force re-download/ingest everything as Normal priority.
Sep 27 2018, 11:39 AM · Git loader

Sep 21 2018

olasd closed T1199: (Incremental) loading of large git repositories fails with an HTTP timeout as Resolved.

That's deployed now.

Sep 21 2018, 2:37 PM · Git loader

Sep 20 2018

olasd added a comment to T1199: (Incremental) loading of large git repositories fails with an HTTP timeout.

Turns out that eadefcb15384ac0c68f3ba664f9607e1f588257d already truncates the history to a depth of 0 *cough*. It just needs deploying.

Sep 20 2018, 5:21 PM · Git loader

Sep 17 2018

olasd claimed T1199: (Incremental) loading of large git repositories fails with an HTTP timeout.
Sep 17 2018, 1:58 PM · Git loader
olasd triaged T1199: (Incremental) loading of large git repositories fails with an HTTP timeout as High priority.
Sep 17 2018, 1:58 PM · Git loader

Sep 14 2018

olasd added a comment to T1195: git loader: fail to ingest our own hello world repository.

Tracked down the dulwich bug. Pull Request : https://github.com/dulwich/dulwich/pull/659

Sep 14 2018, 6:57 PM · Git loader
olasd added a comment to T1195: git loader: fail to ingest our own hello world repository.

Thanks to

export GIT_TRACE_PACKET=1
export GIT_TRACE=1
export GIT_CURL_VERBOSE=1

I've compared traces from git cloning on our forge and on other repositories with actual git clone.

Sep 14 2018, 5:29 PM · Git loader
zack triaged T1195: git loader: fail to ingest our own hello world repository as Normal priority.
Sep 14 2018, 5:01 PM · Git loader

Jun 26 2018

olasd added a comment to T917: Git loader: update README for YAML-based syntax.
---
content_size_limit: 1000000
save_data: True
save_data_path: /home/ndandrim/.cache/swh/packfiles
storage:
  cls: remote
  args:
    url: http://localhost:5002/
Jun 26 2018, 4:55 PM · Git loader, Development documentation

Jun 13 2018

moranegg added a parent task for T17: handle github assets in git loader: T1102: Handle GitHub elements .
Jun 13 2018, 4:24 PM · Git loader
moranegg added a parent task for T1101: fetch release note from github to keep in release_metadata table: T1102: Handle GitHub elements .
Jun 13 2018, 4:24 PM · Git loader