That's a fairly large repo (as seen with how the content bundles get spread out to limit their size). It looks like it has some large directories (e.g. the .bugs directory looks like it has a lot of entries) so I'm not too surprised.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Feb 5 2019
Jan 21 2019
Errors of type dulwich.errors.NotGitRepository [1] are likely related to a bug in dulwich regarding redirected repository urls not correctly handled.
A pull request [2] has been submitted to fix that issue.
Dec 17 2018
Up to 85% now.
Dec 4 2018
Nov 27 2018
Nov 16 2018
Nov 14 2018
Errors of type ValueError: year is out of range [1] are related to commit dates that can not be represented using standard datetime.datetime Python object (minyear = 0, maxyear = 9999).
See for instance:
All the errors of type psycopg2.IntegrityError: duplicate key value violates unique constraint "content_pkey" [1] are all about sha1 collisions, mainly from repositories
testing the attack uncovered by SHAttered [2]
Indeed, you're right the timezone offset is used to compute a revision identifier so even if its value is incorrect it should be stored anyway.
The simplest solution would be to check if the computed timezone offset lies in the adequate bounds [UTC−14:00, UTC+14:00] and set it to 0 if not.
Nov 9 2018
In T1219#24586, @vlorentz wrote:Resolved by D622.
Resolved by D622.
Oct 30 2018
Oct 20 2018
Heads up, i amended the description.
I did it that way to not destroy the analysis thread (adding a bloated comment in the middle of it seemed wrong to me).
Oct 19 2018
"dulwich.errors.ObjectFormatException: invalid literal for int() with base 10:" is https://github.com/dulwich/dulwich/pull/664
Thanks for the analysis!
Oct 11 2018
Oct 1 2018
Sep 27 2018
Sep 21 2018
That's deployed now.
Sep 20 2018
Turns out that eadefcb15384ac0c68f3ba664f9607e1f588257d already truncates the history to a depth of 0 *cough*. It just needs deploying.
Sep 17 2018
Sep 14 2018
Tracked down the dulwich bug. Pull Request : https://github.com/dulwich/dulwich/pull/659
Thanks to
export GIT_TRACE_PACKET=1 export GIT_TRACE=1 export GIT_CURL_VERBOSE=1
I've compared traces from git cloning on our forge and on other repositories with actual git clone.
Jun 26 2018
--- content_size_limit: 1000000 save_data: True save_data_path: /home/ndandrim/.cache/swh/packfiles storage: cls: remote args: url: http://localhost:5002/
Jun 13 2018
In my opinion, the release notes shouldn't be stored in the release objects, for the following reasons:
- they are dependent of the origin (a clone of this exact same repository on another git hosting platform won't have that information)
- they aren't part of the data used to compute the release identifier
- they can be modified after the fact
Mar 16 2018
Mar 14 2018
Mar 2 2018
Jan 11 2018
In T917#16911, @olasd wrote:Moving the documentation about the config to a top-level configuration document, in the docs directory, sounds like a very good plan. You assigned yourself the ticket, will you do it?
Jan 10 2018
Moving the documentation about the config to a top-level configuration document, in the docs directory, sounds like a very good plan. You assigned yourself the ticket, will you do it?
Jan 6 2018
Dec 15 2017
Fixed with that latest version package:
Packaged it and pushed to our own repository.
Update on this:
- Issue opened.
- Pull Request (PR) proposed and merged.
Nov 13 2017
This error slipped under my radar last week.
I opened a related issue in dulwich since it should be handled upstream.
Nov 10 2017
PR got merged \m/
Nov 7 2017
Nov 4 2017
PR got merged \m/
Oct 31 2017
Follow up on this:
Oct 27 2017
The revision in question is:
Debugging some more, the date generating this error is the following, which raises indeed the initial overflow error:
Possibly related error.
Debugging problematic object shows 1e82c9224b8898672b3b6fe8b6b737f7eed24cf6 which git fsck references as well.
Turns out it's a badly formatted tag:
Oct 26 2017
Patching the version to print the identifier in error, i retrieve the following object ae51106031a0bb39a8def57a8592f70116487eab (which is amongst the badly formatted tags listed by git fsck below).
In that particular repository, the tag has no time (tag.tag_time and tag.tag_timezone are None, tag._tag_timezone_neg_utc is False - those are the default values for that object).
But the swh-loader-git's code expects those values to exist.
In our model though, we are ok with that date not being provided.
Tweaking the loader git to print the actual sha1: