Last origin rescheduled and injected.
Jun 19 2018
Apr 12 2018
python3-dulwich (fix included) packaged and pushed to our debian repository.
After discussion with jelmer (dulwich's author), he proposed and implemented the real solution, deal with bytes (avoiding altogether encoding water mudding ;)
It's landed in dulwich/dulwich's master branch \m/.
Apr 11 2018
Patching dulwich to try and detect the encoding (when the problem arose) seems to do the trick:
With latest dulwich (> 0.19.1, current head) we break somewhere else now, still encoding related:
I opened a discussion at at https://github.com/jelmer/dulwich/issues/608 about this case.
Jan 19 2018
I was initially opened to clean up the repository because i thought it was some form of corruption.
But now, i no longer think that's the case. And don't want to tamper with sources.
Jan 18 2018
After some digging, it seems an encoding problem:
Trying to analyze a bit further that repository, we can see this:
Dec 21 2017
The sql error was sheer bad luck, tested locally and no problem, so it was rescheduled, loaded successfully.
Only 2 errors left:
- 1 about bad transaction in db
- 1 about unicode error:
Dec 19 2017
Updated and scheduled the last 170 repositories.
Now, remains those to be checked for errors.
Dec 15 2017
Nov 7 2017
Oct 27 2017
Oct 26 2017
Oct 3 2017
Jul 28 2017
For information, the last injection has been done. The remaining errors:
(but we should have a list of those repos, for posterity).
Jul 27 2017
These should be rescheduled and driven to successful completion.
- we send something that was not a git repository.
- integrity error (which is expected for now)
Jul 26 2017
After much learning on how to read and extract logs from our kibana instance, here is the error repartition.
Jun 6 2017
As of now, ingestion, after multiple (re)schedulings, has been done.
Apr 26 2017
Update on this.
Apr 7 2017
Feb 15 2017
Visit dates have been fixed for the origins already injected.
Feb 12 2017
Feb 11 2017
Command to trigger the messages (from worker01):
cat /srv/storage/space/mirrors/gitorious.org/full_mapping.txt | SWH_WORKER_INSTANCE=swh_loader_git_disk ./load_gitorious.py --root-repositories /srv/storage/space/mirrors/gitorious.org/mnt/repositories
(The script defaults to use the right queue 'swh_loader_git_express' and the right origin-date 'Wed, 30 Mar 2016 09:40:04 +0200')
Feb 10 2017
start-date: Fri Feb 10 16:40:00 UTC 2017
The full mapping of gitorious repositories URLs to on-disk location is at uffizi:/srv/storage/space/mirrors/gitorious.org/full_mapping.txt
May 25 2016
May 13 2016
May 12 2016
I'm now running a git fsck on all the repositories. Output and results in worker01:/tmp/fsck.
I've collapsed the two mappings into a single file: /srv/softwareheritage/mirrors/gitorious.org/full_mapping.txt
Here are all the information I have about the on-disk gitorious layout (credit: astrid):
Deployed the uid+gid changes and added the filesystem to uffizi:/etc/exports
To prepare for this, I moved temp-drydock to uid=10000 on worker01. If CI is broken it's my fault.
Apr 1 2016
for reference, see the content of the gitorious disk image uffizi:/srv/softwareheritage/mirrors/gitorious.org/gitorious.img
Mar 29 2016
This is now done. I'm running an fsck on the retrieved file system image just in case.
Mar 10 2016
Mar 9 2016
The transfer is now in progress on uffizi:/srv/softwareheritage/mirros/gitorious.org/, within a screen session of my user with title "gitorious-transfer".
Mar 5 2016
We are now all set to start (after having automated it properly…) the transfer of Gitorious stuff to SWH.
Feb 27 2016
Here is the complete list of URL that can be used to "git clone" (via HTTPS) all the repositories available from the Gitorious valhalla:.