Merger of current git cloner + git loader that only retrieves from the remote git repository the new objects since the last time we visited.
Description
Status | Assigned | Task | ||
---|---|---|---|---|
Migrated | gitlab-migration | T382 stay up to date w.r.t. new GitHub repositories | ||
Migrated | gitlab-migration | T66 clone and load fork GitHub repositories | ||
Migrated | gitlab-migration | T51 smart, all-in-one git cloner/loader/ (+ dealing with updates too) | ||
Migrated | gitlab-migration | T116 Add storage endpoints to help with repository updates | ||
Migrated | gitlab-migration | T264 directory_get: Retrieve information on a directory per hash. |
Event Timeline
Started playing with dulwich's git smart protocol client.
- The HTTP "smart" client doesn't know how to read data from the server, and therefore sends all the commit history at once. Furthermore, it's completely buggy with Python3.
- The git smart client seems to work well.
Trying an update on all of linux.git's refs makes the github server hang up the connection. It seems that it doesn't like when we ask for peeled refs, we therefore need to filter them out before asking for the missing refs.
Related but not limited to:
58903e5 * origin/master origin/HEAD Open occurrence_get(origin_id) to retrieve latest occurrences per origin
bc23eb9 * sql/upgrades/043: add 042→043 upgrade script
d05afde * revision_log from multiple root revisions
3a40f00 * sql/upgrades/042: add 041→042 upgrade script
f54fd8d * Open release_get_by to retrieve a release by origin.
5dc4244 * revision_get_by: branch name filtering is optional
7e623c8 * sql/upgrades/040: add 040→041 upgrade script
7e2dcbc * Open directory_get to retrieve information on directory by id
For information, sample test_update.py adapted in swh-loader-git https://forge.softwareheritage.org/diffusion/DLDG/browse/master/swh/loader/git/updater.py to use the swh-storage.
(commit introducing updater.py 5d5f3ea)
A new git updater, based on @ardumont's proof of concept, is now available in rDLDGIT.
This updater has been wired to a new task, and the workers have been updated to accept it.