This schema change is now done in production.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jan 28 2016
Jan 27 2016
full SQL code with the new schemata for origin_visit, occurrence_history and occurrence. Those three tables are implicitly relevant only for the "Software Heritage" authority.
Jan 26 2016
Reading the dulwich code a bit further, it turns out that git commits can have more header attributes than we initally expected.
Reading the dulwich code a bit further, it turns out that git commits can have more header attributes than we initally expected.
Dulwich seems to handle some of those special cases just fine.
full SQL code with the new schemata for origin_visit, occurrence_history and occurrence. Those three tables are implicitly relevant only for the "Software Heritage" authority.
Roberto Di Cosmo (via mobile/cell)
Le 26 janv. 2016 07:47, "olasd (Nicolas Dandrimont)" <
forge@softwareheritage.org> a écrit :
This has now been deployed in swh.storage v0.0.30: occurrences and releases can now point to arbitrary objects.
Jan 25 2016
Jan 22 2016
This has now been done.
Still thinking about this .
Jan 21 2016
Currentlly running
I just noticed that empty messages with empty lines are stored as an empty bytea, whereas empty messages without the empty line are stored as NULL. So there's that.
Some example releases:
I have done some investigations on this in light of T272. Bottom line: not good: git is very proficient in the corner cases department.
As a starting point, I've briefly discussed with Darcs developers how the push/pull protocol works in Darcs. Unfortunately the protocol doesn't seem to be documented anywhere. The relevant entry points in the code are:
Our main query on occurrences is looking for occurrences that are
- from a given origin
- on a given branch (or all branches)
- that are the newest, or the closest to a given timestamp.
For information, sample test_update.py adapted in swh-loader-git https://forge.softwareheritage.org/diffusion/DLDG/browse/master/swh/loader/git/updater.py to use the swh-storage.
+ v0.0.21 deployed on archive
- 68a8283 Deal nicely with communication downtime with storage
- 3afbd2d Deal more appropriately with storage error
Related but not limited to:
58903e5 * origin/master origin/HEAD Open occurrence_get(origin_id) to retrieve latest occurrences per origin
bc23eb9 * sql/upgrades/043: add 042→043 upgrade script
d05afde * revision_log from multiple root revisions
3a40f00 * sql/upgrades/042: add 041→042 upgrade script
f54fd8d * Open release_get_by to retrieve a release by origin.
5dc4244 * revision_get_by: branch name filtering is optional
7e623c8 * sql/upgrades/040: add 040→041 upgrade script
7e2dcbc * Open directory_get to retrieve information on directory by id
Jan 18 2016
storage:
- 7e623c8 * master origin/master origin/HEAD sql/upgrades/040: add 040→041 upgrade script
- 7e2dcbc * Open directory_get to retrieve information on directory by id
- ac380c9 * Rename directory_get to directory_ls
- 'Single entity query' done in 9fe94d3, ebe3a29 (entity_get)
Current status on this:
- find objects of any type by sha1_git (release/revision/directory/content)
Jan 15 2016
What about having "occurrences" for all kinds of objects in a VCS, releases, tags, revisions, etc. ?
What about having "occurrences" for all kinds of objects in a VCS, releases, tags, revisions, etc. ?
We would definitely need to look at other VCS to get a general model; for example, in Darcs patches are first class citizens: do we have a way of accomodating this in our data model?