Here are some bonnie++ tests on both uffizi and banco. They seem consistent with the fact that reads on uffizi from the object storage are much slower (factor 3x) when compared with banco. But further investigation is needed.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jan 21 2016
This is back on hold now, as we discovered that the read performances on uffizi from the object store are not as good as they should.
Currentlly running
I just noticed that empty messages with empty lines are stored as an empty bytea, whereas empty messages without the empty line are stored as NULL. So there's that.
Some example releases:
I have done some investigations on this in light of T272. Bottom line: not good: git is very proficient in the corner cases department.
As a starting point, I've briefly discussed with Darcs developers how the push/pull protocol works in Darcs. Unfortunately the protocol doesn't seem to be documented anywhere. The relevant entry points in the code are:
Our main query on occurrences is looking for occurrences that are
- from a given origin
- on a given branch (or all branches)
- that are the newest, or the closest to a given timestamp.
PostgreSQL has now been updated to 9.5 (and split into three clusters).
For information, sample test_update.py adapted in swh-loader-git https://forge.softwareheritage.org/diffusion/DLDG/browse/master/swh/loader/git/updater.py to use the swh-storage.
+ v0.0.21 deployed on archive
- 68a8283 Deal nicely with communication downtime with storage
- 3afbd2d Deal more appropriately with storage error
Related but not limited to:
58903e5 * origin/master origin/HEAD Open occurrence_get(origin_id) to retrieve latest occurrences per origin
bc23eb9 * sql/upgrades/043: add 042→043 upgrade script
d05afde * revision_log from multiple root revisions
3a40f00 * sql/upgrades/042: add 041→042 upgrade script
f54fd8d * Open release_get_by to retrieve a release by origin.
5dc4244 * revision_get_by: branch name filtering is optional
7e623c8 * sql/upgrades/040: add 040→041 upgrade script
7e2dcbc * Open directory_get to retrieve information on directory by id
pgbouncer is now listening on port 5432, and postgres 9.4 on port 5439.
Database cluster initialization and credentials sync (-g: dump only tablespaces and users):
pg_dumpall -g -p <old db port> | psql -p <new db port>
Jan 18 2016
With your changes regarding secure connection:
Added opportunistic TLS for client and server connections:
Configuration changed in ~/.pg_service.conf and ~/.pg_pass (for the port)
pgbouncer has been setup on prado (port 6432) and should proxy all connections to postgres 9.4.
Now that the first batch import (github + snapshot.debian.org + gnu.org) is done and we won't be importing other sources for a while, a full object store backup from uffizi to banco has now started.
louvre# for i in `seq 0 9` a b c d e f; do lvextend -L +5T vg-data/$i; done
storage:
- 7e623c8 * master origin/master origin/HEAD sql/upgrades/040: add 040→041 upgrade script
- 7e2dcbc * Open directory_get to retrieve information on directory by id
- ac380c9 * Rename directory_get to directory_ls
- 'Single entity query' done in 9fe94d3, ebe3a29 (entity_get)
Current status on this:
- find objects of any type by sha1_git (release/revision/directory/content)
Jan 15 2016
What about having "occurrences" for all kinds of objects in a VCS, releases, tags, revisions, etc. ?
What about having "occurrences" for all kinds of objects in a VCS, releases, tags, revisions, etc. ?
We would definitely need to look at other VCS to get a general model; for example, in Darcs patches are first class citizens: do we have a way of accomodating this in our data model?
Jan 14 2016
it's not visible in the commit, but I've also just added the following (horrible) anacrontab line to our gandi simple-paas instance
4@hourly 0 statcache wget -qO /srv/data/web/vhosts/www.softwareheritage.org/htdocs/stat-counters.js.new https://archive.softwareheritage.org/api/1/stat/counters/\?callback=swh_load_counters && mv /srv/data/web/vhosts/www.softwareheritage.org/htdocs/stat-counters.js.new /srv/data/web/vhosts/www.softwareheritage.org/htdocs/stat-counters.js
to maintain the cached copy of stat-counters up to date (every 4 hours)
Jan 13 2016
Jan 12 2016
I'd be in favor of working with three clusters then:
That sounds sensible. We should be able to export a LV from our spinning rust storage to prado for that "mirrors" postgresql cluster.