sadly, 14k is only .1% ;)
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Feed Advanced Search
Advanced Search
Advanced Search
Oct 4 2015
Oct 4 2015
zack renamed T66: clone and load fork GitHub repositories from retrieve non-fork GitHub repositories to clone and load non-fork GitHub repositories.
zack added a comment to T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have.
zack added a comment to T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have.
IPython notebook to play with the result times scatter plot :
result_times.ipynb2 KBDownload
olasd added a comment to T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have.
Based on that data, here are the current average/stddev processing times per repository based on the first ~14k random repositories loaded (~1% of our total):
Oct 3 2015
Oct 3 2015
zack added a comment to T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have.
(Thanks for making me play for the first time with a IPython notebook, it's a pretty impressive environment to play with scientific data.)
olasd added a comment to T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have.
IPython notebook to play with the result times scatter plot :
result_times.ipynb2 KBDownload
Oct 2 2015
Oct 2 2015
zack added a project to T45: Fix swh.storage.storage.occurrence_add for overlapping intervals: Storage manager.
see also T62
see also T22
zack added a project to T49: DB schema: add missing unicity constraint on origin (type, url): Storage manager.
zack added projects to T51: smart, all-in-one git cloner/loader/ (+ dealing with updates too): Git loader, Git cloner.
zack added a project to T62: DB schema: add directory→tarball provenance information: Storage manager.
zack added a project to T9: directory (= extracted archive) loader - 1st deployable version: Directory loader.
zack added a project to T38: port ghlister to swh task interface - list all / catch up: GitHub lister.
olasd added a comment to T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have.
This makes me think that we are now i/o bound on writes on our storage.
olasd added a comment to T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have.
This task made good progress today. I spent a small while perusing our logging to understand the margins for performance.
Oct 1 2015
Oct 1 2015
- /revision/<SHA1_GIT>: show commit information
- /directory/<SHA1_GIT>: show directory information (including ls)
- /directory/<SHA1_GIT>/path/to/file-or-dir: ditto, but for dir pointed by path
- /content/[<HASH_ALGO>:]<HASH>: show content information
- /release/<SHA1_GIT>: show release information
- /person/<PERSON_ID>: show person information
- /origin/<ORIGIN_ID>: show origin information
- /project/<PROJECT_ID>: show project information
- /organization/<ORGANIZATION_ID>: show organization information
- /directory/<TIMESTAMP>/<ORIGIN>|/<BRANCH>|/path/to/file-or-dir : show directory information at timestamp/origin/branch
- /revision/<TIMESTAMP>/<ORIGIN>|/<BRANCH> : show revision information at origin/branch/timestamp
- /revision/<TIMESTAMP>/<ORIGIN>| : Show all branches of origin at a given timestamp
- /revision/<TIMESTAMP>/<ORIGIN>|/<BRANCH>| : Show all revisions (~git log) of origin and branch at a given timestamp
zack added a project to T60: deploy webapp at http://base.softwareheritage.org: System administrators.
zack raised the priority of T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have from Normal to High.
zack renamed T9: directory (= extracted archive) loader - 1st deployable version from Debian dir loader - 1st deployable version to directory (= extracted archive) loader - 1st deployable version.
zack renamed T9: directory (= extracted archive) loader - 1st deployable version from Debian (.dsc) loader - 1st deployable version to Debian dir loader - 1st deployable version.
zack closed T21: gzip antelink content on sesi-pv-lc2, a subtask of T19: transfer antelink content from sesi-pv-lc2 to SWH infra, as Resolved.
ardumont closed T47: lookup one hash and returns information about it (origin, revision, etc...), a subtask of T32: web UI: checksum search, as Resolved.
Sep 30 2015
Sep 30 2015
daily pg_dump over the net is now setup on prado for the databases gitimport and snapshot.debian.org, see prado:/usr/local/bin/swh-postgres-backup-sesi and /srv/softwareheritage/postgres/backup.conf
olasd moved T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have from Backlog to This week on the Staff board.
Done as of rDLDG69a5070
Resolved as of rDLDGc8f7d27.
mv started on uffizi, in a screen session
zack added a project to T58: move last batch of github clones (~3M) from /incoming to /data: Developers.
this is now done (thanks Laurent!)
zack closed T53: open network connectivity between sesi-pv-lc2 and swh machines, a subtask of T6: backup: postgres DB, as Resolved.
technology suggestion for how to deal with this nicely: https://pypi.python.org/pypi/retrying
Sep 29 2015
Sep 29 2015
zack added a comment to T56: "devis" for server + disk array to be used as backup for the object storage.
service tag sent to the Dell commercial (thanks Laurent!)
Sep 29 2015, 6:07 PM · Restricted Project
olasd closed T43: Convention for error passing from storage "backend" to storage "API server" to storage "API client" as Resolved.
Resolved as of rDSTO2b46e6941afe
As discussed on swh-private, this is no longer required now. We will reassess after having injected all the content we already have, selectively transfering only what we want/need.
It is now done for the workers, but not for the other hosts (louvre, tait, etc.)
zack added a comment to T56: "devis" for server + disk array to be used as backup for the object storage.
Status update: I've established a first contact with the Dell commercial.
To proceed, he is asking the serial number of our *current* power vault.
Sep 29 2015, 4:25 PM · Restricted Project
zack added a comment to T56: "devis" for server + disk array to be used as backup for the object storage.
new quotation iteration, after discussion with olasd:
Basket - Dell.html38 KBDownload
Sep 29 2015, 12:47 PM · Restricted Project
- Done once with basic API
- Refactor to use an unified API call
- Keep up with latest change on swh-storage
ardumont closed T33: Git cloner: catch up with new GitHub repositories after the summer as Resolved.
Sep 28 2015
Sep 28 2015
zack added a comment to T56: "devis" for server + disk array to be used as backup for the object storage.
as a start, I've created a couple of quotations on https://dell.quadrem.net/
- panier-dell-r430.html37 KBDownload: with the cheapest (~1K) dell server/configuration (R430) I've found, but AFAICT compatible with the controller required for the disk array
- panier-dell-r630.html38 KBDownload: slightly more expensive (2K) server (R630)
either way, the overall price is completely dominated by the disk cost…
Sep 28 2015, 10:23 PM · Restricted Project
Sep 28 2015, 5:53 PM · Restricted Project
done in rDSTObe3910ecff368967cbef7f803dbdf191c1510c3d (and subsequent fixups by olasd)
Sep 27 2015
Sep 27 2015
python3-swh.loader.git is installed and running on worker0{5..8}
Sep 26 2015
Sep 26 2015
gzip/checksumming restarted, after fixing the /etc/fstab mess on the machine
priority lowered as, for better or worse, we have already freed enough space on the machine for DB backups without having to transfer the data
zack lowered the priority of T19: transfer antelink content from sesi-pv-lc2 to SWH infra from Normal to Low.
Sep 25 2015
Sep 25 2015