Page MenuHomeSoftware Heritage
Feed Advanced Search

Oct 4 2015

zack added a project to T67: prototype: git clone from SWH: Developers.
Oct 4 2015, 5:44 PM · Vault, General
zack created T67: prototype: git clone from SWH.
Oct 4 2015, 5:44 PM · Vault, General
zack renamed T66: clone and load fork GitHub repositories from retrieve non-fork GitHub repositories to clone and load non-fork GitHub repositories.
Oct 4 2015, 5:40 PM · Restricted Project, General
zack added a subtask for T66: clone and load fork GitHub repositories: T51: smart, all-in-one git cloner/loader/ (+ dealing with updates too).
Oct 4 2015, 5:40 PM · Restricted Project, General
zack added a parent task for T51: smart, all-in-one git cloner/loader/ (+ dealing with updates too): T66: clone and load fork GitHub repositories.
Oct 4 2015, 5:40 PM · Git cloner, Git loader
zack created T66: clone and load fork GitHub repositories.
Oct 4 2015, 5:40 PM · Restricted Project, General
olasd updated the task description for T65: Support authors with non-utf8-encoded names.
Oct 4 2015, 5:31 PM · Storage manager, Git loader
olasd created T65: Support authors with non-utf8-encoded names.
Oct 4 2015, 5:30 PM · Storage manager, Git loader
olasd created T64: Support tags with empty or non-utf8 messages.
Oct 4 2015, 5:29 PM · Git loader
olasd created T63: Fix backups failing when a file changes.
Oct 4 2015, 2:23 PM
zack added a comment to T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have.
In T36#629, @olasd wrote:

sadly, 14k is only .1% ;)

Oct 4 2015, 1:47 PM · Git loader
zack added a comment to T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have.
In T36#626, @olasd wrote:

IPython notebook to play with the result times scatter plot :

Oct 4 2015, 12:47 PM · Git loader
olasd added a comment to T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have.
In T36#628, @zack wrote:

Based on that data, here are the current average/stddev processing times per repository based on the first ~14k random repositories loaded (~1% of our total):

Oct 4 2015, 1:34 AM · Git loader

Oct 3 2015

zack added a comment to T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have.

(Thanks for making me play for the first time with a IPython notebook, it's a pretty impressive environment to play with scientific data.)

Oct 3 2015, 8:17 PM · Git loader
olasd added a comment to T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have.

IPython notebook to play with the result times scatter plot :

Oct 3 2015, 5:34 PM · Git loader

Oct 2 2015

zack added a project to T17: handle github assets in git loader: Git loader.
Oct 2 2015, 11:04 PM · Git loader
zack added a project to T45: Fix swh.storage.storage.occurrence_add for overlapping intervals: Storage manager.
Oct 2 2015, 11:03 PM · Storage manager
zack added a project to T3: Discuss the project <-> origin mapping: Storage manager.
Oct 2 2015, 11:03 PM · Storage manager
zack added a comment to T22: DB schema: add metadata field to revision table.

see also T62

Oct 2 2015, 11:02 PM · Storage manager
zack added a comment to T62: DB schema: add directory→tarball provenance information.

see also T22

Oct 2 2015, 11:02 PM · Storage manager
zack added a project to T22: DB schema: add metadata field to revision table: Storage manager.
Oct 2 2015, 11:02 PM · Storage manager
zack added a project to T41: add actual content to swh-storage-testdata: Storage manager.
Oct 2 2015, 11:00 PM · Storage manager
zack updated subscribers of T48: upload-based content search.
Oct 2 2015, 10:59 PM · Web app
zack added a project to T49: DB schema: add missing unicity constraint on origin (type, url): Storage manager.
Oct 2 2015, 10:59 PM · Restricted Project, Storage manager
zack added projects to T51: smart, all-in-one git cloner/loader/ (+ dealing with updates too): Git loader, Git cloner.
Oct 2 2015, 10:59 PM · Git cloner, Git loader
zack added a project to T60: deploy webapp at http://base.softwareheritage.org: Web app.
Oct 2 2015, 10:58 PM · Web app
zack added a project to T61: loading: trigger to update occurrence table: Storage manager.
Oct 2 2015, 10:58 PM · Storage manager
zack added a project to T62: DB schema: add directory→tarball provenance information: Storage manager.
Oct 2 2015, 10:58 PM · Storage manager
zack added a project to T9: directory (= extracted archive) loader - 1st deployable version: Directory loader.
Oct 2 2015, 10:58 PM · Directory loader
zack added a project to T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have: Git loader.
Oct 2 2015, 10:58 PM · Git loader
zack added a project to T29: evaluate conffile/argparse Python module: Core & foundations.
Oct 2 2015, 10:57 PM · Core & foundations
zack added a project to T30: port ghlister to swh.* namespace and conventions: GitHub lister.
Oct 2 2015, 10:56 PM · GitHub lister
zack added a project to T38: port ghlister to swh task interface - list all / catch up: GitHub lister.
Oct 2 2015, 10:56 PM · GitHub lister
zack added a project to T37: port ghlister to swh task interface - daily updates: GitHub lister.
Oct 2 2015, 10:56 PM · GitHub lister
olasd added a comment to T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have.

This makes me think that we are now i/o bound on writes on our storage.

Oct 2 2015, 6:19 PM · Git loader
olasd added a comment to T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have.

This task made good progress today. I spent a small while perusing our logging to understand the margins for performance.

Oct 2 2015, 5:56 PM · Git loader

Oct 1 2015

ardumont added a comment to T60: deploy webapp at http://base.softwareheritage.org.
  • /revision/<SHA1_GIT>: show commit information
  • /directory/<SHA1_GIT>: show directory information (including ls)
  • /directory/<SHA1_GIT>/path/to/file-or-dir: ditto, but for dir pointed by path
  • /content/[<HASH_ALGO>:]<HASH>: show content information
  • /release/<SHA1_GIT>: show release information
  • /person/<PERSON_ID>: show person information
  • /origin/<ORIGIN_ID>: show origin information
  • /project/<PROJECT_ID>: show project information
  • /organization/<ORGANIZATION_ID>: show organization information
  • /directory/<TIMESTAMP>/<ORIGIN>|/<BRANCH>|/path/to/file-or-dir : show directory information at timestamp/origin/branch
  • /revision/<TIMESTAMP>/<ORIGIN>|/<BRANCH> : show revision information at origin/branch/timestamp
  • /revision/<TIMESTAMP>/<ORIGIN>| : Show all branches of origin at a given timestamp
  • /revision/<TIMESTAMP>/<ORIGIN>|/<BRANCH>| : Show all revisions (~git log) of origin and branch at a given timestamp
Oct 1 2015, 5:03 PM · Web app
zack created T62: DB schema: add directory→tarball provenance information.
Oct 1 2015, 4:38 PM · Storage manager
ardumont claimed T9: directory (= extracted archive) loader - 1st deployable version.
Oct 1 2015, 4:02 PM · Directory loader
zack created T61: loading: trigger to update occurrence table.
Oct 1 2015, 3:48 PM · Storage manager
zack added a project to T60: deploy webapp at http://base.softwareheritage.org: System administrators.
Oct 1 2015, 3:45 PM · Web app
zack created T60: deploy webapp at http://base.softwareheritage.org.
Oct 1 2015, 3:45 PM · Web app
zack raised the priority of T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have from Normal to High.
Oct 1 2015, 3:44 PM · Git loader
zack renamed T9: directory (= extracted archive) loader - 1st deployable version from Debian dir loader - 1st deployable version to directory (= extracted archive) loader - 1st deployable version.
Oct 1 2015, 3:41 PM · Directory loader
zack renamed T9: directory (= extracted archive) loader - 1st deployable version from Debian (.dsc) loader - 1st deployable version to Debian dir loader - 1st deployable version.
Oct 1 2015, 3:41 PM · Directory loader
zack closed T58: move last batch of github clones (~3M) from /incoming to /data as Resolved.
Oct 1 2015, 2:25 PM
zack added a comment to T6: backup: postgres DB.

update, backup conf is now:

Oct 1 2015, 11:51 AM
ardumont updated the task description for T59: Build and upload one swh-web-ui debian package.
Oct 1 2015, 11:30 AM
ardumont updated the task description for T59: Build and upload one swh-web-ui debian package.
Oct 1 2015, 11:29 AM
ardumont closed T59: Build and upload one swh-web-ui debian package as Resolved.
Oct 1 2015, 11:28 AM
ardumont updated the task description for T59: Build and upload one swh-web-ui debian package.
Oct 1 2015, 11:27 AM
ardumont updated the task description for T59: Build and upload one swh-web-ui debian package.
Oct 1 2015, 11:13 AM
ardumont triaged T59: Build and upload one swh-web-ui debian package as Normal priority.
Oct 1 2015, 11:08 AM
zack closed T21: gzip antelink content on sesi-pv-lc2 as Resolved.
Oct 1 2015, 10:00 AM
zack closed T21: gzip antelink content on sesi-pv-lc2, a subtask of T19: transfer antelink content from sesi-pv-lc2 to SWH infra, as Resolved.
Oct 1 2015, 10:00 AM
ardumont created T59: Build and upload one swh-web-ui debian package.
Oct 1 2015, 9:49 AM
ardumont closed T47: lookup one hash and returns information about it (origin, revision, etc...), a subtask of T32: web UI: checksum search, as Resolved.
Oct 1 2015, 9:43 AM
ardumont closed T47: lookup one hash and returns information about it (origin, revision, etc...) as Resolved.
Oct 1 2015, 9:43 AM

Sep 30 2015

zack added a project to T30: port ghlister to swh.* namespace and conventions: Developers.
Sep 30 2015, 6:28 PM · GitHub lister
zack lowered the priority of T6: backup: postgres DB from High to Normal.
Sep 30 2015, 6:25 PM
zack added a comment to T6: backup: postgres DB.

daily pg_dump over the net is now setup on prado for the databases gitimport and snapshot.debian.org, see prado:/usr/local/bin/swh-postgres-backup-sesi and /srv/softwareheritage/postgres/backup.conf

Sep 30 2015, 6:25 PM
olasd moved T36: performance estimation: how long will it take to git-bulk-load all the GitHub repos we have from Backlog to This week on the Staff board.
Sep 30 2015, 4:35 PM · Git loader
olasd closed T35: port git-bulk-loader to generic task scheduling API as Resolved.

Done as of rDLDG69a5070

Sep 30 2015, 4:34 PM
olasd closed T50: swh.loader.git: improve robustness of loading as Resolved.

Resolved as of rDLDGc8f7d27.

Sep 30 2015, 4:33 PM
zack added a comment to T58: move last batch of github clones (~3M) from /incoming to /data.

mv started on uffizi, in a screen session

Sep 30 2015, 2:33 PM
zack added a project to T58: move last batch of github clones (~3M) from /incoming to /data: Developers.
Sep 30 2015, 1:35 PM
zack created T58: move last batch of github clones (~3M) from /incoming to /data.
Sep 30 2015, 1:35 PM
zack added a comment to T53: open network connectivity between sesi-pv-lc2 and swh machines.

this is now done (thanks Laurent!)

Sep 30 2015, 1:30 PM
zack closed T53: open network connectivity between sesi-pv-lc2 and swh machines as Resolved.
Sep 30 2015, 1:30 PM
zack closed T53: open network connectivity between sesi-pv-lc2 and swh machines, a subtask of T6: backup: postgres DB, as Resolved.
Sep 30 2015, 1:30 PM
zack added a comment to T50: swh.loader.git: improve robustness of loading.

technology suggestion for how to deal with this nicely: https://pypi.python.org/pypi/retrying

Sep 30 2015, 1:29 PM
zack closed T55: DB stored procedures: use LANGUAGE sql for pure-SQL functions instead of LANGUAGE plpgsql as Resolved by committing rDSTO3b67e58d1e2d: swh-func.sql: use LANGUAGE sql wherever possible.
Sep 30 2015, 11:58 AM

Sep 29 2015

zack added a comment to T56: "devis" for server + disk array to be used as backup for the object storage.

service tag sent to the Dell commercial (thanks Laurent!)

Sep 29 2015, 6:07 PM · Restricted Project
olasd closed T43: Convention for error passing from storage "backend" to storage "API server" to storage "API client" as Resolved.

Resolved as of rDSTO2b46e6941afe

Sep 29 2015, 5:02 PM
zack added a comment to T19: transfer antelink content from sesi-pv-lc2 to SWH infra.

As discussed on swh-private, this is no longer required now. We will reassess after having injected all the content we already have, selectively transfering only what we want/need.

Sep 29 2015, 4:34 PM
zack closed T19: transfer antelink content from sesi-pv-lc2 to SWH infra as Wontfix.
Sep 29 2015, 4:33 PM
zack added a comment to T53: open network connectivity between sesi-pv-lc2 and swh machines.

It is now done for the workers, but not for the other hosts (louvre, tait, etc.)

Sep 29 2015, 4:33 PM
zack created T57: git commit IRC notifications on #swh-devel.
Sep 29 2015, 4:31 PM · General
zack added a comment to T56: "devis" for server + disk array to be used as backup for the object storage.

Status update: I've established a first contact with the Dell commercial.
To proceed, he is asking the serial number of our *current* power vault.

Sep 29 2015, 4:25 PM · Restricted Project
zack added a comment to T56: "devis" for server + disk array to be used as backup for the object storage.

new quotation iteration, after discussion with olasd:

Sep 29 2015, 12:47 PM · Restricted Project
ardumont closed T32: web UI: checksum search as Resolved.
Sep 29 2015, 11:09 AM
ardumont added a comment to T32: web UI: checksum search.
  • Done once with basic API
  • Refactor to use an unified API call
  • Keep up with latest change on swh-storage
Sep 29 2015, 11:09 AM
ardumont closed T33: Git cloner: catch up with new GitHub repositories after the summer as Resolved.
Sep 29 2015, 11:08 AM
olasd closed T44: Rename swh.storage.api to swh.storage.api.server, swh.storage.remote_storage to swh.storage.api.client as Resolved by committing rDSTO1ddeff4be42d: Move remote_storage and api to api.client and api.server.
Sep 29 2015, 10:22 AM

Sep 28 2015

zack added a comment to T56: "devis" for server + disk array to be used as backup for the object storage.

as a start, I've created a couple of quotations on https://dell.quadrem.net/

  • : with the cheapest (~1K) dell server/configuration (R430) I've found, but AFAICT compatible with the controller required for the disk array
  • : slightly more expensive (2K) server (R630)

either way, the overall price is completely dominated by the disk cost…

Sep 28 2015, 10:23 PM · Restricted Project
zack created T56: "devis" for server + disk array to be used as backup for the object storage.
Sep 28 2015, 5:53 PM · Restricted Project
zack closed T54: DB schema: merge directory_list_* tables into directory as Resolved.
Sep 28 2015, 4:36 PM
zack added a comment to T54: DB schema: merge directory_list_* tables into directory.

done in rDSTObe3910ecff368967cbef7f803dbdf191c1510c3d (and subsequent fixups by olasd)

Sep 28 2015, 4:36 PM

Sep 27 2015

zack created T55: DB stored procedures: use LANGUAGE sql for pure-SQL functions instead of LANGUAGE plpgsql.
Sep 27 2015, 7:12 PM
zack created T54: DB schema: merge directory_list_* tables into directory.
Sep 27 2015, 4:50 PM
zack created T53: open network connectivity between sesi-pv-lc2 and swh machines.
Sep 27 2015, 1:17 PM
zack created T52: swh-cron: manifest-based scheduler for recurring tasks.
Sep 27 2015, 11:21 AM
zack created T51: smart, all-in-one git cloner/loader/ (+ dealing with updates too).
Sep 27 2015, 11:16 AM · Git cloner, Git loader
olasd created T50: swh.loader.git: improve robustness of loading.
Sep 27 2015, 11:05 AM
olasd created T49: DB schema: add missing unicity constraint on origin (type, url).
Sep 27 2015, 11:02 AM · Restricted Project, Storage manager
olasd closed T8: Git loader - 1st deployable version as Resolved.

python3-swh.loader.git is installed and running on worker0{5..8}

Sep 27 2015, 9:51 AM

Sep 26 2015

zack added a comment to T21: gzip antelink content on sesi-pv-lc2.

gzip/checksumming restarted, after fixing the /etc/fstab mess on the machine

Sep 26 2015, 9:28 PM
zack added a comment to T19: transfer antelink content from sesi-pv-lc2 to SWH infra.

priority lowered as, for better or worse, we have already freed enough space on the machine for DB backups without having to transfer the data

Sep 26 2015, 3:36 PM
zack lowered the priority of T19: transfer antelink content from sesi-pv-lc2 to SWH infra from Normal to Low.
Sep 26 2015, 3:35 PM

Sep 25 2015

zack removed a subtask for T6: backup: postgres DB: T19: transfer antelink content from sesi-pv-lc2 to SWH infra.
Sep 25 2015, 5:07 PM