Page MenuHomeSoftware Heritage

set up a git-annex repository for media, slides, and the like
Closed, MigratedEdits Locked

Description

[ note: this is not about storing content of the Software Heritage archive into git-annex, but rather work material that is too big to fit nicely into a phabricator-hosted git repo ]

We need a shared place where to deposit potentially large material such as project slides, talks videos and slides, press articles, etc. about Software Heritage. git-annex would be ideal, with a main copy hosted on a project machine (e.g., uffizi), another always-up-to-date copy onto a different machine (e.g., banco), and local copies for anyone who might need to access that.

Ideally, we would like to be able to easily link into one of the copies of the repository via HTTP URL. That way we can for instance hyperlink entries of the wiki talk page to slides and videos stored in the annex.

Event Timeline

zack changed the task status from Open to Work in Progress.EditedJul 15 2016, 7:30 PM
zack added subscribers: rdicosmo, olasd.

I've now created two git annexes on uffizi:/srv/softwareheritage/annex/{public,private}, both pointing into /srv/storace/space/. In the first one I've added the various press articles (PDFs) and radio clips (MP3).

Plenty of stuff remains to be done to finish this. The following comes to mind:

  • create an additional copy of the annexes on banco
  • make sure that banco uffizi can copy among them via ssh without password
  • change the Unix group from swhdev to a new swhteam one
  • add @rdicosmo to the latter group, and create his user on at both uffizi and banco
  • publish via HTTP a checked out copy of the public annex (@olasd: can you suggest on which machine we will need a checked out copy of the public annex to this end?)
  • set up automatic periodic syncing of content between the uffizi and banco annexes (not needed now that sync --content is ok)
  • config the annexes so that mincopy is set to 2
  • set up a cron job to periodically fsck the two annexes (and barf via email if something is amiss) (overkill)
  • document on the Intranet how to use the annex

Now done: created @rdicosmo *nix user (with his @ voyager SSH key), chgrp of the annex on uffizi to the new swhteam *nix group

Created the annexes on banco, and configured both banco and uffizi annexes to be "backup" copies. This way, running "git annex sync --content" should be enough to keep them in sync.

user swhteamannex can now sync without password between banco and uffizi

We have web-facing stuff on moma, tate and pergamon currently. I don't know which of those is better for that purpose, I believe that might be pergamon as it doesn't host critical stuff.

The git-annex hooks on alioth might be an interesting thing to look at: git.debian.org:/git/collab-maint/debconf-share.git/hooks/annex-content

While writing this comment, I remembered reading https://secure.phabricator.com/T7789 which says that phabricator now has a prototype for git-lfs support.

In T479#8135, @olasd wrote:

While writing this comment, I remembered reading https://secure.phabricator.com/T7789 which says that phabricator now has a prototype for git-lfs support.

While I've never used git-lfs myself, it's clear that something supported directly by phabricator would be much easier to setup/maintain than a custom git-annex deployment, and hence preferable. Two questions:

  • is it already part of the version of phabricator we are currently running?
  • where would the disk space for the lfs storage come from in our case?

The 2nd question might be a blocker. On the one hand I don't think we want to use the regular phabricator partition for that, as if we start storing several talks it might grow up significantly. On the other hand I do want to have at least 2 independent copies of this stuff, and I'm unclear about how git-lfs allows to do that.

If we have decent answers for the 2nd point, I wouldn't mind dropping the git-annex idea and switch to phabricator's git-lfs.

zack closed subtask Unknown Object (Maniphest Task) as Resolved.Jul 21 2016, 2:50 PM

This is now done. The public annex is now available starting from https://annex.softwareheritage.org/

gitlab-migration changed the status of subtask Unknown Object (Maniphest Task) from Resolved to Migrated.