Page MenuHomeSoftware Heritage

Formalize the default branch convention for snapshots
Closed, MigratedEdits Locked

Description

For now, each loader decides what convention it uses for branches. This makes working with our snapshots harder than it should be for our users.

One of the things we need to decide is what the default branch for snapshots is; this is useful e.g. when presenting data in the web ui or when trying to run a metadata indexer.

My proposal is the following:

  • the default branch for snapshots is defined to be HEAD.
    • if the concept of HEAD exists with the same name in the upstream VCS (f.e. git, svn), this branch should be a literal pointer to the corresponding archived object
    • if the concept of HEAD doesn't exist with the same name in the upstream VCS (f.e. mercurial), this branch should be an alias pointing at the default branch, named using the upstream VCS context (f.e. in the mercurial case, that would be an alias for the tip of the default branch)
    • if the concept of a default branch/version doesn't exist in the upstream VCS, no HEAD branch should exist in the snapshot

I'm not sure where this documentation should live; I think https://wiki.softwareheritage.org/index.php?title=Repository_snapshot_objects should be reintegrated in swh-docs.

Event Timeline

olasd triaged this task as High priority.Oct 10 2018, 11:58 AM
olasd created this task.
  • the default branch for snapshots is defined to be HEAD.
    • if the concept of HEAD exists with the same name in the upstream VCS (f.e. git, svn), this branch should be a literal pointer to the corresponding archived object
    • if the concept of HEAD doesn't exist with the same name in the upstream VCS (f.e. mercurial), this branch should be an alias pointing at the default branch, named using the upstream VCS context (f.e. in the mercurial case, that would be an alias for the tip of the default branch)
    • if the concept of a default branch/version doesn't exist in the upstream VCS, no HEAD branch should exist in the snapshot

LGTM.

I guess there aren't many examples of the last case (you mentioned Debian f2f, which sounds correct), but indeed we shouldn't try to arbitrary pick a HEAD in that case.

As per oral discussion, from the development point of view, we should now be good:

  1. concept HEAD exists
  • loader-git: ok
  • loader-svn: use HEAD as default branch (cf. point 1)
  1. concept HEAD exists with a different name
  • loader-mercurial: Use HEAD for the default repository's branch (cf. point 2)
  • loader-pypi: already alias the default 'release' (symmetrically to what pypi provides)
  1. nothing to do
  • loader-debian
  • loader-tar
  • loader-dir

Remains the snapshot migration for the actual visited origins.

Cheers,

I'm closing this as it was about defining the naming convention and we have done so. I'm going to file a task about documenting it as part of the data model documentation.

@ardumont: can you file a separate task for migrating existing snapshot in the archive? (I suspect you've clearer than me in mind which snapshots need to be migrated…) TIA

@ardumont: can you file a separate task for migrating existing snapshot in the archive? (I suspect you've clearer than me in mind which snapshots need to be migrated…) TIA

Yes, done. T1268.

tl; dr, only the svn loader's data needs to migrate.