Page MenuHomeSoftware Heritage

Define the mapping for Bazaar repositories/branches to the SWH data model
Closed, MigratedEdits Locked

Event Timeline

Alphare triaged this task as High priority.Sep 27 2021, 1:20 PM
Alphare created this task.
Alphare created this object in space S1 Public.
Alphare added a parent task: T3610: Bazaar/Breezy loader.

The conclusions in the meeting were as follows:

  • Treat each bzr branch as a separate origin, since we have no way of knowing if branches inside a repository are related in bzr terms, and also because we de-duplicate on the SWH side:
    • From a user standpoint, they will most likely be searching by branch. If they search by shared repository, they will search with a prefix of a branch, which should also work
    • Since bzr branches do *not* have multiple heads, we don't have to worry about any sort of mapping, we will simply have HEAD
    • Tags are per-branch, so that also works
  • Renames are not tracked in SWH yet and are out of scope
  • Remembering a repository seems pointless since it depends on each user's way of working

Edge cases:

  • We have to look for stacked branches, how they work and if we can do anything more interesting that just failing
  • How do ghost revisions present themselves and can we store them even if we don't have the revision it points to?

Unresolved question I forgot to ask but I'm writing down here: bzr can hold multiple authors (line separated) for each commit, as well as associated bug fixes, should we store the authors directly in the author field? What about the bug fixes, should we add them to extra_headers?

Thanks.

Would it be possible to add a "conception documentation" included in the docs/ of the BZR loader repo? (possibly with D6344 or as a standalone diff)?

Ideally this doc would (briefly) describe how bazaar works and how it is different from already supported DVCS, then document chosen the "mapping" of the bzr model into swh (especially mentioning what is lost during this).