Page MenuHomeSoftware Heritage

Storing multiple authors in Revisions and Releases
Closed, MigratedEdits Locked

Description

As underlined by @vlorentz in D6344, we need to define our strategy with storing multiple authors for bzr commits.

Do we use a weaker system like GitHub's co-authored by (https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors) or do we explicitly migrate the existing model to support multiple authors?

Event Timeline

Alphare triaged this task as Normal priority.Jan 25 2022, 2:03 PM
Alphare created this task.
Alphare created this object in space S1 Public.

Heads up, we have some other tasks relating to this T3284 T1645.

Also @moranegg would most likely be interested by this (at least for the deposit)

Practically, we could be storing the metadata on additional authors *now* in the extra_headers field (make them a bunch of (b'author', b'XXX <yyy@zzz.ttt>') entries). Of course, that doesn't solve the question of presenting the information.

I think having multiple authors in a first-party field in revisions (and releases) will need a revision of our manifest format (which should probably be done with SWHIDv2). The current manifest format mixes authorship and timestamping in the same line, which is less than ideal for making such fields multi-valued.

olasd lowered the priority of this task from Normal to Wishlist.Jan 27 2022, 5:27 PM
olasd added a subscriber: anlambert.

From merged tasks, this would also be useful for some package loaders, e.g. npm, that support multiple authors in their packaging metadata.

olasd renamed this task from Storing multiple authors to Storing multiple authors in Revisions and Releases.Jan 27 2022, 5:30 PM

Now that I've written it out loud, of course, Releases don't have extra_headers so the package loaders can't make use of this workaround/hack for now.

In T3887#77949, @olasd wrote:

Now that I've written it out loud, of course, Releases don't have extra_headers so the package loaders can't make use of this workaround/hack for now.

Heh :-)

Do you foresee any issue in adding extra_headers to releases as well, other than "someone should do it"?

In T3887#77951, @zack wrote:

Do you foresee any issue in adding extra_headers to releases as well, other than "someone should do it"?

Not really, no; We've not done it because, afaict, we haven't had a strong need for it. The manifest format should be able to accommodate it even more easily than revisions do (because it has fewer edge cases).

Then let's just go for it (insert here ref. to upcoming separate task :-)).

The added advantage will be that it will make it easier to change "Merkle modeling decisions" in the future, by allowing to more easily switch between release and revision objects. (As we had to do in the past for deposits, for example.)

I suggested Co-Authored-By because it is a de-facto standard in Git now thanks to GitHub, so we already have many revisions using this "format" (no releases as far as I know, though).

Of course it's not great to make the BZR loader write metadata in revision *messages*, but at least it doesn't introduce a new format.