Page MenuHomeSoftware Heritage

Add archival of bug tracker databases as well as an unofficial bug tracker per-project
Closed, MigratedEdits Locked

Description

There seems to be a significant oversight in the need to collect bug reports. The most important information to preserve with software code is its bug history. There are many reasons SW Heritage should collect bug reports:

  • Issues in bug databases are not generally part of git (apart from a recent innovation). They are fragile. Bug reports are a burden to create but easy to lose or destroy.
  • Sometimes bug databases are sabotaged. Sometimes a forge attacks a project and wipes it out, including the bug reports. This happened recently on a Codeberg project. Or a forge could simly shut its doors and go offline spontaneously without notice.
  • Sometimes bug db's are lost in migrations from one forge to another.
  • Some forges restrict access to bug trackers. In the case of gitlab.com, the bug reports are *more* restricted than the software. That is, Tor users are blocked from even reading bug reports.

As a separate matter, I suggest a step further. Not only to archive bug reports but to create a mechanism for people to submit new original reports. This is important because in some situations people are unwilling or unable to use the official bug tracker, so bug reports are consequently being suppressed by bug discoverers. Also because in some situations a developer might suppress a bug report, such as when someone discovers a deliberate tracker in the code that benefits the creator but works against the users.

Event Timeline

embed the bugs within git

A possible approach would be to harvest the bugs from the official bug tracker and embed them in git. The tool for embedding the bug reports in git already exists. I've not used the tool, but I see that there are bridges to facilitate bug report harvesting. The advantage is that it avoids the effort of updating the SWH GUI.

use an intern

This could be a good project for an intern so someone might want to add it here.

vlorentz triaged this task as Wishlist priority.Apr 12 2021, 11:31 AM
vlorentz added a subscriber: vlorentz.

Hi, thanks for the suggestion.

We agree bugs are a very important part of software projects and deserve archiving, for the reasons you citing and more.

However, this would require considerable work that isn't part of the (current) mission of Software Heritage, and definitely isn't something an intern could do in a few months.

About git-bug: as it stores all the data in git, we automatically archive its data like any other git object, so projects just need to mirror their bug tracker in a git repository if they want it to be archived.

However, this would require considerable work

The code has been written. The only thing for SWH to do is run git bug bridge pull [<name>] after the git clone, and schedule that to run periodically. AFAICT, the work is merely installing git-bug and adding a line to a couple scripts. The config calls for login creds so creating an account at each forge may be needed (a task for interns).

The only large undertaking that I suggested was to create a bug tracker that accepts new bug reports from those who can't access the official one. That's a separate feature that can be decided independantly.

that isn't part of the (current) mission of Software Heritage,

Part of the mission states "Software is a precious part of our cultural heritage." There's a lot more culture in the bug reports than in the code.

we automatically archive its data like any other git object,

You are likely doing a git pull on a periodic basis. Just add git bug bridge pull [<name>] next to it.

so projects just need to mirror their bug tracker in a git repository

Being unable to rely on source projects motivation to preserve their own work is why SWH exists. You certainly can't count on every project taking an action of any kind. This is a non-starter.

if they want it to be archived.

I don't believe SWH is serving the original project developers, who all generally have their own copy of their code. SWH serves everyone else, who would like the bug reports preserved regardless of the intent and motivation of the project originators.

You are likely doing a git pull on a periodic basis. Just add git bug bridge pull [<name>] next to it.

The devil is in the details.

First, we don't have local clones of the repositories. We also use Dulwich instead of the main Git implementation for this reason.

Second, our Git loader is incremental: it finds which commits are already loaded in the main storage, and asks the server only for new commits since those commits. It's unclear if this could work with git bug bridge pull.

It's also unclear if git bug bridge pull is even deterministic and has a stable over releases.

Then, even assuming we did have a local clone, the name part in git bug bridge pull [<name>] is also harder than it looks, because our git loader is only aware of Git URLs, it's not aware of the concept of forges. So we would have to change our task model to support it.

Finally, git bug creates its own branches in a repository, which means we are no longer archiving repositories verbatim; we are introducing our own changes to them.

so projects just need to mirror their bug tracker in a git repository

Being unable to rely on source projects motivation to preserve their own work is why SWH exists. You certainly can't count on every project taking an action of any kind. This is a non-starter.

if they want it to be archived.

I don't believe SWH is serving the original project developers, who all generally have their own copy of their code. SWH serves everyone else, who would like the bug reports preserved regardless of the intent and motivation of the project originators.

True, of course. I was merely proposing a workaround.

Another one would be for other project to host git-bug mirrors of all repositories, and we would archive it automatically.