Page MenuHomeSoftware Heritage

Add "forge" now
Open, NormalPublic

Description

We want to streamline the process to add a new code hosting platform to our regular
crawling queue, similarly to the currently available save code now feature. In a
nutshell:

  • a user-facing form that allow any user to submit the URL of a new code hosting platform, specifying the technology used on that platform (this is similar to the interface used for the "Save code now" feature)
  • the request goes into a moderation queue for human review through an admin interface similar to the one for Save Code Now
  • if accepted, the process continues as described in T1537.

We need to think about the name we decide to use for this feature: "Save Forge Now" is
appealing but misleading, as it will take time to actually save the full state of the
forge (it will not be "now"), and we actually do more, as this is not a one shot
operation, but an addition to the regular crawling. "Add Forge Now" would correspond
more to the reality.

Event Timeline

zack triaged this task as Low priority.Feb 21 2019, 8:20 PM
zack created this task.
rdicosmo raised the priority of this task from Low to Normal.Mar 4 2021, 10:21 AM
rdicosmo added a subscriber: douardda.
rdicosmo added a subscriber: rdicosmo.
zack renamed this task from save "forge" now to Save "forge" now.Mar 31 2021, 11:07 AM
ardumont renamed this task from Save "forge" now to Add "forge" now.Mon, Nov 8, 11:23 AM

This is an example of a use-case in Open Science:
An institution has a forge and wishes to add the forge as a whole to SWH.

Is the request for regular pulling?
If yes, it should be added for visibility on the home page of the archive.
If no, it should be clear this is a one shot thing.

Is the request for regular pulling?
If yes, it should be added for visibility on the home page of the archive.
If no, it should be clear this is a one shot thing.

Thanks a lot for this use case.

About the last point, do we really want to support archiving a forge only once?
In my mind, it has always been a first step when archiving a new forge, which we will then crawl regularly forever (until it shuts down, which we should be able to detect automatically eventually).
Other than that, I don't think it is in archival interest to not also regularly craw a forge that we know exists, after having archived it once.

Is the request for regular pulling?

It is.

The requisite is that first listing should happen quickly and then, regular "re-listing"
(and crawling) happens the same way the remaining part of the stack does (see scheduler
heuristics).