Page MenuHomeSoftware Heritage

Web-app: save code now accepts url of users for github and fails
Closed, MigratedEdits Locked

Description

For example- this url: https://github.com/TIBHannover/ A GitHub user url with 9 repos.

First it was accepted on the save code form and it fails the process (which I find normal).
I think it should not be accepted if we don't plan on listing all the repos, and give the SWH user a chance to give each url for each repo before having the failed message.

Event Timeline

moranegg created this task.

Nice catch.

We shouldn't try to load all repos of a user, because the 1-N mapping from user-URL to repos-URLs is too much forge-dependent, and we don't want to have to implement it for all forges out there.

On the other hand, I guess, the go/no-go mechanisms can maybe be based on regexs (maybe it is already, I haven't checked, sorry; if so, the GitHub regexs should be tuned) so that we can reject user-looking GitHub URLs and accept repo-looking GitHub URLs easily.

For the same reason as above I don't think the error message should be sophisticated and try to tell the user to enter a repo URL instead. It should just say something like "no repo to clone there". Users will figure out what they need to do.
(And we can have some example doc on the wiki, about how to "save code now" for common forges, if really needed.)

For the same reason as above I don't think the error message should be sophisticated and try to tell the user to enter a repo URL instead. It should just say something like "no repo to clone there". Users will figure out what they need to do.
(And we can have some example doc on the wiki, about how to "save code now" for common forges, if really needed.)

I agree it shouldn't be elaborated, a global rejection error with a link to the user documentation of how to use the "save code now" feature is sufficient.

I think the most generic solution would be to have our loader abstraction provide an entry point that allows checking for the existence of a loadable entity at a given origin url.

This entry point could then be called synchronously on URLs where a positive pattern match happens, improving the UX.

For the git loader, this entry point could be "can I HTTP GET the info/refs endpoint for the repository?"

The idea of using a chain of responsibility where loaders can check specific URL is indeed nice and elegant.

OTOH, I don't like turning static checks into dynamic ones, for the specific purposes of auto-reject/auto-accept URLs. The repo might be down at the time it is tested, and up back again; or vice-versa for what is worth. I'm sure we do not want retry logic in the pre-filtering phase, so relying on dynamic checks isn't great in my opinion. We can use a chain of responsibility for static checks, but it will be quite useless at this point, because most VCS support, e.g., http:// URLs.

Still, dynamic checks might be useful as an additional element that queue moderators use to guide their decision making. I imagine the list of pending save-code-now URLs with a checkmark (or not) on the side, stating whether the dynamic check succeeded or not, and a retry button. Similar to what CI integration does on popular forges.