Page MenuHomeSoftware Heritage

Better handling of erroneous origins submitted to save code now
Open, NormalPublic

Description

Despite clear instructions to submit to save code now the exact origin from which a checkout/clone can be performed, several users keep submitting erroneous origins.
A typical example is https://github.com/TIBHannover/ that points to an "organization" on GitHub with some 80+ projects under its umbrella, and then, rightfully, fails.

In the current setup, the user that originates such erroneous request does not get any feedback about the mistake (unless she looks up the result proactively later on), and the "save code now" is wasted.

It would be desirable to provide the user with feedback that helps fix the issue.

Event Timeline

rdicosmo triaged this task as Normal priority.Apr 15 2021, 10:47 PM
rdicosmo created this task.

It would be desirable to provide the user with feedback that helps fix the issue.

Totally.

Now that we have a decent user registration system I think we should consider:

  1. requiring user registration for submitting save code now requests (which will also provide an audit trail for users that repeatedly submit bogus if not actively harmful requests)
  2. send by default email notifications about the outcome of save code now requests, both successes and failures, with the possibility of disabling email notifications in the user profile

This will make the overall UX of interacting with the archive feel much more "reliable" for users, whereas right now it feels much like a leap of faith whether it will work or not, in good part due to the lack of systematic out-of-band notifications.

Oh, and now that we have user profile pages, we should have a list of "my" save code now requests with their status visible in the user profile, for those who want to check synchronously the status of their requests (and might have disabled email notifications).

As a first step towards giving more feedback for users who submitted wrong origins for
ingestion (e.g. organization links, tarballs with wrong visit type, link to html page
probably for listing, etc...). We could allow the operator which rejects the origins a
free form input field so they could explain the reason of the rejection. It'd be less
brutal a rejection.

This does not require the user registration part discussed above nor does it exclude it.

Bonus point for this, it's an easy hack ;)

As an incremental step after that, we could make that a configurable predefined template
selection box of rejecting reasons as I don't think there are so many different reasons
after all (unsupported for now, not an origin of type <type>, not a repository link,
...). Drawing stats from the first implementation could help in designing the initial
templates of rejection.

Which could be another easy hack once the first part is done (if we want).

As suggested to @anlambert recently (@antoine, given it a bit more thought and added the
second incremental part since then thus the ping ;)

In T3252#63314, @zack wrote:

It would be desirable to provide the user with feedback that helps fix the issue.

Totally.

Now that we have a decent user registration system I think we should consider:

  1. requiring user registration for submitting save code now requests (which will also provide an audit trail for users that repeatedly submit bogus if not actively harmful requests)
  2. send by default email notifications about the outcome of save code now requests, both successes and failures, with the possibility of disabling email notifications in the user profile

This will make the overall UX of interacting with the archive feel much more "reliable" for users, whereas right now it feels much like a leap of faith whether it will work or not, in good part due to the lack of systematic out-of-band notifications.

Not sure if we should require a user to be authenticated for submitting a save code now request but adding an email field (auto filled for registered users) to send a notification after the origin was loaded seems a good tradeoff. To implement the email notification, we will have to add a journal client in swh-web processing origin visit messages.

In T3252#63315, @zack wrote:

Oh, and now that we have user profile pages, we should have a list of "my" save code now requests with their status visible in the user profile, for those who want to check synchronously the status of their requests (and might have disabled email notifications).

+1, great idea !

As a first step towards giving more feedback for users who submitted wrong origins for
ingestion (e.g. organization links, tarballs with wrong visit type, link to html page
probably for listing, etc...). We could allow the operator which rejects the origins a
free form input field so they could explain the reason of the rejection. It'd be less
brutal a rejection.

This does not require the user registration part discussed above nor does it exclude it.

Bonus point for this, it's an easy hack ;)

As an incremental step after that, we could make that a configurable predefined template
selection box of rejecting reasons as I don't think there are so many different reasons
after all (unsupported for now, not an origin of type <type>, not a repository link,
...). Drawing stats from the first implementation could help in designing the initial
templates of rejection.

Which could be another easy hack once the first part is done (if we want).

As suggested to @anlambert recently (@antoine, given it a bit more thought and added the
second incremental part since then thus the ping ;)

+1, can you create a task about it ? This could be handled by a GSOC student who chooses to
work on the webapp.

but adding an email field (auto filled for registered users) to send a notification after the origin was loaded seems a good tradeoff. To implement the email notification, we will have to add a journal client in swh-web processing origin visit messages.

Adding an email field is a poor UX solution (it needs to be reentered every time or saved in a cookie) which we used for the vault at the time because we didn't have user registration.
Now that we have user registration we can just tell users that if they want to be notified, they should login. (Which is indeed something independent from requiring user registration for being able to submit.) That will encourage users to register to have added-value functionalitlies, like notifications.
And then we should go back to all places that could use notifications (vault, save code now, deposit, "save again" button) and uniform things.

In T3252#63374, @zack wrote:

but adding an email field (auto filled for registered users) to send a notification after the origin was loaded seems a good tradeoff. To implement the email notification, we will have to add a journal client in swh-web processing origin visit messages.

Adding an email field is a poor UX solution (it needs to be reentered every time or saved in a cookie) which we used for the vault at the time because we didn't have user registration.
Now that we have user registration we can just tell users that if they want to be notified, they should login. (Which is indeed something independent from requiring user registration for being able to submit.) That will encourage users to register to have added-value functionalitlies, like notifications.
And then we should go back to all places that could use notifications (vault, save code now, deposit, "save again" button) and uniform things.

Ack, I got the idea. I will create related tasks if they do not exist yet.

+1, can you create a task about it ? This could be handled by a GSOC student who chooses to
work on the webapp.

sure done respectively T3256 then T3257.

+1, can you create a task about it ? This could be handled by a GSOC student who chooses to
work on the webapp.

sure done respectively T3256 then T3257.

Great, thanks !

Thanks to all of you for this dicussion and proposals.

Here is a summary of my understanding of all the above:

  1. registration remains not mandatory for submitting a save code now/save again request
  2. an option to login or register is shown to logged out users, advertising the key extra features they may gain, namely:
    • notification of success, rejection or failure
    • access to a "My save code now" tab that presents all her requests with their status
  3. in case of failure, a boilerplate message is sent back to the registered user with a reminder of the main mistakes to avoid
  4. in case of rejection, the operator may add extra information as she sees fit (recording this is useful even if we do not have a user to notify), now in tasks T3256 then T3257.

Notice that T3213 already tracks an extra feature for selected registered users: triggering archival of bundles (.tar.gz, .zip, packages etc.)

Are we all on the same page?

@rdicosmo great summary, I'm certainly on that page :)

In T3252#63315, @zack wrote:

Oh, and now that we have user profile pages, we should have a list of "my" save code now requests with their status visible in the user profile, for those who want to check synchronously the status of their requests (and might have disabled email notifications).

+1, great idea !

T3272