Page MenuHomeSoftware Heritage

Use a FOSS alternative or drop Google ReCAPTCHA use
Closed, MigratedEdits Locked

Description

The Software Heritage archive web application uses Google ReCAPTCHA to protect "Save code now" request forms in production.

This is currently the only piece of loaded Javascript code that does not come under a free license.
All loaded Javascript code by the Software Heritage web application should be FOSS so we must find an alternative here.

As pointed by this article, a good solution would be to use django-simple-captcha.

Event Timeline

anlambert triaged this task as Normal priority.Jun 17 2019, 11:48 AM
anlambert created this task.

Django-simple-captcha works best out of the box using Forms or ModelForms. But the origin/save page is not rendered using forms, its plain HTML. One possible solution is to use a Form for origin save submission, the other is to write custom captcha template in and include it in the page. Which one did you have in mind?

With the post-hoc moderation of Save Code Now requests, do we really need a captcha? Isn't the base rate limiting enough?

As an alternative, we could just set the Django CSRF token on the form using a bit of Javascript code rather than the view sending it directly in the form, which would thwart most dumb bots (that's the "ReCAPTCHA alternatives for uncustomized spam > Javascript" section of the aforementioned article).

I should have read django-simple-captcha doc, indeed its integration is not really straightforward for swh-web.

Currently, only the api endpoint for creating save requests is rate limited while the save code now form is submitted using Javascript
(validating input then setting the appropriate Django CSRF token before sending the POST request).

So without captcha, it will still be difficult for a dumb bot to spam us.

I would go for removing the captcha but add rate limiting to the form submission just in case.

I should have read django-simple-captcha doc, indeed its integration is not really straightforward for swh-web.

Currently, only the api endpoint for creating save requests is rate limited while the save code now form is submitted using Javascript
(validating input then setting the appropriate Django CSRF token before sending the POST request).

So without captcha, it will still be difficult for a dumb bot to spam us.

I would go for removing the captcha but add rate limiting to the form submission just in case.

SGTM

Getting rid of ReCaptcha for save code now LGTM too.
I just wasn't sure that rate limit applies to Web UI submissions (e.g., will API requests come from our own IP? and if so, is that whitelisted?); I'm assuming that is what @anlambert plans to check.

anlambert renamed this task from Use a FOSS alternative to Google ReCAPTCHA to Use a FOSS alternative or drop Google ReCAPTCHA use.Jun 18 2019, 3:28 PM
anlambert claimed this task.

The use of Google ReCAPTCHA is now dropped in favor of rate limiting the number of save requests a user can create through the form (10/h).

The good news is that we are now fully compliant with LibreJS (no more blocked script and green icon in Firefox), see report below:

This has been deployed to production so closing this.