Implement support for takedown notices (infra, admin tools, workflow)
Closed, MigratedEdits Locked
Actions

Description

Takedown notices are coming, and we need full support to dereference certain contents, like it happens on https://github.com/github/dmca.

This involves several subtasks:

low level support for blacklisting specified contents (not only URLs, also SWHIDs), with support for regexps
admin interface to add/remove entries from the blacklist
a journal of these operations (what was added/removed, when and why, from the blacklist)
a public webpage that maintains the list of accepted takedown notices

Related Objects
Search...

Status	Assigned	Task
Migrated	gitlab-migration	T3087 Implement support for takedown notices (infra, admin tools, workflow)
Migrated	gitlab-migration	T1099 support origin and SWHID blocklist for archive search and browse
Migrated	gitlab-migration	T3224 Implement blocklist support in swh.search
Migrated	gitlab-migration	T3245 List all the objects that should be impacted by a given takedown request
Migrated	gitlab-migration	T3246 Document takedown request processing workflow
Migrated	gitlab-migration	T4657 Allow object removal from journal
Migrated	gitlab-migration	T4704 Publish objects that has been subject to a TDN in a (series of) kafka topic

Event Timeline

rdicosmo triaged this task as Normal priority.Mar 4 2021, 7:21 PM

rdicosmo created this task.

rdicosmo added a subtask: T1099: support origin and SWHID blocklist for archive search and browse.

rdicosmo added a project: Roadmap 2021.Mar 8 2021, 9:41 AM

rdicosmo added a project: meta-task.Mar 8 2021, 10:11 AM

rdicosmo merged a task: Unknown Object (Maniphest Task).Mar 11 2021, 8:13 PM

rdicosmo added a subscriber: douardda.

olasd claimed this task.Apr 8 2021, 5:11 PM

olasd removed olasd as the assignee of this task.Apr 12 2021, 4:15 PM

olasd created subtask T3245: List all the objects that should be impacted by a given takedown request.Apr 12 2021, 4:24 PM

Are we planning to add a way to notify the mirrors of the takedown notices ?
I'm just thinking if it could be interesting to subscribe the staging environment to it to ensure the content is also removed from it (and also flagged to avoid any further ingestion).

In T3087#63054, @vsellier wrote:

Are we planning to add a way to notify the mirrors of the takedown notices ?

Yeah, we'll have to do that.

What we (me and @rdicosmo) have been thinking of so far, was providing mirrors with a feed of the following information:

reference of the takedown request
SWHID of object affected
reason for takedown (maybe, can be found from the reference of the takedown request, if we find a way to structure it properly; useful for automated processing, I guess)
decision taken by Software Heritage (hide / remove once / blocklist forever)

We'd expect mirror operators to follow the feed, and to take their own decisions with respect to the actions to enact on their own infra.

I'm just thinking if it could be interesting to subscribe the staging environment to it to ensure the content is also removed from it

Once this scaffolding exists, it would certainly make sense to have it used to push the decisions from prod to staging.

(and also flagged to avoid any further ingestion).

For now my working assumption is that we'll remove objects *once* but we won't make the decision sticky. But I can see how having a sticky ingestion blocklist could be useful in some cases.

rdicosmo raised the priority of this task from Normal to High.Apr 13 2021, 2:53 PM

So what about exports of the archive available on git-annex?

In T3087#63791, @douardda wrote:

So what about exports of the archive available on git-annex?

In the most serious cases, we will be obliged to remove the incriminated content from these exports too.

One can imagine at least two ways to go:

open up the export, chase the incriminated content, remove it or zero it out, then repack and replace the original export
rebuild the export after removing the content from the archive

Fo 2., it would be handy to have timestamps on all objects (feature mentioned in another thread), so one could rebuild an export with the same content (minus the removed one) as the original export

Any thoughts on this? Any other ways to handle this issue (short of simply removing the exports)?

vlorentz assigned this task to anlambert.Apr 23 2021, 4:48 PM

In T3087#63887, @rdicosmo wrote:

In T3087#63791, @douardda wrote:

So what about exports of the archive available on git-annex?

Those exports do not contain blobs, so in case the takedown to be handled are only concerning file contents, they should not be impacted.
They might be impacted in case of takedown related to metadata, e.g., commit messages.

In that case we can go with what Roberto suggests (in short: "hot fixing" the exports), but that will take a significant amount of processing. For instance, graph compression will need to be redone from scratch. An alternative option, assuming that takedown impacting metadata will be rare enough, will be to just pull the entire graph exports. Once we have regular graph exports (which can happen as often as on a monthly basis) the impact of doing so will be fairly limited.

bchauvet added a project: Roadmap 2022.Mar 23 2022, 4:37 PM

douardda claimed this task.Apr 12 2022, 12:34 PM

lunar added a subscriber: lunar.Oct 17 2022, 2:44 PM

lunar claimed this task.Oct 25 2022, 4:48 PM

lunar added a subtask: T4657: Allow object removal from journal.

gitlab-migration closed subtask T1099: support origin and SWHID blocklist for archive search and browse as Migrated.Jan 8 2023, 4:59 PM