Page MenuHomeSoftware Heritage

Review / extend web app robots.txt
Open, HighPublic

Description

https://archive.softwareheritage.org/browse/ is currently being crawled by a random bot.

While this bot is quite slow and therefore doesn't incur much load, I don't think being indexed is generally desirable, at least for now; We should look at what other endpoints we should add to the current robots.txt.

Event Timeline

olasd created this task.Sep 22 2018, 2:50 PM
olasd triaged this task as High priority.
zack added a subscriber: zack.Sep 22 2018, 2:52 PM

+1

being crawled along million-revision VCS histories doesn't sound particularly appealing :-)