Page MenuHomeSoftware Heritage

facet/metadata-based project search
Closed, MigratedEdits Locked

Description

Under the URL "https://archive.softwareheritage.org/browse/" one finds the call to action "Enter string pattern(s) to search for in origin urls:"

There is not, as far as I can see, any way for a human user to determine what URLs are available. IMO "browse" implies that one does not need prior knowledge of content scope in order to explore/discover the content. Guessing what URLs might be in the dataset in order to seed the browse view puts an unnecessary burden on new users for basic discovery.

One solution would be to provide a faceted search via domain in an alpha list, but ideally other top-level facets would also be provided (e.g., language, date, program/package/library name).

Event Timeline

paregorios created this object in space S1 Public.
zack triaged this task as Low priority.Jun 12 2018, 12:13 PM
zack added subscribers: moranegg, zack.

Thanks for this feature request! Below you can find some additional background on what's needed to make it real.

There is not, as far as I can see, any way for a human user to determine what URLs are available

Yes and no. It's true that URL are not always telling. But at the same time forge repository URLs often contain the name of the corresponding project and/or the umbrella organization running it. So very often just searching for one or both of these facets of what you're looking for will narrow down your search results quite a bit. I certainly agree that that is just a temporary work around for the lack of a more general project-based search interface.

One solution would be to provide a faceted search via domain in an alpha list, but ideally other top-level facets would also be provided (e.g., language, date, program/package/library name).

Yes, we want to have this. But it will take a while. The main reason is that there are tons of different ways ("ontologies", if you want) to define project facets (or more generally project "metadata"), and each development forge, or software database really, uses this one. We need first of all to archive all of them, because they deserve preservation too. And then we need to make them searchable in a uniform way at least on a subset of facets. Additionally, we want to extract metadata that are embedded in the code we archive itself (e.g., the programming language, dependencies, etc.) and make those searchable to. At the scale of Software Heritage this is quite challenging.
We're already working on it (@moranegg is leading this and might comment more), but don't hold your breath :)

zack renamed this task from "Browse" should mean browse to facet/metadata-bases project search.Jun 12 2018, 12:14 PM
zack renamed this task from facet/metadata-bases project search to facet/metadata-based project search.
zack added a project: General.
zack raised the priority of this task from Low to Normal.Oct 4 2018, 11:55 AM
zack added a project: Metadata workflow.
zack added a subscriber: vlorentz.
zack claimed this task.

This is now fixed (by @vlorentz) and deployed.

The UI is *not* great on many levels, but we should submit more precise issues for the various needed improvements, instead of reusing this task.

Closing.