Page MenuHomeSoftware Heritage

textual search language for the Web UI
Open, NormalPublic

Description

The current Web UI search accept a single string that is used as list of tokens to search either in URLs or metadata.

We want to have a "search engine-like" sub-syntax that, without becoming too structured, allows to express specific facets.

Here are facets that come to mind as already useful today:

  • loader: (or "visit type", not sure what would be the best name for this) to filter on which loader has been used (without this today we cannot easily select all pypi, debian, or cran packages, for instance)
  • last_visited: (with some relational operator) to return only results that have been last visited in a given time frame
  • metadata fields: our metadata indexing extract a bunch of properties that would be great to filter on, rather than lumping them all together into a single full-text search (this requires some care into avoiding clashes between the metadata key namespace and the search facet one)

Other stuff that might be added in the future, provided we have suitable backend indexing:

  • project size: filter on number of, e.g., commits
  • file content: filter on projects that contain a given substring. This one hints at the fact that we will probably want to have a selector determining which type of objects will be returned, e.g., origin (the only possibility today) v. a specific type of Merkle DAG node

Event Timeline

zack triaged this task as Normal priority.Jan 29 2020, 1:31 PM
zack created this task.

Just saw this search page on Zenodo: https://help.zenodo.org/guides/search/

It can be helpful to provide insight what type of searches are needed in the academic domain.

Hi @zack, we can consider using Elasticsearch string query format to achieve this feature without having to design the syntax from scratch. It is really powerful and can cover most of the use cases.

Please have a look at these and let me know your opinion :