Page MenuHomeSoftware Heritage

Search tools on metadata
Closed, MigratedEdits Locked

Description

For now, metadata are only used for fulltext/keyword search. There is a lot more we can do. This meta-task will track ideas and progress of their design.

Ideas:

  • Better full-text search (for now, it's keyword-based)
  • Search per field
    • Single field
    • Combining fields
  • Filter results to a subset of metadata file types ("mappings")
  • Stats on frequency of values for each field (e.g., license, ...)
    • on all metadata
    • on a subset of metadata (filtered by a search query)
  • Metadata-only search (right now it's always in OR with URL-based search)

Related Objects

StatusAssignedTask
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration
Migratedgitlab-migration

Event Timeline

vlorentz triaged this task as Normal priority.Feb 8 2019, 1:09 PM
vlorentz created this task.
vlorentz added a project: Restricted Project.

I've added an item to the above list (metadata-only search); I think the ideal UI would be a single form with two checkboxes under it, one enabling URL-based search (enabled by default), one enabling metadata-based search (disabled by default).

Relatedly, I don't understand what the "stats" items do in this list, they don't seem to be related to metadata-based *search*.

right now it's always in OR with URL-based search

Are you sure? Looking at the code, it seems to me it searches either in URL or in metadata, but not both at the same time.

That's the impression i got from testing. Either way, the current UI & semantics are bad, the proposed ones would be much better.

That's the impression i got from testing.

The URL is often part of the metadata, in the codeRepository key.

Relatedly, I don't understand what the "stats" items do in this list, they don't seem to be related to metadata-based *search*.

The second item ("on a subset of metadata") involves search

vlorentz renamed this task from Search tools on metadata (meta task) to Search tools on metadata.Jan 22 2020, 4:23 PM
vlorentz added a project: meta-task.

Hi everyone, I am new here. I am excited by the idea of preserving open-source code and would love to contribute this year during GSoC. I do have a decent knowledge of Elasticsearch, Django, and Web Development in general.

I want to work on the metadata search engine idea as mentioned here. But I noticed that the GSoC page for 2020 is exactly the same as that of 2021 (except the banner). So I wanted to ask if this project can be a part of GSoC this year?

Cheers!
Kumar Shivendu

@KShivendu Thanks for your interest!

Yes it can be part of GSoC this year, we added it earlier this month (history), it's only shown on the GSoC 2020 page because it is automatically generated from the same list of projects

hi, I would like to get familiar with this, as mentioned in the wiki ,

By the time GSoC starts, it will be implemented by a very small Python service (under 100 lines of code) backed by ElasticSearch.

can anybody point me to the relevant repo?

can anybody point me to the relevant repo?

Most likely, swh-search [1]

[1] https://forge.softwareheritage.org/source/swh-search/