Polish the swh-search QL
Closed, MigratedEdits Locked
Actions

Assigned To

Authored By

	vlorentz
	Sep 6 2021, 10:37 AM

Description

Make sure it's:

consistent
future-proof (we should avoid changing it after users start relying on it)
user-friendly
well-documented

Related Objects
Search...

Status	Assigned	Task
Migrated	gitlab-migration	T3097 Expose metadata in the WebApp and make it searchable
Migrated	gitlab-migration	T3952 Make the search query language a first class citizen
Migrated	gitlab-migration	T3558 Enable the swh-search QL in production
Migrated	gitlab-migration	T3559 Enable the swh-search QL in staging
Migrated	gitlab-migration	T3967 "Link" header is not properly displayed in apidoc when it contains []
Migrated	gitlab-migration	T3560 Polish the swh-search QL
Migrated	gitlab-migration	T3909 Get feedback on the swh-search QL
Migrated	gitlab-migration	T3910 Add an option to enable the swh-search QL for some users
Migrated	gitlab-migration	T3927 swh-search crashes when "only show origins visited at least once" is not checked
Migrated	gitlab-migration	T3930 Update swh-search QL examples
Migrated	gitlab-migration	T3931 Pagination UI does not work at all when using the search QL
		Restricted Maniphest Task
Migrated	gitlab-migration	T3926 Better syntax errors for the search query language
		Restricted Maniphest Task
Migrated	gitlab-migration	T3944 Deploy swh-search v0.13.0
Migrated	gitlab-migration	T3943 rebuild python3-tree-sitter for python 3.10
Migrated	gitlab-migration	T4296 Pagination does not work when using sort_by in search query language

Event Timeline

vlorentz triaged this task as Normal priority.Sep 6 2021, 10:37 AM

vlorentz created this task.

vlorentz updated the task description. (Show Details)

vlorentz removed a project: System administration.Feb 14 2022, 1:07 PM

vlorentz added a subtask: Restricted Maniphest Task.Feb 16 2022, 9:54 AM

vlorentz added a subtask: T3926: Better syntax errors for the search query language.Feb 16 2022, 1:41 PM

vlorentz closed subtask T3931: Pagination UI does not work at all when using the search QL as Resolved.Feb 21 2022, 1:36 PM

vlorentz closed subtask Restricted Maniphest Task as Resolved.

vlorentz closed subtask T3930: Update swh-search QL examples as Resolved.

vlorentz closed subtask T3927: swh-search crashes when "only show origins visited at least once" is not checked as Resolved.

zack added a parent task: T3952: Make the search query language a first class citizen .Feb 22 2022, 6:46 PM

vlorentz removed vlorentz as the assignee of this task.Apr 27 2022, 2:29 PM

Hey @vlorentz @zack, I've been using sourcegraph.com for almost a year now and I feel that they have worked a lot on polishing their search query language. I think we can learn from them and adapt our language. Here are a few suggestions:

Instead of making it mandatory to use the origin and metadata keyword. We can just allow users to mention keywords without mentioning the field and search those terms in origin (higher score) and metadata fields. This will allow users to write smaller and effective queries:
- django last_visit > 2022 instead of origin:django and last_visit > 2022
- progval instead of metadata:progval
Make it faster to write array filters like language and license
- language: python|go instead of language in [python, go]
It should be possible to negate any filter with -
- -origin:XYZ should exclude origins containing the term XYZ (exact opposite of origin:XYZ)
Provide aliases for writing queries faster
- o:xyz should be equivalent to origin:xyz
- m:abc should be equivalent to metadata:abc
- lang:python or l:python should be equivalent to language:python
Assume and between filters if anything isn't provided.
- origin:X metadata:Y instead of origin: X and metadata: Y

They are based on the following assumptions:

Search queries should be small and hence fast to type.
Search query languages should intelligently pick up the most common intention of the user while still allowing overriding the default behavior.

In T3560#85153, @KShivendu wrote:

Hey @vlorentz @zack, I've been using sourcegraph.com for almost a year now and I feel that they have worked a lot on polishing their search query language. I think we can learn from them and adapt our language. Here are a few suggestions:

Thanks for investigating this and making a list of actionable suggestions!
Here is a case-by-case commentary below:

Instead of making it mandatory to use the origin and metadata keyword. We can just allow users to mention keywords without mentioning the field and search those terms in origin (higher score) and metadata fields. This will allow users to write smaller and effective queries:

django last_visit > 2022 instead of origin:django and last_visit > 2022

progval instead of metadata:progval

This one gives me pause, but only because we need to make sure it's not semantically ambiguous. Let's see if I'm getting it right:

if there are no qualifiers ("o:", "m:"), we search by default in both origin and metadata, and rank the results
if there are qualifiers we only search in the associated data

Correct?

If so, I'm fine with this, but we need to check how much worse performances get.

Also, I'm not so sure the ranking criteria should be "origin hits win", maybe there's something smarter to be used there...

Make it faster to write array filters like language and license

language: python|go instead of language in [python, go]

LGTM

It should be possible to negate any filter with -

-origin:XYZ should exclude origins containing the term XYZ (exact opposite of origin:XYZ)

LGTM

Provide aliases for writing queries faster

o:xyz should be equivalent to origin:xyz

m:abc should be equivalent to metadata:abc

lang:python or l:python should be equivalent to language:python

OK, but again as long as they're not ambiguous.

Assume and between filters if anything isn't provided.

origin:X metadata:Y instead of origin: X and metadata: Y

Hell yes!

anlambert added a subtask: T4296: Pagination does not work when using sort_by in search query language.Jun 1 2022, 1:42 PM

What about having a UI like in Github or Phabricator to create an advanced query?
eg:
https://github.com/search/advanced

We can continue to support the query language, and the QL can be generated using the UI. This will help us to support saved searches and bookmarks.
We could have more search contexts (more than origin and metadata) in the future as we index more data.
It will be too hard to support different contexts with varying inputs just using a QL. It can done in a UI with some moving elements.
What do you think?

jayeshv mentioned this in T4613: Generalize and simplify the query language.Oct 21 2022, 10:16 AM

gitlab-migration changed the status of subtask T3927: swh-search crashes when "only show origins visited at least once" is not checked from Resolved to Migrated.Jan 8 2023, 4:36 PM

gitlab-migration changed the status of subtask Restricted Maniphest Task from Resolved to Migrated.

This task has been migrated to GitLab.

gitlab-migration closed subtask T3909: Get feedback on the swh-search QL as Migrated.Jan 8 2023, 5:03 PM

gitlab-migration closed subtask T3926: Better syntax errors for the search query language as Migrated.

gitlab-migration closed subtask T4296: Pagination does not work when using sort_by in search query language as Migrated.

gitlab-migration changed the status of subtask T3930: Update swh-search QL examples from Resolved to Migrated.Jan 8 2023, 10:03 PM

gitlab-migration changed the status of subtask T3931: Pagination UI does not work at all when using the search QL from Resolved to Migrated.

Polish the swh-search QLClosed, MigratedEdits LockedActions

Description

Related ObjectsSearch...

Event Timeline

Polish the swh-search QL
Closed, MigratedEdits Locked
Actions

Related Objects
Search...