Page MenuHomeSoftware Heritage

origin_search: Add sort_by feature
ClosedPublic

Authored by KShivendu on Jun 23 2021, 6:00 PM.

Details

Summary

Sorting options are important features of an advanced search interface.

This diff introduces sort_by parameter in origin_search which shall
facilitate the same.

Diff Detail

Repository
rDSEA Archive search
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

Build has FAILED

Patch application report for D5918 (id=21244)

Rebasing onto 9bedaa95a3...

Current branch diff-target is up to date.
Changes applied before test
commit f0349f94a57a620584ca40d3e92c3834da149e67
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Wed Jun 23 15:47:25 2021 +0000

    origin_search: Add sort_by feature

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/166/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/166/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jun 23 2021, 6:04 PM
Harbormaster failed remote builds in B22228: Diff 21244!

Build is green

Patch application report for D5918 (id=21252)

Rebasing onto 3e15303528...

First, rewinding head to replay your work on top of it...
Applying: origin_search: Add sort_by feature
Changes applied before test
commit 042c7114b9bd1681eab18ef693baae3fa18c89d3
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Wed Jun 23 15:47:25 2021 +0000

    origin_search: Add sort_by feature

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/167/ for more details.

swh/search/elasticsearch.py
336

Why is this not covered ? Is something wrong with test_origin_sort_by_nb_visits_search that I've written ?

swh/search/in_memory.py
286

sorting the Iterable hits converts it into a List and hence renders itertools.islice useless so I replaced it with [start:end]

Can we do something better than this ?

swh/search/tests/test_search.py
427

Please ignore this. I'll fix it in the next draft.

  • test_search: Improve tests for sort_by

Build is green

Patch application report for D5918 (id=21253)

Rebasing onto 3e15303528...

First, rewinding head to replay your work on top of it...
Applying: origin_search: Add sort_by feature
Applying: test_search: Improve tests for sort_by
Changes applied before test
commit d648eaa500c9fc3e2cd083c2984e1ad331e2b78c
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jun 25 06:09:24 2021 +0000

    test_search: Improve tests for sort_by

commit 2cc7a3cef4e3c22fb5d529a81889a4d87ec02ce0
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Wed Jun 23 15:47:25 2021 +0000

    origin_search: Add sort_by feature

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/168/ for more details.

Would it make sense to have sort_by take a list of fields, instead of a single field?

ardumont added inline comments.
swh/search/elasticsearch.py
336

Not saying it's the case here but sometimes, there is a rendering glitch in the coverage.
You can also check the output in jenkins directly (but i don't remember nor find how...)

441–447

Please.

Extract this list into a constant list of sorting params into the interface and import
it in this module (same for the in_memory implementation). Then explain in the docstring
in the interface all the sorting params you can use in the endpoint (targetting the
constant list variable).

swh/search/in_memory.py
286

why useless?

288

why isn't there nb_visits in the sort params here? It's there in the main implementation (.../elasticsearch.py).

swh/search/elasticsearch.py
336

yeah it definitely is

swh/search/in_memory.py
286

My bad. I had some misunderstanding.

I just wanted to say that, according to https://stackoverflow.com/questions/41079001/python-3-5-slice-vs-islice-vs-alternatives-efficiency-comparison , simply doing mylist[start:stop] is faster than using list(itertools.islice(..)).

So it doesn't have any performance increase plus it imports itertools just for this one line.

So what do you suggest ?

swh/search/in_memory.py
286

you're right, we don't need islice anymore.

  • interface: Maintain SORT_BY_OPTIONS list

Build is green

Patch application report for D5918 (id=21259)

Rebasing onto 3e15303528...

First, rewinding head to replay your work on top of it...
Applying: origin_search: Add sort_by feature
Applying: test_search: Improve tests for sort_by
Applying: interface: Maintain SORT_BY_OPTIONS list
Changes applied before test
commit a85de23b672061f572314a4dc75550f5f9e0e9c7
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jun 25 10:02:13 2021 +0000

    interface: Maintain SORT_BY_OPTIONS list

commit b3e28c8951e1b0dd089a556bc3588ed771bfcdd8
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jun 25 06:09:24 2021 +0000

    test_search: Improve tests for sort_by

commit 5f41d603d9377c711d92b81587e9880349f6e399
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Wed Jun 23 15:47:25 2021 +0000

    origin_search: Add sort_by feature

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/169/ for more details.

swh/search/in_memory.py
286

right, thanks for the clarification ;)

  • origin_search: Allow sorting with multiple fields

Build is green

Patch application report for D5918 (id=21296)

Rebasing onto 3e15303528...

First, rewinding head to replay your work on top of it...
Applying: origin_search: Add sort_by feature
Applying: test_search: Improve tests for sort_by
Applying: interface: Maintain SORT_BY_OPTIONS list
Applying: origin_search: Allow sorting with multiple fields
Changes applied before test
commit 104cffe41f356b52c0d58e3c442c96556faa24fd
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jun 25 16:14:35 2021 +0000

    origin_search: Allow sorting with multiple fields

commit 36c982b30378a77f86c2bd229ecb37ad1d858496
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jun 25 10:02:13 2021 +0000

    interface: Maintain SORT_BY_OPTIONS list

commit 340b645dab741e48507bfcdb5053f41f0d7358a2
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jun 25 06:09:24 2021 +0000

    test_search: Improve tests for sort_by

commit 5d1f34296b01479dd796f8f7511e92d8aaf3e9c2
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Wed Jun 23 15:47:25 2021 +0000

    origin_search: Add sort_by feature

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/170/ for more details.

swh/search/tests/test_search.py
427

This one gets flattened by black (probably because of line length) while other members of the array remain intact.

What can I do to make this dictionary occupy 3 lines instead of just 1 ?

vlorentz added inline comments.
swh/search/elasticsearch.py
437

you can remove this if. If the list is empty, the loop will be skipped.

445

to avoid copying the list every time

447–450

ditto

swh/search/in_memory.py
279

you can also remove this one

281–305

Can you move this outside the function, improve the name (if possi ble), and add a docstring?

This revision now requires changes to proceed.Jun 25 2021, 7:23 PM
  • origin_search: Polish code related to sort_by

Build is green

Patch application report for D5918 (id=21298)

Rebasing onto 3e15303528...

First, rewinding head to replay your work on top of it...
Applying: origin_search: Add sort_by feature
Applying: test_search: Improve tests for sort_by
Applying: interface: Maintain SORT_BY_OPTIONS list
Applying: origin_search: Allow sorting with multiple fields
Applying: origin_search: Polish code related to sort_by
Changes applied before test
commit 8886591ebf292a5b843bb304dd6c0792abb6a54c
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Sat Jun 26 06:10:45 2021 +0000

    origin_search: Polish code related to sort_by

commit 9615ba2d293730d719f7e80f21108e041db1bc37
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jun 25 16:14:35 2021 +0000

    origin_search: Allow sorting with multiple fields

commit 450e015a20756ec1f4bbd4d902bd4cf4eb1b494d
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jun 25 10:02:13 2021 +0000

    interface: Maintain SORT_BY_OPTIONS list

commit 15c3d603897d9f4d49e5c94db5cd889b0fa96646
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jun 25 06:09:24 2021 +0000

    test_search: Improve tests for sort_by

commit e82353210bcd5c4886e3e13dbb9fbc9a3027ccf9
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Wed Jun 23 15:47:25 2021 +0000

    origin_search: Add sort_by feature

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/171/ for more details.

swh/search/tests/test_search.py
427

Any comments for this ?

Thanks again!

swh/search/tests/test_search.py
427

You can keep it like this, no big deal.

If you really don't want to, you can use # fmt: off before the code block and # fmt: on after.

This revision is now accepted and ready to land.Jun 26 2021, 10:06 AM

Build is green

Patch application report for D5918 (id=21300)

Rebasing onto 3e15303528...

Current branch diff-target is up to date.
Changes applied before test
commit 6b1d563d42e9643a4f957bfccd06a75a0dbabbc6
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Wed Jun 23 15:47:25 2021 +0000

    origin_search: Add sort_by feature
    
    Sorting options are important features of an advanced search interface.
    
    This diff introduces sort_by parameter in origin_search which shall
    facilitate the same.

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/172/ for more details.

This revision was automatically updated to reflect the committed changes.