Page MenuHomeSoftware Heritage

origin_search: Add keyword search for instrinsic_metadata keywords/description
ClosedPublic

Authored by KShivendu on Jul 2 2021, 1:24 PM.

Details

Summary

instrinsic_metadata contains keywords and description which are very useful for
finding desirable origins based on a list of keywords provided by the user.

Diff Detail

Repository
rDSEA Archive search
Branch
metadata-description
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 22412
Build 34914: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 34913: arc lint + arc unit

Unit TestsFailed

TimeTest
280 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.search.tests.test_api_client.TestRemoteSearch::test_origin_keywords_search
self = <swh.search.tests.test_api_client.TestRemoteSearch testMethod=test_origin_keywords_search> def test_origin_keywords_search(self):
134 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.search.tests.test_elasticsearch.TestElasticsearchSearch::test_origin_keywords_search
self = <swh.search.tests.test_elasticsearch.TestElasticsearchSearch testMethod=test_origin_keywords_search> def test_origin_keywords_search(self):
6 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.search.tests.test_in_memory.InmemorySearchTest::test_origin_keywords_search
self = <swh.search.tests.test_in_memory.InmemorySearchTest testMethod=test_origin_keywords_search> def test_origin_keywords_search(self):
3 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.search.tests.test_init::test_types[memory-InMemorySearch-None]
mocker = <pytest_mock.plugin.MockerFixture object at 0x7f4a7ac74a90> class_ = 'memory' expected_class = <class 'swh.search.in_memory.InMemorySearch'>, kwargs = None
4 msJenkins > .tox.py3.lib.python3.7.site-packages.swh.search.in_memory::swh.search.in_memory._nested_get
View Full Test Results (4 Failed · 130 Passed · 1 Skipped)

Event Timeline

Build has FAILED

Patch application report for D5963 (id=21434)

Rebasing onto 2e1fb86387...

Current branch diff-target is up to date.
Changes applied before test
commit c100cd9f5d6dfd7042c0a717cff3fa609e1b3ac1
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jul 2 11:19:08 2021 +0000

    origin_search: Add keyword search for instrinsic_metadata keywords/description

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/183/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/183/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jul 2 2021, 1:29 PM
Harbormaster failed remote builds in B22412: Diff 21434!
  • test_search: Fix failing tests

Build has FAILED

Patch application report for D5963 (id=21435)

Rebasing onto 2e1fb86387...

Current branch diff-target is up to date.
Changes applied before test
commit f8ded206f956b441750cc17a111fa1c810660dcf
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jul 2 12:15:55 2021 +0000

    test_search: Fix failing tests

commit c100cd9f5d6dfd7042c0a717cff3fa609e1b3ac1
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jul 2 11:19:08 2021 +0000

    origin_search: Add keyword search for instrinsic_metadata keywords/description

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/184/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/184/console

Harbormaster returned this revision to the author for changes because remote builds failed.Jul 2 2021, 2:48 PM
Harbormaster failed remote builds in B22413: Diff 21435!
  • origin_search: Fix signature

Build is green

Patch application report for D5963 (id=21442)

Rebasing onto 2e1fb86387...

Current branch diff-target is up to date.
Changes applied before test
commit 8238dd0163136d45a51fbb44b5c32f0f73eecc8e
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jul 2 14:01:43 2021 +0000

    origin_search: Fix signature

commit f8ded206f956b441750cc17a111fa1c810660dcf
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jul 2 12:15:55 2021 +0000

    test_search: Fix failing tests

commit c100cd9f5d6dfd7042c0a717cff3fa609e1b3ac1
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jul 2 11:19:08 2021 +0000

    origin_search: Add keyword search for instrinsic_metadata keywords/description

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/185/ for more details.

vlorentz added inline comments.
swh/search/elasticsearch.py
450

what does ^2 do?

465–474

why these two changes?

swh/search/in_memory.py
447–452

you can sort in-place, no need to copy

  • origin_search: Polish the code with get_expansion and other methods

Build is green

Patch application report for D5963 (id=21471)

Rebasing onto 2e1fb86387...

Current branch diff-target is up to date.
Changes applied before test
commit cd468bdb2fcfc577edf83432f329faa943d46918
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jul 2 16:14:08 2021 +0000

    origin_search: Polish the code with get_expansion and other methods

commit 8238dd0163136d45a51fbb44b5c32f0f73eecc8e
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jul 2 14:01:43 2021 +0000

    origin_search: Fix signature

commit f8ded206f956b441750cc17a111fa1c810660dcf
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jul 2 12:15:55 2021 +0000

    test_search: Fix failing tests

commit c100cd9f5d6dfd7042c0a717cff3fa609e1b3ac1
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jul 2 11:19:08 2021 +0000

    origin_search: Add keyword search for instrinsic_metadata keywords/description

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/187/ for more details.

KShivendu added inline comments.
swh/search/elasticsearch.py
450

Boosts a result score by 2 times (might not be exactly 2x, but that's the idea) if queried keywords are found in that field

swh/search/tests/test_search.py
458–467

These comments are outdated. I'll update them.
But you may check if these test cases are sufficient.

This revision is now accepted and ready to land.Jul 5 2021, 11:54 AM
KShivendu marked an inline comment as done.
  • Squash
  • Minor polishes

Build is green

Patch application report for D5963 (id=21477)

Rebasing onto 2e1fb86387...

Current branch diff-target is up to date.
Changes applied before test
commit d2c137d31d482439f17420cfa12c2bba05c6a6ec
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jul 2 11:19:08 2021 +0000

    origin_search: Add keyword search for instrinsic_metadata keywords/description

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/188/ for more details.

Build is green

Patch application report for D5963 (id=21478)

Rebasing onto 2e1fb86387...

Current branch diff-target is up to date.
Changes applied before test
commit f378a989e972f1896f4b99fac7e916e4fb133a00
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Fri Jul 2 11:19:08 2021 +0000

    origin_search: Add keyword search for instrinsic_metadata keywords/description
    
    instrinsic_metadata contains keywords and description which are very useful for
    finding desirable origins based on a list of keywords provided by the user.

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/189/ for more details.