Page MenuHomeSoftware Heritage

journal_client: Store visit count and last visit date
ClosedPublic

Authored by KShivendu on Jun 7 2021, 5:23 PM.

Details

Summary

swh.storage passes visit count and visit date for each OriginVisitStatus through swh.journal(kafka).
These values are good candidates for building filters and sorting. this diff provides the code to
store these values when they're recieved by the swh.search journal client

Diff Detail

Repository
rDSEA Archive search
Branch
arcpatch-D5824_2
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 21978
Build 34186: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 34185: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Use painless script to atomically merge nb_visit and last_visit_date (max values)

Could you add tests to check larger values aren't overwritten?

vlorentz requested changes to this revision.Jun 8 2021, 2:17 PM
This revision now requires changes to proceed.Jun 8 2021, 2:17 PM

Add code to test painless script for merging values atomically

Use """ to overcome line length limits for writing long Painless scripts statements

Build has FAILED

Patch application report for D5824 (id=20850)

Rebasing onto c4d6fed488...

Current branch diff-target is up to date.
Changes applied before test
commit 8c9bf75efb07895e7ec821a359e8bba6136085ed
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Mon Jun 7 15:17:22 2021 +0530

    Store visit count and last visit date

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/115/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/115/console

Use noqa instead of multiple """

Build has FAILED

Patch application report for D5824 (id=20855)

Rebasing onto c4d6fed488...

Current branch diff-target is up to date.
Changes applied before test
commit 4800883d37fa7d434c4fd923046a3fa8f6331441
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Mon Jun 7 15:17:22 2021 +0530

    Store visit count and last visit date

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/116/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/116/console

Could you also add a test in test_search.py making sure nb_visit actually does get updated?

And also add filter and tests for last_visit_date.

vlorentz requested changes to this revision.Jun 9 2021, 1:02 PM
This revision now requires changes to proceed.Jun 9 2021, 1:02 PM

Add tests for last_visit_date

Could you also add a test in test_search.py making sure nb_visit actually does get updated?

I believe test_origin_nb_visit_update_search does it already.
Line 211 to 213 in test_search.py are for the same.

Please let me know if you think it's not enough.

And also add filter and tests for last_visit_date.

Done

  • mypy: Fix errors with release >= v0.900 (commit made by @anlambert. I'm including it so that my builds don't fail)
  • Store visit count and last visit date
  • Add tests for last_visit_date

Could you also add a test in test_search.py making sure nb_visit actually does get updated?

swh/search/interface.py
61–62

It's unclear from the names they will returns visits *after* the given values (rather than equal).

Could you rename the parameters and document them in the docstring?

swh/search/tests/test_search.py
236–243

It's hard to spot the difference. Could you make the date constants and comment that the difference is the years?

swh/search/interface.py
61–62

Actually, I thought of only going for equality because at the moment these two parameters are only used in tests for the atomic merges and later they'll be replaced with the query parser. In the query, users can specify < or > and Elasticsearch queries will be generated accordingly. So can I keep it as it is ?

Could you rename the parameters and document them in the docstring?

Sure.

  • Improve documentation + tests and fix painless script
  • use compareTo function instead of <. This should fix the painless script

Build has FAILED

Patch application report for D5824 (id=20913)

Rebasing onto 3e129a3f48...

Current branch diff-target is up to date.
Changes applied before test
commit 7eb2cccb51e69607c254d86f73df3bb2e08e1513
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Jun 10 16:12:08 2021 +0530

    Improve documentation + tests and fix painless script

commit 6ea5d4df4e541b3f952d5f8169119a9ad916db82
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Wed Jun 9 16:18:43 2021 +0530

    Add tests for last_visit_date

commit 19521597d7bcdbbf59ff25c3c1b257247fc62af8
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Mon Jun 7 15:17:22 2021 +0530

    Store visit count and last visit date

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/121/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/121/console

Build has FAILED

Patch application report for D5824 (id=20915)

Rebasing onto 3e129a3f48...

Current branch diff-target is up to date.
Changes applied before test
commit 9911e8bcee0f0bee683b5c1e52bbb73e4e573b12
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Jun 10 16:12:08 2021 +0530

    Improve documentation + tests and fix painless script

commit 6ea5d4df4e541b3f952d5f8169119a9ad916db82
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Wed Jun 9 16:18:43 2021 +0530

    Add tests for last_visit_date

commit 19521597d7bcdbbf59ff25c3c1b257247fc62af8
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Mon Jun 7 15:17:22 2021 +0530

    Store visit count and last visit date

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/122/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/122/console

Handle difference in iso formats of datetime and elasticsearch(painless) using .replace

Build has FAILED

Patch application report for D5824 (id=20916)

Rebasing onto 3e129a3f48...

Current branch diff-target is up to date.
Changes applied before test
commit b42a650dda27bc4d490d10f1e7cef7974ec15953
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Jun 10 16:12:08 2021 +0530

    Improve documentation + tests and fix painless script

commit 6ea5d4df4e541b3f952d5f8169119a9ad916db82
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Wed Jun 9 16:18:43 2021 +0530

    Add tests for last_visit_date

commit 19521597d7bcdbbf59ff25c3c1b257247fc62af8
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Mon Jun 7 15:17:22 2021 +0530

    Store visit count and last visit date

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/123/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/123/console

Build has FAILED

Patch application report for D5824 (id=20935)

Rebasing onto 3e129a3f48...

Current branch diff-target is up to date.
Changes applied before test
commit 4fa01d6af279b75ee953ff2edf96fe8cf75caa43
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Jun 10 17:18:54 2021 +0000

    Attempt to fix error

commit 4c012e025b300f79ed463507a8da2887a347626e
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Jun 10 15:25:25 2021 +0000

    journal_client: Store visit count and last visit date
    
    Summary:
    swh.storage passes visit count and visit date for each OriginVisitStatus through swh.journal(kafka).
    These values are good candidates for building filters and sorting. this diff provides the code to
    store these values when they're recieved by the swh.search journal client
    
    Reviewers: vlorentz, vsellier, #reviewers
    
    Subscribers: anlambert, ardumont
    
    Differential Revision: https://forge.softwareheritage.org/D5824

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/124/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/124/console

Build has FAILED

Patch application report for D5824 (id=20937)

Rebasing onto 3e129a3f48...

Current branch diff-target is up to date.
Changes applied before test
commit bfcef57673edde2898b2450c46732332fd2456cb
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Jun 10 18:00:25 2021 +0000

    Add field type in es

commit 4fa01d6af279b75ee953ff2edf96fe8cf75caa43
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Jun 10 17:18:54 2021 +0000

    Attempt to fix error

commit 4c012e025b300f79ed463507a8da2887a347626e
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Jun 10 15:25:25 2021 +0000

    journal_client: Store visit count and last visit date
    
    Summary:
    swh.storage passes visit count and visit date for each OriginVisitStatus through swh.journal(kafka).
    These values are good candidates for building filters and sorting. this diff provides the code to
    store these values when they're recieved by the swh.search journal client
    
    Reviewers: vlorentz, vsellier, #reviewers
    
    Subscribers: anlambert, ardumont
    
    Differential Revision: https://forge.softwareheritage.org/D5824

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/126/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/126/console

Update docstring to fix sphinx warnings

Build is green

Patch application report for D5824 (id=20943)

Rebasing onto 3e129a3f48...

Current branch diff-target is up to date.
Changes applied before test
commit 1555107a272d69ab06a251929cef9e7b0b3037c8
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Jun 10 15:25:25 2021 +0000

    journal_client: Store visit count and last visit date
    
    Summary:
    swh.storage passes visit count and visit date for each OriginVisitStatus through swh.journal(kafka).
    These values are good candidates for building filters and sorting. this diff provides the code to
    store these values when they're recieved by the swh.search journal client
    
    Reviewers: vlorentz, vsellier, #reviewers
    
    Subscribers: anlambert, ardumont
    
    Differential Revision: https://forge.softwareheritage.org/D5824

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/127/ for more details.

swh/search/interface.py
61–62

The point of having them here is to be able to use them even outside tests while we are working on the query language. Otherwise we would just use origin_dump in tests.

  • Use gte filter instead of equality

Build has FAILED

Patch application report for D5824 (id=20997)

Rebasing onto f3739ba16d...

First, rewinding head to replay your work on top of it...
Applying: journal_client: Store visit count and last visit date
Using index info to reconstruct a base tree...
M	swh/search/elasticsearch.py
M	swh/search/tests/test_search.py
Falling back to patching base and 3-way merge...
Auto-merging swh/search/tests/test_search.py
Auto-merging swh/search/elasticsearch.py
CONFLICT (content): Merge conflict in swh/search/elasticsearch.py
Patch failed at 0001 journal_client: Store visit count and last visit date

Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".

Rebase failed (ret=1)!

Could not rebase; Attempt merge onto f3739ba16d...

Already up to date.
Changes applied before test
commit c2b26194a888310e28f603a028003673278c6707
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Mon Jun 14 15:37:55 2021 +0000

    Use gte filter instead of equality

commit 1555107a272d69ab06a251929cef9e7b0b3037c8
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Jun 10 15:25:25 2021 +0000

    journal_client: Store visit count and last visit date
    
    Summary:
    swh.storage passes visit count and visit date for each OriginVisitStatus through swh.journal(kafka).
    These values are good candidates for building filters and sorting. this diff provides the code to
    store these values when they're recieved by the swh.search journal client
    
    Reviewers: vlorentz, vsellier, #reviewers
    
    Subscribers: anlambert, ardumont
    
    Differential Revision: https://forge.softwareheritage.org/D5824

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/132/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/132/console

Pull from origin and rebase branch

Build is green

Patch application report for D5824 (id=20998)

Rebasing onto f3739ba16d...

Current branch diff-target is up to date.
Changes applied before test
commit 4bbf80152638648402941a92fe6c0a0b4b1687a7
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Mon Jun 14 15:48:09 2021 +0000

    journal_client: Store visit count and last visit date

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/133/ for more details.

Use gte filter instead of equality

Build has FAILED

Patch application report for D5824 (id=21000)

Rebasing onto f3739ba16d...

Current branch diff-target is up to date.
Changes applied before test
commit aa1b1cdc2c2f405a9277528688a71a440632b1ce
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Mon Jun 14 15:37:55 2021 +0000

    Use gte filter instead of equality

commit e34daab3a35e2bfa364c490461f4d939487465bd
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Jun 10 15:25:25 2021 +0000

    journal_client: Store visit count and last visit date
    
    Summary:
    swh.storage passes visit count and visit date for each OriginVisitStatus through swh.journal(kafka).
    These values are good candidates for building filters and sorting. this diff provides the code to
    store these values when they're recieved by the swh.search journal client
    
    Reviewers: vlorentz, vsellier, #reviewers
    
    Subscribers: anlambert, ardumont
    
    Differential Revision: https://forge.softwareheritage.org/D5824

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/134/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/134/console

Build has FAILED

Patch application report for D5824 (id=21001)

Rebasing onto f3739ba16d...

Current branch diff-target is up to date.
Changes applied before test
commit 1d181f8b335aded8afe8312c50971a9b5584b64d
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Mon Jun 14 15:37:55 2021 +0000

    Use gte filter instead of equality

commit e34daab3a35e2bfa364c490461f4d939487465bd
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Jun 10 15:25:25 2021 +0000

    journal_client: Store visit count and last visit date
    
    Summary:
    swh.storage passes visit count and visit date for each OriginVisitStatus through swh.journal(kafka).
    These values are good candidates for building filters and sorting. this diff provides the code to
    store these values when they're recieved by the swh.search journal client
    
    Reviewers: vlorentz, vsellier, #reviewers
    
    Subscribers: anlambert, ardumont
    
    Differential Revision: https://forge.softwareheritage.org/D5824

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/135/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/135/console

Re-adjust position of "noqa" for painless script (after applying dedent)

Build has FAILED

Patch application report for D5824 (id=21002)

Rebasing onto f3739ba16d...

Current branch diff-target is up to date.
Changes applied before test
commit bf72257d1f784e58262585d7ddd922eb7bc8f01f
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Mon Jun 14 16:40:55 2021 +0000

    nb_vist -> nb_visits

commit 1d181f8b335aded8afe8312c50971a9b5584b64d
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Mon Jun 14 15:37:55 2021 +0000

    Use gte filter instead of equality

commit e34daab3a35e2bfa364c490461f4d939487465bd
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Jun 10 15:25:25 2021 +0000

    journal_client: Store visit count and last visit date
    
    Summary:
    swh.storage passes visit count and visit date for each OriginVisitStatus through swh.journal(kafka).
    These values are good candidates for building filters and sorting. this diff provides the code to
    store these values when they're recieved by the swh.search journal client
    
    Reviewers: vlorentz, vsellier, #reviewers
    
    Subscribers: anlambert, ardumont
    
    Differential Revision: https://forge.softwareheritage.org/D5824

Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/136/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/136/console

Match origin_search signatures

Build is green

Patch application report for D5824 (id=21003)

Rebasing onto f3739ba16d...

Current branch diff-target is up to date.
Changes applied before test
commit e95fbbc76ef02999ef0cb4d0c5b0d6042f3bf163
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Mon Jun 14 16:40:55 2021 +0000

    nb_vist -> nb_visits

commit 1d181f8b335aded8afe8312c50971a9b5584b64d
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Mon Jun 14 15:37:55 2021 +0000

    Use gte filter instead of equality

commit e34daab3a35e2bfa364c490461f4d939487465bd
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Jun 10 15:25:25 2021 +0000

    journal_client: Store visit count and last visit date
    
    Summary:
    swh.storage passes visit count and visit date for each OriginVisitStatus through swh.journal(kafka).
    These values are good candidates for building filters and sorting. this diff provides the code to
    store these values when they're recieved by the swh.search journal client
    
    Reviewers: vlorentz, vsellier, #reviewers
    
    Subscribers: anlambert, ardumont
    
    Differential Revision: https://forge.softwareheritage.org/D5824

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/137/ for more details.

One last change, then you can land this

swh/search/interface.py
74–77
This revision is now accepted and ready to land.Jun 15 2021, 10:14 AM

Fix origin_search() documentation and squash commits

Build is green

Patch application report for D5824 (id=21007)

Rebasing onto f3739ba16d...

Current branch diff-target is up to date.
Changes applied before test
commit 8b2c87f4e1b454a250553f95277dac2b13329ca5
Author: KShivendu <shivendu@iitbhilai.ac.in>
Date:   Thu Jun 10 15:25:25 2021 +0000

    Store nb_visits and last_visit_date
    
    swh.storage passes visit count and visit date for each OriginVisitStatus through swh.journal(kafka).
    These two values are good candidates for filters and the sorting feature so this commit provides the code to
    store these values when they are recieved by the swh.search journal client

See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/139/ for more details.

This revision was automatically updated to reflect the committed changes.