Last revision/release dates are excellent candidates for sorting options and filters. These changes will store these values whenever an OriginVisitStatus with a valid snapshot id is received. The dates are fetched from swh.storage by passing the snapshot id. So these changes also configure swh.storage client in swh.search
Details
- Reviewers
anlambert - Group Reviewers
Reviewers - Commits
- rDSEA9bedaa95a39e: journal_client: Store last revision/release date
Diff Detail
- Repository
- rDSEA Archive search
- Branch
- last-revision-release-date
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 22201 Build 34550: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 34549: arc lint + arc unit
Event Timeline
Build is green
Patch application report for D5883 (id=21114)
Rebasing onto 35570b4284...
First, rewinding head to replay your work on top of it... Applying: Store last_eventful_visit_date Using index info to reconstruct a base tree... M swh/search/elasticsearch.py M swh/search/in_memory.py M swh/search/interface.py M swh/search/journal_client.py M swh/search/tests/test_journal_client.py M swh/search/tests/test_search.py Falling back to patching base and 3-way merge... Auto-merging swh/search/tests/test_search.py Auto-merging swh/search/elasticsearch.py No changes -- Patch already applied. Applying: Add last_revision_daate and last_release_date
Changes applied before test
commit 2bfb347ec05e6a414b7a642a73f6f6eb196e5e2d Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Thu Jun 17 10:39:17 2021 +0000 Add last_revision_daate and last_release_date FRemove duplicate cls param for get_storage Fix failing tests Adjust tests related to previous commit Load swh.storage from config
See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/152/ for more details.
Build is green
Patch application report for D5883 (id=21115)
Rebasing onto 35570b4284...
Current branch diff-target is up to date.
Changes applied before test
commit fe74c7e858310cba1232ca9885a38d3940a773e3 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Thu Jun 17 10:39:17 2021 +0000 Add last_revision_daate and last_release_date FRemove duplicate cls param for get_storage Fix failing tests Adjust tests related to previous commit Load swh.storage from config
See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/153/ for more details.
swh/search/journal_client.py | ||
---|---|---|
96–134 | storage.snapshot_get_branches does not return all branches, a limit of 1000 is set by default. Also you should merge the fetch_last_*_date method into a single one to avoid iterating on the whole set of branches twice. |
- merge the fetch_last_*_date methods
- Use snapshot_get_all_branches instead of snapshot_get_branches
- Fix diff/commit description/message
- Use in-memory backend in test_cli.py
Build is green
Patch application report for D5883 (id=21128)
Rebasing onto 35570b4284...
Current branch diff-target is up to date.
Changes applied before test
commit 835edc7e5b1efd21d8486b228252a1d1056ebc92 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Thu Jun 17 10:39:17 2021 +0000 Setup storage and store last revision/release date Last revision/release dates are excellent candidates for sorting options and filters. These changes will store these values whenever an OriginVisitStatus with a valid snapshot id is received. The dates are fetched from swh.storage by passing the snapshot id. So these changes also configure swh.storage client in swh.search
See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/154/ for more details.
Build has FAILED
Patch application report for D5883 (id=21129)
Rebasing onto 35570b4284...
Current branch diff-target is up to date.
Changes applied before test
commit 7d6c68e37f61409e2002829bdc873084194cc427 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Fri Jun 18 11:23:02 2021 +0000 Include last revision/release date only when available commit 835edc7e5b1efd21d8486b228252a1d1056ebc92 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Thu Jun 17 10:39:17 2021 +0000 Setup storage and store last revision/release date Last revision/release dates are excellent candidates for sorting options and filters. These changes will store these values whenever an OriginVisitStatus with a valid snapshot id is received. The dates are fetched from swh.storage by passing the snapshot id. So these changes also configure swh.storage client in swh.search
Link to build: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/155/
See console output for more information: https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/155/console
swh/search/journal_client.py | ||
---|---|---|
127 | This include last revision/release date only when snapshot_id is available. Is it okay to do this ? |
swh/search/elasticsearch.py | ||
---|---|---|
43 | Should last_release_date be added too here ? | |
308 | no search on min release date ? | |
swh/search/in_memory.py | ||
136 | same here | |
swh/search/interface.py | ||
64 | same here | |
swh/search/journal_client.py | ||
127 | lgtm but you should test for None snapshot at the beginning of the function, this way you can dedent the code for non None snapshot. |
- Add test for fetch_last_revision_release_date in journal_client
- Add missing arguments and tests related to last_release_date
Build is green
Patch application report for D5883 (id=21150)
Rebasing onto 35570b4284...
Current branch diff-target is up to date.
Changes applied before test
commit 426a7d3d72bfd23e6fb547d40ddf43ded1507afe Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Fri Jun 18 15:06:34 2021 +0000 Add test for journal_client and add tests for last_release_date commit 7d6c68e37f61409e2002829bdc873084194cc427 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Fri Jun 18 11:23:02 2021 +0000 Include last revision/release date only when available commit 835edc7e5b1efd21d8486b228252a1d1056ebc92 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Thu Jun 17 10:39:17 2021 +0000 Setup storage and store last revision/release date Last revision/release dates are excellent candidates for sorting options and filters. These changes will store these values whenever an OriginVisitStatus with a valid snapshot id is received. The dates are fetched from swh.storage by passing the snapshot id. So these changes also configure swh.storage client in swh.search
See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/156/ for more details.
swh/search/journal_client.py | ||
---|---|---|
93 | import at the toplevel | |
95–123 | and this function definition too | |
swh/search/tests/test_journal_client.py | ||
190–295 | Is this copy-pasted from swh-model's tests? You can import data from there instead. (And if not, move the definitions outside the function) | |
206–217 | why is did the value of snapshot_id change? this doesn't seem related to this diff. |
Build is green
Patch application report for D5883 (id=21153)
Rebasing onto 35570b4284...
Current branch diff-target is up to date.
Changes applied before test
commit 2e6a34b8ed706dc9c5e03a5c062fdfd17e0739a6 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Mon Jun 21 06:11:02 2021 +0000 Polish the code commit e6fbdc54f916b6fd6c2bfed97b2acaeecc77fdb5 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Fri Jun 18 15:20:26 2021 +0000 Throw error in absence of storage config commit 426a7d3d72bfd23e6fb547d40ddf43ded1507afe Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Fri Jun 18 15:06:34 2021 +0000 Add test for journal_client and add tests for last_release_date commit 7d6c68e37f61409e2002829bdc873084194cc427 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Fri Jun 18 11:23:02 2021 +0000 Include last revision/release date only when available commit 835edc7e5b1efd21d8486b228252a1d1056ebc92 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Thu Jun 17 10:39:17 2021 +0000 Setup storage and store last revision/release date Last revision/release dates are excellent candidates for sorting options and filters. These changes will store these values whenever an OriginVisitStatus with a valid snapshot id is received. The dates are fetched from swh.storage by passing the snapshot id. So these changes also configure swh.storage client in swh.search
See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/158/ for more details.
swh/search/tests/test_journal_client.py | ||
---|---|---|
23 | Please don't mind this. I'll remove it before final squashing. |
Almost there ! Some minor changes can still be added before landing this. Regarding my comment about the painless script formatting, it can be handled in another diff.
swh/search/elasticsearch.py | ||
---|---|---|
195–264 | The code of that painless script could be improved a little for better readability. Could you add a second commit (or create another diff once this one gets landed) formatting the code in a way like below: update_script = dedent( """ // utility function to get and parse date ZonedDateTime getDate(def ctx, String date_field) { String default_date = "0001-01-01T00:00:00Z"; String date = ctx._source.getOrDefault(date_field, default_date); return ZonedDateTime.parse(date); } // backup current visit_types field value List visit_types = ctx._source.getOrDefault("visit_types", []); int nb_visits = ctx._source.getOrDefault("nb_visits", 0); ZonedDateTime last_visit_date = getDate(ctx, "last_visit_date"); String snapshot_id = ctx._source.getOrDefault("snapshot_id", ""); ZonedDateTime last_eventful_visit_date = getDate(ctx, "last_eventful_visit_date"); ZonedDateTime last_revision_date = getDate(ctx, "last_revision_date"); ZonedDateTime last_release_date = getDate(ctx, "last_release_date"); // update origin document with new field values ctx._source.putAll(params); // restore previous visit types after visit_types field overriding if (ctx._source.containsKey("visit_types")) { for (int i = 0; i < visit_types.length; ++i) { if (!ctx._source.visit_types.contains(visit_types[i])) { ctx._source.visit_types.add(visit_types[i]); } } } // Undo overwrite if incoming nb_visits is smaller if (ctx._source.containsKey("nb_visits")) { int incoming_nb_visits = ctx._source.getOrDefault("nb_visits", 0); if(incoming_nb_visits < nb_visits){ ctx._source.nb_visits = nb_visits; } } // Undo overwrite if incoming last_visit_date is older if (ctx._source.containsKey("last_visit_date")) { ZonedDateTime incoming_last_visit_date = getDate(ctx, "last_visit_date"); int difference = // returns -1, 0 or 1 incoming_last_visit_date.compareTo(last_visit_date); if(difference < 0){ ctx._source.last_visit_date = last_visit_date; } } // Undo update of last_eventful_date and snapshot_id if // snapshot_id hasn't changed OR incoming_last_eventful_visit_date is older if (ctx._source.containsKey("snapshot_id")) { String incoming_snapshot_id = ctx._source.getOrDefault("snapshot_id", ""); ZonedDateTime incoming_last_eventful_visit_date = getDate(ctx, "last_eventful_visit_date"); int difference = // returns -1, 0 or 1 incoming_last_eventful_visit_date.compareTo(last_eventful_visit_date); if(snapshot_id == incoming_snapshot_id || difference < 0){ ctx._source.snapshot_id = snapshot_id; ctx._source.last_eventful_visit_date = last_eventful_visit_date; } } // Undo overwrite if incoming last_revision_date is older if (ctx._source.containsKey("last_revision_date")) { ZonedDateTime incoming_last_revision_date = getDate(ctx, "last_revision_date"); int difference = // returns -1, 0 or 1 incoming_last_revision_date.compareTo(last_revision_date); if(difference < 0){ ctx._source.last_revision_date = last_revision_date; } } // Undo overwrite if incoming last_release_date is older if (ctx._source.containsKey("last_release_date")) { ZonedDateTime incoming_last_release_date = getDate(ctx, "last_release_date"); // returns -1, 0 or 1 int difference = incoming_last_release_date.compareTo(last_release_date); if(difference < 0){ ctx._source.last_release_date = last_release_date; } } """ | |
swh/search/tests/test_journal_client.py | ||
55 | you can remove the id parameter, it is automatically computed by the model. | |
71 | same here and below for all models. | |
swh/search/tests/test_search.py | ||
363–441 | Both tests are quite similar, you should wrap the code in a private function (taking a parameter indicating if we are testing revision or release) that will be called by those. |
Build is green
Patch application report for D5883 (id=21213)
Rebasing onto 35570b4284...
Current branch diff-target is up to date.
Changes applied before test
commit 790833f7b0348c0574d820e8e31d11587cc0295c Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Wed Jun 23 04:25:27 2021 +0000 Improve code quality commit 4abac24fd5e6742a5e1b2e0009f750d5eb4976a5 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Mon Jun 21 16:51:59 2021 +0000 Remove unnecessary comment commit 2e6a34b8ed706dc9c5e03a5c062fdfd17e0739a6 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Mon Jun 21 06:11:02 2021 +0000 Polish the code commit e6fbdc54f916b6fd6c2bfed97b2acaeecc77fdb5 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Fri Jun 18 15:20:26 2021 +0000 Throw error in absence of storage config commit 426a7d3d72bfd23e6fb547d40ddf43ded1507afe Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Fri Jun 18 15:06:34 2021 +0000 Add test for journal_client and add tests for last_release_date commit 7d6c68e37f61409e2002829bdc873084194cc427 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Fri Jun 18 11:23:02 2021 +0000 Include last revision/release date only when available commit 835edc7e5b1efd21d8486b228252a1d1056ebc92 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Thu Jun 17 10:39:17 2021 +0000 Setup storage and store last revision/release date Last revision/release dates are excellent candidates for sorting options and filters. These changes will store these values whenever an OriginVisitStatus with a valid snapshot id is received. The dates are fetched from swh.storage by passing the snapshot id. So these changes also configure swh.storage client in swh.search
See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/159/ for more details.
Build is green
Patch application report for D5883 (id=21214)
Rebasing onto 35570b4284...
Current branch diff-target is up to date.
Changes applied before test
commit b4c804a3d26f08f22f98461a84d7d9382a1cc3a1 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Wed Jun 23 04:25:27 2021 +0000 Improve code quality commit 4abac24fd5e6742a5e1b2e0009f750d5eb4976a5 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Mon Jun 21 16:51:59 2021 +0000 Remove unnecessary comment commit 2e6a34b8ed706dc9c5e03a5c062fdfd17e0739a6 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Mon Jun 21 06:11:02 2021 +0000 Polish the code commit e6fbdc54f916b6fd6c2bfed97b2acaeecc77fdb5 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Fri Jun 18 15:20:26 2021 +0000 Throw error in absence of storage config commit 426a7d3d72bfd23e6fb547d40ddf43ded1507afe Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Fri Jun 18 15:06:34 2021 +0000 Add test for journal_client and add tests for last_release_date commit 7d6c68e37f61409e2002829bdc873084194cc427 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Fri Jun 18 11:23:02 2021 +0000 Include last revision/release date only when available commit 835edc7e5b1efd21d8486b228252a1d1056ebc92 Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Thu Jun 17 10:39:17 2021 +0000 Setup storage and store last revision/release date Last revision/release dates are excellent candidates for sorting options and filters. These changes will store these values whenever an OriginVisitStatus with a valid snapshot id is received. The dates are fetched from swh.storage by passing the snapshot id. So these changes also configure swh.storage client in swh.search
See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/160/ for more details.
Is it okay if I add .vscode in .gitignore. It often gets included by mistake. swh-indexer, swh-storage and swh-web already have it in their .gitignore files.
Is it okay if I add .vscode in .gitignore. It often gets included by mistake. swh-indexer, swh-storage and swh-web already have it in their .gitignore files.
I guess it's fine if it's already in other modules.
And we should also add it in the .gitignore template repository [1].
Can you also please also take care of updating it at the same time?
[1] https://forge.softwareheritage.org/source/swh-py-template/
Indeed, this makes sense. Visual Studio Code is quite a popular text editor nowadays (I also use it).
@KShivendu, code looks good to me now. Before landing this, you should squash all commits into a single one and improve commit message style following that guide.
Also @vlorentz and @vsellier should validate the review too as they are your official GSOC mentors.
swh/search/tests/test_search.py | ||
---|---|---|
364 | Nitpick: could you rename the type parameter to date_type ? |
Build is green
Patch application report for D5883 (id=21226)
Rebasing onto 35570b4284...
Current branch diff-target is up to date.
Changes applied before test
commit b4675a3a1babbf257ded46d42c3d3757da3456ee Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Thu Jun 17 10:39:17 2021 +0000 journal_client: Store last revision/release date Last revision/release dates are excellent candidates for sorting options and filters. These changes will store these values whenever an OriginVisitStatus with a valid snapshot id is received. The dates are fetched from swh.storage using the snapshot_id. So these changes also configure swh.storage client in swh.search
See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/161/ for more details.
Ok then. Before I can accept this, please wrap your commit message body to 80 chars max.
Something like:
journal_client: Store last revision/release date Last revision/release dates are excellent candidates for sorting options and filters. These changes store these values whenever an OriginVisitStatus with a valid snapshot id is received. The dates are fetched from swh.storage so these changes also configure swh.storage client in swh.search.
Build is green
Patch application report for D5883 (id=21228)
Rebasing onto 35570b4284...
Current branch diff-target is up to date.
Changes applied before test
commit e16952f22eda3ee5ad04f6d73ba089d56c0502fb Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Thu Jun 17 10:39:17 2021 +0000 journal_client: Store last revision/release date Last revision/release dates are excellent candidates for sorting options and filters. These changes store these values whenever an OriginVisitStatus with a valid snapshot id is received. The dates are fetched from swh.storage so these changes also configure swh.storage client in swh.search.
See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/162/ for more details.
Build is green
Patch application report for D5883 (id=21235)
Rebasing onto 35570b4284...
Current branch diff-target is up to date.
Changes applied before test
commit 9bedaa95a39ed8781b3d665346e6ccc62066445b Author: KShivendu <shivendu@iitbhilai.ac.in> Date: Thu Jun 17 10:39:17 2021 +0000 journal_client: Store last revision/release date Last revision/release dates are excellent candidates for sorting options and filters. These changes store these values whenever an OriginVisitStatus with a valid snapshot id is received. The dates are fetched from swh.storage so these changes also configure swh.storage client in swh.search.
See https://jenkins.softwareheritage.org/job/DSEA/job/tests-on-diff/163/ for more details.