buffer: Don't send data from empty buffers
filter: do not call the underlying functions if there's nothing to add
filter: add filtering for release_add
Details
- Reviewers
douardda ardumont - Group Reviewers
Reviewers - Maniphest Tasks
- T3625: Reduce git loader memory footprint
- Commits
- rDSTOabe95b34a2f9: filter: add filtering for release_add
rDSTOc52b7b667911: filter: do not call the underlying functions if there's nothing to add
rDSTO5d5d4c941eac: buffer: Ensure that we don't send data from empty buffers
Diff Detail
- Repository
- rDSTO Storage manager
- Branch
- master
- Lint
No Linters Available - Unit
No Unit Test Coverage - Build Status
Buildable 24267 Build 37876: Phabricator diff pipeline on jenkins Jenkins console · Jenkins Build 37875: arc lint + arc unit
| Time | Test | |
|---|---|---|
| 4 ms | Jenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_filter::test_filtering_proxy_storage_content swh_storage = <swh.storage.proxies.filter.FilteringProxyStorage object at 0x7f2dfd3c49e8>
sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7f2dfd39f908>
| |
| 2 ms | Jenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_filter::test_filtering_proxy_storage_directory swh_storage = <swh.storage.proxies.filter.FilteringProxyStorage object at 0x7f2dfd3c4780>
sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7f2dfd3bfe10>
| |
| 3 ms | Jenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_filter::test_filtering_proxy_storage_revision swh_storage = <swh.storage.proxies.filter.FilteringProxyStorage object at 0x7f2dff466da0>
sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7f2dfd3be358>
| |
| 3 ms | Jenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_filter::test_filtering_proxy_storage_skipped_content swh_storage = <swh.storage.proxies.filter.FilteringProxyStorage object at 0x7f2dfd39f748>
sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7f2dfd394f28>
| |
| 2 ms | Jenkins > .tox.py3.lib.python3.7.site-packages.swh.storage.tests.test_filter::test_filtering_proxy_storage_skipped_content_missing_sha1_git swh_storage = <swh.storage.proxies.filter.FilteringProxyStorage object at 0x7f2dfd3c10f0>
sample_data = <swh.storage.tests.storage_data.StorageData object at 0x7f2dfd3c0da0>
| |
| View Full Test Results (5 Failed · 1,098 Passed · 40 Skipped) |
Event Timeline
Build has FAILED
Patch application report for D6427 (id=23351)
Rebasing onto 113088ab06...
Current branch diff-target is up to date.
Changes applied before test
commit 9fdeb676868b6c927922a31f3ab72f6a3dae5690
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Oct 6 18:30:56 2021 +0200
buffer: add some debug logging for number of objects sent
commit ebbeff524bff6b37e55fd8b791f62bf7cfe0d195
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Oct 6 18:26:04 2021 +0200
filter: add filtering for release_add
commit 2f882f56aa92efb45b97f55094ff26f8ab55db78
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Oct 6 18:25:21 2021 +0200
filter: do not call the underlying functions if there's nothing to add
commit 2959c474e8af5e6c0e986207673716b10d6aef87
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Oct 6 18:21:26 2021 +0200
buffer: Don't send data from empty buffersLink to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1434/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1434/console
Build has FAILED
Patch application report for D6427 (id=23353)
Rebasing onto 113088ab06...
Current branch diff-target is up to date.
Changes applied before test
commit 404e5a3bf893a7a626452e90f2f8ae5f9ab7c7dc
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Oct 6 18:30:56 2021 +0200
buffer: add some debug logging for number of objects sent
commit 4129cd0d9fba6d31ee1f63a04a5dfe1cac4b70c3
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Oct 6 18:26:04 2021 +0200
filter: add filtering for release_add
commit 15029c871061231ba136e7a07fac3e8f6816b87c
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Oct 6 18:25:21 2021 +0200
filter: do not call the underlying functions if there's nothing to add
commit 2959c474e8af5e6c0e986207673716b10d6aef87
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Oct 6 18:21:26 2021 +0200
buffer: Don't send data from empty buffersLink to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1435/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1435/console
I've triggered a run in staging worker (patched venv with this) on NixOS/nixpkgs.
It's still ongoing.
But i'm already seeing it's doing the right thing about few directories
having lots of directory entries (new threshold introduced here helped).
DEBUG:swh.storage.proxies.buffer:Flushing 46 objects of type directory DEBUG:swh.storage.proxies.buffer:Flushed 202727 directory entries DEBUG:swh.storage.proxies.buffer:Flushing 46 objects of type directory DEBUG:swh.storage.proxies.buffer:Flushed 203098 directory entries DEBUG:swh.storage.proxies.buffer:Flushing 46 objects of type directory DEBUG:swh.storage.proxies.buffer:Flushed 203210 directory entries DEBUG:swh.storage.proxies.buffer:Flushing 1000 objects of type directory DEBUG:swh.storage.proxies.buffer:Flushed 95783 directory entries DEBUG:swh.storage.proxies.buffer:Flushing 1000 objects of type directory DEBUG:swh.storage.proxies.buffer:Flushed 21288 directory entries DEBUG:swh.storage.proxies.buffer:Flushing 1000 objects of type directory DEBUG:swh.storage.proxies.buffer:Flushed 152126 directory entries DEBUG:swh.storage.proxies.buffer:Flushing 327 objects of type directory DEBUG:swh.storage.proxies.buffer:Flushed 200418 directory entries
It's not complete though, it's currently failing beyond revision.
Probably the new stuff about release.
DEBUG:swh.storage.proxies.buffer:Flushing 67 objects of type revision
DEBUG:swh.storage.proxies.buffer:Flushing 30 objects of type release
ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` status
Traceback (most recent call last):
File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/loader/core/loader.py", line 339, in load
self.store_data()
File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/loader/core/loader.py", line 467, in store_data
self.flush()
File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/loader/core/loader.py", line 168, in flush
self.storage.flush()
File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/storage/proxies/buffer.py", line 193, in flush
stats = add_fn(list(batch))
File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 84, in release_add
missing_ids = self._filter_missing_ids("release", (r.id for r in releases))
File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 142, in _filter_missing_ids
fn = fn_by_object_type[object_type]
KeyError: 'release'
DEBUG:swh.storage.proxies.buffer:Flushing 67 objects of type revision
DEBUG:swh.storage.proxies.buffer:Flushing 30 objects of type release
Traceback (most recent call last):
File "/home/swhworker/profiling/bin/swh", line 8, in <module>
sys.exit(main())
File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/core/cli/__init__.py", line 185, in main
return swh(auto_envvar_prefix="SWH")
File "/home/swhworker/profiling/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/swhworker/profiling/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/swhworker/profiling/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/swhworker/profiling/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/swhworker/profiling/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/swhworker/profiling/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/swhworker/profiling/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/loader/cli.py", line 105, in run
result = loader.load()
File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/loader/core/loader.py", line 382, in load
self.flush()
File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/loader/core/loader.py", line 168, in flush
self.storage.flush()
File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/storage/proxies/buffer.py", line 193, in flush
stats = add_fn(list(batch))
File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 84, in release_add
missing_ids = self._filter_missing_ids("release", (r.id for r in releases))
File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 142, in _filter_missing_ids
fn = fn_by_object_type[object_type]
KeyError: 'release'| swh/storage/proxies/filter.py | ||
|---|---|---|
| 37 | yes, @olasd, regarding my last comment [1], adding release here should fix the problem. [1] D6427#167010 | |
| swh/storage/proxies/filter.py | ||
|---|---|---|
| 37 | Ah, yeah, duh. Time to clean this up and make sure everything is tested... | |
| swh/storage/proxies/buffer.py | ||
|---|---|---|
| 177–187 | small nitpick | |
Update tests for the following changes:
- buffer: Ensure that we don't send data from empty buffers
- filter: do not call the underlying functions if there's nothing to add
- filter: add filtering for release_add
I'll split off the new buffer thresholds in a new diff. This diff now only contains the (small) improvements to the buffer/filter proxies
Build was aborted
Patch application report for D6427 (id=23399)
Rebasing onto 113088ab06...
Current branch diff-target is up to date.
Changes applied before test
commit abe95b34a2f9d8e16b16de7f406c61d16d753339
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Oct 6 18:26:04 2021 +0200
filter: add filtering for release_add
commit c52b7b66791166a80239710095764abfd9609c65
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Oct 6 18:25:21 2021 +0200
filter: do not call the underlying functions if there's nothing to add
commit 5d5d4c941eacf622de42b4454baa32efa3207d45
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Oct 6 18:21:26 2021 +0200
buffer: Ensure that we don't send data from empty buffers
This was already the case (as grouper called on an empty iterator just
returns no batches), but add a test to enforce it.Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1442/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1442/console
Build is green
Patch application report for D6427 (id=23399)
Rebasing onto 113088ab06...
Current branch diff-target is up to date.
Changes applied before test
commit abe95b34a2f9d8e16b16de7f406c61d16d753339
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Oct 6 18:26:04 2021 +0200
filter: add filtering for release_add
commit c52b7b66791166a80239710095764abfd9609c65
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Oct 6 18:25:21 2021 +0200
filter: do not call the underlying functions if there's nothing to add
commit 5d5d4c941eacf622de42b4454baa32efa3207d45
Author: Nicolas Dandrimont <nicolas@dandrimont.eu>
Date: Wed Oct 6 18:21:26 2021 +0200
buffer: Ensure that we don't send data from empty buffers
This was already the case (as grouper called on an empty iterator just
returns no batches), but add a test to enforce it.See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1443/ for more details.