buffer: Don't send data from empty buffers
filter: do not call the underlying functions if there's nothing to add
filter: add filtering for release_add
Details
- Reviewers
douardda ardumont - Group Reviewers
Reviewers - Maniphest Tasks
- T3625: Reduce git loader memory footprint
- Commits
- rDSTOabe95b34a2f9: filter: add filtering for release_add
rDSTOc52b7b667911: filter: do not call the underlying functions if there's nothing to add
rDSTO5d5d4c941eac: buffer: Ensure that we don't send data from empty buffers
Diff Detail
- Repository
- rDSTO Storage manager
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Event Timeline
Build has FAILED
Patch application report for D6427 (id=23351)
Rebasing onto 113088ab06...
Current branch diff-target is up to date.
Changes applied before test
commit 9fdeb676868b6c927922a31f3ab72f6a3dae5690 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Oct 6 18:30:56 2021 +0200 buffer: add some debug logging for number of objects sent commit ebbeff524bff6b37e55fd8b791f62bf7cfe0d195 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Oct 6 18:26:04 2021 +0200 filter: add filtering for release_add commit 2f882f56aa92efb45b97f55094ff26f8ab55db78 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Oct 6 18:25:21 2021 +0200 filter: do not call the underlying functions if there's nothing to add commit 2959c474e8af5e6c0e986207673716b10d6aef87 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Oct 6 18:21:26 2021 +0200 buffer: Don't send data from empty buffers
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1434/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1434/console
Build has FAILED
Patch application report for D6427 (id=23353)
Rebasing onto 113088ab06...
Current branch diff-target is up to date.
Changes applied before test
commit 404e5a3bf893a7a626452e90f2f8ae5f9ab7c7dc Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Oct 6 18:30:56 2021 +0200 buffer: add some debug logging for number of objects sent commit 4129cd0d9fba6d31ee1f63a04a5dfe1cac4b70c3 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Oct 6 18:26:04 2021 +0200 filter: add filtering for release_add commit 15029c871061231ba136e7a07fac3e8f6816b87c Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Oct 6 18:25:21 2021 +0200 filter: do not call the underlying functions if there's nothing to add commit 2959c474e8af5e6c0e986207673716b10d6aef87 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Oct 6 18:21:26 2021 +0200 buffer: Don't send data from empty buffers
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1435/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1435/console
I've triggered a run in staging worker (patched venv with this) on NixOS/nixpkgs.
It's still ongoing.
But i'm already seeing it's doing the right thing about few directories
having lots of directory entries (new threshold introduced here helped).
DEBUG:swh.storage.proxies.buffer:Flushing 46 objects of type directory DEBUG:swh.storage.proxies.buffer:Flushed 202727 directory entries DEBUG:swh.storage.proxies.buffer:Flushing 46 objects of type directory DEBUG:swh.storage.proxies.buffer:Flushed 203098 directory entries DEBUG:swh.storage.proxies.buffer:Flushing 46 objects of type directory DEBUG:swh.storage.proxies.buffer:Flushed 203210 directory entries DEBUG:swh.storage.proxies.buffer:Flushing 1000 objects of type directory DEBUG:swh.storage.proxies.buffer:Flushed 95783 directory entries DEBUG:swh.storage.proxies.buffer:Flushing 1000 objects of type directory DEBUG:swh.storage.proxies.buffer:Flushed 21288 directory entries DEBUG:swh.storage.proxies.buffer:Flushing 1000 objects of type directory DEBUG:swh.storage.proxies.buffer:Flushed 152126 directory entries DEBUG:swh.storage.proxies.buffer:Flushing 327 objects of type directory DEBUG:swh.storage.proxies.buffer:Flushed 200418 directory entries
It's not complete though, it's currently failing beyond revision.
Probably the new stuff about release.
DEBUG:swh.storage.proxies.buffer:Flushing 67 objects of type revision DEBUG:swh.storage.proxies.buffer:Flushing 30 objects of type release ERROR:swh.loader.git.loader.GitLoader:Loading failure, updating to `failed` status Traceback (most recent call last): File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/loader/core/loader.py", line 339, in load self.store_data() File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/loader/core/loader.py", line 467, in store_data self.flush() File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/loader/core/loader.py", line 168, in flush self.storage.flush() File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/storage/proxies/buffer.py", line 193, in flush stats = add_fn(list(batch)) File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 84, in release_add missing_ids = self._filter_missing_ids("release", (r.id for r in releases)) File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 142, in _filter_missing_ids fn = fn_by_object_type[object_type] KeyError: 'release' DEBUG:swh.storage.proxies.buffer:Flushing 67 objects of type revision DEBUG:swh.storage.proxies.buffer:Flushing 30 objects of type release Traceback (most recent call last): File "/home/swhworker/profiling/bin/swh", line 8, in <module> sys.exit(main()) File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/core/cli/__init__.py", line 185, in main return swh(auto_envvar_prefix="SWH") File "/home/swhworker/profiling/lib/python3.7/site-packages/click/core.py", line 829, in __call__ return self.main(*args, **kwargs) File "/home/swhworker/profiling/lib/python3.7/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/home/swhworker/profiling/lib/python3.7/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/swhworker/profiling/lib/python3.7/site-packages/click/core.py", line 1259, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/swhworker/profiling/lib/python3.7/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/swhworker/profiling/lib/python3.7/site-packages/click/core.py", line 610, in invoke return callback(*args, **kwargs) File "/home/swhworker/profiling/lib/python3.7/site-packages/click/decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/loader/cli.py", line 105, in run result = loader.load() File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/loader/core/loader.py", line 382, in load self.flush() File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/loader/core/loader.py", line 168, in flush self.storage.flush() File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/storage/proxies/buffer.py", line 193, in flush stats = add_fn(list(batch)) File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 84, in release_add missing_ids = self._filter_missing_ids("release", (r.id for r in releases)) File "/home/swhworker/profiling/lib/python3.7/site-packages/swh/storage/proxies/filter.py", line 142, in _filter_missing_ids fn = fn_by_object_type[object_type] KeyError: 'release'
swh/storage/proxies/filter.py | ||
---|---|---|
37 ↗ | (On Diff #23353) | yes, @olasd, regarding my last comment [1], adding release here should fix the problem. [1] D6427#167010 |
swh/storage/proxies/filter.py | ||
---|---|---|
37 ↗ | (On Diff #23353) | Ah, yeah, duh. Time to clean this up and make sure everything is tested... |
swh/storage/proxies/buffer.py | ||
---|---|---|
178–188 ↗ | (On Diff #23353) | small nitpick |
Update tests for the following changes:
- buffer: Ensure that we don't send data from empty buffers
- filter: do not call the underlying functions if there's nothing to add
- filter: add filtering for release_add
I'll split off the new buffer thresholds in a new diff. This diff now only contains the (small) improvements to the buffer/filter proxies
Build was aborted
Patch application report for D6427 (id=23399)
Rebasing onto 113088ab06...
Current branch diff-target is up to date.
Changes applied before test
commit abe95b34a2f9d8e16b16de7f406c61d16d753339 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Oct 6 18:26:04 2021 +0200 filter: add filtering for release_add commit c52b7b66791166a80239710095764abfd9609c65 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Oct 6 18:25:21 2021 +0200 filter: do not call the underlying functions if there's nothing to add commit 5d5d4c941eacf622de42b4454baa32efa3207d45 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Oct 6 18:21:26 2021 +0200 buffer: Ensure that we don't send data from empty buffers This was already the case (as grouper called on an empty iterator just returns no batches), but add a test to enforce it.
Link to build: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1442/
See console output for more information: https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1442/console
Build is green
Patch application report for D6427 (id=23399)
Rebasing onto 113088ab06...
Current branch diff-target is up to date.
Changes applied before test
commit abe95b34a2f9d8e16b16de7f406c61d16d753339 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Oct 6 18:26:04 2021 +0200 filter: add filtering for release_add commit c52b7b66791166a80239710095764abfd9609c65 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Oct 6 18:25:21 2021 +0200 filter: do not call the underlying functions if there's nothing to add commit 5d5d4c941eacf622de42b4454baa32efa3207d45 Author: Nicolas Dandrimont <nicolas@dandrimont.eu> Date: Wed Oct 6 18:21:26 2021 +0200 buffer: Ensure that we don't send data from empty buffers This was already the case (as grouper called on an empty iterator just returns no batches), but add a test to enforce it.
See https://jenkins.softwareheritage.org/job/DSTO/job/tests-on-diff/1443/ for more details.