- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Oct 18 2022
Passing --single-branch option to git clone do not result in a timeout:
$ git clone https://forge.softwareheritage.org/source/swh-graph.git Cloning into 'swh-graph'... error: RPC failed; HTTP 504 curl 22 The requested URL returned error: 504 fatal: the remote end hung up unexpectedly
Parametrize test with extra loader arguments
Parametrize tests with extra loader arguments
Parametrize tests with extra loader aguments
@vsellier, I landed all optimizations for the CVS loader and tagged a new version v0.5.0 so you can retry the NetBSD repository loading on staging.
Looks good to me.
In D8669#225737, @vlorentz wrote:apidoc links are sometimes useful. eg. https://docs.softwareheritage.org/devel/apidoc/swh.lister.crates.html documents the lister's design
Oct 17 2022
Add assert fallback
Bump swh.model
Remove double pasted line
In D8682#226232, @vlorentz wrote:That's a surprisingly small diff for such a change, nice!
What speedup do you get with this?
Update: s/cpan-module-json/cpan-release-json/
Update:
- use hash builtin instead of adding a new hash_to_int method
- update tests
In D8652#226221, @vlorentz wrote:Please update https://docs.softwareheritage.org/devel/swh-storage/extrinsic-metadata-specification.html#extrinsic-metadata-formats when landing this
Rebase
Oct 14 2022
In D8682#226178, @swh-public-ci wrote:Build has FAILED
Patch application report for D8682 (id=31362)
Rebasing onto 965c3de498...
Current branch diff-target is up to date.Changes applied before test
commit b47790a2c8260e5b4e1c3ef8981a76db6563c139 Author: Antoine Lambert <anlambert@softwareheritage.org> Date: Thu Oct 13 18:00:37 2022 +0200 loader: Yield only modified objects in process_cvs_changesets Previously, after each revision replay all files and directories of the CVS repository being loaded were collected and sent to the storage. This is a real bottleneck in terms of loading performances as it delegates the filtering of new objects to archive to the storage filtering proxy. As we known exactly the set of paths that have been modified in a CVS revision, prefer to do that filtering on the loader side and only send modified objects to storage instead of the whole set of contents and directories from the reconstructed filesystem. This should greatly improve loading performance for large repositories but also reduce loader memory consumption. commit 76a19ee665b39e6ec31399d1c814b95264b26912 Author: Antoine Lambert <anlambert@softwareheritage.org> Date: Thu Oct 13 17:30:51 2022 +0200 loader: Reconstruct repo filesystem incrementally at each revision Instead of creating a from_disk.Directory instance after each replayed CVS revision by recursively scanning all directories of the repository, prefer to have a single one as class member kept synchronized with the recontructed filesystem after each revision replay. This should improve loader in terms of performance, especially when delaing with large repositories.Link to build: https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/134/
See console output for more information: https://jenkins.softwareheritage.org/job/DLDCVS/job/tests-on-diff/134/console
Use from_disk.Directory.collect to get added/modified objects instead of maintaining a set of paths.
In D8682#226119, @anlambert wrote:In D8682#226118, @olasd wrote:In D8682#226117, @olasd wrote:swh.model.from_disk.Directory has a collect method which is supposed to do the change tracking by itself (it only returns the nodes that have changed since the last time .collect() was called). This should allow you to drop the modified_paths tracking altogether.
Ah, collect uses get_data which yields a bunch of dicts. Meh. That should probably be updated to just yield the nodes themselves.
Oh nice, I did not know we already have such feature in swh-model. I will try to use it and adapt implementation if needed.
In D8682#226118, @olasd wrote:In D8682#226117, @olasd wrote:swh.model.from_disk.Directory has a collect method which is supposed to do the change tracking by itself (it only returns the nodes that have changed since the last time .collect() was called). This should allow you to drop the modified_paths tracking altogether.
Ah, collect uses get_data which yields a bunch of dicts. Meh. That should probably be updated to just yield the nodes themselves.
Oct 13 2022
In T4625#93225, @vsellier wrote:
Rebase
Remove not used test archive
Use endswith