Page MenuHomeSoftware Heritage

Add tree diffing in HgLoaderFromDisk
ClosedPublic

Authored by acezar on Fri, Nov 20, 10:50 AM.

Details

Summary

Add tree diffing in HgLoaderFromDisk

By looking at differences between revisions, the repository tree is
updated rather that fully rebuild for each one.

Observed load time improvement on https://www.mercurial-scm.org/repo/hg/
1:11:02 -> 47:58

Diff Detail

Repository
rDLDHG Mercurial loader
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

acezar created this revision.Fri, Nov 20, 10:50 AM

Build is green

Patch application report for D4540 (id=16094)

Could not rebase; Attempt merge onto bd914dec39...

Updating bd914de..9c6fbd6
Fast-forward
 requirements.txt                                   |   1 +
 setup.py                                           |   2 +
 swh/loader/mercurial/__init__.py                   |   4 +-
 swh/loader/mercurial/cli.py                        |   6 +-
 swh/loader/mercurial/from_bundle.py                | 641 ++++++++++++++++++++
 swh/loader/mercurial/from_disk.py                  | 504 ++++++++++++++++
 swh/loader/mercurial/hgutil.py                     |  66 +++
 swh/loader/mercurial/identify.py                   | 541 +++++++++++++++++
 swh/loader/mercurial/loader.py                     | 645 +--------------------
 swh/loader/mercurial/tasks.py                      |   8 +-
 swh/loader/mercurial/tests/data/build.py           | 265 +++++++++
 swh/loader/mercurial/tests/data/example.json       |   1 +
 swh/loader/mercurial/tests/data/example.sh         |  59 ++
 swh/loader/mercurial/tests/data/example.tgz        | Bin 0 -> 51200 bytes
 swh/loader/mercurial/tests/data/hello.json         |   1 +
 swh/loader/mercurial/tests/data/the-sandbox.json   |   1 +
 swh/loader/mercurial/tests/data/transplant.json    |   1 +
 swh/loader/mercurial/tests/loader_checker.py       |  74 +++
 .../tests/{test_loader.py => test_from_bundle.py}  |  14 +-
 swh/loader/mercurial/tests/test_from_disk.py       | 209 +++++++
 swh/loader/mercurial/tests/test_hgutil.py          |  46 ++
 swh/loader/mercurial/tests/test_identify.py        |  74 +++
 swh/loader/mercurial/tests/test_loader.org         | 121 ----
 swh/loader/mercurial/tests/test_tasks.py           |   6 +-
 24 files changed, 2514 insertions(+), 776 deletions(-)
 create mode 100644 swh/loader/mercurial/from_bundle.py
 create mode 100644 swh/loader/mercurial/from_disk.py
 create mode 100644 swh/loader/mercurial/hgutil.py
 create mode 100644 swh/loader/mercurial/identify.py
 create mode 100755 swh/loader/mercurial/tests/data/build.py
 create mode 100644 swh/loader/mercurial/tests/data/example.json
 create mode 100644 swh/loader/mercurial/tests/data/example.sh
 create mode 100644 swh/loader/mercurial/tests/data/example.tgz
 create mode 100644 swh/loader/mercurial/tests/data/hello.json
 create mode 100644 swh/loader/mercurial/tests/data/the-sandbox.json
 create mode 100644 swh/loader/mercurial/tests/data/transplant.json
 create mode 100644 swh/loader/mercurial/tests/loader_checker.py
 rename swh/loader/mercurial/tests/{test_loader.py => test_from_bundle.py} (93%)
 create mode 100644 swh/loader/mercurial/tests/test_from_disk.py
 create mode 100644 swh/loader/mercurial/tests/test_hgutil.py
 create mode 100644 swh/loader/mercurial/tests/test_identify.py
 delete mode 100644 swh/loader/mercurial/tests/test_loader.org
Changes applied before test
commit 9c6fbd63c8a9c784acff1ee8933966f54bd363ca
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 10:47:29 2020 +0100

    Add tree diffing in HgLoaderFromDisk
    
    Avoid rebuilding the whole tree for revision.
    
    Load time improvement on https://www.mercurial-scm.org/repo/hg/
    1:11:02 -> 47:58.84

commit 96e3da394e5f28b30dc61bdefe303a98b04f89c4
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Wed Oct 28 11:33:17 2020 +0100

    Add mercurial.from_disk.HgLoaderFromDisk
    
    Rather than relying on mercurial bundles this loader expect a local repository.

commit c8c91ab674a9ade49caacd63a5b507bab67df9dc
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Mon Oct 19 16:42:21 2020 +0200

    Add new example repository generated from script
    
    First updatable example repository documented by its generation script.

commit bc32e1280cfd6a59df595cdcbcc2c2b51b3618aa
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Mon Oct 19 16:22:07 2020 +0200

    Add `Hg20BundleLoader` tests from json files
    
    Generated json files with `swh/loader/mercurial/tests/data/build.py` for
    existing repositories and added them to `Hg20BundleLoader` tests.
    
    Introduce `LoaderChecker` as a standardized way to test repositories
    against json files.

commit ff11f77f1b493bd1c8ed257e790ded8da276101c
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Oct 16 11:28:35 2020 +0200

    Add testing repository builder
    
    This build script purpose is to create example repositories from bash scripts
    and extract assertion data from them into json files.
    
    Advantages:
    
        - the bash script documents the repository creation
        - automating creation allow easy repository update
        - automation extraction allow easier update of assertion data

commit a2e9cf16919a5f81a06f955a533a254a9b3c9689
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/101/ for more details.

marmoute requested changes to this revision.Fri, Nov 20, 12:14 PM
marmoute added a subscriber: marmoute.
marmoute added inline comments.
swh/loader/mercurial/from_disk.py
111–116

We need to iterate from the highest level directory, not the lower lever one, right ?

If I am not mistaken, this code seems wrong. Why is this not caught by the tests ?

152

please document this attribute

153

this should be last_root, should it not ? (also, should probably be _last_hg_nodeid and _last root

449–450

We should consider diffing to the parent, but this is not garantee to be a good idea. Possibly reusing the manifest cache would give use a better performance anyway. So the comment should mention the possible change, but not present it as mandatory.

466

Why do we need this ? I would expect the possibly cached hash to be invalidated when we alter the structure in the lines above that one.

This revision now requires changes to proceed.Fri, Nov 20, 12:14 PM
acezar updated this revision to Diff 16204.Mon, Nov 23, 4:15 PM
acezar marked 5 inline comments as done.

Followup

acezar added inline comments.Mon, Nov 23, 4:17 PM
swh/loader/mercurial/from_disk.py
111–116

the way __setitem__ is implemented in Directory the traversal is done recursivly from lower lever to the higher. Going the other way will repeat the traversal lookup for each level. So in this case starting from the higher seems better to me.

The code has evolved, but can you explain why the code is wrong?

Build is green

Patch application report for D4540 (id=16204)

Could not rebase; Attempt merge onto bd914dec39...

Updating bd914de..a6480fc
Fast-forward
 requirements.txt                                   |   1 +
 setup.py                                           |   2 +
 swh/loader/mercurial/__init__.py                   |   4 +-
 swh/loader/mercurial/cli.py                        |   6 +-
 swh/loader/mercurial/from_bundle.py                | 641 ++++++++++++++++++++
 swh/loader/mercurial/from_disk.py                  | 493 ++++++++++++++++
 swh/loader/mercurial/hgutil.py                     |  78 +++
 swh/loader/mercurial/identify.py                   | 541 +++++++++++++++++
 swh/loader/mercurial/loader.py                     | 645 +--------------------
 swh/loader/mercurial/tasks.py                      |   8 +-
 swh/loader/mercurial/tests/data/build.py           | 265 +++++++++
 swh/loader/mercurial/tests/data/example.json       |   1 +
 swh/loader/mercurial/tests/data/example.sh         |  59 ++
 swh/loader/mercurial/tests/data/example.tgz        | Bin 0 -> 51200 bytes
 swh/loader/mercurial/tests/data/hello.json         |   1 +
 swh/loader/mercurial/tests/data/the-sandbox.json   |   1 +
 swh/loader/mercurial/tests/data/transplant.json    |   1 +
 swh/loader/mercurial/tests/loader_checker.py       |  74 +++
 .../tests/{test_loader.py => test_from_bundle.py}  |  14 +-
 swh/loader/mercurial/tests/test_from_disk.py       | 199 +++++++
 swh/loader/mercurial/tests/test_hgutil.py          |  46 ++
 swh/loader/mercurial/tests/test_identify.py        |  74 +++
 swh/loader/mercurial/tests/test_loader.org         | 121 ----
 swh/loader/mercurial/tests/test_tasks.py           |   6 +-
 24 files changed, 2507 insertions(+), 774 deletions(-)
 create mode 100644 swh/loader/mercurial/from_bundle.py
 create mode 100644 swh/loader/mercurial/from_disk.py
 create mode 100644 swh/loader/mercurial/hgutil.py
 create mode 100644 swh/loader/mercurial/identify.py
 create mode 100755 swh/loader/mercurial/tests/data/build.py
 create mode 100644 swh/loader/mercurial/tests/data/example.json
 create mode 100644 swh/loader/mercurial/tests/data/example.sh
 create mode 100644 swh/loader/mercurial/tests/data/example.tgz
 create mode 100644 swh/loader/mercurial/tests/data/hello.json
 create mode 100644 swh/loader/mercurial/tests/data/the-sandbox.json
 create mode 100644 swh/loader/mercurial/tests/data/transplant.json
 create mode 100644 swh/loader/mercurial/tests/loader_checker.py
 rename swh/loader/mercurial/tests/{test_loader.py => test_from_bundle.py} (93%)
 create mode 100644 swh/loader/mercurial/tests/test_from_disk.py
 create mode 100644 swh/loader/mercurial/tests/test_hgutil.py
 create mode 100644 swh/loader/mercurial/tests/test_identify.py
 delete mode 100644 swh/loader/mercurial/tests/test_loader.org
Changes applied before test
commit a6480fc459e32ba5463f0d69441f52d893bf39da
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 10:47:29 2020 +0100

    Add tree diffing in HgLoaderFromDisk
    
    Avoid rebuilding the whole tree for revision.
    
    Load time improvement on https://www.mercurial-scm.org/repo/hg/
    1:11:02 -> 47:58.84
    
    Differential Revision: https://forge.softwareheritage.org/D4540

commit 9074bd977debf9b5412fcad025c98f5630d05666
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 11:11:39 2020 +0100

    Add content lru cache to HgLoaderFromDisk
    
    Summary: Avoid recalculation of unchanged content hash between revisions
    
    Reviewers: #reviewers
    
    Differential Revision: https://forge.softwareheritage.org/D4541

commit b35536071623338213dcf22352c3a8f332b32344
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Wed Oct 28 11:33:17 2020 +0100

    Add mercurial.from_disk.HgLoaderFromDisk
    
    Rather than relying on mercurial bundles this loader expect a local repository.
    
    Differential Revision: https://forge.softwareheritage.org/D3435

commit c8c91ab674a9ade49caacd63a5b507bab67df9dc
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Mon Oct 19 16:42:21 2020 +0200

    Add new example repository generated from script
    
    First updatable example repository documented by its generation script.

commit bc32e1280cfd6a59df595cdcbcc2c2b51b3618aa
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Mon Oct 19 16:22:07 2020 +0200

    Add `Hg20BundleLoader` tests from json files
    
    Generated json files with `swh/loader/mercurial/tests/data/build.py` for
    existing repositories and added them to `Hg20BundleLoader` tests.
    
    Introduce `LoaderChecker` as a standardized way to test repositories
    against json files.

commit ff11f77f1b493bd1c8ed257e790ded8da276101c
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Oct 16 11:28:35 2020 +0200

    Add testing repository builder
    
    This build script purpose is to create example repositories from bash scripts
    and extract assertion data from them into json files.
    
    Advantages:
    
        - the bash script documents the repository creation
        - automating creation allow easy repository update
        - automation extraction allow easier update of assertion data

commit a2e9cf16919a5f81a06f955a533a254a9b3c9689
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/109/ for more details.

acezar updated this revision to Diff 16259.Tue, Nov 24, 2:11 PM

Followup

Build is green

Patch application report for D4540 (id=16259)

Could not rebase; Attempt merge onto bd914dec39...

Updating bd914de..34ffb89
Fast-forward
 requirements.txt                                   |   1 +
 setup.py                                           |   3 +
 swh/loader/mercurial/__init__.py                   |  10 +
 swh/loader/mercurial/from_disk.py                  | 507 +++++++++++++++++++
 swh/loader/mercurial/hgutil.py                     |  78 +++
 swh/loader/mercurial/identify.py                   | 541 +++++++++++++++++++++
 swh/loader/mercurial/tasks_from_disk.py            |  33 ++
 swh/loader/mercurial/tests/data/build.py           | 265 ++++++++++
 swh/loader/mercurial/tests/data/example.json       |   1 +
 swh/loader/mercurial/tests/data/example.sh         |  59 +++
 swh/loader/mercurial/tests/data/example.tgz        | Bin 0 -> 51200 bytes
 swh/loader/mercurial/tests/data/hello.json         |   1 +
 swh/loader/mercurial/tests/data/the-sandbox.json   |   1 +
 swh/loader/mercurial/tests/data/transplant.json    |   1 +
 swh/loader/mercurial/tests/loader_checker.py       |  74 +++
 swh/loader/mercurial/tests/test_from_disk.py       | 199 ++++++++
 swh/loader/mercurial/tests/test_hgutil.py          |  46 ++
 swh/loader/mercurial/tests/test_identify.py        |  74 +++
 swh/loader/mercurial/tests/test_loader.org         | 121 -----
 swh/loader/mercurial/tests/test_tasks_from_disk.py |  47 ++
 20 files changed, 1941 insertions(+), 121 deletions(-)
 create mode 100644 swh/loader/mercurial/from_disk.py
 create mode 100644 swh/loader/mercurial/hgutil.py
 create mode 100644 swh/loader/mercurial/identify.py
 create mode 100644 swh/loader/mercurial/tasks_from_disk.py
 create mode 100755 swh/loader/mercurial/tests/data/build.py
 create mode 100644 swh/loader/mercurial/tests/data/example.json
 create mode 100644 swh/loader/mercurial/tests/data/example.sh
 create mode 100644 swh/loader/mercurial/tests/data/example.tgz
 create mode 100644 swh/loader/mercurial/tests/data/hello.json
 create mode 100644 swh/loader/mercurial/tests/data/the-sandbox.json
 create mode 100644 swh/loader/mercurial/tests/data/transplant.json
 create mode 100644 swh/loader/mercurial/tests/loader_checker.py
 create mode 100644 swh/loader/mercurial/tests/test_from_disk.py
 create mode 100644 swh/loader/mercurial/tests/test_hgutil.py
 create mode 100644 swh/loader/mercurial/tests/test_identify.py
 delete mode 100644 swh/loader/mercurial/tests/test_loader.org
 create mode 100644 swh/loader/mercurial/tests/test_tasks_from_disk.py
Changes applied before test
commit 34ffb89308b19e1ea74c571866f290d3f86d29ca
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 10:47:29 2020 +0100

    Add tree diffing in HgLoaderFromDisk
    
    Avoid rebuilding the whole tree for revision.
    
    Load time improvement on https://www.mercurial-scm.org/repo/hg/
    1:11:02 -> 47:58.84
    
    Differential Revision: https://forge.softwareheritage.org/D4540

commit 908349f1155e8f4abe939e6ae7b0d53419545f58
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 11:11:39 2020 +0100

    Add content lru cache to HgLoaderFromDisk
    
    Summary: Avoid recalculation of unchanged content hash between revisions
    
    Reviewers: #reviewers
    
    Differential Revision: https://forge.softwareheritage.org/D4541

commit ede3e31d7b8b654c81607a967982e7330d88c98a
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Wed Oct 28 11:33:17 2020 +0100

    Add mercurial.from_disk.HgLoaderFromDisk
    
    Rather than relying on mercurial bundles this loader expect a local repository.
    
    Differential Revision: https://forge.softwareheritage.org/D3435

commit c8c91ab674a9ade49caacd63a5b507bab67df9dc
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Mon Oct 19 16:42:21 2020 +0200

    Add new example repository generated from script
    
    First updatable example repository documented by its generation script.

commit bc32e1280cfd6a59df595cdcbcc2c2b51b3618aa
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Mon Oct 19 16:22:07 2020 +0200

    Add `Hg20BundleLoader` tests from json files
    
    Generated json files with `swh/loader/mercurial/tests/data/build.py` for
    existing repositories and added them to `Hg20BundleLoader` tests.
    
    Introduce `LoaderChecker` as a standardized way to test repositories
    against json files.

commit ff11f77f1b493bd1c8ed257e790ded8da276101c
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Oct 16 11:28:35 2020 +0200

    Add testing repository builder
    
    This build script purpose is to create example repositories from bash scripts
    and extract assertion data from them into json files.
    
    Advantages:
    
        - the bash script documents the repository creation
        - automating creation allow easy repository update
        - automation extraction allow easier update of assertion data

commit a2e9cf16919a5f81a06f955a533a254a9b3c9689
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/113/ for more details.

marmoute added inline comments.Tue, Nov 24, 2:21 PM
swh/loader/mercurial/from_disk.py
111–116

(note: I cannot find the original code anymore, (thanks phab).)

In short, if you delete empty directory from root to leaf, you can end up with empty directory anyway. Because we checked it they were empty before deleting they empty children.

acezar marked an inline comment as done.Tue, Nov 24, 3:15 PM
acezar added inline comments.
swh/loader/mercurial/from_disk.py
111–116

(was https://forge.softwareheritage.org/D4540?id=16094#inline-31160)

I see now. Thanks.

The deletion was not necessary anyway as shown by the new code.

acezar updated this revision to Diff 16279.Tue, Nov 24, 7:21 PM
acezar marked an inline comment as done.

Followup

Build is green

Patch application report for D4540 (id=16279)

Could not rebase; Attempt merge onto bd914dec39...

Updating bd914de..4ab0f84
Fast-forward
 requirements.txt                                   |   1 +
 setup.py                                           |   3 +
 swh/loader/mercurial/__init__.py                   |  10 +
 swh/loader/mercurial/from_disk.py                  | 514 ++++++++++++++++++++
 swh/loader/mercurial/hgutil.py                     |  78 +++
 swh/loader/mercurial/identify.py                   | 541 +++++++++++++++++++++
 swh/loader/mercurial/tasks_from_disk.py            |  33 ++
 swh/loader/mercurial/tests/data/build.py           | 265 ++++++++++
 swh/loader/mercurial/tests/data/example.json       |   1 +
 swh/loader/mercurial/tests/data/example.sh         |  59 +++
 swh/loader/mercurial/tests/data/example.tgz        | Bin 0 -> 51200 bytes
 swh/loader/mercurial/tests/data/hello.json         |   1 +
 swh/loader/mercurial/tests/data/the-sandbox.json   |   1 +
 swh/loader/mercurial/tests/data/transplant.json    |   1 +
 swh/loader/mercurial/tests/loader_checker.py       |  74 +++
 swh/loader/mercurial/tests/test_from_disk.py       | 199 ++++++++
 swh/loader/mercurial/tests/test_hgutil.py          |  46 ++
 swh/loader/mercurial/tests/test_identify.py        |  74 +++
 swh/loader/mercurial/tests/test_loader.org         | 121 -----
 swh/loader/mercurial/tests/test_tasks_from_disk.py |  47 ++
 20 files changed, 1948 insertions(+), 121 deletions(-)
 create mode 100644 swh/loader/mercurial/from_disk.py
 create mode 100644 swh/loader/mercurial/hgutil.py
 create mode 100644 swh/loader/mercurial/identify.py
 create mode 100644 swh/loader/mercurial/tasks_from_disk.py
 create mode 100755 swh/loader/mercurial/tests/data/build.py
 create mode 100644 swh/loader/mercurial/tests/data/example.json
 create mode 100644 swh/loader/mercurial/tests/data/example.sh
 create mode 100644 swh/loader/mercurial/tests/data/example.tgz
 create mode 100644 swh/loader/mercurial/tests/data/hello.json
 create mode 100644 swh/loader/mercurial/tests/data/the-sandbox.json
 create mode 100644 swh/loader/mercurial/tests/data/transplant.json
 create mode 100644 swh/loader/mercurial/tests/loader_checker.py
 create mode 100644 swh/loader/mercurial/tests/test_from_disk.py
 create mode 100644 swh/loader/mercurial/tests/test_hgutil.py
 create mode 100644 swh/loader/mercurial/tests/test_identify.py
 delete mode 100644 swh/loader/mercurial/tests/test_loader.org
 create mode 100644 swh/loader/mercurial/tests/test_tasks_from_disk.py
Changes applied before test
commit 4ab0f8401b677c54a716162be674a8f3af5a777b
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 10:47:29 2020 +0100

    Add tree diffing in HgLoaderFromDisk
    
    Avoid rebuilding the whole tree for revision.
    
    Load time improvement on https://www.mercurial-scm.org/repo/hg/
    1:11:02 -> 47:58.84

commit bd98badaa4e3c25363a905f17f91ecc06a42af29
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 11:11:39 2020 +0100

    Add content lru cache to HgLoaderFromDisk
    
    Avoid recalculation of unchanged content hash between revisions

commit f14a65f97b272db087b2ad823dbdeb44fda768e0
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Wed Oct 28 11:33:17 2020 +0100

    Add mercurial.from_disk.HgLoaderFromDisk
    
    Rather than relying on mercurial bundles this loader expect a local repository.

commit c8c91ab674a9ade49caacd63a5b507bab67df9dc
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Mon Oct 19 16:42:21 2020 +0200

    Add new example repository generated from script
    
    First updatable example repository documented by its generation script.

commit bc32e1280cfd6a59df595cdcbcc2c2b51b3618aa
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Mon Oct 19 16:22:07 2020 +0200

    Add `Hg20BundleLoader` tests from json files
    
    Generated json files with `swh/loader/mercurial/tests/data/build.py` for
    existing repositories and added them to `Hg20BundleLoader` tests.
    
    Introduce `LoaderChecker` as a standardized way to test repositories
    against json files.

commit ff11f77f1b493bd1c8ed257e790ded8da276101c
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Oct 16 11:28:35 2020 +0200

    Add testing repository builder
    
    This build script purpose is to create example repositories from bash scripts
    and extract assertion data from them into json files.
    
    Advantages:
    
        - the bash script documents the repository creation
        - automating creation allow easy repository update
        - automation extraction allow easier update of assertion data

commit a2e9cf16919a5f81a06f955a533a254a9b3c9689
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/116/ for more details.

acezar updated this revision to Diff 16285.Wed, Nov 25, 12:13 PM

Deleted empty directories

Build is green

Patch application report for D4540 (id=16285)

Could not rebase; Attempt merge onto bd914dec39...

Updating bd914de..4f737dc
Fast-forward
 requirements.txt                                   |   1 +
 setup.py                                           |   3 +
 swh/loader/mercurial/__init__.py                   |  10 +
 swh/loader/mercurial/from_disk.py                  | 522 ++++++++++++++++++++
 swh/loader/mercurial/hgutil.py                     |  78 +++
 swh/loader/mercurial/identify.py                   | 541 +++++++++++++++++++++
 swh/loader/mercurial/tasks_from_disk.py            |  33 ++
 swh/loader/mercurial/tests/data/build.py           | 265 ++++++++++
 swh/loader/mercurial/tests/data/example.json       |   1 +
 swh/loader/mercurial/tests/data/example.sh         |  59 +++
 swh/loader/mercurial/tests/data/example.tgz        | Bin 0 -> 51200 bytes
 swh/loader/mercurial/tests/data/hello.json         |   1 +
 swh/loader/mercurial/tests/data/the-sandbox.json   |   1 +
 swh/loader/mercurial/tests/data/transplant.json    |   1 +
 swh/loader/mercurial/tests/loader_checker.py       |  74 +++
 swh/loader/mercurial/tests/test_from_disk.py       | 214 ++++++++
 swh/loader/mercurial/tests/test_hgutil.py          |  46 ++
 swh/loader/mercurial/tests/test_identify.py        |  74 +++
 swh/loader/mercurial/tests/test_loader.org         | 121 -----
 swh/loader/mercurial/tests/test_tasks_from_disk.py |  47 ++
 20 files changed, 1971 insertions(+), 121 deletions(-)
 create mode 100644 swh/loader/mercurial/from_disk.py
 create mode 100644 swh/loader/mercurial/hgutil.py
 create mode 100644 swh/loader/mercurial/identify.py
 create mode 100644 swh/loader/mercurial/tasks_from_disk.py
 create mode 100755 swh/loader/mercurial/tests/data/build.py
 create mode 100644 swh/loader/mercurial/tests/data/example.json
 create mode 100644 swh/loader/mercurial/tests/data/example.sh
 create mode 100644 swh/loader/mercurial/tests/data/example.tgz
 create mode 100644 swh/loader/mercurial/tests/data/hello.json
 create mode 100644 swh/loader/mercurial/tests/data/the-sandbox.json
 create mode 100644 swh/loader/mercurial/tests/data/transplant.json
 create mode 100644 swh/loader/mercurial/tests/loader_checker.py
 create mode 100644 swh/loader/mercurial/tests/test_from_disk.py
 create mode 100644 swh/loader/mercurial/tests/test_hgutil.py
 create mode 100644 swh/loader/mercurial/tests/test_identify.py
 delete mode 100644 swh/loader/mercurial/tests/test_loader.org
 create mode 100644 swh/loader/mercurial/tests/test_tasks_from_disk.py
Changes applied before test
commit 4f737dc7c4efe19a15c3a029f7b6a13c14ba2f81
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 10:47:29 2020 +0100

    Add tree diffing in HgLoaderFromDisk
    
    Avoid rebuilding the whole tree for revision.
    
    Load time improvement on https://www.mercurial-scm.org/repo/hg/
    1:11:02 -> 47:58.84

commit 421c3c696e9ac8e552e833c2fa95d839bbb4dba4
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 11:11:39 2020 +0100

    Add content lru cache to HgLoaderFromDisk
    
    Avoid recalculation of unchanged content hash between revisions

commit 12411aa64133e2578f20444ab09e117ccb4634d5
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Wed Oct 28 11:33:17 2020 +0100

    Add mercurial.from_disk.HgLoaderFromDisk
    
    Rather than relying on mercurial bundles this loader expect a local repository.

commit c8c91ab674a9ade49caacd63a5b507bab67df9dc
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Mon Oct 19 16:42:21 2020 +0200

    Add new example repository generated from script
    
    First updatable example repository documented by its generation script.

commit bc32e1280cfd6a59df595cdcbcc2c2b51b3618aa
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Mon Oct 19 16:22:07 2020 +0200

    Add `Hg20BundleLoader` tests from json files
    
    Generated json files with `swh/loader/mercurial/tests/data/build.py` for
    existing repositories and added them to `Hg20BundleLoader` tests.
    
    Introduce `LoaderChecker` as a standardized way to test repositories
    against json files.

commit ff11f77f1b493bd1c8ed257e790ded8da276101c
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Oct 16 11:28:35 2020 +0200

    Add testing repository builder
    
    This build script purpose is to create example repositories from bash scripts
    and extract assertion data from them into json files.
    
    Advantages:
    
        - the bash script documents the repository creation
        - automating creation allow easy repository update
        - automation extraction allow easier update of assertion data

commit a2e9cf16919a5f81a06f955a533a254a9b3c9689
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/117/ for more details.

marmoute requested changes to this revision.Wed, Nov 25, 2:19 PM

almost there.

swh/loader/mercurial/tests/test_from_disk.py
66–72

You should add a variant with a mix of empty and non empty directory. To make sure we do not over delete.

This revision now requires changes to proceed.Wed, Nov 25, 2:19 PM
acezar updated this revision to Diff 16301.Wed, Nov 25, 2:37 PM

Followup

acezar marked an inline comment as done.Wed, Nov 25, 2:38 PM

Build is green

Patch application report for D4540 (id=16301)

Could not rebase; Attempt merge onto bd914dec39...

Updating bd914de..a441aa8
Fast-forward
 requirements.txt                                   |   1 +
 setup.py                                           |   3 +
 swh/loader/mercurial/__init__.py                   |  10 +
 swh/loader/mercurial/from_disk.py                  | 526 ++++++++++++++++++++
 swh/loader/mercurial/hgutil.py                     |  78 +++
 swh/loader/mercurial/identify.py                   | 541 +++++++++++++++++++++
 swh/loader/mercurial/tasks_from_disk.py            |  33 ++
 swh/loader/mercurial/tests/data/build.py           | 265 ++++++++++
 swh/loader/mercurial/tests/data/example.json       |   1 +
 swh/loader/mercurial/tests/data/example.sh         |  59 +++
 swh/loader/mercurial/tests/data/example.tgz        | Bin 0 -> 51200 bytes
 swh/loader/mercurial/tests/data/hello.json         |   1 +
 swh/loader/mercurial/tests/data/the-sandbox.json   |   1 +
 swh/loader/mercurial/tests/data/transplant.json    |   1 +
 swh/loader/mercurial/tests/loader_checker.py       |  74 +++
 swh/loader/mercurial/tests/test_from_disk.py       | 215 ++++++++
 swh/loader/mercurial/tests/test_hgutil.py          |  46 ++
 swh/loader/mercurial/tests/test_identify.py        |  74 +++
 swh/loader/mercurial/tests/test_loader.org         | 121 -----
 swh/loader/mercurial/tests/test_tasks_from_disk.py |  47 ++
 20 files changed, 1976 insertions(+), 121 deletions(-)
 create mode 100644 swh/loader/mercurial/from_disk.py
 create mode 100644 swh/loader/mercurial/hgutil.py
 create mode 100644 swh/loader/mercurial/identify.py
 create mode 100644 swh/loader/mercurial/tasks_from_disk.py
 create mode 100755 swh/loader/mercurial/tests/data/build.py
 create mode 100644 swh/loader/mercurial/tests/data/example.json
 create mode 100644 swh/loader/mercurial/tests/data/example.sh
 create mode 100644 swh/loader/mercurial/tests/data/example.tgz
 create mode 100644 swh/loader/mercurial/tests/data/hello.json
 create mode 100644 swh/loader/mercurial/tests/data/the-sandbox.json
 create mode 100644 swh/loader/mercurial/tests/data/transplant.json
 create mode 100644 swh/loader/mercurial/tests/loader_checker.py
 create mode 100644 swh/loader/mercurial/tests/test_from_disk.py
 create mode 100644 swh/loader/mercurial/tests/test_hgutil.py
 create mode 100644 swh/loader/mercurial/tests/test_identify.py
 delete mode 100644 swh/loader/mercurial/tests/test_loader.org
 create mode 100644 swh/loader/mercurial/tests/test_tasks_from_disk.py
Changes applied before test
commit a441aa8ee884af4d901e220fd4a69a10ee02448a
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 10:47:29 2020 +0100

    Add tree diffing in HgLoaderFromDisk
    
    Avoid rebuilding the whole tree for revision.
    
    Load time improvement on https://www.mercurial-scm.org/repo/hg/
    1:11:02 -> 47:58.84

commit 421c3c696e9ac8e552e833c2fa95d839bbb4dba4
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 11:11:39 2020 +0100

    Add content lru cache to HgLoaderFromDisk
    
    Avoid recalculation of unchanged content hash between revisions

commit 12411aa64133e2578f20444ab09e117ccb4634d5
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Wed Oct 28 11:33:17 2020 +0100

    Add mercurial.from_disk.HgLoaderFromDisk
    
    Rather than relying on mercurial bundles this loader expect a local repository.

commit c8c91ab674a9ade49caacd63a5b507bab67df9dc
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Mon Oct 19 16:42:21 2020 +0200

    Add new example repository generated from script
    
    First updatable example repository documented by its generation script.

commit bc32e1280cfd6a59df595cdcbcc2c2b51b3618aa
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Mon Oct 19 16:22:07 2020 +0200

    Add `Hg20BundleLoader` tests from json files
    
    Generated json files with `swh/loader/mercurial/tests/data/build.py` for
    existing repositories and added them to `Hg20BundleLoader` tests.
    
    Introduce `LoaderChecker` as a standardized way to test repositories
    against json files.

commit ff11f77f1b493bd1c8ed257e790ded8da276101c
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Oct 16 11:28:35 2020 +0200

    Add testing repository builder
    
    This build script purpose is to create example repositories from bash scripts
    and extract assertion data from them into json files.
    
    Advantages:
    
        - the bash script documents the repository creation
        - automating creation allow easy repository update
        - automation extraction allow easier update of assertion data

commit a2e9cf16919a5f81a06f955a533a254a9b3c9689
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/119/ for more details.

acezar updated this revision to Diff 16344.Thu, Nov 26, 2:26 PM

Fix bug when file is replaced by directory between 2 revisions

Build is green

Patch application report for D4540 (id=16344)

Could not rebase; Attempt merge onto bd914dec39...

Updating bd914de..32337fc
Fast-forward
 requirements.txt                                   |   1 +
 setup.py                                           |   3 +
 swh/loader/mercurial/__init__.py                   |  10 +
 swh/loader/mercurial/from_disk.py                  | 529 ++++++++++++++++++++
 swh/loader/mercurial/hgutil.py                     |  78 +++
 swh/loader/mercurial/identify.py                   | 541 +++++++++++++++++++++
 swh/loader/mercurial/tasks_from_disk.py            |  33 ++
 swh/loader/mercurial/tests/data/build.py           | 265 ++++++++++
 swh/loader/mercurial/tests/data/example.json       |   1 +
 swh/loader/mercurial/tests/data/example.sh         |  59 +++
 swh/loader/mercurial/tests/data/example.tgz        | Bin 0 -> 51200 bytes
 swh/loader/mercurial/tests/data/hello.json         |   1 +
 swh/loader/mercurial/tests/data/the-sandbox.json   |   1 +
 swh/loader/mercurial/tests/data/transplant.json    |   1 +
 swh/loader/mercurial/tests/loader_checker.py       |  74 +++
 swh/loader/mercurial/tests/test_from_disk.py       | 221 +++++++++
 swh/loader/mercurial/tests/test_hgutil.py          |  46 ++
 swh/loader/mercurial/tests/test_identify.py        |  74 +++
 swh/loader/mercurial/tests/test_loader.org         | 121 -----
 swh/loader/mercurial/tests/test_tasks_from_disk.py |  47 ++
 20 files changed, 1985 insertions(+), 121 deletions(-)
 create mode 100644 swh/loader/mercurial/from_disk.py
 create mode 100644 swh/loader/mercurial/hgutil.py
 create mode 100644 swh/loader/mercurial/identify.py
 create mode 100644 swh/loader/mercurial/tasks_from_disk.py
 create mode 100755 swh/loader/mercurial/tests/data/build.py
 create mode 100644 swh/loader/mercurial/tests/data/example.json
 create mode 100644 swh/loader/mercurial/tests/data/example.sh
 create mode 100644 swh/loader/mercurial/tests/data/example.tgz
 create mode 100644 swh/loader/mercurial/tests/data/hello.json
 create mode 100644 swh/loader/mercurial/tests/data/the-sandbox.json
 create mode 100644 swh/loader/mercurial/tests/data/transplant.json
 create mode 100644 swh/loader/mercurial/tests/loader_checker.py
 create mode 100644 swh/loader/mercurial/tests/test_from_disk.py
 create mode 100644 swh/loader/mercurial/tests/test_hgutil.py
 create mode 100644 swh/loader/mercurial/tests/test_identify.py
 delete mode 100644 swh/loader/mercurial/tests/test_loader.org
 create mode 100644 swh/loader/mercurial/tests/test_tasks_from_disk.py
Changes applied before test
commit 32337fc7a736d8a7882775f55de19acbbdb43428
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 10:47:29 2020 +0100

    Add tree diffing in HgLoaderFromDisk
    
    Avoid rebuilding the whole tree for revision.
    
    Load time improvement on https://www.mercurial-scm.org/repo/hg/
    1:11:02 -> 47:58.84

commit 31d81786b6b6030597e64f8d3b9d286b8d3e4454
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 11:11:39 2020 +0100

    Add content lru cache to HgLoaderFromDisk
    
    Avoid recalculation of unchanged content hash between revisions

commit bfc44ab8688a8b74ccaf7ecb25be5fb8db27f548
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Wed Oct 28 11:33:17 2020 +0100

    Add mercurial.from_disk.HgLoaderFromDisk
    
    Rather than relying on mercurial bundles this loader expect a local repository.

commit c8c91ab674a9ade49caacd63a5b507bab67df9dc
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Mon Oct 19 16:42:21 2020 +0200

    Add new example repository generated from script
    
    First updatable example repository documented by its generation script.

commit bc32e1280cfd6a59df595cdcbcc2c2b51b3618aa
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Mon Oct 19 16:22:07 2020 +0200

    Add `Hg20BundleLoader` tests from json files
    
    Generated json files with `swh/loader/mercurial/tests/data/build.py` for
    existing repositories and added them to `Hg20BundleLoader` tests.
    
    Introduce `LoaderChecker` as a standardized way to test repositories
    against json files.

commit ff11f77f1b493bd1c8ed257e790ded8da276101c
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Oct 16 11:28:35 2020 +0200

    Add testing repository builder
    
    This build script purpose is to create example repositories from bash scripts
    and extract assertion data from them into json files.
    
    Advantages:
    
        - the bash script documents the repository creation
        - automating creation allow easy repository update
        - automation extraction allow easier update of assertion data

commit a2e9cf16919a5f81a06f955a533a254a9b3c9689
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/123/ for more details.

marmoute requested changes to this revision.Thu, Nov 26, 2:44 PM
marmoute added inline comments.
swh/loader/mercurial/tests/test_from_disk.py
73

The length on that directory will always be (1), even if some was not deleted. So you need to check that len('path/some') is 1.

This revision now requires changes to proceed.Thu, Nov 26, 2:44 PM
acezar updated this revision to Diff 16349.Thu, Nov 26, 3:47 PM

Fix empty directories removal

acezar marked an inline comment as done.Thu, Nov 26, 3:47 PM

Build is green

Patch application report for D4540 (id=16349)

Could not rebase; Attempt merge onto bd914dec39...

Updating bd914de..e33743a
Fast-forward
 requirements.txt                                   |   1 +
 setup.py                                           |   3 +
 swh/loader/mercurial/__init__.py                   |  10 +
 swh/loader/mercurial/from_disk.py                  | 543 +++++++++++++++++++++
 swh/loader/mercurial/hgutil.py                     |  78 +++
 swh/loader/mercurial/identify.py                   | 541 ++++++++++++++++++++
 swh/loader/mercurial/tasks_from_disk.py            |  33 ++
 swh/loader/mercurial/tests/data/build.py           | 265 ++++++++++
 swh/loader/mercurial/tests/data/example.json       |   1 +
 swh/loader/mercurial/tests/data/example.sh         |  59 +++
 swh/loader/mercurial/tests/data/example.tgz        | Bin 0 -> 51200 bytes
 swh/loader/mercurial/tests/data/hello.json         |   1 +
 swh/loader/mercurial/tests/data/the-sandbox.json   |   1 +
 swh/loader/mercurial/tests/data/transplant.json    |   1 +
 swh/loader/mercurial/tests/loader_checker.py       |  74 +++
 swh/loader/mercurial/tests/test_from_disk.py       | 243 +++++++++
 swh/loader/mercurial/tests/test_hgutil.py          |  46 ++
 swh/loader/mercurial/tests/test_identify.py        |  74 +++
 swh/loader/mercurial/tests/test_loader.org         | 121 -----
 swh/loader/mercurial/tests/test_tasks_from_disk.py |  47 ++
 20 files changed, 2021 insertions(+), 121 deletions(-)
 create mode 100644 swh/loader/mercurial/from_disk.py
 create mode 100644 swh/loader/mercurial/hgutil.py
 create mode 100644 swh/loader/mercurial/identify.py
 create mode 100644 swh/loader/mercurial/tasks_from_disk.py
 create mode 100755 swh/loader/mercurial/tests/data/build.py
 create mode 100644 swh/loader/mercurial/tests/data/example.json
 create mode 100644 swh/loader/mercurial/tests/data/example.sh
 create mode 100644 swh/loader/mercurial/tests/data/example.tgz
 create mode 100644 swh/loader/mercurial/tests/data/hello.json
 create mode 100644 swh/loader/mercurial/tests/data/the-sandbox.json
 create mode 100644 swh/loader/mercurial/tests/data/transplant.json
 create mode 100644 swh/loader/mercurial/tests/loader_checker.py
 create mode 100644 swh/loader/mercurial/tests/test_from_disk.py
 create mode 100644 swh/loader/mercurial/tests/test_hgutil.py
 create mode 100644 swh/loader/mercurial/tests/test_identify.py
 delete mode 100644 swh/loader/mercurial/tests/test_loader.org
 create mode 100644 swh/loader/mercurial/tests/test_tasks_from_disk.py
Changes applied before test
commit e33743aa28db73bd806445ab31fcce6ba9463d8d
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 10:47:29 2020 +0100

    Add tree diffing in HgLoaderFromDisk
    
    Avoid rebuilding the whole tree for revision.
    
    Load time improvement on https://www.mercurial-scm.org/repo/hg/
    1:11:02 -> 47:58.84

commit 31d81786b6b6030597e64f8d3b9d286b8d3e4454
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 11:11:39 2020 +0100

    Add content lru cache to HgLoaderFromDisk
    
    Avoid recalculation of unchanged content hash between revisions

commit bfc44ab8688a8b74ccaf7ecb25be5fb8db27f548
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Wed Oct 28 11:33:17 2020 +0100

    Add mercurial.from_disk.HgLoaderFromDisk
    
    Rather than relying on mercurial bundles this loader expect a local repository.

commit c8c91ab674a9ade49caacd63a5b507bab67df9dc
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Mon Oct 19 16:42:21 2020 +0200

    Add new example repository generated from script
    
    First updatable example repository documented by its generation script.

commit bc32e1280cfd6a59df595cdcbcc2c2b51b3618aa
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Mon Oct 19 16:22:07 2020 +0200

    Add `Hg20BundleLoader` tests from json files
    
    Generated json files with `swh/loader/mercurial/tests/data/build.py` for
    existing repositories and added them to `Hg20BundleLoader` tests.
    
    Introduce `LoaderChecker` as a standardized way to test repositories
    against json files.

commit ff11f77f1b493bd1c8ed257e790ded8da276101c
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Oct 16 11:28:35 2020 +0200

    Add testing repository builder
    
    This build script purpose is to create example repositories from bash scripts
    and extract assertion data from them into json files.
    
    Advantages:
    
        - the bash script documents the repository creation
        - automating creation allow easy repository update
        - automation extraction allow easier update of assertion data

commit a2e9cf16919a5f81a06f955a533a254a9b3c9689
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Thu Oct 8 18:07:50 2020 +0200

    add swh-hg-identify a cli to identify hg objects

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/124/ for more details.

marmoute accepted this revision.Thu, Nov 26, 3:48 PM

great

This revision is now accepted and ready to land.Thu, Nov 26, 3:48 PM

Build is green

Patch application report for D4540 (id=16425)

Could not rebase; Attempt merge onto c8c91ab674...

Updating c8c91ab..d3885c7
Fast-forward
 setup.py                                           |   1 +
 swh/loader/mercurial/__init__.py                   |  10 +
 swh/loader/mercurial/from_disk.py                  | 484 +++++++++++++++++++++
 swh/loader/mercurial/hgutil.py                     |  78 ++++
 swh/loader/mercurial/tasks_from_disk.py            |  33 ++
 swh/loader/mercurial/tests/test_from_disk.py       | 205 +++++++++
 swh/loader/mercurial/tests/test_hgutil.py          |  46 ++
 swh/loader/mercurial/tests/test_loader.org         | 121 ------
 swh/loader/mercurial/tests/test_loader.py          |  12 -
 swh/loader/mercurial/tests/test_tasks_from_disk.py |  47 ++
 10 files changed, 904 insertions(+), 133 deletions(-)
 create mode 100644 swh/loader/mercurial/from_disk.py
 create mode 100644 swh/loader/mercurial/hgutil.py
 create mode 100644 swh/loader/mercurial/tasks_from_disk.py
 create mode 100644 swh/loader/mercurial/tests/test_from_disk.py
 create mode 100644 swh/loader/mercurial/tests/test_hgutil.py
 delete mode 100644 swh/loader/mercurial/tests/test_loader.org
 create mode 100644 swh/loader/mercurial/tests/test_tasks_from_disk.py
Changes applied before test
commit d3885c7f6e7a5ab19bb226576cb656ab7501106e
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 11:11:39 2020 +0100

    Add content lru cache to HgLoaderFromDisk
    
    Avoid recalculation of unchanged content hash between revisions

commit a1c8afa5e42cc58eef255c38ce0585aa71eac0a6
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Wed Oct 28 11:33:17 2020 +0100

    Add mercurial.from_disk.HgLoaderFromDisk
    
    Rather than relying on mercurial bundles this loader expect a local repository.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/128/ for more details.

Build is green

Patch application report for D4540 (id=16426)

Could not rebase; Attempt merge onto c8c91ab674...

Updating c8c91ab..66082d2
Fast-forward
 setup.py                                           |   1 +
 swh/loader/mercurial/__init__.py                   |  10 +
 swh/loader/mercurial/from_disk.py                  | 533 +++++++++++++++++++++
 swh/loader/mercurial/hgutil.py                     |  78 +++
 swh/loader/mercurial/tasks_from_disk.py            |  33 ++
 swh/loader/mercurial/tests/test_from_disk.py       | 243 ++++++++++
 swh/loader/mercurial/tests/test_hgutil.py          |  46 ++
 swh/loader/mercurial/tests/test_loader.org         | 121 -----
 swh/loader/mercurial/tests/test_loader.py          |  12 -
 swh/loader/mercurial/tests/test_tasks_from_disk.py |  47 ++
 10 files changed, 991 insertions(+), 133 deletions(-)
 create mode 100644 swh/loader/mercurial/from_disk.py
 create mode 100644 swh/loader/mercurial/hgutil.py
 create mode 100644 swh/loader/mercurial/tasks_from_disk.py
 create mode 100644 swh/loader/mercurial/tests/test_from_disk.py
 create mode 100644 swh/loader/mercurial/tests/test_hgutil.py
 delete mode 100644 swh/loader/mercurial/tests/test_loader.org
 create mode 100644 swh/loader/mercurial/tests/test_tasks_from_disk.py
Changes applied before test
commit 66082d2169f5c8a388b0a77f4fd75a28eaf97b54
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 10:47:29 2020 +0100

    Add tree diffing in HgLoaderFromDisk
    
    Avoid rebuilding the whole tree for revision.
    
    Load time improvement on https://www.mercurial-scm.org/repo/hg/
    1:11:02 -> 47:58.84

commit d3885c7f6e7a5ab19bb226576cb656ab7501106e
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 11:11:39 2020 +0100

    Add content lru cache to HgLoaderFromDisk
    
    Avoid recalculation of unchanged content hash between revisions

commit a1c8afa5e42cc58eef255c38ce0585aa71eac0a6
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Wed Oct 28 11:33:17 2020 +0100

    Add mercurial.from_disk.HgLoaderFromDisk
    
    Rather than relying on mercurial bundles this loader expect a local repository.

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/129/ for more details.

douardda accepted this revision.Tue, Dec 1, 5:27 PM
douardda added a subscriber: douardda.

lgtm *but*:

  • you should remove the ".84" part in the timestamps (I first read 1h -> 42h)
  • I'd like the commit message to give a bit more details on what this diff really is doing (if I get the idea, keep the directory structure between 2 revisions and "apply the modifications" rather than rebuild the whole structure, right?)
acezar updated this revision to Diff 16466.Tue, Dec 1, 5:42 PM

Improve commit message

Build is green

Patch application report for D4540 (id=16466)

Could not rebase; Attempt merge onto a1c8afa5e4...

Updating a1c8afa..4bf91cf
Fast-forward
 swh/loader/mercurial/from_disk.py            | 84 +++++++++++++++++++++++++---
 swh/loader/mercurial/hgutil.py               |  3 +-
 swh/loader/mercurial/tests/test_from_disk.py | 42 +++++++++++++-
 3 files changed, 117 insertions(+), 12 deletions(-)
Changes applied before test
commit 4bf91cff72d364956b39f78ad8c8a00bd3f0d8c9
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 10:47:29 2020 +0100

    Add tree diffing in HgLoaderFromDisk
    
    By looking at differences between revisions, the repository tree is
    updated rather that fully rebuild for each one.
    
    Observed load time improvement on https://www.mercurial-scm.org/repo/hg/
    1:11:02 -> 47:58

commit d3885c7f6e7a5ab19bb226576cb656ab7501106e
Author: Antoine Cezar <antoine.cezar@octobus.net>
Date:   Fri Nov 20 11:11:39 2020 +0100

    Add content lru cache to HgLoaderFromDisk
    
    Avoid recalculation of unchanged content hash between revisions

See https://jenkins.softwareheritage.org/job/DLDHG/job/tests-on-diff/130/ for more details.

acezar edited the summary of this revision. (Show Details)Tue, Dec 1, 5:44 PM
This revision was automatically updated to reflect the committed changes.