diff --git a/PKG-INFO b/PKG-INFO index 8c849406..2bcf9259 100644 --- a/PKG-INFO +++ b/PKG-INFO @@ -1,218 +1,218 @@ Metadata-Version: 2.1 Name: swh.storage -Version: 0.14.0 +Version: 0.14.1 Summary: Software Heritage storage manager Home-page: https://forge.softwareheritage.org/diffusion/DSTO/ Author: Software Heritage developers Author-email: swh-devel@inria.fr License: UNKNOWN Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest Project-URL: Funding, https://www.softwareheritage.org/donate Project-URL: Source, https://forge.softwareheritage.org/source/swh-storage Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-storage/ Description: swh-storage =========== Abstraction layer over the archive, allowing to access all stored source code artifacts as well as their metadata. See the [documentation](https://docs.softwareheritage.org/devel/swh-storage/index.html) for more details. ## Quick start ### Dependencies Python tests for this module include tests that cannot be run without a local Postgresql database, so you need the Postgresql server executable on your machine (no need to have a running Postgresql server). They also expect a cassandra server. #### Debian-like host ``` $ sudo apt install libpq-dev postgresql-11 cassandra ``` #### Non Debian-like host The tests expects the path to `cassandra` to either be unspecified, it is then looked up at `/usr/sbin/cassandra`, either specified through the environment variable `SWH_CASSANDRA_BIN`. Optionally, you can avoid running the cassandra tests. ``` (swh) :~/swh-storage$ tox -- -m 'not cassandra' ``` ### Installation It is strongly recommended to use a virtualenv. In the following, we consider you work in a virtualenv named `swh`. See the [developer setup guide](https://docs.softwareheritage.org/devel/developer-setup.html#developer-setup) for a more details on how to setup a working environment. You can install the package directly from [pypi](https://pypi.org/p/swh.storage): ``` (swh) :~$ pip install swh.storage [...] ``` Or from sources: ``` (swh) :~$ git clone https://forge.softwareheritage.org/source/swh-storage.git [...] (swh) :~$ cd swh-storage (swh) :~/swh-storage$ pip install . [...] ``` Then you can check it's properly installed: ``` (swh) :~$ swh storage --help Usage: swh storage [OPTIONS] COMMAND [ARGS]... Software Heritage Storage tools. Options: -h, --help Show this message and exit. Commands: rpc-serve Software Heritage Storage RPC server. ``` ## Tests The best way of running Python tests for this module is to use [tox](https://tox.readthedocs.io/). ``` (swh) :~$ pip install tox ``` ### tox From the sources directory, simply use tox: ``` (swh) :~/swh-storage$ tox [...] ========= 315 passed, 6 skipped, 15 warnings in 40.86 seconds ========== _______________________________ summary ________________________________ flake8: commands succeeded py3: commands succeeded congratulations :) ``` ## Development The storage server can be locally started. It requires a configuration file and a running Postgresql database. ### Sample configuration A typical configuration `storage.yml` file is: ``` storage: cls: local args: db: "dbname=softwareheritage-dev user= password=" objstorage: cls: pathslicing args: root: /tmp/swh-storage/ slicing: 0:2/2:4/4:6 ``` which means, this uses: - a local storage instance whose db connection is to `softwareheritage-dev` local instance, - the objstorage uses a local objstorage instance whose: - `root` path is /tmp/swh-storage, - slicing scheme is `0:2/2:4/4:6`. This means that the identifier of the content (sha1) which will be stored on disk at first level with the first 2 hex characters, the second level with the next 2 hex characters and the third level with the next 2 hex characters. And finally the complete hash file holding the raw content. For example: 00062f8bd330715c4f819373653d97b3cd34394c will be stored at 00/06/2f/00062f8bd330715c4f819373653d97b3cd34394c Note that the `root` path should exist on disk before starting the server. ### Starting the storage server If the python package has been properly installed (e.g. in a virtual env), you should be able to use the command: ``` (swh) :~/swh-storage$ swh storage rpc-serve storage.yml ``` This runs a local swh-storage api at 5002 port. ``` (swh) :~/swh-storage$ curl http://127.0.0.1:5002 Software Heritage storage server

You have reached the Software Heritage storage server.
See its documentation and API for more information

``` ### And then what? In your upper layer ([loader-git](https://forge.softwareheritage.org/source/swh-loader-git/), [loader-svn](https://forge.softwareheritage.org/source/swh-loader-svn/), etc...), you can define a remote storage with this snippet of yaml configuration. ``` storage: cls: remote args: url: http://localhost:5002/ ``` You could directly define a local storage with the following snippet: ``` storage: cls: local args: db: service=swh-dev objstorage: cls: pathslicing args: root: /home/storage/swh-storage/ slicing: 0:2/2:4/4:6 ``` Platform: UNKNOWN Classifier: Programming Language :: Python :: 3 Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3) Classifier: Operating System :: OS Independent Classifier: Development Status :: 5 - Production/Stable Requires-Python: >=3.7 Description-Content-Type: text/markdown Provides-Extra: testing Provides-Extra: schemata Provides-Extra: journal diff --git a/swh.storage.egg-info/PKG-INFO b/swh.storage.egg-info/PKG-INFO index 8c849406..2bcf9259 100644 --- a/swh.storage.egg-info/PKG-INFO +++ b/swh.storage.egg-info/PKG-INFO @@ -1,218 +1,218 @@ Metadata-Version: 2.1 Name: swh.storage -Version: 0.14.0 +Version: 0.14.1 Summary: Software Heritage storage manager Home-page: https://forge.softwareheritage.org/diffusion/DSTO/ Author: Software Heritage developers Author-email: swh-devel@inria.fr License: UNKNOWN Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest Project-URL: Funding, https://www.softwareheritage.org/donate Project-URL: Source, https://forge.softwareheritage.org/source/swh-storage Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-storage/ Description: swh-storage =========== Abstraction layer over the archive, allowing to access all stored source code artifacts as well as their metadata. See the [documentation](https://docs.softwareheritage.org/devel/swh-storage/index.html) for more details. ## Quick start ### Dependencies Python tests for this module include tests that cannot be run without a local Postgresql database, so you need the Postgresql server executable on your machine (no need to have a running Postgresql server). They also expect a cassandra server. #### Debian-like host ``` $ sudo apt install libpq-dev postgresql-11 cassandra ``` #### Non Debian-like host The tests expects the path to `cassandra` to either be unspecified, it is then looked up at `/usr/sbin/cassandra`, either specified through the environment variable `SWH_CASSANDRA_BIN`. Optionally, you can avoid running the cassandra tests. ``` (swh) :~/swh-storage$ tox -- -m 'not cassandra' ``` ### Installation It is strongly recommended to use a virtualenv. In the following, we consider you work in a virtualenv named `swh`. See the [developer setup guide](https://docs.softwareheritage.org/devel/developer-setup.html#developer-setup) for a more details on how to setup a working environment. You can install the package directly from [pypi](https://pypi.org/p/swh.storage): ``` (swh) :~$ pip install swh.storage [...] ``` Or from sources: ``` (swh) :~$ git clone https://forge.softwareheritage.org/source/swh-storage.git [...] (swh) :~$ cd swh-storage (swh) :~/swh-storage$ pip install . [...] ``` Then you can check it's properly installed: ``` (swh) :~$ swh storage --help Usage: swh storage [OPTIONS] COMMAND [ARGS]... Software Heritage Storage tools. Options: -h, --help Show this message and exit. Commands: rpc-serve Software Heritage Storage RPC server. ``` ## Tests The best way of running Python tests for this module is to use [tox](https://tox.readthedocs.io/). ``` (swh) :~$ pip install tox ``` ### tox From the sources directory, simply use tox: ``` (swh) :~/swh-storage$ tox [...] ========= 315 passed, 6 skipped, 15 warnings in 40.86 seconds ========== _______________________________ summary ________________________________ flake8: commands succeeded py3: commands succeeded congratulations :) ``` ## Development The storage server can be locally started. It requires a configuration file and a running Postgresql database. ### Sample configuration A typical configuration `storage.yml` file is: ``` storage: cls: local args: db: "dbname=softwareheritage-dev user= password=" objstorage: cls: pathslicing args: root: /tmp/swh-storage/ slicing: 0:2/2:4/4:6 ``` which means, this uses: - a local storage instance whose db connection is to `softwareheritage-dev` local instance, - the objstorage uses a local objstorage instance whose: - `root` path is /tmp/swh-storage, - slicing scheme is `0:2/2:4/4:6`. This means that the identifier of the content (sha1) which will be stored on disk at first level with the first 2 hex characters, the second level with the next 2 hex characters and the third level with the next 2 hex characters. And finally the complete hash file holding the raw content. For example: 00062f8bd330715c4f819373653d97b3cd34394c will be stored at 00/06/2f/00062f8bd330715c4f819373653d97b3cd34394c Note that the `root` path should exist on disk before starting the server. ### Starting the storage server If the python package has been properly installed (e.g. in a virtual env), you should be able to use the command: ``` (swh) :~/swh-storage$ swh storage rpc-serve storage.yml ``` This runs a local swh-storage api at 5002 port. ``` (swh) :~/swh-storage$ curl http://127.0.0.1:5002 Software Heritage storage server

You have reached the Software Heritage storage server.
See its documentation and API for more information

``` ### And then what? In your upper layer ([loader-git](https://forge.softwareheritage.org/source/swh-loader-git/), [loader-svn](https://forge.softwareheritage.org/source/swh-loader-svn/), etc...), you can define a remote storage with this snippet of yaml configuration. ``` storage: cls: remote args: url: http://localhost:5002/ ``` You could directly define a local storage with the following snippet: ``` storage: cls: local args: db: service=swh-dev objstorage: cls: pathslicing args: root: /home/storage/swh-storage/ slicing: 0:2/2:4/4:6 ``` Platform: UNKNOWN Classifier: Programming Language :: Python :: 3 Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3) Classifier: Operating System :: OS Independent Classifier: Development Status :: 5 - Production/Stable Requires-Python: >=3.7 Description-Content-Type: text/markdown Provides-Extra: testing Provides-Extra: schemata Provides-Extra: journal diff --git a/swh/storage/algos/diff.py b/swh/storage/algos/diff.py index a22d47b9..ba098470 100644 --- a/swh/storage/algos/diff.py +++ b/swh/storage/algos/diff.py @@ -1,413 +1,420 @@ -# Copyright (C) 2018 The Software Heritage developers +# Copyright (C) 2018-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information # Utility module to efficiently compute the list of changed files # between two directory trees. # The implementation is inspired from the work of Alberto Cortés # for the go-git project. For more details, you can refer to: # - this blog post: https://blog.sourced.tech/post/difftree/ # - the reference implementation in go: # https://github.com/src-d/go-git/tree/master/utils/merkletrie import collections +from typing import Any, Dict + from swh.model.hashutil import hash_to_bytes from swh.model.identifiers import directory_identifier +from swh.storage.interface import StorageInterface + from .dir_iterators import DirectoryIterator, DoubleDirectoryIterator, Remaining + # get the hash identifier for an empty directory _empty_dir_hash = hash_to_bytes(directory_identifier({"entries": []})) -def _get_rev(storage, rev_id): +def _get_rev(storage: StorageInterface, rev_id: bytes) -> Dict[str, Any]: """ Return revision data from swh storage. """ - return list(storage.revision_get([rev_id]))[0] + revision = storage.revision_get([rev_id])[0] + assert revision is not None + return revision.to_dict() class _RevisionChangesList(object): """ Helper class to track the changes between two revision directories. """ def __init__(self, storage, track_renaming): """ Args: storage: instance of swh storage track_renaming (bool): whether to track or not files renaming """ self.storage = storage self.track_renaming = track_renaming self.result = [] # dicts used to track file renaming based on hash value # we use a list instead of a single entry to handle the corner # case when a repository contains multiple instance of # the same file in different directories and a commit # renames all of them self.inserted_hash_idx = collections.defaultdict(list) self.deleted_hash_idx = collections.defaultdict(list) def add_insert(self, it_to): """ Add a file insertion in the to directory. Args: it_to (swh.storage.algos.dir_iterators.DirectoryIterator): iterator on the to directory """ to_hash = it_to.current_hash() # if the current file hash has been previously marked as deleted, # the file has been renamed if self.track_renaming and self.deleted_hash_idx[to_hash]: # pop the delete change index in the same order it was inserted change = self.result[self.deleted_hash_idx[to_hash].pop(0)] # change the delete change as a rename one change["type"] = "rename" change["to"] = it_to.current() change["to_path"] = it_to.current_path() else: # add the insert change in the list self.result.append( { "type": "insert", "from": None, "from_path": None, "to": it_to.current(), "to_path": it_to.current_path(), } ) # if rename tracking is activated, add the change index in # the inserted_hash_idx dict if self.track_renaming: self.inserted_hash_idx[to_hash].append(len(self.result) - 1) def add_delete(self, it_from): """ Add a file deletion in the from directory. Args: it_from (swh.storage.algos.dir_iterators.DirectoryIterator): iterator on the from directory """ from_hash = it_from.current_hash() # if the current file has been previously marked as inserted, # the file has been renamed if self.track_renaming and self.inserted_hash_idx[from_hash]: # pop the insert change index in the same order it was inserted change = self.result[self.inserted_hash_idx[from_hash].pop(0)] # change the insert change as a rename one change["type"] = "rename" change["from"] = it_from.current() change["from_path"] = it_from.current_path() else: # add the delete change in the list self.result.append( { "type": "delete", "from": it_from.current(), "from_path": it_from.current_path(), "to": None, "to_path": None, } ) # if rename tracking is activated, add the change index in # the deleted_hash_idx dict if self.track_renaming: self.deleted_hash_idx[from_hash].append(len(self.result) - 1) def add_modify(self, it_from, it_to): """ Add a file modification in the to directory. Args: it_from (swh.storage.algos.dir_iterators.DirectoryIterator): iterator on the from directory it_to (swh.storage.algos.dir_iterators.DirectoryIterator): iterator on the to directory """ self.result.append( { "type": "modify", "from": it_from.current(), "from_path": it_from.current_path(), "to": it_to.current(), "to_path": it_to.current_path(), } ) def add_recursive(self, it, insert): """ Recursively add changes from a directory. Args: it (swh.storage.algos.dir_iterators.DirectoryIterator): iterator on a directory insert (bool): the type of changes to add (insertion or deletion) """ # current iterated element is a regular file, # simply add adequate change in the list if not it.current_is_dir(): if insert: self.add_insert(it) else: self.add_delete(it) return # current iterated element is a directory, dir_id = it.current_hash() # handle empty dir insertion/deletion as the swh model allow # to have such object compared to git if dir_id == _empty_dir_hash: if insert: self.add_insert(it) else: self.add_delete(it) # iterate on files reachable from it and add # adequate changes in the list else: sub_it = DirectoryIterator(self.storage, dir_id, it.current_path() + b"/") sub_it_current = sub_it.step() while sub_it_current: if not sub_it.current_is_dir(): if insert: self.add_insert(sub_it) else: self.add_delete(sub_it) sub_it_current = sub_it.step() def add_recursive_insert(self, it_to): """ Recursively add files insertion from a to directory. Args: it_to (swh.storage.algos.dir_iterators.DirectoryIterator): iterator on a to directory """ self.add_recursive(it_to, True) def add_recursive_delete(self, it_from): """ Recursively add files deletion from a from directory. Args: it_from (swh.storage.algos.dir_iterators.DirectoryIterator): iterator on a from directory """ self.add_recursive(it_from, False) def _diff_elts_same_name(changes, it): """" Compare two directory entries with the same name and add adequate changes if any. Args: changes (_RevisionChangesList): the list of changes between two revisions it (swh.storage.algos.dir_iterators.DoubleDirectoryIterator): the iterator traversing two revision directories at the same time """ # compare the two current directory elements of the iterator status = it.compare() # elements have same hash and same permissions: # no changes to add and call next on the two iterators if status["same_hash"] and status["same_perms"]: it.next_both() # elements are regular files and have been modified: # insert the modification change in the list and # call next on the two iterators elif status["both_are_files"]: changes.add_modify(it.it_from, it.it_to) it.next_both() # one element is a regular file, the other a directory: # recursively add delete/insert changes and call next # on the two iterators elif status["file_and_dir"]: changes.add_recursive_delete(it.it_from) changes.add_recursive_insert(it.it_to) it.next_both() # both elements are directories: elif status["both_are_dirs"]: # from directory is empty: # recursively add insert changes in the to directory # and call next on the two iterators if status["from_is_empty_dir"]: changes.add_recursive_insert(it.it_to) it.next_both() # to directory is empty: # recursively add delete changes in the from directory # and call next on the two iterators elif status["to_is_empty_dir"]: changes.add_recursive_delete(it.it_from) it.next_both() # both directories are not empty: # call step on the two iterators to descend further in # the directory trees. elif not status["from_is_empty_dir"] and not status["to_is_empty_dir"]: it.step_both() def _compare_paths(path1, path2): """ Compare paths in lexicographic depth-first order. For instance, it returns: - "a" < "b" - "b/c/d" < "b" - "c/foo.txt" < "c.txt" """ path1_parts = path1.split(b"/") path2_parts = path2.split(b"/") i = 0 while True: if len(path1_parts) == len(path2_parts) and i == len(path1_parts): return 0 elif len(path2_parts) == i: return 1 elif len(path1_parts) == i: return -1 else: if path2_parts[i] > path1_parts[i]: return -1 elif path2_parts[i] < path1_parts[i]: return 1 i = i + 1 def _diff_elts(changes, it): """ Compare two directory entries. Args: changes (_RevisionChangesList): the list of changes between two revisions it (swh.storage.algos.dir_iterators.DoubleDirectoryIterator): the iterator traversing two revision directories at the same time """ # compare current to and from path in depth-first lexicographic order c = _compare_paths(it.it_from.current_path(), it.it_to.current_path()) # current from path is lower than the current to path: # the from path has been deleted if c < 0: changes.add_recursive_delete(it.it_from) it.next_from() # current from path is greater than the current to path: # the to path has been inserted elif c > 0: changes.add_recursive_insert(it.it_to) it.next_to() # paths are the same and need more processing else: _diff_elts_same_name(changes, it) def diff_directories(storage, from_dir, to_dir, track_renaming=False): """ Compute the differential between two directories, i.e. the list of file changes (insertion / deletion / modification / renaming) between them. Args: storage (swh.storage.interface.StorageInterface): instance of a swh storage (either local or remote, for optimal performance the use of a local storage is recommended) from_dir (bytes): the swh identifier of the directory to compare from to_dir (bytes): the swh identifier of the directory to compare to track_renaming (bool): whether or not to track files renaming Returns: list: A list of dict representing the changes between the two revisions. Each dict contains the following entries: - *type*: a string describing the type of change ('insert' / 'delete' / 'modify' / 'rename') - *from*: a dict containing the directory entry metadata in the from revision (None in case of an insertion) - *from_path*: bytes string corresponding to the absolute path of the from revision entry (None in case of an insertion) - *to*: a dict containing the directory entry metadata in the to revision (None in case of a deletion) - *to_path*: bytes string corresponding to the absolute path of the to revision entry (None in case of a deletion) The returned list is sorted in lexicographic depth-first order according to the value of the *to_path* field. """ changes = _RevisionChangesList(storage, track_renaming) it = DoubleDirectoryIterator(storage, from_dir, to_dir) while True: r = it.remaining() if r == Remaining.NoMoreFiles: break elif r == Remaining.OnlyFromFilesRemain: changes.add_recursive_delete(it.it_from) it.next_from() elif r == Remaining.OnlyToFilesRemain: changes.add_recursive_insert(it.it_to) it.next_to() else: _diff_elts(changes, it) return changes.result def diff_revisions(storage, from_rev, to_rev, track_renaming=False): """ Compute the differential between two revisions, i.e. the list of file changes between the two associated directories. Args: storage (swh.storage.interface.StorageInterface): instance of a swh storage (either local or remote, for optimal performance the use of a local storage is recommended) from_rev (bytes): the identifier of the revision to compare from to_rev (bytes): the identifier of the revision to compare to track_renaming (bool): whether or not to track files renaming Returns: list: A list of dict describing the introduced file changes (see :func:`swh.storage.algos.diff.diff_directories`). """ from_dir = None if from_rev: from_dir = _get_rev(storage, from_rev)["directory"] to_dir = _get_rev(storage, to_rev)["directory"] return diff_directories(storage, from_dir, to_dir, track_renaming) def diff_revision(storage, revision, track_renaming=False): """ Computes the differential between a revision and its first parent. If the revision has no parents, the directory to compare from is considered as empty. In other words, it computes the file changes introduced in a specific revision. Args: storage (swh.storage.interface.StorageInterface): instance of a swh storage (either local or remote, for optimal performance the use of a local storage is recommended) revision (bytes): the identifier of the revision from which to compute the introduced changes. track_renaming (bool): whether or not to track files renaming Returns: list: A list of dict describing the introduced file changes (see :func:`swh.storage.algos.diff.diff_directories`). """ rev_data = _get_rev(storage, revision) parent = None if rev_data["parents"]: parent = rev_data["parents"][0] return diff_revisions(storage, parent, revision, track_renaming) diff --git a/swh/storage/tests/algos/test_diff.py b/swh/storage/tests/algos/test_diff.py index 8bbeae75..8998037d 100644 --- a/swh/storage/tests/algos/test_diff.py +++ b/swh/storage/tests/algos/test_diff.py @@ -1,374 +1,389 @@ -# Copyright (C) 2018 The Software Heritage developers +# Copyright (C) 2018-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information # flake8: noqa +import pytest import unittest + from unittest.mock import patch from swh.model.hashutil import hash_to_bytes from swh.model.identifiers import directory_identifier from swh.storage.algos import diff from .test_dir_iterator import DirectoryModel +def test__get_rev(swh_storage, sample_data): + revision = sample_data.revision + + # does not exist then raises + with pytest.raises(AssertionError): + diff._get_rev(swh_storage, revision.id) + + # otherwise, we retrieve its dict representation + swh_storage.revision_add([revision]) + actual_revision = diff._get_rev(swh_storage, revision.id) + assert actual_revision == revision.to_dict() + + @patch("swh.storage.algos.diff._get_rev") @patch("swh.storage.algos.dir_iterators._get_dir") class TestDiffRevisions(unittest.TestCase): def diff_revisions( self, rev_from, rev_to, from_dir_model, to_dir_model, expected_changes, mock_get_dir, mock_get_rev, ): rev_from_bytes = hash_to_bytes(rev_from) rev_to_bytes = hash_to_bytes(rev_to) def _get_rev(*args, **kwargs): if args[1] == rev_from_bytes: return {"directory": from_dir_model["target"]} else: return {"directory": to_dir_model["target"]} def _get_dir(*args, **kwargs): from_dir = from_dir_model.get_hash_data(args[1]) to_dir = to_dir_model.get_hash_data(args[1]) return from_dir if from_dir != None else to_dir mock_get_rev.side_effect = _get_rev mock_get_dir.side_effect = _get_dir changes = diff.diff_revisions( None, rev_from_bytes, rev_to_bytes, track_renaming=True ) self.assertEqual(changes, expected_changes) def test_insert_delete(self, mock_get_dir, mock_get_rev): rev_from = "898ff03e1e7925ecde3da66327d3cdc7e07625ba" rev_to = "647c3d381e67490e82cdbbe6c96e46d5e1628ce2" from_dir_model = DirectoryModel() to_dir_model = DirectoryModel() to_dir_model.add_file(b"file1", "ea15f54ca215e7920c60f564315ebb7f911a5204") to_dir_model.add_file(b"file2", "3e5faecb3836ffcadf82cc160787e35d4e2bec6a") to_dir_model.add_file(b"file3", "2ae33b2984974d35eababe4890d37fbf4bce6b2c") expected_changes = [ { "type": "insert", "from": None, "from_path": None, "to": to_dir_model.get_path_data(b"file1"), "to_path": b"file1", }, { "type": "insert", "from": None, "from_path": None, "to": to_dir_model.get_path_data(b"file2"), "to_path": b"file2", }, { "type": "insert", "from": None, "from_path": None, "to": to_dir_model.get_path_data(b"file3"), "to_path": b"file3", }, ] self.diff_revisions( rev_from, rev_to, from_dir_model, to_dir_model, expected_changes, mock_get_dir, mock_get_rev, ) from_dir_model = DirectoryModel() from_dir_model.add_file(b"file1", "ea15f54ca215e7920c60f564315ebb7f911a5204") from_dir_model.add_file(b"file2", "3e5faecb3836ffcadf82cc160787e35d4e2bec6a") from_dir_model.add_file(b"file3", "2ae33b2984974d35eababe4890d37fbf4bce6b2c") to_dir_model = DirectoryModel() expected_changes = [ { "type": "delete", "from": from_dir_model.get_path_data(b"file1"), "from_path": b"file1", "to": None, "to_path": None, }, { "type": "delete", "from": from_dir_model.get_path_data(b"file2"), "from_path": b"file2", "to": None, "to_path": None, }, { "type": "delete", "from": from_dir_model.get_path_data(b"file3"), "from_path": b"file3", "to": None, "to_path": None, }, ] self.diff_revisions( rev_from, rev_to, from_dir_model, to_dir_model, expected_changes, mock_get_dir, mock_get_rev, ) def test_onelevel_diff(self, mock_get_dir, mock_get_rev): rev_from = "898ff03e1e7925ecde3da66327d3cdc7e07625ba" rev_to = "647c3d381e67490e82cdbbe6c96e46d5e1628ce2" from_dir_model = DirectoryModel() from_dir_model.add_file(b"file1", "ea15f54ca215e7920c60f564315ebb7f911a5204") from_dir_model.add_file(b"file2", "f4a96b2000be83b61254d107046fa9777b17eb34") from_dir_model.add_file(b"file3", "d3c00f9396c6d0277727cec522ff6ad1ea0bc2da") to_dir_model = DirectoryModel() to_dir_model.add_file(b"file2", "3ee0f38ee0ea23cc2c8c0b9d66b27be4596b002b") to_dir_model.add_file(b"file3", "d3c00f9396c6d0277727cec522ff6ad1ea0bc2da") to_dir_model.add_file(b"file4", "40460b9653b1dc507e1b6eb333bd4500634bdffc") expected_changes = [ { "type": "delete", "from": from_dir_model.get_path_data(b"file1"), "from_path": b"file1", "to": None, "to_path": None, }, { "type": "modify", "from": from_dir_model.get_path_data(b"file2"), "from_path": b"file2", "to": to_dir_model.get_path_data(b"file2"), "to_path": b"file2", }, { "type": "insert", "from": None, "from_path": None, "to": to_dir_model.get_path_data(b"file4"), "to_path": b"file4", }, ] self.diff_revisions( rev_from, rev_to, from_dir_model, to_dir_model, expected_changes, mock_get_dir, mock_get_rev, ) def test_twolevels_diff(self, mock_get_dir, mock_get_rev): rev_from = "898ff03e1e7925ecde3da66327d3cdc7e07625ba" rev_to = "647c3d381e67490e82cdbbe6c96e46d5e1628ce2" from_dir_model = DirectoryModel() from_dir_model.add_file(b"file1", "ea15f54ca215e7920c60f564315ebb7f911a5204") from_dir_model.add_file( b"dir1/file1", "8335fca266811bac7ae5c8e1621476b4cf4156b6" ) from_dir_model.add_file( b"dir1/file2", "a6127d909e79f1fcb28bbf220faf86e7be7831e5" ) from_dir_model.add_file( b"dir1/file3", "18049b8d067ce1194a7e1cce26cfa3ae4242a43d" ) from_dir_model.add_file(b"file2", "d3c00f9396c6d0277727cec522ff6ad1ea0bc2da") to_dir_model = DirectoryModel() to_dir_model.add_file(b"file1", "3ee0f38ee0ea23cc2c8c0b9d66b27be4596b002b") to_dir_model.add_file(b"dir1/file2", "de3548b32a8669801daa02143a66dae21fe852fd") to_dir_model.add_file(b"dir1/file3", "18049b8d067ce1194a7e1cce26cfa3ae4242a43d") to_dir_model.add_file(b"dir1/file4", "f5c3f42aec5fe7b92276196c350cbadaf4c51f87") to_dir_model.add_file(b"file2", "d3c00f9396c6d0277727cec522ff6ad1ea0bc2da") expected_changes = [ { "type": "delete", "from": from_dir_model.get_path_data(b"dir1/file1"), "from_path": b"dir1/file1", "to": None, "to_path": None, }, { "type": "modify", "from": from_dir_model.get_path_data(b"dir1/file2"), "from_path": b"dir1/file2", "to": to_dir_model.get_path_data(b"dir1/file2"), "to_path": b"dir1/file2", }, { "type": "insert", "from": None, "from_path": None, "to": to_dir_model.get_path_data(b"dir1/file4"), "to_path": b"dir1/file4", }, { "type": "modify", "from": from_dir_model.get_path_data(b"file1"), "from_path": b"file1", "to": to_dir_model.get_path_data(b"file1"), "to_path": b"file1", }, ] self.diff_revisions( rev_from, rev_to, from_dir_model, to_dir_model, expected_changes, mock_get_dir, mock_get_rev, ) def test_insert_delete_empty_dirs(self, mock_get_dir, mock_get_rev): rev_from = "898ff03e1e7925ecde3da66327d3cdc7e07625ba" rev_to = "647c3d381e67490e82cdbbe6c96e46d5e1628ce2" from_dir_model = DirectoryModel() from_dir_model.add_file( b"dir3/file1", "ea15f54ca215e7920c60f564315ebb7f911a5204" ) to_dir_model = DirectoryModel() to_dir_model.add_file(b"dir3/file1", "ea15f54ca215e7920c60f564315ebb7f911a5204") to_dir_model.add_file(b"dir3/dir1/") expected_changes = [ { "type": "insert", "from": None, "from_path": None, "to": to_dir_model.get_path_data(b"dir3/dir1"), "to_path": b"dir3/dir1", } ] self.diff_revisions( rev_from, rev_to, from_dir_model, to_dir_model, expected_changes, mock_get_dir, mock_get_rev, ) from_dir_model = DirectoryModel() from_dir_model.add_file(b"dir1/dir2/") from_dir_model.add_file( b"dir1/file1", "ea15f54ca215e7920c60f564315ebb7f911a5204" ) to_dir_model = DirectoryModel() to_dir_model.add_file(b"dir1/file1", "ea15f54ca215e7920c60f564315ebb7f911a5204") expected_changes = [ { "type": "delete", "from": from_dir_model.get_path_data(b"dir1/dir2"), "from_path": b"dir1/dir2", "to": None, "to_path": None, } ] self.diff_revisions( rev_from, rev_to, from_dir_model, to_dir_model, expected_changes, mock_get_dir, mock_get_rev, ) def test_track_renaming(self, mock_get_dir, mock_get_rev): rev_from = "898ff03e1e7925ecde3da66327d3cdc7e07625ba" rev_to = "647c3d381e67490e82cdbbe6c96e46d5e1628ce2" from_dir_model = DirectoryModel() from_dir_model.add_file( b"file1_oldname", "ea15f54ca215e7920c60f564315ebb7f911a5204" ) from_dir_model.add_file( b"dir1/file1_oldname", "ea15f54ca215e7920c60f564315ebb7f911a5204" ) from_dir_model.add_file( b"file2_oldname", "d3c00f9396c6d0277727cec522ff6ad1ea0bc2da" ) to_dir_model = DirectoryModel() to_dir_model.add_file( b"dir1/file1_newname", "ea15f54ca215e7920c60f564315ebb7f911a5204" ) to_dir_model.add_file( b"dir2/file1_newname", "ea15f54ca215e7920c60f564315ebb7f911a5204" ) to_dir_model.add_file( b"file2_newname", "d3c00f9396c6d0277727cec522ff6ad1ea0bc2da" ) expected_changes = [ { "type": "rename", "from": from_dir_model.get_path_data(b"dir1/file1_oldname"), "from_path": b"dir1/file1_oldname", "to": to_dir_model.get_path_data(b"dir1/file1_newname"), "to_path": b"dir1/file1_newname", }, { "type": "rename", "from": from_dir_model.get_path_data(b"file1_oldname"), "from_path": b"file1_oldname", "to": to_dir_model.get_path_data(b"dir2/file1_newname"), "to_path": b"dir2/file1_newname", }, { "type": "rename", "from": from_dir_model.get_path_data(b"file2_oldname"), "from_path": b"file2_oldname", "to": to_dir_model.get_path_data(b"file2_newname"), "to_path": b"file2_newname", }, ] self.diff_revisions( rev_from, rev_to, from_dir_model, to_dir_model, expected_changes, mock_get_dir, mock_get_rev, )