diff --git a/PKG-INFO b/PKG-INFO
index 8402d5fc..a4dc5adc 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,250 +1,250 @@
 Metadata-Version: 2.1
 Name: swh.storage
-Version: 0.43.0
+Version: 0.43.1
 Summary: Software Heritage storage manager
 Home-page: https://forge.softwareheritage.org/diffusion/DSTO/
 Author: Software Heritage developers
 Author-email: swh-devel@inria.fr
 License: UNKNOWN
 Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
 Project-URL: Funding, https://www.softwareheritage.org/donate
 Project-URL: Source, https://forge.softwareheritage.org/source/swh-storage
 Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-storage/
 Platform: UNKNOWN
 Classifier: Programming Language :: Python :: 3
 Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
 Classifier: Operating System :: OS Independent
 Classifier: Development Status :: 5 - Production/Stable
 Requires-Python: >=3.7
 Description-Content-Type: text/markdown
 Provides-Extra: testing
 Provides-Extra: journal
 License-File: LICENSE
 License-File: AUTHORS
 
 swh-storage
 ===========
 
 Abstraction layer over the archive, allowing to access all stored source code
 artifacts as well as their metadata.
 
 See the
 [documentation](https://docs.softwareheritage.org/devel/swh-storage/index.html)
 for more details.
 
 ## Quick start
 
 ### Dependencies
 
 Python tests for this module include tests that cannot be run without a local
 Postgresql database, so you need the Postgresql server executable on your
 machine (no need to have a running Postgresql server). They also expect a
 cassandra server.
 
 #### Debian-like host
 
 ```
 $ sudo apt install libpq-dev postgresql-11 cassandra
 ```
 
 #### Non Debian-like host
 
 The tests expects the path to `cassandra` to either be unspecified, it is then
 looked up at `/usr/sbin/cassandra`, either specified through the environment
 variable `SWH_CASSANDRA_BIN`.
 
 Optionally, you can avoid running the cassandra tests.
 
 ```
 (swh) :~/swh-storage$ tox -- -m 'not cassandra'
 ```
 
 ### Installation
 
 It is strongly recommended to use a virtualenv. In the following, we
 consider you work in a virtualenv named `swh`. See the
 [developer setup guide](https://docs.softwareheritage.org/devel/developer-setup.html#developer-setup)
 for a more details on how to setup a working environment.
 
 
 You can install the package directly from
 [pypi](https://pypi.org/p/swh.storage):
 
 ```
 (swh) :~$ pip install swh.storage
 [...]
 ```
 
 Or from sources:
 
 ```
 (swh) :~$ git clone https://forge.softwareheritage.org/source/swh-storage.git
 [...]
 (swh) :~$ cd swh-storage
 (swh) :~/swh-storage$ pip install .
 [...]
 ```
 
 Then you can check it's properly installed:
 ```
 (swh) :~$ swh storage --help
 Usage: swh storage [OPTIONS] COMMAND [ARGS]...
 
   Software Heritage Storage tools.
 
 Options:
   -h, --help  Show this message and exit.
 
 Commands:
   rpc-serve  Software Heritage Storage RPC server.
 ```
 
 
 ## Tests
 
 The best way of running Python tests for this module is to use
 [tox](https://tox.readthedocs.io/).
 
 ```
 (swh) :~$ pip install tox
 ```
 
 ### tox
 
 From the sources directory, simply use tox:
 
 ```
 (swh) :~/swh-storage$ tox
 [...]
 ========= 315 passed, 6 skipped, 15 warnings in 40.86 seconds ==========
 _______________________________ summary ________________________________
   flake8: commands succeeded
   py3: commands succeeded
   congratulations :)
 ```
 
 Note: it is possible to set the `JAVA_HOME` environment variable to specify the
 version of the JVM to be used by Cassandra. For example, at the time of writing
 this, Cassandra does not support java 14, so one may want to use for example
 java 11:
 
 ```
 (swh) :~/swh-storage$ export JAVA_HOME=/usr/lib/jvm/java-14-openjdk-amd64/bin/java
 (swh) :~/swh-storage$ tox
 [...]
 ```
 
 ## Development
 
 The storage server can be locally started. It requires a configuration file and
 a running Postgresql database.
 
 ### Sample configuration
 
 A typical configuration `storage.yml` file is:
 
 ```
 storage:
   cls: postgresql
   db: "dbname=softwareheritage-dev user=<user> password=<pwd>"
   objstorage:
     cls: pathslicing
     root: /tmp/swh-storage/
     slicing: 0:2/2:4/4:6
 ```
 
 which means, this uses:
 
 - a local storage instance whose db connection is to
   `softwareheritage-dev` local instance,
 
 - the objstorage uses a local objstorage instance whose:
 
   - `root` path is /tmp/swh-storage,
 
   - slicing scheme is `0:2/2:4/4:6`. This means that the identifier of
     the content (sha1) which will be stored on disk at first level
     with the first 2 hex characters, the second level with the next 2
     hex characters and the third level with the next 2 hex
     characters. And finally the complete hash file holding the raw
     content. For example: 00062f8bd330715c4f819373653d97b3cd34394c
     will be stored at 00/06/2f/00062f8bd330715c4f819373653d97b3cd34394c
 
 Note that the `root` path should exist on disk before starting the server.
 
 
 ### Starting the storage server
 
 If the python package has been properly installed (e.g. in a virtual env), you
 should be able to use the command:
 
 ```
 (swh) :~/swh-storage$ swh storage rpc-serve storage.yml
 ```
 
 This runs a local swh-storage api at 5002 port.
 
 ```
 (swh) :~/swh-storage$ curl http://127.0.0.1:5002
 <html>
 <head><title>Software Heritage storage server</title></head>
 <body>
 <p>You have reached the
 <a href="https://www.softwareheritage.org/">Software Heritage</a>
 storage server.<br />
 See its
 <a href="https://docs.softwareheritage.org/devel/swh-storage/">documentation
 and API</a> for more information</p>
 ```
 
 ### And then what?
 
 In your upper layer
 ([loader-git](https://forge.softwareheritage.org/source/swh-loader-git/),
 [loader-svn](https://forge.softwareheritage.org/source/swh-loader-svn/),
 etc...), you can define a remote storage with this snippet of yaml
 configuration.
 
 ```
 storage:
   cls: remote
   url: http://localhost:5002/
 ```
 
 You could directly define a postgresql storage with the following snippet:
 
 ```
 storage:
   cls: postgresql
   db: service=swh-dev
   objstorage:
     cls: pathslicing
     root: /home/storage/swh-storage/
     slicing: 0:2/2:4/4:6
 ```
 
 ## Cassandra
 
 As an alternative to PostgreSQL, swh-storage can use Cassandra as a database backend.
 It can be used like this:
 
 ```
 storage:
   cls: cassandra
   hosts:
     - localhost
   objstorage:
     cls: pathslicing
     root: /home/storage/swh-storage/
     slicing: 0:2/2:4/4:6
 ```
 
 The Cassandra swh-storage implementation supports both Cassandra >= 4.0-alpha2
 and ScyllaDB >= 4.4 (and possibly earlier versions, but this is untested).
 
 While the main code supports both transparently, running tests
 or configuring the schema requires specific code when using ScyllaDB,
 enabled by setting the `SWH_USE_SCYLLADB=1` environment variable.
 
 
diff --git a/requirements-test.txt b/requirements-test.txt
index 8662be4f..a33e143a 100644
--- a/requirements-test.txt
+++ b/requirements-test.txt
@@ -1,15 +1,16 @@
 hypothesis >= 3.11.0
-pytest
+pytest < 7.0.0  # v7.0.0 removed _pytest.tmpdir.TempdirFactory, which is used by some of the pytest plugins we use
+
 pytest-mock
 # pytz is in fact a dep of swh.model[testing] and should not be necessary, but
 # the dep on swh.model in the main requirements-swh.txt file shadows this one
 # adding the [testing] extra.
 swh.model[testing] >= 0.0.50
 pytz
 pytest-redis
 pytest-xdist
 types-python-dateutil
 types-pytz
 types-pyyaml
 types-redis
 types-requests
diff --git a/swh.storage.egg-info/PKG-INFO b/swh.storage.egg-info/PKG-INFO
index 8402d5fc..a4dc5adc 100644
--- a/swh.storage.egg-info/PKG-INFO
+++ b/swh.storage.egg-info/PKG-INFO
@@ -1,250 +1,250 @@
 Metadata-Version: 2.1
 Name: swh.storage
-Version: 0.43.0
+Version: 0.43.1
 Summary: Software Heritage storage manager
 Home-page: https://forge.softwareheritage.org/diffusion/DSTO/
 Author: Software Heritage developers
 Author-email: swh-devel@inria.fr
 License: UNKNOWN
 Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
 Project-URL: Funding, https://www.softwareheritage.org/donate
 Project-URL: Source, https://forge.softwareheritage.org/source/swh-storage
 Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-storage/
 Platform: UNKNOWN
 Classifier: Programming Language :: Python :: 3
 Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
 Classifier: Operating System :: OS Independent
 Classifier: Development Status :: 5 - Production/Stable
 Requires-Python: >=3.7
 Description-Content-Type: text/markdown
 Provides-Extra: testing
 Provides-Extra: journal
 License-File: LICENSE
 License-File: AUTHORS
 
 swh-storage
 ===========
 
 Abstraction layer over the archive, allowing to access all stored source code
 artifacts as well as their metadata.
 
 See the
 [documentation](https://docs.softwareheritage.org/devel/swh-storage/index.html)
 for more details.
 
 ## Quick start
 
 ### Dependencies
 
 Python tests for this module include tests that cannot be run without a local
 Postgresql database, so you need the Postgresql server executable on your
 machine (no need to have a running Postgresql server). They also expect a
 cassandra server.
 
 #### Debian-like host
 
 ```
 $ sudo apt install libpq-dev postgresql-11 cassandra
 ```
 
 #### Non Debian-like host
 
 The tests expects the path to `cassandra` to either be unspecified, it is then
 looked up at `/usr/sbin/cassandra`, either specified through the environment
 variable `SWH_CASSANDRA_BIN`.
 
 Optionally, you can avoid running the cassandra tests.
 
 ```
 (swh) :~/swh-storage$ tox -- -m 'not cassandra'
 ```
 
 ### Installation
 
 It is strongly recommended to use a virtualenv. In the following, we
 consider you work in a virtualenv named `swh`. See the
 [developer setup guide](https://docs.softwareheritage.org/devel/developer-setup.html#developer-setup)
 for a more details on how to setup a working environment.
 
 
 You can install the package directly from
 [pypi](https://pypi.org/p/swh.storage):
 
 ```
 (swh) :~$ pip install swh.storage
 [...]
 ```
 
 Or from sources:
 
 ```
 (swh) :~$ git clone https://forge.softwareheritage.org/source/swh-storage.git
 [...]
 (swh) :~$ cd swh-storage
 (swh) :~/swh-storage$ pip install .
 [...]
 ```
 
 Then you can check it's properly installed:
 ```
 (swh) :~$ swh storage --help
 Usage: swh storage [OPTIONS] COMMAND [ARGS]...
 
   Software Heritage Storage tools.
 
 Options:
   -h, --help  Show this message and exit.
 
 Commands:
   rpc-serve  Software Heritage Storage RPC server.
 ```
 
 
 ## Tests
 
 The best way of running Python tests for this module is to use
 [tox](https://tox.readthedocs.io/).
 
 ```
 (swh) :~$ pip install tox
 ```
 
 ### tox
 
 From the sources directory, simply use tox:
 
 ```
 (swh) :~/swh-storage$ tox
 [...]
 ========= 315 passed, 6 skipped, 15 warnings in 40.86 seconds ==========
 _______________________________ summary ________________________________
   flake8: commands succeeded
   py3: commands succeeded
   congratulations :)
 ```
 
 Note: it is possible to set the `JAVA_HOME` environment variable to specify the
 version of the JVM to be used by Cassandra. For example, at the time of writing
 this, Cassandra does not support java 14, so one may want to use for example
 java 11:
 
 ```
 (swh) :~/swh-storage$ export JAVA_HOME=/usr/lib/jvm/java-14-openjdk-amd64/bin/java
 (swh) :~/swh-storage$ tox
 [...]
 ```
 
 ## Development
 
 The storage server can be locally started. It requires a configuration file and
 a running Postgresql database.
 
 ### Sample configuration
 
 A typical configuration `storage.yml` file is:
 
 ```
 storage:
   cls: postgresql
   db: "dbname=softwareheritage-dev user=<user> password=<pwd>"
   objstorage:
     cls: pathslicing
     root: /tmp/swh-storage/
     slicing: 0:2/2:4/4:6
 ```
 
 which means, this uses:
 
 - a local storage instance whose db connection is to
   `softwareheritage-dev` local instance,
 
 - the objstorage uses a local objstorage instance whose:
 
   - `root` path is /tmp/swh-storage,
 
   - slicing scheme is `0:2/2:4/4:6`. This means that the identifier of
     the content (sha1) which will be stored on disk at first level
     with the first 2 hex characters, the second level with the next 2
     hex characters and the third level with the next 2 hex
     characters. And finally the complete hash file holding the raw
     content. For example: 00062f8bd330715c4f819373653d97b3cd34394c
     will be stored at 00/06/2f/00062f8bd330715c4f819373653d97b3cd34394c
 
 Note that the `root` path should exist on disk before starting the server.
 
 
 ### Starting the storage server
 
 If the python package has been properly installed (e.g. in a virtual env), you
 should be able to use the command:
 
 ```
 (swh) :~/swh-storage$ swh storage rpc-serve storage.yml
 ```
 
 This runs a local swh-storage api at 5002 port.
 
 ```
 (swh) :~/swh-storage$ curl http://127.0.0.1:5002
 <html>
 <head><title>Software Heritage storage server</title></head>
 <body>
 <p>You have reached the
 <a href="https://www.softwareheritage.org/">Software Heritage</a>
 storage server.<br />
 See its
 <a href="https://docs.softwareheritage.org/devel/swh-storage/">documentation
 and API</a> for more information</p>
 ```
 
 ### And then what?
 
 In your upper layer
 ([loader-git](https://forge.softwareheritage.org/source/swh-loader-git/),
 [loader-svn](https://forge.softwareheritage.org/source/swh-loader-svn/),
 etc...), you can define a remote storage with this snippet of yaml
 configuration.
 
 ```
 storage:
   cls: remote
   url: http://localhost:5002/
 ```
 
 You could directly define a postgresql storage with the following snippet:
 
 ```
 storage:
   cls: postgresql
   db: service=swh-dev
   objstorage:
     cls: pathslicing
     root: /home/storage/swh-storage/
     slicing: 0:2/2:4/4:6
 ```
 
 ## Cassandra
 
 As an alternative to PostgreSQL, swh-storage can use Cassandra as a database backend.
 It can be used like this:
 
 ```
 storage:
   cls: cassandra
   hosts:
     - localhost
   objstorage:
     cls: pathslicing
     root: /home/storage/swh-storage/
     slicing: 0:2/2:4/4:6
 ```
 
 The Cassandra swh-storage implementation supports both Cassandra >= 4.0-alpha2
 and ScyllaDB >= 4.4 (and possibly earlier versions, but this is untested).
 
 While the main code supports both transparently, running tests
 or configuring the schema requires specific code when using ScyllaDB,
 enabled by setting the `SWH_USE_SCYLLADB=1` environment variable.
 
 
diff --git a/swh.storage.egg-info/requires.txt b/swh.storage.egg-info/requires.txt
index 6ca9cbe9..75004d1a 100644
--- a/swh.storage.egg-info/requires.txt
+++ b/swh.storage.egg-info/requires.txt
@@ -1,33 +1,33 @@
 aiohttp
 cassandra-driver!=3.21.0,>=3.19.0
 click
 deprecated
 flask
 iso8601
 mypy_extensions
 psycopg2
 redis
 tenacity>=6.2
 typing-extensions
 swh.core[db,http]>=0.14.0
 swh.counters>=v0.8.0
 swh.model>=4.4.0
 swh.objstorage>=0.2.2
 
 [journal]
 swh.journal>=0.9
 
 [testing]
 hypothesis>=3.11.0
-pytest
+pytest<7.0.0
 pytest-mock
 swh.model[testing]>=0.0.50
 pytz
 pytest-redis
 pytest-xdist
 types-python-dateutil
 types-pytz
 types-pyyaml
 types-redis
 types-requests
 swh.journal>=0.9
diff --git a/swh/storage/algos/revisions_walker.py b/swh/storage/algos/revisions_walker.py
index d1683b08..b73bb643 100644
--- a/swh/storage/algos/revisions_walker.py
+++ b/swh/storage/algos/revisions_walker.py
@@ -1,553 +1,566 @@
-# Copyright (C) 2018-2021  The Software Heritage developers
+# Copyright (C) 2018-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
+from __future__ import annotations
+
 from abc import ABCMeta, abstractmethod
 from collections import deque
+import dataclasses
 import heapq
+from typing import TYPE_CHECKING, Any, Dict, Optional, Set, TypeVar
+
+from swh.model.model import Sha1Git
+
+if TYPE_CHECKING:
+    from swh.storage.interface import StorageInterface
+
+
+@dataclasses.dataclass
+class State:
+    done: Set[Sha1Git] = dataclasses.field(default_factory=set)
+    revs_to_visit: Any = dataclasses.field(default_factory=list)
+    last_rev: Optional[Dict] = None
+    num_revs: int = 0
+    missing_revs: Set[Sha1Git] = dataclasses.field(default_factory=set)
+
 
 _revs_walker_classes = {}
 
 
 class _RevisionsWalkerMetaClass(ABCMeta):
     def __new__(cls, clsname, bases, attrs):
         newclass = super().__new__(cls, clsname, bases, attrs)
         if "rw_type" in attrs:
             _revs_walker_classes[attrs["rw_type"]] = newclass
         return newclass
 
 
+TWalker = TypeVar("TWalker", bound="RevisionsWalker")
+
+
 class RevisionsWalker(metaclass=_RevisionsWalkerMetaClass):
     """
     Abstract base class encapsulating the logic to walk across
     a revisions history starting from a given one.
 
     It defines an iterator returning the revisions according
     to a specific ordering implemented in derived classes.
 
     The iteration step performs the following operations:
 
         1) Check if the iteration is finished by calling method
            :meth:`is_finished` and raises :exc:`StopIteration` if it
            it is the case
 
         2) Get the next unseen revision by calling method
            :meth:`get_next_rev_id`
 
         3) Process parents of that revision by calling method
            :meth:`process_parent_revs` for the next iteration
            steps
 
         4) Check if the revision should be returned by calling
            method :meth:`should_return` and returns it if
            it is the case
 
     In order to easily instantiate a specific type of revisions
     walker, it is recommended to use the factory function
     :func:`get_revisions_walker`.
 
     Args:
-        storage (swh.storage.interface.StorageInterface): instance of swh storage
-            (either local or remote)
-        rev_start (bytes): a revision identifier
-        max_revs (Optional[int]): maximum number of revisions to return
-        state (Optional[dict]): previous state of that revisions walker
+        storage: instance of swh storage (either local or remote)
+        rev_start: a revision identifier
+        max_revs: maximum number of revisions to return
+        state: previous state of that revisions walker
+        ignore_displayname: return the original author/committer's full name even if
+          it's masked by a displayname.
     """
 
-    def __init__(self, storage, rev_start, max_revs=None, state=None):
-        self._revs_to_visit = []
-        self._done = set()
-        self._revs = {}
-        self._last_rev = None
-        self._num_revs = 0
+    def __init__(
+        self,
+        storage: StorageInterface,
+        rev_start: Sha1Git,
+        max_revs: Optional[int] = None,
+        state: Optional[State] = None,
+        ignore_displayname: bool = False,
+    ):
+        self._revs: Dict[Sha1Git, Dict] = {}
         self._max_revs = max_revs
-        self._missing_revs = set()
-        if state:
-            self._revs_to_visit = state["revs_to_visit"]
-            self._done = state["done"]
-            self._last_rev = state["last_rev"]
-            self._num_revs = state["num_revs"]
-            self._missing_revs = state["missing_revs"]
+        self._state = state or State()
         self.storage = storage
+        self.ignore_displayname = ignore_displayname
         self.process_rev(rev_start)
 
     @abstractmethod
-    def process_rev(self, rev_id):
+    def process_rev(self, rev_id: Sha1Git) -> None:
         """
         Abstract method whose purpose is to process a newly visited
         revision during the walk.
         Derived classes must implement it according to the desired
         method to walk across the revisions history (for instance
         through a dfs on the revisions DAG).
 
         Args:
-            rev_id (bytes): the newly visited revision identifier
+            rev_id: the newly visited revision identifier
         """
         pass
 
     @abstractmethod
-    def get_next_rev_id(self):
+    def get_next_rev_id(self) -> Sha1Git:
         """
         Abstract method whose purpose is to return the next revision
         during the iteration.
         Derived classes must implement it according to the desired
         method to walk across the revisions history.
-
-        Returns:
-            dict: A dict describing a revision as returned by
-            :meth:`swh.storage.interface.StorageInterface.revision_get`
         """
         pass
 
-    def process_parent_revs(self, rev):
+    def process_parent_revs(self, rev: Dict) -> None:
         """
         Process the parents of a revision when it is iterated.
         The default implementation simply calls :meth:`process_rev`
         for each parent revision in the order they are declared.
 
         Args:
             rev (dict): A dict describing a revision as returned by
                 :meth:`swh.storage.interface.StorageInterface.revision_get`
         """
         for parent_id in rev["parents"]:
             self.process_rev(parent_id)
 
-    def should_return(self, rev):
+    def should_return(self, rev: Dict) -> bool:
         """
         Filter out a revision to return if needed.
         Default implementation returns all iterated revisions.
 
         Args:
             rev (dict): A dict describing a revision as returned by
                 :meth:`swh.storage.interface.StorageInterface.revision_get`
 
         Returns:
             bool: Whether to return the revision in the iteration
         """
         return True
 
-    def is_finished(self):
+    def is_finished(self) -> bool:
         """
         Determine if the iteration is finished.
         This method is called at the beginning of each iteration loop.
 
         Returns:
             bool: Whether the iteration is finished
         """
-        if self._max_revs is not None and self._num_revs >= self._max_revs:
+        if self._max_revs is not None and self._state.num_revs >= self._max_revs:
             return True
-        if not self._revs_to_visit:
+        if not self._state.revs_to_visit:
             return True
         return False
 
-    def _get_rev(self, rev_id):
+    def _get_rev(self, rev_id: Sha1Git) -> Optional[Dict]:
         rev = self._revs.get(rev_id)
         if rev is None:
             # cache some revisions in advance to avoid sending too much
             # requests to storage and thus speedup the revisions walk
-            for rev in self.storage.revision_log([rev_id], limit=100):
+            for rev in self.storage.revision_log(
+                [rev_id], limit=100, ignore_displayname=self.ignore_displayname
+            ):
                 # revision data is missing, returned history will be truncated
                 if rev is None:
                     continue
                 self._revs[rev["id"]] = rev
         return self._revs.get(rev_id)
 
-    def missing_revisions(self):
+    def missing_revisions(self) -> Set[Sha1Git]:
         """
         Return a set of revision identifiers whose associated data were
         found missing into the archive content while walking on the
         revisions graph.
 
         Returns:
             Set[bytes]: a set of revision identifiers
         """
-        return self._missing_revs
+        return self._state.missing_revs
 
-    def is_history_truncated(self):
+    def is_history_truncated(self) -> bool:
         """
         Return if the revision history generated so far has been truncated
         of not. A revision history might end up truncated if some revision
         data were found missing into the archive content.
 
         Returns:
             bool: Whether the history got truncated or not
         """
         return len(self.missing_revisions()) > 0
 
-    def export_state(self):
+    def export_state(self) -> State:
         """
         Export the internal state of that revision walker to a dict.
         Its purpose is to continue the iteration in a pagination context.
 
         Returns:
-            dict: A dict containing the internal state of that revisions walker
+            The internal state of that revisions walker
         """
-        return {
-            "revs_to_visit": self._revs_to_visit,
-            "done": self._done,
-            "last_rev": self._last_rev,
-            "num_revs": self._num_revs,
-            "missing_revs": self._missing_revs,
-        }
+        return self._state
 
-    def __next__(self):
+    def __next__(self) -> Dict:
         if self.is_finished():
             raise StopIteration
-        while self._revs_to_visit:
+        while self._state.revs_to_visit:
             rev_id = self.get_next_rev_id()
-            if rev_id in self._done:
+            if rev_id in self._state.done:
                 continue
-            self._done.add(rev_id)
+            self._state.done.add(rev_id)
             rev = self._get_rev(rev_id)
             # revision data is missing, returned history will be truncated
             if rev is None:
-                self._missing_revs.add(rev_id)
+                self._state.missing_revs.add(rev_id)
                 continue
             self.process_parent_revs(rev)
             if self.should_return(rev):
-                self._num_revs += 1
-                self._last_rev = rev
+                self._state.num_revs += 1
+                self._state.last_rev = rev
                 return rev
         raise StopIteration
 
-    def __iter__(self):
+    def __iter__(self: TWalker) -> TWalker:
         return self
 
 
 class CommitterDateRevisionsWalker(RevisionsWalker):
     """
     Revisions walker that returns revisions in reverse chronological
     order according to committer date (same behaviour as ``git log``)
     """
 
     rw_type = "committer_date"
 
-    def process_rev(self, rev_id):
+    def process_rev(self, rev_id: Sha1Git) -> None:
         """
         Add the revision to a priority queue according to the committer date.
 
         Args:
             rev_id (bytes): the newly visited revision identifier
         """
-        if rev_id not in self._done:
+        if rev_id not in self._state.done:
             rev = self._get_rev(rev_id)
             if rev is not None:
                 commit_time = (
                     rev["committer_date"]["timestamp"]["seconds"]
                     if rev["committer_date"]
                     # allows to avoid failure with a revision without commit date
                     # and iterate on such revision before its parents
-                    else len(self._revs_to_visit)
+                    else len(self._state.revs_to_visit)
                 )
-                heapq.heappush(self._revs_to_visit, (-commit_time, rev_id))
+                heapq.heappush(self._state.revs_to_visit, (-commit_time, rev_id))
             else:
-                self._missing_revs.add(rev_id)
+                self._state.missing_revs.add(rev_id)
 
-    def get_next_rev_id(self):
+    def get_next_rev_id(self) -> Sha1Git:
         """
         Return the smallest revision from the priority queue, i.e.
         the one with highest committer date.
 
         Returns:
             dict: A dict describing a revision as returned by
             :meth:`swh.storage.interface.StorageInterface.revision_get`
         """
-        _, rev_id = heapq.heappop(self._revs_to_visit)
+        _, rev_id = heapq.heappop(self._state.revs_to_visit)
         return rev_id
 
 
 class BFSRevisionsWalker(RevisionsWalker):
     """
     Revisions walker that returns revisions in the same order
     as when performing a breadth-first search on the revisions
     DAG.
     """
 
     rw_type = "bfs"
 
     def __init__(self, *args, **kwargs):
         super().__init__(*args, **kwargs)
-        self._revs_to_visit = deque(self._revs_to_visit)
+        self._state.revs_to_visit = deque(self._state.revs_to_visit)
 
-    def process_rev(self, rev_id):
+    def process_rev(self, rev_id: Sha1Git) -> None:
         """
         Append the revision to a queue.
 
         Args:
             rev_id (bytes): the newly visited revision identifier
         """
-        if rev_id not in self._done:
-            self._revs_to_visit.append(rev_id)
+        if rev_id not in self._state.done:
+            self._state.revs_to_visit.append(rev_id)
 
-    def get_next_rev_id(self):
+    def get_next_rev_id(self) -> Sha1Git:
         """
         Return the next revision from the queue.
 
         Returns:
             dict: A dict describing a revision as returned by
             :meth:`swh.storage.interface.StorageInterface.revision_get`
         """
-        return self._revs_to_visit.popleft()
+        return self._state.revs_to_visit.popleft()
 
 
 class DFSPostRevisionsWalker(RevisionsWalker):
     """
     Revisions walker that returns revisions in the same order
     as when performing a depth-first search in post-order on the
     revisions DAG (i.e. after visiting a merge commit,
     the merged commit will be visited before the base it was
     merged on).
     """
 
     rw_type = "dfs_post"
 
-    def process_rev(self, rev_id):
+    def process_rev(self, rev_id: Sha1Git) -> None:
         """
         Append the revision to a stack.
 
         Args:
             rev_id (bytes): the newly visited revision identifier
         """
-        if rev_id not in self._done:
-            self._revs_to_visit.append(rev_id)
+        if rev_id not in self._state.done:
+            self._state.revs_to_visit.append(rev_id)
 
-    def get_next_rev_id(self):
+    def get_next_rev_id(self) -> Sha1Git:
         """
         Return the next revision from the stack.
 
         Returns:
             dict: A dict describing a revision as returned by
             :meth:`swh.storage.interface.StorageInterface.revision_get`
         """
-        return self._revs_to_visit.pop()
+        return self._state.revs_to_visit.pop()
 
 
 class DFSRevisionsWalker(DFSPostRevisionsWalker):
     """
     Revisions walker that returns revisions in the same order
     as when performing a depth-first search in pre-order on the
     revisions DAG (i.e. after visiting a merge commit,
     the base commit it was merged on will be visited before
     the merged commit).
     """
 
     rw_type = "dfs"
 
-    def process_parent_revs(self, rev):
+    def process_parent_revs(self, rev: Dict) -> None:
         """
         Process the parents of a revision when it is iterated in
         the reversed order they are declared.
 
         Args:
             rev (dict): A dict describing a revision as returned by
                 :meth:`swh.storage.interface.StorageInterface.revision_get`
         """
         for parent_id in reversed(rev["parents"]):
             self.process_rev(parent_id)
 
 
 class PathRevisionsWalker(CommitterDateRevisionsWalker):
     """
     Revisions walker that returns revisions where a specific
     path in the source tree has been modified, in other terms
     it allows to get the history for a specific file or directory.
 
     It has a behaviour similar to what ``git log`` offers by default,
     meaning the returned history is simplified in order to only
     show relevant revisions (see the `History Simplification
     <https://git-scm.com/docs/git-log#_history_simplification>`_
     section of the associated manual for more details).
 
     Please note that to avoid walking the entire history, the iteration
     will stop once a revision where the path has been added is found.
 
     .. warning:: Due to client-side implementation, performances
         are not optimal when the total numbers of revisions to walk
         is large. This should only be used when the total number of
         revisions does not exceed a couple of thousands.
 
     Args:
         storage (swh.storage.interface.StorageInterface): instance of swh storage
             (either local or remote)
         rev_start (bytes): a revision identifier
         path (str): the path in the source tree to retrieve the history
         max_revs (Optional[int]): maximum number of revisions to return
         state (Optional[dict]): previous state of that revisions walker
     """
 
     rw_type = "path"
 
     def __init__(self, storage, rev_start, path, **kwargs):
         super().__init__(storage, rev_start, **kwargs)
         paths = path.strip("/").split("/")
         self._path = list(map(lambda p: p.encode("utf-8"), paths))
         self._rev_dir_path = {}
 
     def _get_path_id(self, rev_id):
         """
         Return the path checksum identifier in the source tree of the
         provided revision. If the path corresponds to a directory, the
         value computed by :meth:`swh.model.Directory.compute_hash`
         will be returned. If the path corresponds to a file, its sha1
         checksum will be returned.
 
         Args:
             rev_id (bytes): a revision identifier
 
         Returns:
             bytes: the path identifier
         """
 
         rev = self._get_rev(rev_id)
 
         rev_dir_id = rev["directory"]
 
         if rev_dir_id not in self._rev_dir_path:
             try:
                 dir_info = self.storage.directory_entry_get_by_path(
                     rev_dir_id, self._path
                 )
                 self._rev_dir_path[rev_dir_id] = dir_info["target"]
             except Exception:
                 self._rev_dir_path[rev_dir_id] = None
 
         return self._rev_dir_path[rev_dir_id]
 
     def is_finished(self):
         """
         Check if the revisions iteration is finished.
         This checks for the specified path's existence in the last
         returned revision's parents' source trees.
         If not, the iteration is considered finished.
 
         Returns:
             bool: Whether to return the revision in the iteration
         """
         if self._path and self._last_rev:
             last_rev_parents = self._last_rev["parents"]
             last_rev_parents_path_ids = [
                 self._get_path_id(p_rev) for p_rev in last_rev_parents
             ]
             no_path = all([path_id is None for path_id in last_rev_parents_path_ids])
             if no_path:
                 return True
         return super().is_finished()
 
     def process_parent_revs(self, rev):
         """
         Process parents when a new revision is iterated.
         It enables to get a simplified revisions history in the same
         manner as ``git log``. When a revision has multiple parents,
         the following process is applied. If the revision was a merge,
         and has the same path identifier to one parent, follow only that
         parent (even if there are several parents with the same path
         identifier, follow only one of them.) Otherwise, follow all parents.
 
         Args:
             rev (dict): A dict describing a revision as returned by
                 :meth:`swh.storage.interface.StorageInterface.revision_get`
         """
         rev_path_id = self._get_path_id(rev["id"])
 
         if rev_path_id:
             if len(rev["parents"]) == 1:
                 self.process_rev(rev["parents"][0])
             else:
                 parent_rev_path_ids = [
                     self._get_path_id(p_rev) for p_rev in rev["parents"]
                 ]
                 different_trees = all(
                     [path_id != rev_path_id for path_id in parent_rev_path_ids]
                 )
                 for i, p_rev in enumerate(rev["parents"]):
                     if different_trees or parent_rev_path_ids[i] == rev_path_id:
                         self.process_rev(p_rev)
                         if not different_trees:
                             break
         else:
             super().process_parent_revs(rev)
 
     def should_return(self, rev):
         """
         Check if a revision should be returned when iterating.
         It verifies that the specified path has been modified
         by the revision but also that all parents have a path
         identifier different from the revision one in order
         to get a simplified history.
 
         Args:
             rev (dict): A dict describing a revision as returned by
                 :meth:`swh.storage.interface.StorageInterface.revision_get`
 
         Returns:
             bool: Whether to return the revision in the iteration
         """
         rev_path_id = self._get_path_id(rev["id"])
 
         if not rev["parents"]:
             return rev_path_id is not None
 
         parent_rev_path_ids = [self._get_path_id(p_rev) for p_rev in rev["parents"]]
         different_trees = all(
             [path_id != rev_path_id for path_id in parent_rev_path_ids]
         )
 
         if rev_path_id != parent_rev_path_ids[0] and different_trees:
             return True
 
         return False
 
 
 def get_revisions_walker(rev_walker_type, *args, **kwargs):
     """
     Instantiate a revisions walker of a given type.
 
     The following code snippet demonstrates how to use a revisions
     walker for processing a whole revisions history::
 
         from swh.storage import get_storage
 
         storage = get_storage(...)
 
         revs_walker = get_revisions_walker('committer_date', storage, rev_id)
         for rev in revs_walker:
             # process revision rev
 
     It is also possible to walk a revisions history in a paginated
     way as illustrated below::
 
         def get_revs_history_page(rw_type, storage, rev_id, page_num,
                                   page_size, rw_state):
             max_revs = (page_num + 1) * page_size
             revs_walker = get_revisions_walker(rw_type, storage, rev_id,
                                                max_revs=max_revs,
                                                state=rw_state)
             revs = list(revs_walker)
             rw_state = revs_walker.export_state()
             return revs
 
         rev_start = ...
         per_page = 50
         rw_state = {}
 
         for page in range(0, 10):
             revs_page = get_revs_history_page('dfs', storage, rev_start, page,
                                               per_page, rw_state)
             # process revisions page
 
 
     Args:
         rev_walker_type (str): the type of revisions walker to return,
             possible values are: *committer_date*, *dfs*, *dfs_post*,
             *bfs* and *path*
         args (list): position arguments to pass to the revisions walker
             constructor
         kwargs (dict): keyword arguments to pass to the revisions walker
             constructor
 
     """
     if rev_walker_type not in _revs_walker_classes:
         raise Exception('No revisions walker found for type "%s"' % rev_walker_type)
     revs_walker_class = _revs_walker_classes[rev_walker_type]
     return revs_walker_class(*args, **kwargs)