diff --git a/PKG-INFO b/PKG-INFO
index 378ac0f..4cf3c64 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,225 +1,225 @@
 Metadata-Version: 2.1
 Name: swh.loader.cvs
-Version: 0.2.2
+Version: 0.3.0
 Summary: Software Heritage CVS Loader
 Home-page: https://forge.softwareheritage.org/diffusion/swh-loader-cvs
 Author: Software Heritage developers
 Author-email: swh-devel@inria.fr
 License: UNKNOWN
 Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
 Project-URL: Funding, https://www.softwareheritage.org/donate
 Project-URL: Source, https://forge.softwareheritage.org/source/swh-loader-cvs
 Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-loader-cvs
 Platform: UNKNOWN
 Classifier: Programming Language :: Python :: 3
 Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: GNU Affero General Public License v3
 Classifier: Operating System :: OS Independent
 Classifier: Development Status :: 3 - Alpha
 Requires-Python: >=3.7
 Description-Content-Type: text/x-rst
 Provides-Extra: testing
 License-File: LICENSE
 License-File: AUTHORS
 
 Software Heritage - CVS loader
 ==============================
 
 The Software Heritage CVS Loader imports the history of CVS repositories
 into the SWH dataset.
 
 The main entry points is:
 
 -  ``swh.loader.cvs.loader.CvsLoader`` for the main cvs loader
    which ingests content out of a local cvs repository
 
 
 Features
 --------
 
 The CVS loader can access CVS repositories via rsync or via the CVS
 pserver protocol, with optional support for tunnelling pserver via SSH.
 
 The CVS loader does *not* require the cvs program to be installed.
 However, the loader's test suite does require cvs to be installed.
 
 Access via rsync requires the rsync program to be installed. The CVS
 loader will then invoke rsync to obtain a temporary local copy of the
 entire CVS repository. It will then walk the local copy the CVS
 repository and parse history of each RCS file with a built-in RCS
 parser. This will usually be the fastest method for importing a given
 CVS repository. However, most CVS servers do not offer repository access
 via rsync, and CVS repositories which see active commits may see
 conversion problems because the CVS repository format was not designed
 for lock-less read access.
 
 Access via the plaintext CVS pserver protocol requires no external
 dependencies to be installed, and is compatible with regular CVS
 servers. This method will use read-locks on the server side and should
 therefore be safe to use with active CVS repositories. The CVS loader
 will use a built-in minimal CVS client written in Python to fetch the
 output of the cvs rlog command executed on the CVS server. This output
 will be processed to obtain repository history information. All versions
 of all files will then be fetched from the server and injected into the
 SWH archive.
 
 Access via pserver over SSH requires OpenSSH to be installed. Apart from
 using SSH as a transport layer the conversion process is the same as in
 the plaintext pserver case. The SSH client will be instructed to trust
 SSH host key fingeprints upon first use. If a CVS server changes its SSH
 fingerprint then manual intervention may be required in order for future
 visits to be successful.
 
 Regardless of access protocol, the CVS loader uses heuristics to convert
 the per-file history stored in CVS into changesets. These changesets
 correspond to snapshots in the SWH database model. A given CVS
 repository should always yield a consistent series of changesets across
 multiple visits.
 
 The following URL protocol schemes are recognized by the loader:
 
 -  rsync://
 -  pserver://
 -  ssh://
 
 After the protocol scheme, the CVS server hostname must be specified,
 with an optional user:password field delimited from the hostname with
 the ‘@’ character::
 
    pserver://anonymous:password@cvs.example.com/
 
 After the hostname, the server-side CVS root path must be specified. The
 path will usually contain a CVSROOT directory on the server, though this
 directory may be hidden from clients::
 
    pserver://anonymous:password@cvs.example.com/var/cvs/
 
 The final component of the URL identifies the name of the CVS module
 which should be ingested into the SWH archive::
 
    pserver://anonymous:password@cvs.example.com/var/cvs/project1
 
 As a concrete example, this URL points to the historical CVS repository
 of the a2ps project. In this case, the cvsroot path is /sources/a2ps and
 the CVS module of the project is called a2ps::
 
    pserver://anonymous:anonymous@cvs.savannah.gnu.org/sources/a2ps/a2ps
 
 In order to obtain the history of this repository the CVS loader will
 perform the CVS pserver protocol exchange which is also performed by::
 
    cvs -d :pserver:anonymous@cvs.savannah.gnu.org/sources/a2ps rlog a2ps
 
 Known Limitations
 -----------------
 
 CVS repositories which see active commits should be converted with care.
 It is possible to end up with a partial conversion of the latest commit
 if repository data is fetched via rsync while a commit is in progress.
 The pserver protocol is the safer option in such cases.
 
 Only history of main CVS branch is converted. CVS vendor branch imports
 and merges which modify the main branch are modeled as two distinct
 commits to the main branch. Other branches will not be represented in
 the conversion result at all.
 
 CVS labels are not converted into corresponding SWH tags/releases yet.
 
 The converter does not yet support incremental fetching of CVS history.
 The entire history will be fetched and processed during every visit. By
 design, CVS does not fully support a concept of changesets that span
 multiple files and, as such, importing an evolving CVS history
 incrementally is a not a trivial problem. Regardless, some improvements
 could be made relatively easily, as noted below.
 
 CVS repositories copied with rsync could be cached locally, such that
 rsync will only download RCS files which have changed since the last
 visit. At present the local copy of the repository is fetched to a
 temporary directory and is deleted once the conversion process is done.
 
 It might help to store persistent meta-data about blobs imported from
 CVS. If such meta-data could be searched via a given CVS repository
 name, a path, and an RCS revision number then redundant downloads of
 file versions over the pserver protocol could be detected and skipped.
 
 The minimal CVS client does not yet support the optional gzip extension
 offered by the CVS pserver protocol. If this was supported then files
 downloaded from a CVS server could be compressed while in transit.
 
 The built-in minimal CVS client has not been tested against many
 versions of CVS. It should work fine against CVS 1.11 and 1.12 servers.
 More work may be needed to improve compatibility with older versions of
 CVS.
 
 Acknowledgements
 ----------------
 
 This software contains code derived from *cvs2gitdump* written by
 YASUOKA Masahiko, and from the *rcsparse* library written by Simon
 Schubert.
 
 This software contains code derived from ViewVC: https://www.viewvc.org/
 
 Licensing information
 ---------------------
 
 Parts of the software written by SWH developers are licensed under
 GPLv3. See the file LICENSE
 
 cvs2gitdump by YASUOKA Masahiko is licensed under ISC. See the top of
 the file swh/loader/cvs/cvs2gitdump/cvs2gitdump.py
 
 rcsparse by Simon Schubert is licensed under AGPLv3. See the file
 swh/loader/cvs/rcsparse/COPYRIGHT
 
 ViewVC is licensed under the 2-clause BSD licence. See the file
 swh/loader/cvs/rlog.py
 
 Running Tests
 =============
 
 The loader's test suite requires cvs to be installed.
 
 Because the rcsparse library is implemented in C and accessed via Python
 bindings, the CVS loader must be compiled and installed before tests can
 be run and the *build* directory must be passed as an argument to
 pytest::
 
    $ ./setup.py build install
    $ pytest ./build
 
 The test suite uses internal protocol schemes which cannot be reached
 from "Save Code Now". These are:
 
  - fake://
  - file://
 
 The fake:// scheme corresponds to pserver:// and ssh://. The test suite
 will spawn a 'cvs server' process locally and the loader will connect
 to this server via a pipe and communicate using the pserver protocol.
 Real ssh:// access lacks test coverage at present and would require
 sshd to become part of the test setup.
 
 The file:// scheme corresponds to rsync:// and behaves as if the rsync
 program had already created a local copy of the repository. Real rsync://
 access lacks test coverage at present and would require an rsyncd server
 to become part of the test setup.
 
 CLI run
 =======
 
 With the configuration:
 
 /tmp/loader_cvs.yml::
 
    storage:
      cls: remote
      args:
        url: http://localhost:5002/
 
 Run::
 
    swh loader --config-file /tmp/loader_cvs.yml \
        run cvs <cvs-url>
 
 
diff --git a/pytest.ini b/pytest.ini
index 3c9dea1..378d23a 100644
--- a/pytest.ini
+++ b/pytest.ini
@@ -1,5 +1,7 @@
 [pytest]
 
 norecursedirs = build docs .*
 markers =
     fs: execute tests that write to the filesystem
+
+asyncio_mode = strict
diff --git a/requirements.txt b/requirements.txt
index 1b5e575..481a213 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,5 +1,7 @@
 # Add here external Python modules dependencies, one per line. Module names
 # should match https://pypi.python.org/pypi names. For the full spec or
 # dependency lines, see https://pip.readthedocs.org/en/1.1/requirements.html
 
+sentry-sdk
 tenacity
+
diff --git a/swh.loader.cvs.egg-info/PKG-INFO b/swh.loader.cvs.egg-info/PKG-INFO
index 378ac0f..4cf3c64 100644
--- a/swh.loader.cvs.egg-info/PKG-INFO
+++ b/swh.loader.cvs.egg-info/PKG-INFO
@@ -1,225 +1,225 @@
 Metadata-Version: 2.1
 Name: swh.loader.cvs
-Version: 0.2.2
+Version: 0.3.0
 Summary: Software Heritage CVS Loader
 Home-page: https://forge.softwareheritage.org/diffusion/swh-loader-cvs
 Author: Software Heritage developers
 Author-email: swh-devel@inria.fr
 License: UNKNOWN
 Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
 Project-URL: Funding, https://www.softwareheritage.org/donate
 Project-URL: Source, https://forge.softwareheritage.org/source/swh-loader-cvs
 Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-loader-cvs
 Platform: UNKNOWN
 Classifier: Programming Language :: Python :: 3
 Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: GNU Affero General Public License v3
 Classifier: Operating System :: OS Independent
 Classifier: Development Status :: 3 - Alpha
 Requires-Python: >=3.7
 Description-Content-Type: text/x-rst
 Provides-Extra: testing
 License-File: LICENSE
 License-File: AUTHORS
 
 Software Heritage - CVS loader
 ==============================
 
 The Software Heritage CVS Loader imports the history of CVS repositories
 into the SWH dataset.
 
 The main entry points is:
 
 -  ``swh.loader.cvs.loader.CvsLoader`` for the main cvs loader
    which ingests content out of a local cvs repository
 
 
 Features
 --------
 
 The CVS loader can access CVS repositories via rsync or via the CVS
 pserver protocol, with optional support for tunnelling pserver via SSH.
 
 The CVS loader does *not* require the cvs program to be installed.
 However, the loader's test suite does require cvs to be installed.
 
 Access via rsync requires the rsync program to be installed. The CVS
 loader will then invoke rsync to obtain a temporary local copy of the
 entire CVS repository. It will then walk the local copy the CVS
 repository and parse history of each RCS file with a built-in RCS
 parser. This will usually be the fastest method for importing a given
 CVS repository. However, most CVS servers do not offer repository access
 via rsync, and CVS repositories which see active commits may see
 conversion problems because the CVS repository format was not designed
 for lock-less read access.
 
 Access via the plaintext CVS pserver protocol requires no external
 dependencies to be installed, and is compatible with regular CVS
 servers. This method will use read-locks on the server side and should
 therefore be safe to use with active CVS repositories. The CVS loader
 will use a built-in minimal CVS client written in Python to fetch the
 output of the cvs rlog command executed on the CVS server. This output
 will be processed to obtain repository history information. All versions
 of all files will then be fetched from the server and injected into the
 SWH archive.
 
 Access via pserver over SSH requires OpenSSH to be installed. Apart from
 using SSH as a transport layer the conversion process is the same as in
 the plaintext pserver case. The SSH client will be instructed to trust
 SSH host key fingeprints upon first use. If a CVS server changes its SSH
 fingerprint then manual intervention may be required in order for future
 visits to be successful.
 
 Regardless of access protocol, the CVS loader uses heuristics to convert
 the per-file history stored in CVS into changesets. These changesets
 correspond to snapshots in the SWH database model. A given CVS
 repository should always yield a consistent series of changesets across
 multiple visits.
 
 The following URL protocol schemes are recognized by the loader:
 
 -  rsync://
 -  pserver://
 -  ssh://
 
 After the protocol scheme, the CVS server hostname must be specified,
 with an optional user:password field delimited from the hostname with
 the ‘@’ character::
 
    pserver://anonymous:password@cvs.example.com/
 
 After the hostname, the server-side CVS root path must be specified. The
 path will usually contain a CVSROOT directory on the server, though this
 directory may be hidden from clients::
 
    pserver://anonymous:password@cvs.example.com/var/cvs/
 
 The final component of the URL identifies the name of the CVS module
 which should be ingested into the SWH archive::
 
    pserver://anonymous:password@cvs.example.com/var/cvs/project1
 
 As a concrete example, this URL points to the historical CVS repository
 of the a2ps project. In this case, the cvsroot path is /sources/a2ps and
 the CVS module of the project is called a2ps::
 
    pserver://anonymous:anonymous@cvs.savannah.gnu.org/sources/a2ps/a2ps
 
 In order to obtain the history of this repository the CVS loader will
 perform the CVS pserver protocol exchange which is also performed by::
 
    cvs -d :pserver:anonymous@cvs.savannah.gnu.org/sources/a2ps rlog a2ps
 
 Known Limitations
 -----------------
 
 CVS repositories which see active commits should be converted with care.
 It is possible to end up with a partial conversion of the latest commit
 if repository data is fetched via rsync while a commit is in progress.
 The pserver protocol is the safer option in such cases.
 
 Only history of main CVS branch is converted. CVS vendor branch imports
 and merges which modify the main branch are modeled as two distinct
 commits to the main branch. Other branches will not be represented in
 the conversion result at all.
 
 CVS labels are not converted into corresponding SWH tags/releases yet.
 
 The converter does not yet support incremental fetching of CVS history.
 The entire history will be fetched and processed during every visit. By
 design, CVS does not fully support a concept of changesets that span
 multiple files and, as such, importing an evolving CVS history
 incrementally is a not a trivial problem. Regardless, some improvements
 could be made relatively easily, as noted below.
 
 CVS repositories copied with rsync could be cached locally, such that
 rsync will only download RCS files which have changed since the last
 visit. At present the local copy of the repository is fetched to a
 temporary directory and is deleted once the conversion process is done.
 
 It might help to store persistent meta-data about blobs imported from
 CVS. If such meta-data could be searched via a given CVS repository
 name, a path, and an RCS revision number then redundant downloads of
 file versions over the pserver protocol could be detected and skipped.
 
 The minimal CVS client does not yet support the optional gzip extension
 offered by the CVS pserver protocol. If this was supported then files
 downloaded from a CVS server could be compressed while in transit.
 
 The built-in minimal CVS client has not been tested against many
 versions of CVS. It should work fine against CVS 1.11 and 1.12 servers.
 More work may be needed to improve compatibility with older versions of
 CVS.
 
 Acknowledgements
 ----------------
 
 This software contains code derived from *cvs2gitdump* written by
 YASUOKA Masahiko, and from the *rcsparse* library written by Simon
 Schubert.
 
 This software contains code derived from ViewVC: https://www.viewvc.org/
 
 Licensing information
 ---------------------
 
 Parts of the software written by SWH developers are licensed under
 GPLv3. See the file LICENSE
 
 cvs2gitdump by YASUOKA Masahiko is licensed under ISC. See the top of
 the file swh/loader/cvs/cvs2gitdump/cvs2gitdump.py
 
 rcsparse by Simon Schubert is licensed under AGPLv3. See the file
 swh/loader/cvs/rcsparse/COPYRIGHT
 
 ViewVC is licensed under the 2-clause BSD licence. See the file
 swh/loader/cvs/rlog.py
 
 Running Tests
 =============
 
 The loader's test suite requires cvs to be installed.
 
 Because the rcsparse library is implemented in C and accessed via Python
 bindings, the CVS loader must be compiled and installed before tests can
 be run and the *build* directory must be passed as an argument to
 pytest::
 
    $ ./setup.py build install
    $ pytest ./build
 
 The test suite uses internal protocol schemes which cannot be reached
 from "Save Code Now". These are:
 
  - fake://
  - file://
 
 The fake:// scheme corresponds to pserver:// and ssh://. The test suite
 will spawn a 'cvs server' process locally and the loader will connect
 to this server via a pipe and communicate using the pserver protocol.
 Real ssh:// access lacks test coverage at present and would require
 sshd to become part of the test setup.
 
 The file:// scheme corresponds to rsync:// and behaves as if the rsync
 program had already created a local copy of the repository. Real rsync://
 access lacks test coverage at present and would require an rsyncd server
 to become part of the test setup.
 
 CLI run
 =======
 
 With the configuration:
 
 /tmp/loader_cvs.yml::
 
    storage:
      cls: remote
      args:
        url: http://localhost:5002/
 
 Run::
 
    swh loader --config-file /tmp/loader_cvs.yml \
        run cvs <cvs-url>
 
 
diff --git a/swh.loader.cvs.egg-info/requires.txt b/swh.loader.cvs.egg-info/requires.txt
index cb67ee8..e131fa9 100644
--- a/swh.loader.cvs.egg-info/requires.txt
+++ b/swh.loader.cvs.egg-info/requires.txt
@@ -1,11 +1,12 @@
+sentry-sdk
 tenacity
 swh.core[http]>=0.3
 swh.storage>=0.11.3
 swh.model>=0.4.0
 swh.scheduler>=0.0.39
 swh.loader.core>=3.0.0
 
 [testing]
 pytest
 pytest-mock
 swh.scheduler[testing]
diff --git a/swh/loader/cvs/loader.py b/swh/loader/cvs/loader.py
index fea1504..a06ff0a 100644
--- a/swh/loader/cvs/loader.py
+++ b/swh/loader/cvs/loader.py
@@ -1,650 +1,654 @@
-# Copyright (C) 2015-2021  The Software Heritage developers
+# Copyright (C) 2015-2022  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU Affero General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 """Loader in charge of injecting either new or existing cvs repositories to
 swh-storage.
 
 """
 from datetime import datetime
 import os
 import os.path
 import subprocess
 import tempfile
 import time
 from typing import Any, BinaryIO, Dict, Iterator, List, Optional, Sequence, Tuple, cast
 
+import sentry_sdk
 from tenacity import retry
 from tenacity.retry import retry_if_exception_type
 from tenacity.stop import stop_after_attempt
 from urllib3.util import parse_url
 
 from swh.loader.core.loader import BaseLoader
 from swh.loader.core.utils import clean_dangling_folders
 from swh.loader.cvs.cvs2gitdump.cvs2gitdump import (
     CHANGESET_FUZZ_SEC,
     ChangeSetKey,
     CvsConv,
     FileRevision,
     RcsKeywords,
     file_path,
 )
 from swh.loader.cvs.cvsclient import CVSClient
 import swh.loader.cvs.rcsparse as rcsparse
 from swh.loader.cvs.rlog import RlogConv
 from swh.loader.exception import NotFound
 from swh.model import from_disk, hashutil
 from swh.model.model import (
     Content,
     Directory,
     Person,
     Revision,
     RevisionType,
     Sha1Git,
     SkippedContent,
     Snapshot,
     SnapshotBranch,
     TargetType,
     TimestampWithTimezone,
 )
 from swh.storage.algos.snapshot import snapshot_get_latest
 from swh.storage.interface import StorageInterface
 
 DEFAULT_BRANCH = b"HEAD"
 
 TEMPORARY_DIR_PREFIX_PATTERN = "swh.loader.cvs."
 
 
 def rsync_retry():
     return retry(
         retry=retry_if_exception_type(subprocess.CalledProcessError),
         stop=stop_after_attempt(max_attempt_number=4),
         reraise=True,
     )
 
 
 class BadPathException(Exception):
     pass
 
 
 class CvsLoader(BaseLoader):
     """Swh cvs loader.
 
     The repository is local.  The loader deals with
     update on an already previously loaded repository.
 
     """
 
     visit_type = "cvs"
 
     cvs_module_name: str
     cvsclient: CVSClient
 
     # remote CVS repository access (history is parsed from CVS rlog):
     rlog_file: BinaryIO
 
     swh_revision_gen: Iterator[
         Tuple[List[Content], List[SkippedContent], List[Directory], Revision]
     ]
 
     def __init__(
         self,
         storage: StorageInterface,
         url: str,
         origin_url: Optional[str] = None,
         visit_date: Optional[datetime] = None,
         cvsroot_path: Optional[str] = None,
         temp_directory: str = "/tmp",
         **kwargs: Any,
     ):
         self.cvsroot_url = url
         # origin url as unique identifier for origin in swh archive
         origin_url = origin_url if origin_url else self.cvsroot_url
         super().__init__(storage=storage, origin_url=origin_url, **kwargs)
         self.temp_directory = temp_directory
 
         # internal state used to store swh objects
         self._contents: List[Content] = []
         self._skipped_contents: List[SkippedContent] = []
         self._directories: List[Directory] = []
         self._revisions: List[Revision] = []
         # internal state, current visit
         self._last_revision: Optional[Revision] = None
         self._visit_status = "full"
         self.visit_date = visit_date or self.visit_date
         self.cvsroot_path = cvsroot_path
         self.custom_id_keyword = None
         self.excluded_keywords: List[str] = []
 
         self.snapshot: Optional[Snapshot] = None
         self.last_snapshot: Optional[Snapshot] = snapshot_get_latest(
             self.storage, self.origin.url
         )
 
     def compute_swh_revision(
         self, k: ChangeSetKey, logmsg: Optional[bytes]
     ) -> Tuple[Revision, from_disk.Directory]:
         """Compute swh hash data per CVS changeset.
 
         Returns:
             tuple (rev, swh_directory)
             - rev: current SWH revision computed from checked out work tree
             - swh_directory: dictionary of path, swh hash data with type
 
         """
         # Compute SWH revision from the on-disk state
         swh_dir = from_disk.Directory.from_disk(path=os.fsencode(self.worktree_path))
         parents: Tuple[Sha1Git, ...]
         if self._last_revision:
             parents = (self._last_revision.id,)
         else:
             parents = ()
         revision = self.build_swh_revision(k, logmsg, swh_dir.hash, parents)
-        self.log.info("SWH revision ID: %s", hashutil.hash_to_hex(revision.id))
+        self.log.debug("SWH revision ID: %s", hashutil.hash_to_hex(revision.id))
         self._last_revision = revision
         return (revision, swh_dir)
 
     def file_path_is_safe(self, wtpath):
         if "%s..%s" % (os.path.sep, os.path.sep) in wtpath:
             # Paths with back-references should not appear
             # in CVS protocol messages or CVS rlog output
             return False
         elif (
             os.path.commonpath([self.tempdir_path, os.path.normpath(wtpath)])
             != self.tempdir_path
         ):
             # The path must be a child of our temporary directory.
             return False
         else:
             return True
 
     def checkout_file_with_rcsparse(
         self, k: ChangeSetKey, f: FileRevision, rcsfile: rcsparse.rcsfile
     ) -> None:
         assert self.cvsroot_path
         assert self.server_style_cvsroot
         path = file_path(self.cvsroot_path, f.path)
         wtpath = os.path.join(self.tempdir_path, path)
         if not self.file_path_is_safe(wtpath):
             raise BadPathException(f"unsafe path found in RCS file: {f.path}")
-        self.log.info("rev %s state %s file %s", f.rev, f.state, f.path)
+        self.log.debug("rev %s state %s file %s", f.rev, f.state, f.path)
         if f.state == "dead":
             # remove this file from work tree
             try:
                 os.remove(wtpath)
             except FileNotFoundError:
                 pass
         else:
             # create, or update, this file in the work tree
             if not rcsfile:
                 rcsfile = rcsparse.rcsfile(f.path)
             rcs = RcsKeywords()
 
             # We try our best to generate the same commit hashes over both pserver
             # and rsync. To avoid differences in file content due to expansion of
             # RCS keywords which contain absolute file paths (such as "Header"),
             # attempt to expand such paths in the same way as a regular CVS server
             # would expand them.
             # Whether this will avoid content differences depends on pserver and
             # rsync servers exposing the same server-side path to the CVS repository.
             # However, this is the best we can do, and only matters if an origin can
             # be fetched over both pserver and rsync. Each will still be treated as
             # a distinct origin, but will hopefully point at the same SWH snapshot.
             # In any case, an absolute path based on the origin URL looks nicer than
             # an absolute path based on a temporary directory used by the CVS loader.
             server_style_path = f.path.replace(
                 self.cvsroot_path, self.server_style_cvsroot
             )
             if server_style_path[0] != "/":
                 server_style_path = "/" + server_style_path
 
             if self.custom_id_keyword is not None:
                 rcs.add_id_keyword(self.custom_id_keyword)
             contents = rcs.expand_keyword(
                 server_style_path, rcsfile, f.rev, self.excluded_keywords
             )
             os.makedirs(os.path.dirname(wtpath), exist_ok=True)
             outfile = open(wtpath, mode="wb")
             outfile.write(contents)
             outfile.close()
 
     def checkout_file_with_cvsclient(
         self, k: ChangeSetKey, f: FileRevision, cvsclient: CVSClient
     ):
         assert self.cvsroot_path
         path = file_path(self.cvsroot_path, f.path)
         wtpath = os.path.join(self.tempdir_path, path)
         if not self.file_path_is_safe(wtpath):
             raise BadPathException(f"unsafe path found in cvs rlog output: {f.path}")
-        self.log.info("rev %s state %s file %s", f.rev, f.state, f.path)
+        self.log.debug("rev %s state %s file %s", f.rev, f.state, f.path)
         if f.state == "dead":
             # remove this file from work tree
             try:
                 os.remove(wtpath)
             except FileNotFoundError:
                 pass
         else:
             dirname = os.path.dirname(wtpath)
             os.makedirs(dirname, exist_ok=True)
             self.log.debug("checkout to %s\n", wtpath)
             fp = cvsclient.checkout(path, f.rev, dirname, expand_keywords=True)
             os.rename(fp.name, wtpath)
             try:
                 fp.close()
             except FileNotFoundError:
                 # Well, we have just renamed the file...
                 pass
 
     def process_cvs_changesets(
         self,
         cvs_changesets: List[ChangeSetKey],
         use_rcsparse: bool,
     ) -> Iterator[
         Tuple[List[Content], List[SkippedContent], List[Directory], Revision]
     ]:
         """Process CVS revisions.
 
         At each CVS revision, check out contents and compute swh hashes.
 
         Yields:
             tuple (contents, skipped-contents, directories, revision) of dict as a
             dictionary with keys, sha1_git, sha1, etc...
 
         """
         for k in cvs_changesets:
             tstr = time.strftime("%c", time.gmtime(k.max_time))
-            self.log.info(
+            self.log.debug(
                 "changeset from %s by %s on branch %s", tstr, k.author, k.branch
             )
             logmsg: Optional[bytes] = b""
             # Check out all files of this revision and get a log message.
             #
             # The log message is obtained from the first file in the changeset.
             # The message will usually be the same for all affected files, and
             # the SWH archive will only store one version of the log message.
             for f in k.revs:
                 rcsfile = None
                 if use_rcsparse:
                     if rcsfile is None:
                         rcsfile = rcsparse.rcsfile(f.path)
                     if not logmsg:
                         logmsg = rcsfile.getlog(k.revs[0].rev)
                     self.checkout_file_with_rcsparse(k, f, rcsfile)
                 else:
                     if not logmsg:
                         logmsg = self.rlog.getlog(self.rlog_file, f.path, k.revs[0].rev)
                     self.checkout_file_with_cvsclient(k, f, self.cvsclient)
 
             # TODO: prune empty directories?
             (revision, swh_dir) = self.compute_swh_revision(k, logmsg)
             (contents, skipped_contents, directories) = from_disk.iter_directory(
                 swh_dir
             )
             yield contents, skipped_contents, directories, revision
 
     def pre_cleanup(self) -> None:
         """Cleanup potential dangling files from prior runs (e.g. OOM killed
         tasks)
 
         """
         clean_dangling_folders(
             self.temp_directory,
             pattern_check=TEMPORARY_DIR_PREFIX_PATTERN,
             log=self.log,
         )
 
     def cleanup(self) -> None:
-        self.log.info("cleanup")
+        self.log.debug("cleanup")
 
     def configure_custom_id_keyword(self, cvsconfig):
         """Parse CVSROOT/config and look for a custom keyword definition.
         There are two different configuration directives in use for this purpose.
 
         The first variant stems from a patch which was never accepted into
         upstream CVS and uses the tag directive: tag=MyName
         With this, the "MyName" keyword becomes an alias for the "Id" keyword.
         This variant is prelevant in CVS versions shipped on BSD.
 
         The second variant stems from upstream CVS 1.12 and looks like:
         LocalKeyword=MyName=SomeKeyword
         KeywordExpand=iMyName
         We only support "SomeKeyword" if it specifies "Id" or "CVSHeader", for now.
         The KeywordExpand directive can be used to suppress expansion of keywords
         by listing keywords after an initial "e" character ("exclude", as opposed
         to an "include" list which uses an initial "i" character).
         For example, this disables expansion of the Date and Name keywords:
         KeywordExpand=eDate,Name
         """
         for line in cvsconfig.readlines():
             line = line.strip()
             try:
                 (config_key, value) = line.split("=", 1)
             except ValueError:
                 continue
             config_key = config_key.strip()
             value = value.strip()
             if config_key == "tag":
                 self.custom_id_keyword = value
             elif config_key == "LocalKeyword":
                 try:
                     (custom_kwname, kwname) = value.split("=", 1)
                 except ValueError:
                     continue
                 if kwname.strip() in ("Id", "CVSHeader"):
                     self.custom_id_keyword = custom_kwname.strip()
             elif config_key == "KeywordExpand" and value.startswith("e"):
                 excluded_keywords = value[1:].split(",")
                 for k in excluded_keywords:
                     self.excluded_keywords.append(k.strip())
 
     @rsync_retry()
     def execute_rsync(
         self, rsync_cmd: List[str], **run_opts
     ) -> subprocess.CompletedProcess:
         rsync = subprocess.run(rsync_cmd, **run_opts)
         rsync.check_returncode()
         return rsync
 
     def fetch_cvs_repo_with_rsync(self, host: str, path: str) -> None:
         # URL *must* end with a trailing slash in order to get CVSROOT listed
         url = "rsync://%s%s/" % (host, os.path.dirname(path))
         rsync = self.execute_rsync(
             ["rsync", url], capture_output=True, encoding="ascii"
         )
         have_cvsroot = False
         have_module = False
         for line in rsync.stdout.split("\n"):
             self.log.debug("rsync server: %s", line)
             if line.endswith(" CVSROOT"):
                 have_cvsroot = True
             elif line.endswith(" %s" % self.cvs_module_name):
                 have_module = True
             if have_module and have_cvsroot:
                 break
         if not have_module:
             raise NotFound(f"CVS module {self.cvs_module_name} not found at {url}")
         if not have_cvsroot:
             raise NotFound(f"No CVSROOT directory found at {url}")
 
         # Fetch the CVSROOT directory and the desired CVS module.
         assert self.cvsroot_path
         for d in ("CVSROOT", self.cvs_module_name):
             target_dir = os.path.join(self.cvsroot_path, d)
             os.makedirs(target_dir, exist_ok=True)
             # Append trailing path separators ("/" in the URL and os.path.sep in the
             # local target directory path) to ensure that rsync will place files
             # directly within our target directory .
             self.execute_rsync(
                 ["rsync", "-az", url + d + "/", target_dir + os.path.sep]
             )
 
     def prepare(self) -> None:
         self._last_revision = None
         self.tempdir_path = tempfile.mkdtemp(
             suffix="-%s" % os.getpid(),
             prefix=TEMPORARY_DIR_PREFIX_PATTERN,
             dir=self.temp_directory,
         )
         url = parse_url(self.origin.url)
         self.log.debug(
             "prepare; origin_url=%s scheme=%s path=%s",
             self.origin.url,
             url.scheme,
             url.path,
         )
         if not url.path:
             raise NotFound(f"Invalid CVS origin URL '{self.origin.url}'")
         self.cvs_module_name = os.path.basename(url.path)
         self.server_style_cvsroot = os.path.dirname(url.path)
         self.worktree_path = os.path.join(self.tempdir_path, self.cvs_module_name)
         if url.scheme == "file" or url.scheme == "rsync":
             # local CVS repository conversion
             if not self.cvsroot_path:
                 self.cvsroot_path = tempfile.mkdtemp(
                     suffix="-%s" % os.getpid(),
                     prefix=TEMPORARY_DIR_PREFIX_PATTERN,
                     dir=self.temp_directory,
                 )
             if url.scheme == "file":
                 if not os.path.exists(url.path):
                     raise NotFound
             elif url.scheme == "rsync":
                 self.fetch_cvs_repo_with_rsync(url.host, url.path)
 
             have_rcsfile = False
             have_cvsroot = False
             for root, dirs, files in os.walk(self.cvsroot_path):
                 if "CVSROOT" in dirs:
                     have_cvsroot = True
                     dirs.remove("CVSROOT")
                     continue
                 for f in files:
                     filepath = os.path.join(root, f)
                     if f[-2:] == ",v":
                         rcsfile = rcsparse.rcsfile(filepath)  # noqa: F841
                         self.log.debug(
                             "Looks like we have data to convert; "
                             "found a valid RCS file at %s",
                             filepath,
                         )
                         have_rcsfile = True
                         break
                 if have_rcsfile:
                     break
 
             if not have_rcsfile:
                 raise NotFound(
                     f"Directory {self.cvsroot_path} does not contain any valid "
                     "RCS files",
                 )
             if not have_cvsroot:
                 self.log.warn(
                     "The CVS repository at '%s' lacks a CVSROOT directory; "
                     "we might be ingesting an incomplete copy of the repository",
                     self.cvsroot_path,
                 )
 
             # The file CVSROOT/config will usually contain ASCII data only.
             # We allow UTF-8 just in case. Other encodings may result in an
             # error and will require manual intervention, for now.
             cvsconfig_path = os.path.join(self.cvsroot_path, "CVSROOT", "config")
             cvsconfig = open(cvsconfig_path, mode="r", encoding="utf-8")
             self.configure_custom_id_keyword(cvsconfig)
             cvsconfig.close()
 
             # Unfortunately, there is no way to convert CVS history in an
             # iterative fashion because the data is not indexed by any kind
             # of changeset ID. We need to walk the history of each and every
             # RCS file in the repository during every visit, even if no new
             # changes will be added to the SWH archive afterwards.
             # "CVS’s repository is the software equivalent of a telephone book
             # sorted by telephone number."
             # https://corecursive.com/software-that-doesnt-suck-with-jim-blandy/
             #
             # An implicit assumption made here is that self.cvs_changesets will
             # fit into memory in its entirety. If it won't fit then the CVS walker
             # will need to be modified such that it spools the list of changesets
             # to disk instead.
             cvs = CvsConv(self.cvsroot_path, RcsKeywords(), False, CHANGESET_FUZZ_SEC)
-            self.log.info("Walking CVS module %s", self.cvs_module_name)
+            self.log.debug("Walking CVS module %s", self.cvs_module_name)
             cvs.walk(self.cvs_module_name)
             cvs_changesets = sorted(cvs.changesets)
-            self.log.info(
+            self.log.debug(
                 "CVS changesets found in %s: %d",
                 self.cvs_module_name,
                 len(cvs_changesets),
             )
             self.swh_revision_gen = self.process_cvs_changesets(
                 cvs_changesets, use_rcsparse=True
             )
         elif url.scheme == "pserver" or url.scheme == "fake" or url.scheme == "ssh":
             # remote CVS repository conversion
             if not self.cvsroot_path:
                 self.cvsroot_path = os.path.dirname(url.path)
             self.cvsclient = CVSClient(url)
             cvsroot_path = os.path.dirname(url.path)
-            self.log.info(
+            self.log.debug(
                 "Fetching CVS rlog from %s:%s/%s",
                 url.host,
                 cvsroot_path,
                 self.cvs_module_name,
             )
             self.rlog = RlogConv(cvsroot_path, CHANGESET_FUZZ_SEC)
             main_rlog_file = self.cvsclient.fetch_rlog()
             self.rlog.parse_rlog(main_rlog_file)
             # Find file deletion events only visible in Attic directories.
             main_changesets = self.rlog.changesets
             attic_paths = []
             attic_rlog_files = []
             assert self.cvsroot_path
             for k in main_changesets:
                 for changed_file in k.revs:
                     path = file_path(self.cvsroot_path, changed_file.path)
                     if path.startswith(self.cvsroot_path):
                         path = path[
                             len(os.path.commonpath([self.cvsroot_path, path])) + 1 :
                         ]
                     parent_path = os.path.dirname(path)
 
                     if parent_path.split("/")[-1] == "Attic":
                         continue
                     attic_path = parent_path + "/Attic"
                     if attic_path in attic_paths:
                         continue
                     attic_paths.append(attic_path)  # avoid multiple visits
                     # Try to fetch more rlog data from this Attic directory.
                     attic_rlog_file = self.cvsclient.fetch_rlog(
                         path=attic_path,
                         state="dead",
                     )
                     if attic_rlog_file:
                         attic_rlog_files.append(attic_rlog_file)
             if len(attic_rlog_files) == 0:
                 self.rlog_file = main_rlog_file
             else:
                 # Combine all the rlog pieces we found and re-parse.
                 fp = tempfile.TemporaryFile()
                 for attic_rlog_file in attic_rlog_files:
                     for line in attic_rlog_file.readlines():
                         fp.write(line)
                         attic_rlog_file.close()
                 main_rlog_file.seek(0)
                 for line in main_rlog_file.readlines():
                     fp.write(line)
                 main_rlog_file.close()
                 fp.seek(0)
                 self.rlog.parse_rlog(cast(BinaryIO, fp))
                 self.rlog_file = cast(BinaryIO, fp)
             cvs_changesets = sorted(self.rlog.changesets)
-            self.log.info(
+            self.log.debug(
                 "CVS changesets found for %s: %d",
                 self.cvs_module_name,
                 len(cvs_changesets),
             )
             self.swh_revision_gen = self.process_cvs_changesets(
                 cvs_changesets, use_rcsparse=False
             )
         else:
             raise NotFound(f"Invalid CVS origin URL '{self.origin.url}'")
 
     def fetch_data(self) -> bool:
         """Fetch the next CVS revision."""
         try:
             data = next(self.swh_revision_gen)
         except StopIteration:
             assert self._last_revision is not None
             self.snapshot = self.generate_and_load_snapshot(self._last_revision)
-            self.log.info("SWH snapshot ID: %s", hashutil.hash_to_hex(self.snapshot.id))
+            self.log.debug(
+                "SWH snapshot ID: %s", hashutil.hash_to_hex(self.snapshot.id)
+            )
             self.flush()
             self.loaded_snapshot_id = self.snapshot.id
             return False
         except Exception:
             self.log.exception("Exception in fetch_data:")
+            sentry_sdk.capture_exception()
             self._visit_status = "failed"
             return False  # Stopping iteration
         self._contents, self._skipped_contents, self._directories, rev = data
         self._revisions = [rev]
         return True
 
     def build_swh_revision(
         self,
         k: ChangeSetKey,
         logmsg: Optional[bytes],
         dir_id: bytes,
         parents: Sequence[bytes],
     ) -> Revision:
         """Given a CVS revision, build a swh revision.
 
         Args:
             k: changeset data
             logmsg: the changeset's log message
             dir_id: the tree's hash identifier
             parents: the revision's parents identifier
 
         Returns:
             The swh revision dictionary.
 
         """
         author = Person.from_fullname(k.author.encode("UTF-8"))
         date = TimestampWithTimezone.from_dict(k.max_time)
 
         return Revision(
             type=RevisionType.CVS,
             date=date,
             committer_date=date,
             directory=dir_id,
             message=logmsg,
             author=author,
             committer=author,
             synthetic=True,
             extra_headers=[],
             parents=tuple(parents),
         )
 
     def generate_and_load_snapshot(self, revision: Revision) -> Snapshot:
         """Create the snapshot either from existing revision.
 
         Args:
             revision (dict): Last revision seen if any (None by default)
 
         Returns:
             Optional[Snapshot] The newly created snapshot
 
         """
         snap = Snapshot(
             branches={
                 DEFAULT_BRANCH: SnapshotBranch(
                     target=revision.id, target_type=TargetType.REVISION
                 )
             }
         )
         self.log.debug("snapshot: %s", snap)
         self.storage.snapshot_add([snap])
         return snap
 
     def store_data(self) -> None:
         "Add our current CVS changeset to the archive."
         self.storage.skipped_content_add(self._skipped_contents)
         self.storage.content_add(self._contents)
         self.storage.directory_add(self._directories)
         self.storage.revision_add(self._revisions)
         self.flush()
         self._skipped_contents = []
         self._contents = []
         self._directories = []
         self._revisions = []
 
     def load_status(self) -> Dict[str, Any]:
         if self.snapshot is None:
             load_status = "failed"
         elif self.last_snapshot == self.snapshot:
             load_status = "uneventful"
         else:
             load_status = "eventful"
         return {
             "status": load_status,
         }
 
     def visit_status(self) -> str:
         return self._visit_status