diff --git a/PKG-INFO b/PKG-INFO index 353ab2b4..8d55c604 100644 --- a/PKG-INFO +++ b/PKG-INFO @@ -1,246 +1,246 @@ Metadata-Version: 2.1 Name: swh.storage -Version: 1.5.1 +Version: 1.6.0 Summary: Software Heritage storage manager Home-page: https://forge.softwareheritage.org/diffusion/DSTO/ Author: Software Heritage developers Author-email: swh-devel@inria.fr Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest Project-URL: Funding, https://www.softwareheritage.org/donate Project-URL: Source, https://forge.softwareheritage.org/source/swh-storage Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-storage/ Classifier: Programming Language :: Python :: 3 Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3) Classifier: Operating System :: OS Independent Classifier: Development Status :: 5 - Production/Stable Requires-Python: >=3.7 Description-Content-Type: text/markdown Provides-Extra: testing Provides-Extra: journal License-File: LICENSE License-File: AUTHORS swh-storage =========== Abstraction layer over the archive, allowing to access all stored source code artifacts as well as their metadata. See the [documentation](https://docs.softwareheritage.org/devel/swh-storage/index.html) for more details. ## Quick start ### Dependencies Python tests for this module include tests that cannot be run without a local Postgresql database, so you need the Postgresql server executable on your machine (no need to have a running Postgresql server). They also expect a cassandra server. #### Debian-like host ``` $ sudo apt install libpq-dev postgresql-11 cassandra ``` #### Non Debian-like host The tests expects the path to `cassandra` to either be unspecified, it is then looked up at `/usr/sbin/cassandra`, either specified through the environment variable `SWH_CASSANDRA_BIN`. Optionally, you can avoid running the cassandra tests. ``` (swh) :~/swh-storage$ tox -- -m 'not cassandra' ``` ### Installation It is strongly recommended to use a virtualenv. In the following, we consider you work in a virtualenv named `swh`. See the [developer setup guide](https://docs.softwareheritage.org/devel/developer-setup.html#developer-setup) for a more details on how to setup a working environment. You can install the package directly from [pypi](https://pypi.org/p/swh.storage): ``` (swh) :~$ pip install swh.storage [...] ``` Or from sources: ``` (swh) :~$ git clone https://forge.softwareheritage.org/source/swh-storage.git [...] (swh) :~$ cd swh-storage (swh) :~/swh-storage$ pip install . [...] ``` Then you can check it's properly installed: ``` (swh) :~$ swh storage --help Usage: swh storage [OPTIONS] COMMAND [ARGS]... Software Heritage Storage tools. Options: -h, --help Show this message and exit. Commands: rpc-serve Software Heritage Storage RPC server. ``` ## Tests The best way of running Python tests for this module is to use [tox](https://tox.readthedocs.io/). ``` (swh) :~$ pip install tox ``` ### tox From the sources directory, simply use tox: ``` (swh) :~/swh-storage$ tox [...] ========= 315 passed, 6 skipped, 15 warnings in 40.86 seconds ========== _______________________________ summary ________________________________ flake8: commands succeeded py3: commands succeeded congratulations :) ``` Note: it is possible to set the `JAVA_HOME` environment variable to specify the version of the JVM to be used by Cassandra. For example, at the time of writing this, Cassandra does not support java 14, so one may want to use for example java 11: ``` (swh) :~/swh-storage$ export JAVA_HOME=/usr/lib/jvm/java-14-openjdk-amd64/bin/java (swh) :~/swh-storage$ tox [...] ``` ## Development The storage server can be locally started. It requires a configuration file and a running Postgresql database. ### Sample configuration A typical configuration `storage.yml` file is: ``` storage: cls: postgresql db: "dbname=softwareheritage-dev user= password=" objstorage: cls: pathslicing root: /tmp/swh-storage/ slicing: 0:2/2:4/4:6 ``` which means, this uses: - a local storage instance whose db connection is to `softwareheritage-dev` local instance, - the objstorage uses a local objstorage instance whose: - `root` path is /tmp/swh-storage, - slicing scheme is `0:2/2:4/4:6`. This means that the identifier of the content (sha1) which will be stored on disk at first level with the first 2 hex characters, the second level with the next 2 hex characters and the third level with the next 2 hex characters. And finally the complete hash file holding the raw content. For example: 00062f8bd330715c4f819373653d97b3cd34394c will be stored at 00/06/2f/00062f8bd330715c4f819373653d97b3cd34394c Note that the `root` path should exist on disk before starting the server. ### Starting the storage server If the python package has been properly installed (e.g. in a virtual env), you should be able to use the command: ``` (swh) :~/swh-storage$ swh storage rpc-serve storage.yml ``` This runs a local swh-storage api at 5002 port. ``` (swh) :~/swh-storage$ curl http://127.0.0.1:5002 Software Heritage storage server

You have reached the Software Heritage storage server.
See its documentation and API for more information

``` ### And then what? In your upper layer ([loader-git](https://forge.softwareheritage.org/source/swh-loader-git/), [loader-svn](https://forge.softwareheritage.org/source/swh-loader-svn/), etc...), you can define a remote storage with this snippet of yaml configuration. ``` storage: cls: remote url: http://localhost:5002/ ``` You could directly define a postgresql storage with the following snippet: ``` storage: cls: postgresql db: service=swh-dev objstorage: cls: pathslicing root: /home/storage/swh-storage/ slicing: 0:2/2:4/4:6 ``` ## Cassandra As an alternative to PostgreSQL, swh-storage can use Cassandra as a database backend. It can be used like this: ``` storage: cls: cassandra hosts: - localhost objstorage: cls: pathslicing root: /home/storage/swh-storage/ slicing: 0:2/2:4/4:6 ``` The Cassandra swh-storage implementation supports both Cassandra >= 4.0-alpha2 and ScyllaDB >= 4.4 (and possibly earlier versions, but this is untested). While the main code supports both transparently, running tests or configuring the schema requires specific code when using ScyllaDB, enabled by setting the `SWH_USE_SCYLLADB=1` environment variable. diff --git a/debian/changelog b/debian/changelog index 3f597ef3..730ea27a 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,2946 +1,2951 @@ -swh-storage (1.5.1-1~swh1~bpo10+1) buster-swh; urgency=medium +swh-storage (1.6.0-1~swh1) unstable-swh; urgency=medium - * Rebuild for buster-swh + * New upstream release 1.6.0 - (tagged by Valentin Lorentz + on 2022-08-16 14:21:35 +0200) + * Upstream changes: - v1.6.0 - * Add anchor extrinsic-metadata- + original-artifacts-json - * Convert psycopg2 errors to + TransientRemoteException instead of RemoteException - * retry: + Add constant 10s wait when retrying transient exceptions - -- Software Heritage autobuilder (on jenkins-debian1) Fri, 05 Aug 2022 12:27:52 +0000 + -- Software Heritage autobuilder (on jenkins-debian1) Tue, 16 Aug 2022 12:36:51 +0000 swh-storage (1.5.1-1~swh1) unstable-swh; urgency=medium * New upstream release 1.5.1 - (tagged by Valentin Lorentz on 2022-08-05 13:59:15 +0200) * Upstream changes: - v1.5.1 - * Fix flaky tests - * cassandra: Make origin_visit_status_get_random's interval consistent with postgresql -- Software Heritage autobuilder (on jenkins-debian1) Fri, 05 Aug 2022 12:17:23 +0000 swh-storage (1.5.0-1~swh1) unstable-swh; urgency=medium * New upstream release 1.5.0 - (tagged by Valentin Lorentz on 2022-08-05 10:28:27 +0200) * Upstream changes: - v1.5.0 - * postgresql: Stop logging and sending timeouts to Sentry - * cassandra: Fix warnings - * Fix flaky test - * cassandra: document UPSERT behavior in a test -- Software Heritage autobuilder (on jenkins-debian1) Fri, 05 Aug 2022 08:44:42 +0000 swh-storage (1.4.2-1~swh1) unstable-swh; urgency=medium * New upstream release 1.4.2 - (tagged by Antoine R. Dumont (@ardumont) on 2022-08-04 09:57:35 +0200) * Upstream changes: - v1.4.2 - postgresql: Increase some timeouts to get origin visits - backfill: Add support for directories with duplicated entries - cli: move an import statement in the cli command - do not always auto-create an OriginVisitStatus object in origin_visit_add() - Add a Storage.flavor property to the postgresql backend - Update pytest_plugin for swh.core 2.10 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 04 Aug 2022 08:13:25 +0000 swh-storage (1.4.1-1~swh1) unstable-swh; urgency=medium * New upstream release 1.4.1 - (tagged by Antoine R. Dumont (@ardumont) on 2022-06-03 15:52:27 +0200) * Upstream changes: - v1.4.1 - Set current_version attribute to postgresql datastore - origin/master pytest_plugin: use the stock pytest_postgresql postgresql factory - Add missing __init__.py in proxies/ - docs: Describe metadata formats more precisely, and mention github and gitlab's - add strict asyncio_mode in pytest.ini -- Software Heritage autobuilder (on jenkins-debian1) Fri, 03 Jun 2022 15:25:19 +0000 swh-storage (1.4.0-1~swh1) unstable-swh; urgency=medium * New upstream release 1.4.0 - (tagged by Valentin Lorentz on 2022-05-03 11:43:34 +0200) * Upstream changes: - v1.4.0 - * Add function storage.algos.directory.directory_get - * Fix deprecation warnings -- Software Heritage autobuilder (on jenkins-debian1) Tue, 03 May 2022 09:55:07 +0000 swh-storage (1.3.2-1~swh1) unstable-swh; urgency=medium * New upstream release 1.3.2 - (tagged by Antoine R. Dumont (@ardumont) on 2022-04-25 14:22:19 +0200) * Upstream changes: - v1.3.2 - retry: re-raise original exception instead of a RetryError - Use logger everywhere in tenacious.py - pre-commit: Remove codespell commit-msg hook -- Software Heritage autobuilder (on jenkins-debian1) Mon, 25 Apr 2022 12:33:52 +0000 swh-storage (1.3.1-1~swh1) unstable-swh; urgency=medium * New upstream release 1.3.1 - (tagged by Antoine Lambert on 2022-04-12 14:36:22 +0200) * Upstream changes: - version 1.3.1 -- Software Heritage autobuilder (on jenkins-debian1) Tue, 12 Apr 2022 12:48:04 +0000 swh-storage (1.3.0-1~swh1) unstable-swh; urgency=medium * New upstream release 1.3.0 - (tagged by Antoine R. Dumont (@ardumont) on 2022-04-11 14:23:42 +0200) * Upstream changes: - v1.3.0 - interface: Add new method origin_visit_get_with_statuses - Make postgresql's Storage client options configurable from config - requirements-test: Remove pytest pinning to < 7 - Add .git-blame-ignore-revs file with automatic reformatting commits - python: Reformat code with black 22.3.0 - pre-commit, tox: Bump black from 19.10b0 to 22.3.0 -- Software Heritage autobuilder (on jenkins-debian1) Mon, 11 Apr 2022 12:37:01 +0000 swh-storage (1.2.0-1~swh1) unstable-swh; urgency=medium * New upstream release 1.2.0 - (tagged by Nicolas Dandrimont on 2022-03-23 16:49:10 +0100) * Upstream changes: - Release swh.storage 1.2.0 - Compatibility with swh.model 6 -- Software Heritage autobuilder (on jenkins-debian1) Wed, 23 Mar 2022 16:00:33 +0000 swh-storage (1.1.0-1~swh1) unstable-swh; urgency=medium * New upstream release 1.1.0 - (tagged by Valentin Lorentz on 2022-03-16 16:09:49 +0100) * Upstream changes: - v1.1.0 - * postgresql: Increase timeouts that often fail - * origin_visit_get_latest: Order by visit id instead of date - * postgresql: Remove unused listener code from db.py - * backfill: Make integer_ranges() work on str args + add typing to RANGE_GENERATORS - * backfill: Add missing raw_manifest to directories -- Software Heritage autobuilder (on jenkins-debian1) Wed, 23 Mar 2022 15:01:47 +0000 swh-storage (1.0.0-1~swh1) unstable-swh; urgency=medium * New upstream release 1.0.0 - (tagged by David Douard on 2022-02-24 11:32:18 +0100) * Upstream changes: - v1.0.0 - add support for `swh db upgrade`, aka updated for swh core 2.0.0 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 24 Feb 2022 11:13:57 +0000 swh-storage (0.43.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.43.1 - (tagged by Valentin Lorentz on 2022-02-08 10:21:59 +0100) * Upstream changes: - v0.43.1 - * Add typing to revision_walker.py and make the state a dataclass - * revision_walker: Add support for ignore_displayname. -- Software Heritage autobuilder (on jenkins-debian1) Tue, 08 Feb 2022 09:31:33 +0000 swh-storage (0.43.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.43.0 - (tagged by Nicolas Dandrimont on 2022-02-02 18:46:15 +0100) * Upstream changes: - Release swh.storage 0.43.0 - Add a displayname field to the person table in PostgreSQL - Use the displayname field in returned results for revision_get, - revision_log and release_get -- Software Heritage autobuilder (on jenkins-debian1) Wed, 02 Feb 2022 17:56:33 +0000 swh-storage (0.42.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.42.0 - (tagged by Valentin Lorentz on 2022-01-25 16:44:35 +0100) * Upstream changes: - v0.42.0 - * Remove 'offset' and 'negative_utc' for the TimestampWithTimezone constructor call - * Stop using the deprecated 'TimestampWithTimezone.offset' attribute - * Fix directory_add to actually insert the manifest + add directory_get_raw_manifest -- Software Heritage autobuilder (on jenkins-debian1) Tue, 25 Jan 2022 15:54:46 +0000 swh-storage (0.41.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.41.1 - (tagged by David Douard on 2022-01-04 15:46:41 +0100) * Upstream changes: - v0.41.1 -- Software Heritage autobuilder (on jenkins-debian1) Tue, 04 Jan 2022 14:55:33 +0000 swh-storage (0.41.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.41.0 - (tagged by Nicolas Dandrimont on 2021-12-22 15:24:31 +0100) * Upstream changes: - Release swh.storage v0.41.0 - add support for storing odd-shaped git objects enabled in swh.model 4.0.0 - drop workarounds for old versions of tenacity -- Software Heritage autobuilder (on jenkins-debian1) Wed, 22 Dec 2021 15:22:18 +0000 swh-storage (0.40.0-2~swh1) unstable-swh; urgency=medium * Update dependencies in d/control. -- David Douard Tue, 16 Nov 2021 14:37:20 +0100 swh-storage (0.40.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.40.0 - (tagged by David Douard on 2021-11-16 10:11:25 +0100) * Upstream changes: - v0.40.0 - Add support for a redis-based reporting for invalid mirrorred objects - Remove now useless fixers in storage/fixer.py - Add a new --type option to 'swh strorage repay' - Update extrinsic metadata specs -- Software Heritage autobuilder (on jenkins-debian1) Tue, 16 Nov 2021 09:45:39 +0000 swh-storage (0.39.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.39.0 - (tagged by Antoine Lambert on 2021-10-28 17:09:40 +0200) * Upstream changes: - version 0.39.0 -- Software Heritage autobuilder (on jenkins-debian1) Fri, 29 Oct 2021 09:17:36 +0000 swh-storage (0.38.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.38.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-10-11 13:28:27 +0200) * Upstream changes: - v0.38.0 - buffer: add some debug logging for number of objects sent - buffer: add a threshold for the estimated size of revision and release batches - buffer: add a threshold for the number of revision parents in one batch - buffer: add a threshold for the number of directory entries in one batch - filter: add filtering for release_add - filter: do not call the underlying functions if there's nothing to add - buffer: Ensure that we don't send data from empty buffers -- Software Heritage autobuilder (on jenkins-debian1) Mon, 11 Oct 2021 14:58:23 +0000 swh-storage (0.37.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.37.1 - (tagged by David Douard on 2021-09-29 11:31:33 +0200) * Upstream changes: - v0.37.1 -- Software Heritage autobuilder (on jenkins-debian1) Wed, 29 Sep 2021 10:12:31 +0000 swh-storage (0.37.0-1~swh2) unstable-swh; urgency=medium * Bump new release -- Antoine R. Dumont (@ardumont) Thu, 16 Sep 2021 09:51:51 +0200 swh-storage (0.37.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.37.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-09-15 18:44:19 +0200) * Upstream changes: - v0.37.0 - Allow filtering extids per extid_version/extid_type when reading - migrate_extrinsic_metadata: Fix edge cases (missing f- stringification, remaining pypi - issues, ...) - cassandra: Make directory_ls fetch contents in batch instead of one-by-one - cassandra: Add option to select (hopefully) more efficient batch insertion algos - cassandra: Remove stat_counters. - cassandra: generate statsd metrics on method calls - content_get: Fetch rows concurrently - directory_entry_add_batch: Remove the temporary prepared statement entirely - directory_entry_add_batch: Reduce churn of prepared statements - postgresql: Fix a column order mismatch between the query and object builder - Add counting storage proxy -- Software Heritage autobuilder (on jenkins-debian1) Thu, 16 Sep 2021 06:42:23 +0000 swh-storage (0.36.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.36.0 - (tagged by Vincent SELLIER on 2021-08-24 16:51:07 +0200) * Upstream changes: - v0.36.0 - changelog: - Add cvs as supported revision_type - Add test for origin_visit_get_latest in presence of mismatched id and date orders - cassandra: Bump next_visit_id when origin_visit_add is called by a replayer - cassandra: Make content_missing query in batches - backfill: add extra where clause to use the right index for extid requests -- Software Heritage autobuilder (on jenkins-debian1) Tue, 24 Aug 2021 15:01:32 +0000 swh-storage (0.35.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.35.1 - (tagged by Valentin Lorentz on 2021-08-20 11:53:33 +0200) * Upstream changes: - v0.35.1 - * cassandra: Fix crash when using _missing() functions with more than 100 ids with ScyllaDB. -- Software Heritage autobuilder (on jenkins-debian1) Fri, 20 Aug 2021 10:01:16 +0000 swh-storage (0.35.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.35.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-07-28 10:36:09 +0200) * Upstream changes: - v0.35.0 - Implement storage of the ExtID.extid_version field -- Software Heritage autobuilder (on jenkins-debian1) Wed, 28 Jul 2021 08:43:06 +0000 swh-storage (0.34.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.34.0 - (tagged by Vincent SELLIER on 2021-07-07 18:22:00 +0200) * Upstream changes: - v0.34.0 - cassandra: allow to configure the consistency level -- Software Heritage autobuilder (on jenkins-debian1) Wed, 07 Jul 2021 16:58:42 +0000 swh-storage (0.33.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.33.0 - (tagged by Valentin Lorentz on 2021-07-05 16:48:16 +0200) * Upstream changes: - v0.33.0 - * Add endpoint raw_extrinsic_metadata_get_authorities -- Software Heritage autobuilder (on jenkins-debian1) Mon, 05 Jul 2021 15:00:12 +0000 swh-storage (0.32.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.32.0 - (tagged by Vincent SELLIER on 2021-06-28 15:35:44 +0200) * Upstream changes: - v0.32.0 - * cassandra: Add support for non-ASCII origin 'URLs'. -- Software Heritage autobuilder (on jenkins-debian1) Mon, 28 Jun 2021 16:20:21 +0000 swh-storage (0.31.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.31.0 - (tagged by Valentin Lorentz on 2021-06-25 11:17:50 +0200) * Upstream changes: - v0.31.0 - * cassandra: Add partial support for ScyllaDB - * mypy: Fix errors with release >= v0.900 (but breaks older mypy versions) - * Add endpoints to access REMD by id -- Software Heritage autobuilder (on jenkins-debian1) Fri, 25 Jun 2021 09:26:09 +0000 swh-storage (0.30.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.30.1 - (tagged by Antoine R. Dumont (@ardumont) on 2021-05-21 10:09:02 +0200) * Upstream changes: - v0.30.1 - Finalize the config "local" deprecation in favor of "postgresql" - tests: Make test parameters order deterministic, so they don't crash pytest-xdist - test_cassandra: Improve error when the process is started but not listening -- Software Heritage autobuilder (on jenkins-debian1) Fri, 21 May 2021 08:22:33 +0000 swh-storage (0.30.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.30.0 - (tagged by David Douard on 2021-05-18 16:34:25 +0200) * Upstream changes: - v0.30.0 -- Software Heritage autobuilder (on jenkins-debian1) Tue, 18 May 2021 14:45:21 +0000 swh-storage (0.29.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.29.1 - (tagged by Nicolas Dandrimont on 2021-05-14 18:31:52 +0200) * Upstream changes: - Release swh.storage 0.29.1 - Add missing db migration -- Software Heritage autobuilder (on jenkins-debian1) Fri, 14 May 2021 16:59:42 +0000 swh-storage (0.29.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.29.0 - (tagged by Valentin Lorentz on 2021-05-11 15:04:58 +0200) * Upstream changes: - v0.29.0 - * Make the TenaciousProxyStorage retry when a single object add fails - * Move all proxy storages in swh/storage/proxies/ - * Deprecate the "local" storage cls in favor of "postgresql" - * cassandra: Add tests checking directory_add and snapshot_add are atomic. - * Add endpoint directory_get_entries, to quickly list a directory's entries - * content_get: Add support for queries by sha1_git -- Software Heritage autobuilder (on jenkins-debian1) Tue, 11 May 2021 13:12:42 +0000 swh-storage (0.28.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.28.0 - (tagged by Valentin Lorentz on 2021-05-06 15:52:03 +0200) * Upstream changes: - v0.28.0 - * Normalize all Storage.xxx_add() methods to return a summary - * cassandra: Add 'check_missing' option, to allow updating objects - * cassandra: Add a test of a 'complex' migration, with a PK update - * Add a new TenaciousProxyStorage - * Make postgresql's origin_add not raise an error in case of conflict - * Stop storing authority/fetcher metadata. - * tenacious: Document potential issues about objects being dropped - * Use swh.core 0.14 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 06 May 2021 14:06:51 +0000 swh-storage (0.27.4-1~swh1) unstable-swh; urgency=medium * New upstream release 0.27.4 - (tagged by Antoine Lambert on 2021-04-29 14:38:49 +0200) * Upstream changes: - version 0.27.4 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 29 Apr 2021 13:04:46 +0000 swh-storage (0.27.3-1~swh1) unstable-swh; urgency=medium * New upstream release 0.27.3 - (tagged by Antoine Lambert on 2021-04-09 14:59:36 +0200) * Upstream changes: - version 0.27.3 -- Software Heritage autobuilder (on jenkins-debian1) Fri, 09 Apr 2021 13:06:58 +0000 swh-storage (0.27.2-1~swh1) unstable-swh; urgency=medium * New upstream release 0.27.2 - (tagged by David Douard on 2021-04-07 15:06:41 +0200) * Upstream changes: - v0.27.2 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 08 Apr 2021 08:05:43 +0000 swh-storage (0.27.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.27.1 - (tagged by Valentin Lorentz on 2021-03-30 17:47:03 +0200) * Upstream changes: - v0.27.1 - * buffer: Add support for 'extid' -- Software Heritage autobuilder (on jenkins-debian1) Tue, 30 Mar 2021 15:59:01 +0000 swh-storage (0.27.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.27.0 - (tagged by Valentin Lorentz on 2021-03-29 14:33:24 +0200) * Upstream changes: - v0.27.0 - * origin_visit_status_add: Fix inconsistent/incorrect errors when type is None and visit is missing. - * extid: remove unicity on (extid_type, extid) and (target_type, target) -- Software Heritage autobuilder (on jenkins-debian1) Mon, 29 Mar 2021 12:44:14 +0000 swh-storage (0.26.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.26.0 - (tagged by Nicolas Dandrimont on 2021-03-22 14:44:35 +0100) * Upstream changes: - Release swh.storage v0.26.0 - Move raw_extrinsic_metadata deduplication to use a new id column. -- Software Heritage autobuilder (on jenkins-debian1) Mon, 22 Mar 2021 21:53:39 +0000 swh-storage (0.25.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.25.0 - (tagged by Antoine Lambert on 2021-03-18 13:55:10 +0100) * Upstream changes: - version 0.25.0 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 18 Mar 2021 13:02:02 +0000 swh-storage (0.24.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.24.1 - (tagged by Valentin Lorentz on 2021-03-04 23:32:36 +0100) * Upstream changes: - v0.24.1 - * tests: Drop hypothesis < 6 requirement - * Remove the remaining references to the deprecated SWHID class - * postgresql: Ensure a minimum limit for the snapshot branches query -- Software Heritage autobuilder (on jenkins-debian1) Thu, 04 Mar 2021 22:39:03 +0000 swh-storage (0.24.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.24.0 - (tagged by Valentin Lorentz on 2021-03-02 10:00:23 +0100) * Upstream changes: - v0.24.0 - * storage_tests: recompute ids when evolving RawExtrinsicMetadata objects. - * RawExtrinsicMetadata: update to use the API in swh-model 1.0.0 -- Software Heritage autobuilder (on jenkins-debian1) Tue, 02 Mar 2021 09:11:15 +0000 swh-storage (0.23.2-1~swh1) unstable-swh; urgency=medium * New upstream release 0.23.2 - (tagged by Antoine Lambert on 2021-02-19 11:47:03 +0100) * Upstream changes: - version 0.23.2 -- Software Heritage autobuilder (on jenkins-debian1) Fri, 19 Feb 2021 10:58:50 +0000 swh-storage (0.23.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.23.1 - (tagged by Antoine R. Dumont (@ardumont) on 2021-02-16 17:19:00 +0100) * Upstream changes: - v0.23.1 - Switch anonymized replayer test to use pytest parametrization -- Software Heritage autobuilder (on jenkins-debian1) Tue, 16 Feb 2021 16:28:25 +0000 swh-storage (0.23.0-1~swh2) unstable-swh; urgency=medium * Fix dependency issue -- Antoine R. Dumont (@ardumont) Tue, 16 Feb 2021 14:34:57 +0100 swh-storage (0.23.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.23.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-02-15 15:20:21 +0100) * Upstream changes: - v0.23.0 - storage: Refactor OriginVisitStatus instantiation - db: Unify sql joins on origin_visit_status using "USING" - storage.postgresql: Use origin_visit_status.type value as source - test_replay: Fix hang since confluent-kafka 1.6 release - postgresql: Fix dbversion() to return the max version instead of a random one. - buffer: ensure objects are flushed in topological order - Return an accurate summary from buffer's flush() method - buffer: add support for snapshots - buffer: add type annotations for tests -- Software Heritage autobuilder (on jenkins-debian1) Mon, 15 Feb 2021 14:39:04 +0000 swh-storage (0.22.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.22.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-02-03 12:09:29 +0100) * Upstream changes: - v0.22.0 - storage: Make origin_get_latest_visit_status return OriginVisitStatus - storage: Change origin_visit_status_get_random interface to return visit_status - Write introduction to swh-storage -- Software Heritage autobuilder (on jenkins-debian1) Wed, 03 Feb 2021 11:15:27 +0000 swh-storage (0.21.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.21.1 - (tagged by Vincent SELLIER on 2021-01-28 14:11:26 +0100) * Upstream changes: - v0.21.1 - * Correctly return origin_visit_status.type value everywhere -- Software Heritage autobuilder (on jenkins-debian1) Thu, 28 Jan 2021 13:19:24 +0000 swh-storage (0.21.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.21.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-01-20 15:42:40 +0100) * Upstream changes: - v0.21.0 - db: Allow new status values not_found, failed to OriginVisitStatus -- Software Heritage autobuilder (on jenkins-debian1) Wed, 20 Jan 2021 14:52:20 +0000 swh-storage (0.20.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.20.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-01-20 10:24:00 +0100) * Upstream changes: - v0.20.0 - storage: Add persistence of the field OriginVisitStatus.type - backfiller: Add type to the origin_visit_status topic - tests: Make test_content_add_race fail for the right reason. -- Software Heritage autobuilder (on jenkins-debian1) Wed, 20 Jan 2021 09:29:54 +0000 swh-storage (0.19.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.19.0 - (tagged by Vincent SELLIER on 2021-01-14 11:09:17 +0100) * Upstream changes: - v0.19.0 - * 2021-01-12 Adapt cassandra storage to ignore the new OriginVisitStatus.type field - * 2021- 01-08 Allow to use the JAVA_HOME environment for cassandra tests - * 2021-01-13 Enforce hypothesis <6 to prevent test breakage - * 2021-01-08 Make the CREATE_TABLES_QUERIES in cassandra/schema.py an explicit list - * 2020-12-18 Add a cli section in the doc - * 2020-11-24 storage.backfill: Allow cli run for origin_visit_status as well - * 2020-11-24 conftest: Reference swh.core.db.pytest_plugin -- Software Heritage autobuilder (on jenkins-debian1) Thu, 14 Jan 2021 10:18:31 +0000 swh-storage (0.18.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.18.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-11-23 14:46:41 +0100) * Upstream changes: - v0.18.0 - requirements-test.txt: Drop no longer needed pytest-postgresql requirement - backfill: Reverse flawed logic in SnapshotBranch generation - migrate_extrinsic_metadata: don't crash when deb revisions aren't referenced by any snapshot -- Software Heritage autobuilder (on jenkins-debian1) Mon, 23 Nov 2020 13:52:32 +0000 swh-storage (0.17.2-1~swh1) unstable-swh; urgency=medium * New upstream release 0.17.2 - (tagged by Nicolas Dandrimont on 2020-11-13 11:56:37 +0100) * Upstream changes: - Release swh.storage 0.17.2 - Future- proof get_journal_writer by setting the value_sanitizer argument - migrate_extrinsic_metadata improvements - backfill: only flush on every batch -- Software Heritage autobuilder (on jenkins-debian1) Fri, 13 Nov 2020 11:05:35 +0000 swh-storage (0.17.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.17.1 - (tagged by Antoine Lambert on 2020-11-05 13:50:35 +0100) * Upstream changes: - version 0.17.1 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 05 Nov 2020 12:56:53 +0000 swh-storage (0.17.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.17.0 - (tagged by Nicolas Dandrimont on 2020-11-03 18:09:53 +0100) * Upstream changes: - Release swh.storage v0.17.0 - Migrate all raw extrinsic metadata attributes from id to target - Add an `algos` function to resolve branch aliases - Prepare updates to make swh.journal more generic - Improve api server initialization - Various updates to the migrate_extrinsic_metadata script, notably writing - most metadata on directories instead of revisions -- Software Heritage autobuilder (on jenkins-debian1) Tue, 03 Nov 2020 17:20:45 +0000 swh-storage (0.16.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.16.0 - (tagged by Nicolas Dandrimont on 2020-10-09 18:23:24 +0200) * Upstream changes: - Release swh.storage v0.16.0 - Updates to the intrinsic metadata migration script - Various improvements to the buffer storage - Update swh storage backfill to use common configuration keys -- Software Heritage autobuilder (on jenkins-debian1) Fri, 09 Oct 2020 16:33:11 +0000 swh-storage (0.15.3-1~swh1) unstable-swh; urgency=medium * New upstream release 0.15.3 - (tagged by Nicolas Dandrimont on 2020-09-24 20:14:39 +0200) * Upstream changes: - Release swh.storage v0.15.3 - hopefully fix the documentation build -- Software Heritage autobuilder (on jenkins-debian1) Thu, 24 Sep 2020 18:24:14 +0000 swh-storage (0.15.2-1~swh1) unstable-swh; urgency=medium * New upstream release 0.15.2 - (tagged by Nicolas Dandrimont on 2020-09-24 19:22:11 +0200) * Upstream changes: - Release swh.storage v0.15.2 - no change rebuild to clean up jenkins fsckup accumulating old files. -- Software Heritage autobuilder (on jenkins-debian1) Thu, 24 Sep 2020 17:28:22 +0000 swh-storage (0.15.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.15.1 - (tagged by Nicolas Dandrimont on 2020-09-24 18:34:54 +0200) * Upstream changes: - Release swh.storage v0.15.1 - Restore buffer proxy behavior with default arguments -- Software Heritage autobuilder (on jenkins-debian1) Thu, 24 Sep 2020 16:44:22 +0000 swh-storage (0.15.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.15.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-09-24 16:54:07 +0200) * Upstream changes: - v0.15.0 - Support different database flavors in the SQL scripts - Add the SQL commands used to set up the logical replication publication - Output a warning when the version of the database is different than expected - Improve code quality and doc in BufferedProxyStorage - Adapt cli declaration entrypoint to swh.core 0.3 - Add warning about skipped_content (sneaking into the 'content' topics) - graph- replayer: fix to prevent wrong warning - pre-commit: Add isort hook and reorder imports with isort - pytest_plugin: Change dbname to storage to avoid clash in tests - pytest_plugin: Use psql to load SQL files instead of connecting with psycopg2 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 24 Sep 2020 15:03:58 +0000 swh-storage (0.14.3-1~swh1) unstable-swh; urgency=medium * New upstream release 0.14.3 - (tagged by David Douard on 2020-09-17 16:58:59 +0200) * Upstream changes: - v0.14.3 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 17 Sep 2020 16:53:56 +0000 swh-storage (0.14.2-1~swh1) unstable-swh; urgency=medium * New upstream release 0.14.2 - (tagged by David Douard on 2020-09-11 15:31:22 +0200) * Upstream changes: - v0.14.2 -- Software Heritage autobuilder (on jenkins-debian1) Fri, 11 Sep 2020 13:37:11 +0000 swh-storage (0.14.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.14.1 - (tagged by Antoine R. Dumont (@ardumont) on 2020-09-04 15:43:51 +0200) * Upstream changes: - v0.14.1 - algos.diff: Add missed revision_get conversion -- Software Heritage autobuilder (on jenkins-debian1) Fri, 04 Sep 2020 13:52:17 +0000 swh-storage (0.14.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.14.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-09-04 12:23:52 +0200) * Upstream changes: - v0.14.0 - Refactor revision_get storage API to return Revision objects - cassandra: Discard Content ctime field in content_get_partition -- Software Heritage autobuilder (on jenkins-debian1) Fri, 04 Sep 2020 10:59:54 +0000 swh-storage (0.13.3-1~swh1) unstable-swh; urgency=medium * New upstream release 0.13.3 - (tagged by Antoine R. Dumont (@ardumont) on 2020-09-01 14:34:57 +0200) * Upstream changes: - v0.13.3 - storage*: release_get(...) -> List[Optional[Release]] - Make StorageInterface a Protocol. - Add a validating storage proxy, to check ids before insertion. - Add a --check-config option for cli commands - Remove the deprecated config-path option from `swh storage rpc-serve` command - Add support for a new "check_config" config option in get_storage() - Check for db version mismatch in PgStorage.check_config() - Add a check_dbversion() method to the Db class - Fix pytest_plugin's database janitor: do not truncate the dbversion table - algos.snapshot: Add visits_and_snapshots_get_from_revision - storage/interface: Remove deprecated diff endpoints - storage_tests: Remove duplicated postgresql-specific tests. - Move postgresql-related files to swh/storage/postgresql/ -- Software Heritage autobuilder (on jenkins-debian1) Tue, 01 Sep 2020 12:40:29 +0000 swh-storage (0.13.2-1~swh2) unstable-swh; urgency=medium * Add mypy-extensions to build-dependencies -- Nicolas Dandrimont Fri, 21 Aug 2020 12:17:05 +0200 swh-storage (0.13.2-1~swh1) unstable-swh; urgency=medium * New upstream release 0.13.2 - (tagged by Valentin Lorentz on 2020-08-20 08:59:39 +0200) * Upstream changes: - v0.13.2 - * pg: Fix crash in snapshot_get when the snapshot does not exist. - * cassandra: fix signatures - * in_memory: rewrite as a backend for the cassandra storage - * remove endpoint snapshot_get_by_origin_visit. - * pg: rewrite converters to work with model objects -- Software Heritage autobuilder (on jenkins-debian1) Thu, 20 Aug 2020 07:18:50 +0000 swh-storage (0.13.1-1~swh3) unstable-swh; urgency=medium * Update dependencies -- Antoine R. Dumont (@ardumont) Fri, 07 Aug 2020 21:17:01 +0000 swh-storage (0.13.1-1~swh2) unstable-swh; urgency=medium * Update dependencies -- Antoine R. Dumont (@ardumont) Fri, 07 Aug 2020 21:02:01 +0000 swh-storage (0.13.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.13.1 - (tagged by Valentin Lorentz on 2020-08-07 18:14:32 +0200) * Upstream changes: - v0.13.1 - * Make snapshot_get_branches return a TypedDict containing SnapshotBranch objects. -- Software Heritage autobuilder (on jenkins-debian1) Fri, 07 Aug 2020 16:23:01 +0000 swh-storage (0.13.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.13.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-08-07 12:38:47 +0200) * Upstream changes: - v0.13.0 - storage*: Rename and type content_get(List[Sha1]) -> List[Optional[Content]] - storage*: Rename content_get_data(Sha1) -> Optional[bytes] - Simplify as Content.ctime None is popped out of a to_dict call in recent model - cassandra.storage: Use next token for pagination instead of computing it -- Software Heritage autobuilder (on jenkins-debian1) Fri, 07 Aug 2020 10:49:28 +0000 swh-storage (0.12.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.12.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-08-06 08:50:17 +0200) * Upstream changes: - v0.12.0 - Type storage endpoints - Drop content_get_range endpoint in favor of content_get_partition -- Software Heritage autobuilder (on jenkins-debian1) Thu, 06 Aug 2020 06:55:26 +0000 swh-storage (0.11.10-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.10 - (tagged by Antoine R. Dumont (@ardumont) on 2020-08-04 14:10:21 +0200) * Upstream changes: - v0.11.10 - tests: Improve coverage on directory_ls endpoints - storage*: Type content_find(...) -> List[Content] - storage*: Type {cnt,dir,rev,rel,snp}_get_random(...) -> Sha1Git -- Software Heritage autobuilder (on jenkins-debian1) Tue, 04 Aug 2020 12:15:21 +0000 swh-storage (0.11.9-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.9 - (tagged by Antoine R. Dumont (@ardumont) on 2020-08-03 11:55:10 +0200) * Upstream changes: - v0.11.9 - storage*: Drop origin-get- range in favor of origin-list - storage*: Do not allow unknown visit status in origin_visit*_get_latest - storage*: Add type annotation to origin_count - Reuse swh.core stream_results function -- Software Heritage autobuilder (on jenkins-debian1) Mon, 03 Aug 2020 10:02:56 +0000 swh-storage (0.11.8-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.8 - (tagged by Valentin Lorentz on 2020-07-31 14:57:09 +0200) * Upstream changes: - v0.11.8 - * test_replay: update for swh.journal 0.4.1. - * Add support for metadata-related object types to the backfiller and replayer. - * pg: Rewrite _origin_query to force the query planner to filter on URLs before filtering on visits. - * Make raw_extrinsic_metadata_get return PagedResult instead of Dict. - * Rename argument 'object_type' of raw_extrinsic_metadata_get to 'type'. -- Software Heritage autobuilder (on jenkins-debian1) Fri, 31 Jul 2020 13:17:40 +0000 swh-storage (0.11.6-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.6 - (tagged by Antoine R. Dumont (@ardumont) on 2020-07-30 16:20:48 +0200) * Upstream changes: - v0.11.6 - storage*: Adapt origin_list(...) -> PagedResult[Origin] - algos.snapshot: Open snapshot_id_get_from_revision - storage*: add origin_visit_status_get(...) -> PagedResult[OriginVisitStatus] - Add type annotations on get_storage. - buffer: Pass lists to backend functions, not iterables. - storage*: Simplify next-page- token computation - filter: Fix types passed to the proxied storage. - Fix upcoming type warning with swh.core > v0.1.2. - Make API endpoints take Lists instead of Iterables as arguments - storage*: use an enum to explicit the order in origin_visit_get - storage*: origin_visit_get(...) -> PagedResult[OriginVisit] - Write metadata + metadata authorities/fetchers to the journal. -- Software Heritage autobuilder (on jenkins-debian1) Thu, 30 Jul 2020 14:29:10 +0000 swh-storage (0.11.5-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.5 - (tagged by Valentin Lorentz on 2020-07-28 09:55:34 +0200) * Upstream changes: - v0.11.5 - in_memory: fix tie-breaking when two visits have the same date. -- Software Heritage autobuilder (on jenkins-debian1) Tue, 28 Jul 2020 08:10:21 +0000 swh-storage (0.11.4-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.4 - (tagged by Antoine R. Dumont (@ardumont) on 2020-07-27 16:08:42 +0200) * Upstream changes: - v0.11.4 - Rename object_metadata to raw_extrinsic_metadata - metadata_{authority,fetcher}_add: Fix crash when the iterable argument is empty - storage*: origin_visit_get_by -> Optional[OriginVisit] - storage*: origin_visit_find_by_date -> Optional[OriginVisit] - storage*: type origin_visit_get_latest endpoint result - algos.origin: Simplify origin_get_latest_visit_status function -- Software Heritage autobuilder (on jenkins-debian1) Mon, 27 Jul 2020 14:16:18 +0000 swh-storage (0.11.3-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.3 - (tagged by Antoine R. Dumont (@ardumont) on 2020-07-27 08:01:03 +0200) * Upstream changes: - v0.11.3 - storage*: origin_get(Iterable[str]) -> Iterable[Optional[Origin]] - storage*.origin_visit_get_random: Read model objects -- Software Heritage autobuilder (on jenkins-debian1) Mon, 27 Jul 2020 06:08:55 +0000 swh-storage (0.11.2-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.2 - (tagged by Antoine R. Dumont (@ardumont) on 2020-07-23 12:09:51 +0200) * Upstream changes: - v0.11.2 - pgstorage: Drop unnecessary indirection from reading origin_visit - pytest-plugin: Make sample_data return data model objects - tests: Use only model objects for testing - Drop validate storage proxy -- Software Heritage autobuilder (on jenkins-debian1) Thu, 23 Jul 2020 10:18:15 +0000 swh-storage (0.11.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.1 - (tagged by Valentin Lorentz on 2020-07-20 13:01:20 +0200) * Upstream changes: - v0.11.1 - * Use model objects in tests - * Rename 'deposit' authority type to 'deposit_client'. -- Software Heritage autobuilder (on jenkins-debian1) Mon, 20 Jul 2020 11:14:39 +0000 swh-storage (0.11.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.11.0 - (tagged by Valentin Lorentz on 2020-07-20 11:01:10 +0200) * Upstream changes: - v0.11.0 - * Make metadata-related endpoints consistent with other endpoints by using Iterables of swh- model objects instead of a dict. - * Update tests to use model objects -- Software Heritage autobuilder (on jenkins-debian1) Mon, 20 Jul 2020 09:12:25 +0000 swh-storage (0.10.6-1~swh1) unstable-swh; urgency=medium * New upstream release 0.10.6 - (tagged by Antoine R. Dumont (@ardumont) on 2020-07-16 15:31:19 +0200) * Upstream changes: - v0.10.6 - pytest_plugin: Ensure fixture instantiates correctly -- Software Heritage autobuilder (on jenkins-debian1) Thu, 16 Jul 2020 13:36:34 +0000 swh-storage (0.10.5-1~swh1) unstable-swh; urgency=medium * New upstream release 0.10.5 - (tagged by Antoine R. Dumont (@ardumont) on 2020-07-16 14:24:50 +0200) * Upstream changes: - v0.10.5 - pytest_plugin: Do not expose the validate proxy storage - pytest-plugin: Expose a sample_data_model fixture - tests: Start using model objects and drop validate proxy when possible -- Software Heritage autobuilder (on jenkins-debian1) Thu, 16 Jul 2020 12:34:44 +0000 swh-storage (0.10.4-1~swh1) unstable-swh; urgency=medium * New upstream release 0.10.4 - (tagged by Antoine R. Dumont (@ardumont) on 2020-07-16 11:25:25 +0200) * Upstream changes: - v0.10.4 - pytest_plugin: Avoid fixture client to declare optional dependency - Allow cassandra binary path to be configured through env variable - 158: Make schema and migration converge so the migration works -- Software Heritage autobuilder (on jenkins-debian1) Thu, 16 Jul 2020 09:37:24 +0000 swh-storage (0.10.3-1~swh1) unstable-swh; urgency=medium * New upstream release 0.10.3 - (tagged by Antoine Lambert on 2020-07-10 16:26:27 +0200) * Upstream changes: - version 0.10.3 -- Software Heritage autobuilder (on jenkins-debian1) Fri, 10 Jul 2020 14:40:28 +0000 swh-storage (0.10.2-1~swh2) unstable-swh; urgency=medium * Fix debian rules to avoid double pytest-plugin loading clash -- Antoine R. Dumont (@ardumont) Fri, 10 Jul 2020 09:21:14 +0200 swh-storage (0.10.2-1~swh1) unstable-swh; urgency=medium * New upstream release 0.10.2 - (tagged by Antoine R. Dumont (@ardumont) on 2020-07-10 08:30:37 +0200) * Upstream changes: - v0.10.2 - tests: Do no expose the pytest- plugin through setuptools entry - Convert ImmutableDict to dict before passing it to json.dumps - docs: Rework dia -> pdf pipeline for inkscape 1.0 -- Software Heritage autobuilder (on jenkins-debian1) Fri, 10 Jul 2020 06:52:42 +0000 swh-storage (0.10.1-1~swh2) unstable-swh; urgency=medium * Update runtime dependencies -- Antoine R. Dumont (@ardumont) Wed, 08 Jul 2020 14:56:01 +0200 swh-storage (0.10.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.10.1 - (tagged by Antoine R. Dumont (@ardumont) on 2020-07-08 14:32:52 +0200) * Upstream changes: - v0.10.1 - extract-pytest-fixture Move shareable fixtures out of conftest into a dedicated pytest plugin - Migrate from vcversioner to setuptools-scm -- Software Heritage autobuilder (on jenkins-debian1) Wed, 08 Jul 2020 12:39:15 +0000 swh-storage (0.10.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.10.0 - (tagged by David Douard on 2020-07-08 09:20:49 +0200) * Upstream changes: - v0.10.0 -- Software Heritage autobuilder (on jenkins-debian1) Wed, 08 Jul 2020 10:11:09 +0000 swh-storage (0.9.3-1~swh1) unstable-swh; urgency=medium * New upstream release 0.9.3 - (tagged by Antoine R. Dumont (@ardumont) on 2020-07-06 09:55:56 +0200) * Upstream changes: - v0.9.3 - storage: Send metrics from the origin_add endpoint -- Software Heritage autobuilder (on jenkins-debian1) Mon, 06 Jul 2020 08:06:13 +0000 swh-storage (0.9.2-1~swh1) unstable-swh; urgency=medium * New upstream release 0.9.2 - (tagged by Antoine R. Dumont (@ardumont) on 2020-07-03 18:48:39 +0200) * Upstream changes: - v0.9.2 - pg-storage: Add missing cur parameter passing -- Software Heritage autobuilder (on jenkins-debian1) Fri, 03 Jul 2020 16:54:13 +0000 swh-storage (0.9.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.9.1 - (tagged by Antoine R. Dumont (@ardumont) on 2020-07-03 16:50:45 +0200) * Upstream changes: - v0.9.1 - storage.db: Drop db.origin_visit_upsert behavior -- Software Heritage autobuilder (on jenkins-debian1) Fri, 03 Jul 2020 15:00:32 +0000 swh-storage (0.9.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.9.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-07-01 09:53:34 +0200) * Upstream changes: - v0.9.0 - storage*: Drop intermediary conversion step into OriginVisit - pg: use 'on conflict do nothing' strategy for duplicate metadata rows. - Make the code location of metadata endpoints consistent across backends. - Add content_metadata_{add,get}. - Add context columns to object_metadata table and object_metadata_{add,get}. - Generalize origin_metadata to allow support for other object types in the future. - Work around the segmentation faults caused by pytest-coverage + multiprocessing. -- Software Heritage autobuilder (on jenkins-debian1) Wed, 01 Jul 2020 08:02:08 +0000 swh-storage (0.8.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.8.1 - (tagged by David Douard on 2020-06-30 10:08:21 +0200) * Upstream changes: - v0.8.1 -- Software Heritage autobuilder (on jenkins-debian1) Tue, 30 Jun 2020 08:36:45 +0000 swh-storage (0.8.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.8.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-06-29 09:33:12 +0200) * Upstream changes: - v0.8.0 - Iterate over paginated visits in batches to retrieve latest visit/snapshot - storage*: Open order parameter to origin-visit-get endpoint - tests/replayer/storage*: Drop obsolete origin visit fields - Relax checks on journal writes regarding origin-visit* - replayer: Fix isoformat datetime string for origin-visit - Deprecate the origin_add_one() endpoint - test_storage: Add missing tests on origin_visit_get method -- Software Heritage autobuilder (on jenkins-debian1) Mon, 29 Jun 2020 07:44:00 +0000 swh-storage (0.7.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.7.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-06-22 15:42:25 +0200) * Upstream changes: - v0.7.0 - test_origin: Rename appropriately tests - algos: Improve origin visit get latest visit status algorithm - test_snapshot: Do not use origin_visit_add returned result - algos.snapshot: Fix edge case when snapshot is not resolved - Ensure ids are correct in tests' storage_data - Fix tests' storage_data revisions - SQL: replace the hash(url) index by a unique btree(url) on the origin table - Make sure the pagination in swh_snapshot_get_by_id uses the proper indexes -- Software Heritage autobuilder (on jenkins-debian1) Mon, 22 Jun 2020 14:09:33 +0000 swh-storage (0.6.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.6.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-06-19 11:29:42 +0200) * Upstream changes: - v0.6.0 - Move deprecated endpoint snapshot_get_latest from api endpoint to algos - algos.origin: Open origin-get-latest-visit-status function - storage*: Allow origin-visit-get-latest to filter on type - test_origin: Align storage initialization within tests -- Software Heritage autobuilder (on jenkins-debian1) Fri, 19 Jun 2020 12:45:32 +0000 swh-storage (0.5.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.5.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-06-17 16:03:15 +0200) * Upstream changes: - v0.5.0 - test_storage: Fix flakiness in round to milliseconds test util method - storage*: Add origin- visit-status-get-latest endpoint - Fix/update the backfiller - validate: accept model objects as well as dicts on all add endpoints - cql: Fix blackified strings - storage: Add missing cur parameter - Fix db_to_author() converter to return None is all fields are None -- Software Heritage autobuilder (on jenkins-debian1) Wed, 17 Jun 2020 14:19:37 +0000 swh-storage (0.4.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.4.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-06-16 09:50:25 +0200) * Upstream changes: - v0.4.0 - ardumont/master storage*: Drop leftover code - storage*: Drop origin_visit_upsert endpoint - storage*: Remove origin-visit-update endpoint - replay: Replay origin-visit and origin-visit-status - in_memory: Make origin- visit-status-add respect "on conflict ignore" policy - test_storage: Add journal behavior coverage for origin-visit-*add - Start migrating the validate proxy toward using BaseModel objects - storage*: Do not write twice origin-visit-status in journal -- Software Heritage autobuilder (on jenkins-debian1) Tue, 16 Jun 2020 07:58:23 +0000 swh-storage (0.3.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.3.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-06-12 09:08:23 +0200) * Upstream changes: - v0.3.0 - origin-visit-add storage*: Align origin-visit-add to take iterable of OriginVisit objects -- Software Heritage autobuilder (on jenkins-debian1) Fri, 12 Jun 2020 07:22:03 +0000 swh-storage (0.2.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.2.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-06-10 11:51:30 +0200) * Upstream changes: - v0.2.0 - origin-visit-upsert: Write visit status objects to the journal - origin-visit-update: Write visit status objects to the journal - origin-visit-add: Write visit status to the journal - Add pagination to origin_metadata_get. - Deduplicate origin-metadata when they have the same authority + discovery_date + fetcher. - Open `origin_visit_status_add` endpoint to add origin visit statuses - Add a replayer test for anonymized journal topics - Small refactoring of the InMemoryStorage to make it more consistent -- Software Heritage autobuilder (on jenkins-debian1) Wed, 10 Jun 2020 10:02:45 +0000 swh-storage (0.1.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.1.1 - (tagged by Nicolas Dandrimont on 2020-06-04 16:49:22 +0200) * Upstream changes: - Release swh.storage v0.1.1 - Work around tests hanging during Debian build -- Software Heritage autobuilder (on jenkins-debian1) Thu, 04 Jun 2020 14:56:54 +0000 swh-storage (0.1.0-2~swh1) unstable-swh; urgency=medium * Update dependencies. -- David Douard Thu, 04 Jun 2020 13:40:52 +0200 swh-storage (0.1.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.1.0 - (tagged by David Douard on 2020-06-04 12:08:46 +0200) * Upstream changes: - v0.1.0 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 04 Jun 2020 10:28:43 +0000 swh-storage (0.0.193-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.193 - (tagged by Antoine R. Dumont (@ardumont) on 2020-05-28 14:28:54 +0200) * Upstream changes: - v0.0.193 - pg: Write origin visit updates & status, read from origin_visit_status - Make content.blake2s256 not null. - Remove unused SQL functions. - README: Update necessary dependencies for test purposes - Add a pre-commit hook to check there are version bumps in sql/upgrades/*.sql - Add missing dbversion bump in 150.sql. - Add artifact metadata to the extrinsic metadata storage specification. - Add not null constraints to metadata_authority/origin_metadata - Realign schema with latest 149 migration script -- Software Heritage autobuilder (on jenkins-debian1) Thu, 28 May 2020 12:37:58 +0000 swh-storage (0.0.192-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.192 - (tagged by Valentin Lorentz on 2020-05-19 18:42:00 +0200) * Upstream changes: - v0.0.192 - * origin_metadata_add: Reject non-bytes types for 'metadata'. -- Software Heritage autobuilder (on jenkins-debian1) Tue, 19 May 2020 16:54:00 +0000 swh-storage (0.0.191-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.191 - (tagged by Valentin Lorentz on 2020-05-19 13:43:35 +0200) * Upstream changes: - v0.0.191 - * Implement the new extrinsic metadata specification/vocabulary. -- Software Heritage autobuilder (on jenkins-debian1) Tue, 19 May 2020 11:52:00 +0000 swh-storage (0.0.190-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.190 - (tagged by Antoine R. Dumont (@ardumont) on 2020-05-18 14:10:39 +0200) * Upstream changes: - v0.0.190 - storage: metadata_provider: Ensure idempotency when creating provider - journal: add a skipped_content topic dedicated to SkippedContent objects - Add missing return annotations on JournalWriter methods - Improve a bit the exception message of JournalWriter.content_update - Refactor the JournalWriter class to normalize its methods - tests: fix test_replay; do only use aware datetime objects - test_kafka_writer: Add missing object type skipped_content -- Software Heritage autobuilder (on jenkins-debian1) Mon, 18 May 2020 12:18:09 +0000 swh-storage (0.0.189-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.189 - (tagged by Antoine R. Dumont (@ardumont) on 2020-04-30 14:50:54 +0200) * Upstream changes: - v0.0.189 - pg: Write both origin visit updates & status, read from origin_visit - pg-storage: Add new created state - setup.py: add documentation link - metadata spec: Fix title hierarchy - tests: Use aware datetimes instead of naive ones. - cassandra: Adapt internal implementations to use origin visit status - in_memory: Adapt internal implementations to use origin visit status -- Software Heritage autobuilder (on jenkins-debian1) Thu, 30 Apr 2020 12:58:57 +0000 swh-storage (0.0.188-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.188 - (tagged by David Douard on 2020-04-28 13:44:20 +0200) * Upstream changes: - v0.0.188 -- Software Heritage autobuilder (on jenkins-debian1) Tue, 28 Apr 2020 11:52:08 +0000 swh-storage (0.0.187-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.187 - (tagged by Antoine R. Dumont (@ardumont) on 2020-04-14 18:13:08 +0200) * Upstream changes: - v0.0.187 - storage.interface: Actually define the remote flush operation -- Software Heritage autobuilder (on jenkins-debian1) Tue, 14 Apr 2020 16:23:41 +0000 swh-storage (0.0.186-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.186 - (tagged by Nicolas Dandrimont on 2020-04-14 17:09:22 +0200) * Upstream changes: - Release swh.storage v0.0.186 - Drop backwards-compatibility code with swh.journal < 0.0.30 -- Software Heritage autobuilder (on jenkins-debian1) Tue, 14 Apr 2020 15:20:57 +0000 swh-storage (0.0.185-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.185 - (tagged by Antoine R. Dumont (@ardumont) on 2020-04-14 14:15:32 +0200) * Upstream changes: - v0.0.185 - storage.filter: Remove internal state - test: update storage tests to (future) swh.journal 0.0.30 -- Software Heritage autobuilder (on jenkins-debian1) Tue, 14 Apr 2020 12:22:06 +0000 swh-storage (0.0.184-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.184 - (tagged by Antoine R. Dumont (@ardumont) on 2020-04-10 16:07:32 +0200) * Upstream changes: - v0.0.184 - storage*: Add flush endpoints to storage implems (backend, proxy) - test_retry: Add missing skipped_content_add tests -- Software Heritage autobuilder (on jenkins-debian1) Fri, 10 Apr 2020 14:14:20 +0000 swh-storage (0.0.183-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.183 - (tagged by Antoine R. Dumont (@ardumont) on 2020-04-09 12:35:53 +0200) * Upstream changes: - v0.0.183 - proxy storage: Add a clear_buffers endpoint - buffer proxy storage: Filter out duplicate objects prior to storage write - storage: Prevent erroneous HashCollisions by using the same ctime for all rows. - Enable black - origin_visit_update: ensure it raises a StorageArgumentException - Adapt cassandra backend to validating model types - tests: many refactoring improvements - tests: Shut down cassandra connection before closing the fixture down - Add more type annotations -- Software Heritage autobuilder (on jenkins-debian1) Thu, 09 Apr 2020 10:46:29 +0000 swh-storage (0.0.182-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.182 - (tagged by Antoine R. Dumont (@ardumont) on 2020-03-27 07:02:13 +0100) * Upstream changes: - v0.0.182 - storage*: Update origin_visit_update to make status parameter mandatory - test: Adapt origin validation test according to latest model changes - Respec discovery_date as a Python datetime instead of an ISO string. - origin_visit_add: Add missing db/cur argument to call to origin_get. -- Software Heritage autobuilder (on jenkins-debian1) Fri, 27 Mar 2020 06:13:17 +0000 swh-storage (0.0.181-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.181 - (tagged by Antoine R. Dumont (@ardumont) on 2020-03-25 09:50:49 +0100) * Upstream changes: - v0.0.181 - storage*: Hex encode content hashes in HashCollision exception - Add format of discovery_date in the metadata specification. - Store the value of token(partition_key) in skipped_content_by_* table, instead of three hashes. - Store the value of token(partition_key) in content_by_* table, instead of three hashes. -- Software Heritage autobuilder (on jenkins-debian1) Wed, 25 Mar 2020 09:03:43 +0000 swh-storage (0.0.180-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.180 - (tagged by Nicolas Dandrimont on 2020-03-18 18:24:41 +0100) * Upstream changes: - Release swh.storage v0.0.180 - Stop counting origin additions multiple times in statsd -- Software Heritage autobuilder (on jenkins-debian1) Wed, 18 Mar 2020 17:45:36 +0000 swh-storage (0.0.179-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.179 - (tagged by Nicolas Dandrimont on 2020-03-18 16:05:13 +0100) * Upstream changes: - Release swh.storage v0.0.179. - fix requirements-swh.txt to use proper version restriction - reduce the transaction load for content writes and reads -- Software Heritage autobuilder (on jenkins-debian1) Wed, 18 Mar 2020 15:50:50 +0000 swh-storage (0.0.178-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.178 - (tagged by Antoine R. Dumont (@ardumont) on 2020-03-16 12:51:28 +0100) * Upstream changes: - v0.0.178 - origin_visit_add: Adapt endpoint signature to return OriginVisit - origin_visit_upsert: Use OriginVisit object as input - storage/writer: refactor JournalWriter.content_add to send model objects -- Software Heritage autobuilder (on jenkins-debian1) Mon, 16 Mar 2020 11:59:18 +0000 swh-storage (0.0.177-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.177 - (tagged by Antoine R. Dumont (@ardumont) on 2020-03-10 11:37:33 +0100) * Upstream changes: - v0.0.177 - storage: Identify and provide the collision hashes in exception - Guarantee the order of results for revision_get and release_get - tests: Improve test speed - sql: do not attempt to create the plpgsql lang if already exists - Update requirement on swh.core for RPCClient method overrides -- Software Heritage autobuilder (on jenkins-debian1) Tue, 10 Mar 2020 10:48:11 +0000 swh-storage (0.0.176-1~swh2) unstable-swh; urgency=medium * Update build dependencies -- Antoine R. Dumont (@ardumont) Mon, 02 Mar 2020 14:36:00 +0100 swh-storage (0.0.176-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.176 - (tagged by Valentin Lorentz on 2020-02-28 14:44:10 +0100) * Upstream changes: - v0.0.176 - * Accept cassandra-driver >= 3.22. - * Make the RPC client and objstorage helper fetch Content.data from lazy - contents. - * Move ctime out of the validation proxy. -- Software Heritage autobuilder (on jenkins-debian1) Fri, 28 Feb 2020 15:21:27 +0000 swh-storage (0.0.175-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.175 - (tagged by Antoine Lambert on 2020-02-20 13:51:40 +0100) * Upstream changes: - version 0.0.175 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 20 Feb 2020 13:18:34 +0000 swh-storage (0.0.174-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.174 - (tagged by Valentin Lorentz on 2020-02-19 14:18:59 +0100) * Upstream changes: - v0.0.174 - * Fix inconsistent behavior of skipped_content_missing across backends. - * Fix FilteringProxy to not drop skipped-contents with a missing sha1_git. - * Make storage proxies use swh-model objects instead of dicts. - * Add support for (de)serializing swh-model in RPC calls. -- Software Heritage autobuilder (on jenkins-debian1) Wed, 19 Feb 2020 15:00:32 +0000 swh-storage (0.0.172-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.172 - (tagged by Valentin Lorentz on 2020-02-12 14:00:04 +0100) * Upstream changes: - v0.0.172 - * Unify exception raised by invalid input to API endpoints. - * Add a validation proxy for _add() methods. This proxy is *required* - in front of all backends whose _add() methods may be called or they'll - crash at runtime. - * Fix RecursionError when storage proxies are deepcopied or unpickled. - * storages: Refactor objstorage operations with a dedicated collaborator - * storages: Refactor journal operations with a dedicated writer collab -- Software Heritage autobuilder (on jenkins-debian1) Wed, 12 Feb 2020 13:13:47 +0000 swh-storage (0.0.171-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.171 - (tagged by Valentin Lorentz on 2020-02-06 14:46:05 +0100) * Upstream changes: - v0.0.171 - * Split 'content_add' method into 'content_add' and 'skipped_content_add'. - * Increase Cassandra requests timeout to 1 second. -- Software Heritage autobuilder (on jenkins-debian1) Thu, 06 Feb 2020 14:07:37 +0000 swh-storage (0.0.170-1~swh3) unstable-swh; urgency=medium * Update build dependencies -- Antoine R. Dumont (@ardumont) Mon, 03 Feb 2020 17:30:38 +0100 swh-storage (0.0.170-1~swh2) unstable-swh; urgency=medium * Update build dependencies -- Antoine R. Dumont (@ardumont) Mon, 03 Feb 2020 16:00:39 +0100 swh-storage (0.0.170-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.170 - (tagged by Antoine R. Dumont (@ardumont) on 2020-02-03 14:11:53 +0100) * Upstream changes: - v0.0.170 - swh.storage.cassandra: Add Cassandra backend implementation -- Software Heritage autobuilder (on jenkins-debian1) Mon, 03 Feb 2020 13:23:48 +0000 swh-storage (0.0.169-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.169 - (tagged by Antoine R. Dumont (@ardumont) on 2020-01-30 13:40:00 +0100) * Upstream changes: - v0.0.169 - retry: Add retry behavior on pipeline storage with flushing failure -- Software Heritage autobuilder (on jenkins-debian1) Thu, 30 Jan 2020 13:26:23 +0000 swh-storage (0.0.168-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.168 - (tagged by Valentin Lorentz on 2020-01-30 11:19:31 +0100) * Upstream changes: - v0.0.168 - * Implement content_update for the in-mem storage. - * Remove cur/db arguments from the in- mem storage. - * Move Storage documentation and endpoint paths to a new StorageInterface class - * Rename in_memory.Storage to in_memory.InMemoryStorage. - * CONTRIBUTORS: add Daniele Serafini -- Software Heritage autobuilder (on jenkins-debian1) Thu, 30 Jan 2020 10:25:30 +0000 swh-storage (0.0.167-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.167 - (tagged by Antoine R. Dumont (@ardumont) on 2020-01-24 14:55:57 +0100) * Upstream changes: - v0.0.167 - pgstorage: Empty temp tables instead of dropping them -- Software Heritage autobuilder (on jenkins-debian1) Fri, 24 Jan 2020 14:01:57 +0000 swh-storage (0.0.166-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.166 - (tagged by Antoine R. Dumont (@ardumont) on 2020-01-24 09:51:52 +0100) * Upstream changes: - v0.0.166 - storage: Add endpoint to get missing content (by sha1_git) and missing snapshot - Remove redundant config checks in load_and_check_config - Remove 'id' and 'object_id' from the output of object_find_by_sha1_git - Make origin_visit_get_random return None instead of {} if there are no results - docs: Fix sphinx warnings -- Software Heritage autobuilder (on jenkins-debian1) Fri, 24 Jan 2020 09:00:12 +0000 swh-storage (0.0.165-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.165 - (tagged by Antoine R. Dumont (@ardumont) on 2020-01-17 14:04:53 +0100) * Upstream changes: - v0.0.165 - storage.retry: Fix objects loading when using generator parameters -- Software Heritage autobuilder (on jenkins-debian1) Fri, 17 Jan 2020 13:09:39 +0000 swh-storage (0.0.164-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.164 - (tagged by Antoine Lambert on 2020-01-16 17:54:40 +0100) * Upstream changes: - version 0.0.164 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 16 Jan 2020 17:05:02 +0000 swh-storage (0.0.163-1~swh2) unstable-swh; urgency=medium * Fix test dependency -- Antoine R. Dumont (@ardumont) Tue, 14 Jan 2020 17:26:08 +0100 swh-storage (0.0.163-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.163 - (tagged by Antoine R. Dumont (@ardumont) on 2020-01-14 17:12:03 +0100) * Upstream changes: - v0.0.163 - retry: Improve proxy storage for add endpoints - in_memory: Make directory_get_random return None when storage empty - storage: Change content_get_metadata api to return Dict[bytes, List[Dict]] - storage: Add content_get_partition endpoint to replace content_get_range - storage: Add endpoint origin_list to replace origin_get_range -- Software Heritage autobuilder (on jenkins-debian1) Tue, 14 Jan 2020 16:17:45 +0000 swh-storage (0.0.162-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.162 - (tagged by Valentin Lorentz on 2019-12-16 14:37:44 +0100) * Upstream changes: - v0.0.162 - Add {content,directory,revision,release,snapshot}_get_random. -- Software Heritage autobuilder (on jenkins-debian1) Mon, 16 Dec 2019 13:41:39 +0000 swh-storage (0.0.161-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.161 - (tagged by Antoine R. Dumont (@ardumont) on 2019-12-10 15:03:28 +0100) * Upstream changes: - v0.0.161 - storage: Add endpoint to randomly pick an origin -- Software Heritage autobuilder (on jenkins-debian1) Tue, 10 Dec 2019 14:08:15 +0000 swh-storage (0.0.160-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.160 - (tagged by Antoine R. Dumont (@ardumont) on 2019-12-06 11:15:48 +0100) * Upstream changes: - v0.0.160 - storage.buffer: Buffer release objects as well - storage.tests: Unify tests sample data - Implement origin lookup by sha1 -- Software Heritage autobuilder (on jenkins-debian1) Fri, 06 Dec 2019 10:23:44 +0000 swh-storage (0.0.159-1~swh2) unstable-swh; urgency=medium * Force fast hypothesis profile when running tests -- Antoine R. Dumont (@ardumont) Tue, 26 Nov 2019 17:08:16 +0100 swh-storage (0.0.159-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.159 - (tagged by Antoine R. Dumont (@ardumont) on 2019-11-22 11:05:41 +0100) * Upstream changes: - v0.0.159 - Add 'pipeline' storage "class" for more readable configurations. - tests: Improve tests environments configuration - Fix a few typos reported by codespell - Add a pre-commit-hooks.yaml config file - Remove utils/(dump|fix)_revisions scripts -- Software Heritage autobuilder (on jenkins-debian1) Fri, 22 Nov 2019 10:10:31 +0000 swh-storage (0.0.158-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.158 - (tagged by Antoine R. Dumont (@ardumont) on 2019-11-14 13:33:00 +0100) * Upstream changes: - v0.0.158 - Drop schemata module (migrated back to swh-lister) -- Software Heritage autobuilder (on jenkins-debian1) Thu, 14 Nov 2019 12:37:18 +0000 swh-storage (0.0.157-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.157 - (tagged by Nicolas Dandrimont on 2019-11-13 13:22:39 +0100) * Upstream changes: - Release swh.storage 0.0.157 - schemata.distribution: Fix bogus NotImplementedError on Area.index_uris -- Software Heritage autobuilder (on jenkins-debian1) Wed, 13 Nov 2019 12:27:07 +0000 swh-storage (0.0.156-1~swh2) unstable-swh; urgency=medium * Add version constraint on psycopg2 -- Nicolas Dandrimont Wed, 30 Oct 2019 18:21:34 +0100 swh-storage (0.0.156-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.156 - (tagged by Valentin Lorentz on 2019-10-30 15:12:10 +0100) * Upstream changes: - v0.0.156 - * Stop supporting origin ids in API (except in origin_get_range). - * Make visit['origin'] a string everywhere (instead of a dict). -- Software Heritage autobuilder (on jenkins-debian1) Wed, 30 Oct 2019 14:29:28 +0000 swh-storage (0.0.155-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.155 - (tagged by David Douard on 2019-10-30 12:14:14 +0100) * Upstream changes: - v0.0.155 -- Software Heritage autobuilder (on jenkins-debian1) Wed, 30 Oct 2019 11:18:37 +0000 swh-storage (0.0.154-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.154 - (tagged by Antoine R. Dumont (@ardumont) on 2019-10-17 13:47:57 +0200) * Upstream changes: - v0.0.154 - Fix tests in debian build -- Software Heritage autobuilder (on jenkins-debian1) Thu, 17 Oct 2019 11:52:46 +0000 swh-storage (0.0.153-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.153 - (tagged by Antoine R. Dumont (@ardumont) on 2019-10-17 13:21:00 +0200) * Upstream changes: - v0.0.153 - Deploy new test fixture -- Software Heritage autobuilder (on jenkins-debian1) Thu, 17 Oct 2019 11:26:12 +0000 swh-storage (0.0.152-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.152 - (tagged by Antoine R. Dumont (@ardumont) on 2019-10-08 16:55:43 +0200) * Upstream changes: - v0.0.152 - swh.storage.buffer: Add buffering proxy storage implementation - swh.storage.filter: Add filtering storage implementation - swh.storage.tests: Improve db transaction handling - swh.storage.tests: Add more tests - swh.storage.storage: introduce a db() context manager -- Software Heritage autobuilder (on jenkins-debian1) Tue, 08 Oct 2019 15:03:16 +0000 swh-storage (0.0.151-1~swh2) unstable-swh; urgency=medium * Add missing build-dependency on python3-swh.journal -- Nicolas Dandrimont Tue, 01 Oct 2019 18:28:19 +0200 swh-storage (0.0.151-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.151 - (tagged by Stefano Zacchiroli on 2019-10-01 10:04:36 +0200) * Upstream changes: - v0.0.151 - * tox: anticipate mypy run to just after flake8 - * mypy.ini: be less flaky w.r.t. the packages installed in tox - * storage.py: ignore typing of optional get_journal_writer import - * mypy: ignore swh.journal to work-around dependency loop - * init.py: switch to documented way of extending path - * typing: minimal changes to make a no- op mypy run pass - * Write objects to the journal only if they don't exist yet. - * Use origin URLs for skipped_content['origin'] instead of origin ids. - * Properly mock get_journal_writer for the remote-pg-storage tests. - * journal_writer: use journal writer from swh.journal - * fix typos in docstrings and sample paths - * storage.origin_visit_add: Remove deprecated 'ts' parameter - * click "required" param wants bool, not int -- Software Heritage autobuilder (on jenkins-debian1) Tue, 01 Oct 2019 08:09:53 +0000 swh-storage (0.0.150-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.150 - (tagged by Antoine R. Dumont (@ardumont) on 2019-09-04 16:09:59 +0200) * Upstream changes: - v0.0.150 - tests/test_storage: Remove failing assertion after swh-model update - tests/test_storage: Fix tests execution with psycopg2 < 2.8 -- Software Heritage autobuilder (on jenkins-debian1) Wed, 04 Sep 2019 14:16:09 +0000 swh-storage (0.0.149-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.149 - (tagged by Antoine R. Dumont (@ardumont) on 2019-09-03 14:00:57 +0200) * Upstream changes: - v0.0.149 - Add support for origin_url in origin_metadata_* - Make origin_add/origin_visit_update validate their input - Make snapshot_add validate its input - Make revision_add and release_add validate their input - Make directory_add validate its input - Make content_add validate its input using swh-model -- Software Heritage autobuilder (on jenkins-debian1) Tue, 03 Sep 2019 12:27:51 +0000 swh-storage (0.0.148-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.148 - (tagged by Valentin Lorentz on 2019-08-23 10:33:02 +0200) * Upstream changes: - v0.0.148 - Tests improvements: - * Remove 'next_branch' from test input data. - * Fix off-by-one error when using origin_visit_upsert on with an unknown visit id. - * Use explicit arguments for origin_visit_add. - * Remove test_content_missing__marked_missing, it makes no sense. - Drop person ids: - * Stop leaking person ids. - * Remove person_get endpoint. - Logging fixes: - * Enforce log level for the werkzeug logger. - * Eliminate warnings about %TYPE. - * api: use RPCServerApp and RPCClient instead of deprecated classes - Other: - * Add support for skipped content in in- memory storage -- Software Heritage autobuilder (on jenkins-debian1) Fri, 23 Aug 2019 08:48:21 +0000 swh-storage (0.0.147-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.147 - (tagged by Valentin Lorentz on 2019-07-18 12:11:37 +0200) * Upstream changes: - Make origin_get ignore the `type` argument -- Software Heritage autobuilder (on jenkins-debian1) Thu, 18 Jul 2019 10:16:16 +0000 swh-storage (0.0.146-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.146 - (tagged by Valentin Lorentz on 2019-07-18 10:46:21 +0200) * Upstream changes: - Progress toward getting rid of origin ids - * Less dependency on origin ids in the in-mem storage - * add the SWH_STORAGE_IN_MEMORY_ENABLE_ORIGIN_IDS env var - * Remove legacy behavior of snapshot_add -- Software Heritage autobuilder (on jenkins-debian1) Thu, 18 Jul 2019 08:52:09 +0000 swh-storage (0.0.145-1~swh3) unstable-swh; urgency=medium * Properly rebuild for unstable-swh -- Nicolas Dandrimont Thu, 11 Jul 2019 14:03:30 +0200 swh-storage (0.0.145-1~swh2) buster-swh; urgency=medium * Remove useless swh.scheduler dependency -- Nicolas Dandrimont Thu, 11 Jul 2019 13:53:45 +0200 swh-storage (0.0.145-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.145 - (tagged by Valentin Lorentz on 2019-07-02 12:00:53 +0200) * Upstream changes: - v0.0.145 - Add an 'origin_visit_find_by_date' endpoint. - Add support for origin urls in all endpoints -- Software Heritage autobuilder (on jenkins-debian1) Tue, 02 Jul 2019 10:19:19 +0000 swh-storage (0.0.143-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.143 - (tagged by Valentin Lorentz on 2019-06-05 13:18:14 +0200) * Upstream changes: - Add test for snapshot/release counters. -- Software Heritage autobuilder (on jenkins-debian1) Mon, 01 Jul 2019 12:38:40 +0000 swh-storage (0.0.142-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.142 - (tagged by Valentin Lorentz on 2019-06-11 15:24:49 +0200) * Upstream changes: - Mark network tests, so they can be disabled. -- Software Heritage autobuilder (on jenkins-debian1) Tue, 11 Jun 2019 13:44:19 +0000 swh-storage (0.0.141-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.141 - (tagged by Valentin Lorentz on 2019-06-06 17:05:03 +0200) * Upstream changes: - Add support for using URL instead of ID in snapshot_get_latest. -- Software Heritage autobuilder (on jenkins-debian1) Tue, 11 Jun 2019 10:36:32 +0000 swh-storage (0.0.140-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.140 - (tagged by mihir(faux__) on 2019-03-24 21:47:31 +0530) * Upstream changes: - Changes the output of content_find method to a list in case of hash collisions and makes the sql query on python side and added test duplicate input, colliding sha256 and colliding blake2s256 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 16 May 2019 12:09:04 +0000 swh-storage (0.0.139-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.139 - (tagged by Nicolas Dandrimont on 2019-04-18 17:57:57 +0200) * Upstream changes: - Release swh.storage v0.0.139 - Backwards- compatibility improvements for snapshot_add - Better transactionality in revision_add/release_add - Fix backwards metric names - Handle shallow histories properly -- Software Heritage autobuilder (on jenkins-debian1) Thu, 18 Apr 2019 16:08:28 +0000 swh-storage (0.0.138-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.138 - (tagged by Valentin Lorentz on 2019-04-09 16:40:49 +0200) * Upstream changes: - Use the db_transaction decorator on all _add() methods. - So they gracefully release the connection on error instead - of relying on reference-counting to call the Db's `__del__` - (which does not happen in Hypothesis tests) because a ref - to it is kept via the traceback object. -- Software Heritage autobuilder (on jenkins-debian1) Tue, 09 Apr 2019 16:50:48 +0000 swh-storage (0.0.137-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.137 - (tagged by Valentin Lorentz on 2019-04-08 15:40:24 +0200) * Upstream changes: - Make test_origin_get_range run faster. -- Software Heritage autobuilder (on jenkins-debian1) Mon, 08 Apr 2019 13:56:16 +0000 swh-storage (0.0.135-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.135 - (tagged by Valentin Lorentz on 2019-04-04 20:42:32 +0200) * Upstream changes: - Make content_add_metadata require a ctime argument. - This makes Python set the ctime instead of pgsql. -- Software Heritage autobuilder (on jenkins-debian1) Fri, 05 Apr 2019 14:43:28 +0000 swh-storage (0.0.134-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.134 - (tagged by Valentin Lorentz on 2019-04-03 13:38:58 +0200) * Upstream changes: - Don't leak origin ids to the journal. -- Software Heritage autobuilder (on jenkins-debian1) Thu, 04 Apr 2019 10:16:09 +0000 swh-storage (0.0.132-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.132 - (tagged by Valentin Lorentz on 2019-04-01 11:50:30 +0200) * Upstream changes: - Use sha1 instead of bigint as FK from origin_visit to snapshot (part 1: add new column) -- Software Heritage autobuilder (on jenkins-debian1) Mon, 01 Apr 2019 13:30:48 +0000 swh-storage (0.0.131-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.131 - (tagged by Nicolas Dandrimont on 2019-03-28 17:24:44 +0100) * Upstream changes: - Release swh.storage v0.0.131 - Add statsd metrics to storage RPC backend - Clean up snapshot_add/origin_visit_update - Uniformize RPC backend to use POSTs everywhere -- Software Heritage autobuilder (on jenkins-debian1) Thu, 28 Mar 2019 16:34:07 +0000 swh-storage (0.0.130-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.130 - (tagged by Valentin Lorentz on 2019-02-26 10:50:44 +0100) * Upstream changes: - Add an helper function to list all origins in the storage. -- Software Heritage autobuilder (on jenkins-debian1) Wed, 13 Mar 2019 14:01:04 +0000 swh-storage (0.0.129-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.129 - (tagged by Valentin Lorentz on 2019-02-27 10:42:29 +0100) * Upstream changes: - Double the timeout of revision_get. - Metadata indexers often hit the limit. -- Software Heritage autobuilder (on jenkins-debian1) Fri, 01 Mar 2019 10:11:28 +0000 swh-storage (0.0.128-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.128 - (tagged by Antoine R. Dumont (@ardumont) on 2019-02-21 14:59:22 +0100) * Upstream changes: - v0.0.128 - api.server: Fix wrong exception type - storage.cli: Fix cli entry point name to the expected name (setup.py) -- Software Heritage autobuilder (on jenkins-debian1) Thu, 21 Feb 2019 14:07:23 +0000 swh-storage (0.0.127-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.127 - (tagged by Antoine R. Dumont (@ardumont) on 2019-02-21 13:34:19 +0100) * Upstream changes: - v0.0.127 - api.wsgi: Open wsgi entrypoint and check config at startup time - api.server: Make the api server load and check its configuration - swh.storage.cli: Migrate the api server startup in swh.storage.cli -- Software Heritage autobuilder (on jenkins-debian1) Thu, 21 Feb 2019 12:59:48 +0000 swh-storage (0.0.126-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.126 - (tagged by Valentin Lorentz on 2019-02-21 10:18:26 +0100) * Upstream changes: - Double the timeout of snapshot_get_latest. - Metadata indexers often hit the limit. -- Software Heritage autobuilder (on jenkins-debian1) Thu, 21 Feb 2019 11:24:52 +0000 swh-storage (0.0.125-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.125 - (tagged by Antoine R. Dumont (@ardumont) on 2019-02-14 10:13:31 +0100) * Upstream changes: - v0.0.125 - api/server: Do not read configuration at each request -- Software Heritage autobuilder (on jenkins-debian1) Thu, 14 Feb 2019 16:57:01 +0000 swh-storage (0.0.124-1~swh3) unstable-swh; urgency=low * New upstream release, fixing the distribution this time -- Antoine R. Dumont (@ardumont) Thu, 14 Feb 2019 17:51:29 +0100 swh-storage (0.0.124-1~swh2) unstable; urgency=medium * New upstream release for dependency fix reasons -- Antoine R. Dumont (@ardumont) Thu, 14 Feb 2019 09:27:55 +0100 swh-storage (0.0.124-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.124 - (tagged by Antoine Lambert on 2019-02-12 14:40:53 +0100) * Upstream changes: - version 0.0.124 -- Software Heritage autobuilder (on jenkins-debian1) Tue, 12 Feb 2019 13:46:08 +0000 swh-storage (0.0.123-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.123 - (tagged by Antoine R. Dumont (@ardumont) on 2019-02-08 15:06:49 +0100) * Upstream changes: - v0.0.123 - Make Storage.origin_get support a list of origins, like other - Storage.*_get methods. - Stop using _to_bytes functions. - Use the BaseDb (and friends) from swh-core -- Software Heritage autobuilder (on jenkins-debian1) Fri, 08 Feb 2019 14:14:18 +0000 swh-storage (0.0.122-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.122 - (tagged by Antoine Lambert on 2019-01-28 11:57:27 +0100) * Upstream changes: - version 0.0.122 -- Software Heritage autobuilder (on jenkins-debian1) Mon, 28 Jan 2019 11:02:45 +0000 swh-storage (0.0.121-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.121 - (tagged by Antoine Lambert on 2019-01-28 11:31:48 +0100) * Upstream changes: - version 0.0.121 -- Software Heritage autobuilder (on jenkins-debian1) Mon, 28 Jan 2019 10:36:40 +0000 swh-storage (0.0.120-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.120 - (tagged by Antoine Lambert on 2019-01-17 12:04:27 +0100) * Upstream changes: - version 0.0.120 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 17 Jan 2019 11:12:47 +0000 swh-storage (0.0.119-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.119 - (tagged by Antoine R. Dumont (@ardumont) on 2019-01-11 11:57:13 +0100) * Upstream changes: - v0.0.119 - listener: Notify Kafka when an origin visit is updated -- Software Heritage autobuilder (on jenkins-debian1) Fri, 11 Jan 2019 11:02:07 +0000 swh-storage (0.0.118-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.118 - (tagged by Antoine Lambert on 2019-01-09 16:59:15 +0100) * Upstream changes: - version 0.0.118 -- Software Heritage autobuilder (on jenkins-debian1) Wed, 09 Jan 2019 18:51:34 +0000 swh-storage (0.0.117-1~swh1) unstable-swh; urgency=medium * v0.0.117 * listener: Adapt decoding behavior depending on the object type -- Antoine R. Dumont (@ardumont) Thu, 20 Dec 2018 14:48:44 +0100 swh-storage (0.0.116-1~swh1) unstable-swh; urgency=medium * v0.0.116 * Update requirements to latest swh.core -- Antoine R. Dumont (@ardumont) Fri, 14 Dec 2018 15:57:04 +0100 swh-storage (0.0.115-1~swh1) unstable-swh; urgency=medium * version 0.0.115 -- Antoine Lambert Fri, 14 Dec 2018 15:47:52 +0100 swh-storage (0.0.114-1~swh1) unstable-swh; urgency=medium * version 0.0.114 -- Antoine Lambert Wed, 05 Dec 2018 10:59:49 +0100 swh-storage (0.0.113-1~swh1) unstable-swh; urgency=medium * v0.0.113 * in-memory storage: Add recursive argument to directory_ls endpoint -- Antoine R. Dumont (@ardumont) Fri, 30 Nov 2018 11:56:44 +0100 swh-storage (0.0.112-1~swh1) unstable-swh; urgency=medium * v0.0.112 * in-memory storage: Align with existing storage * docstring: Improvements and adapt according to api * doc: update index to match new swh-doc format * Increase test coverage for stat_counters + fix its bugs. -- Antoine R. Dumont (@ardumont) Fri, 30 Nov 2018 10:28:02 +0100 swh-storage (0.0.111-1~swh1) unstable-swh; urgency=medium * v0.0.111 * Move generative tests in their own module * Open in-memory storage implementation -- Antoine R. Dumont (@ardumont) Wed, 21 Nov 2018 08:55:14 +0100 swh-storage (0.0.110-1~swh1) unstable-swh; urgency=medium * v0.0.110 * storage: Open content_get_range endpoint * tests: Start using hypothesis for tests generation * Improvements: Remove SQLisms from the tests and API * docs: Document metadata providers -- Antoine R. Dumont (@ardumont) Fri, 16 Nov 2018 11:53:14 +0100 swh-storage (0.0.109-1~swh1) unstable-swh; urgency=medium * version 0.0.109 -- Antoine Lambert Mon, 12 Nov 2018 14:11:09 +0100 swh-storage (0.0.108-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.108 * Add a function to get a full snapshot from the paginated view -- Nicolas Dandrimont Thu, 18 Oct 2018 18:32:10 +0200 swh-storage (0.0.107-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.107 * Enable pagination of snapshot branches * Drop occurrence-related tables * Drop entity-related tables -- Nicolas Dandrimont Wed, 17 Oct 2018 15:06:07 +0200 swh-storage (0.0.106-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.106 * Fix origin_visit_get_latest_snapshot logic * Improve directory iterator * Drop backwards compatibility between snapshots and occurrences * Drop the occurrence table -- Nicolas Dandrimont Mon, 08 Oct 2018 17:03:54 +0200 swh-storage (0.0.105-1~swh1) unstable-swh; urgency=medium * v0.0.105 * Increase directory_ls endpoint to 20 seconds * Add snapshot to the stats endpoint * Improve documentation -- Antoine R. Dumont (@ardumont) Mon, 10 Sep 2018 11:36:27 +0200 swh-storage (0.0.104-1~swh1) unstable-swh; urgency=medium * version 0.0.104 -- Antoine Lambert Wed, 29 Aug 2018 15:55:37 +0200 swh-storage (0.0.103-1~swh1) unstable-swh; urgency=medium * v0.0.103 * swh.storage.storage: origin_add returns updated list of dict with id -- Antoine R. Dumont (@ardumont) Mon, 30 Jul 2018 11:47:53 +0200 swh-storage (0.0.102-1~swh1) unstable-swh; urgency=medium * Release swh-storage v0.0.102 * Stop using temporary tables for read-only queries * Add timeouts for some read-only queries -- Nicolas Dandrimont Tue, 05 Jun 2018 14:06:54 +0200 swh-storage (0.0.101-1~swh1) unstable-swh; urgency=medium * v0.0.101 * swh.storage.api.client: Permit to specify the query timeout option -- Antoine R. Dumont (@ardumont) Thu, 24 May 2018 12:13:51 +0200 swh-storage (0.0.100-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.100 * remote api: only instantiate storage once per import * add thread-awareness to the storage implementation * properly cleanup after tests * parallelize objstorage and storage additions -- Nicolas Dandrimont Sat, 12 May 2018 18:12:40 +0200 swh-storage (0.0.99-1~swh1) unstable-swh; urgency=medium * v0.0.99 * storage: Add methods to compute directories/revisions diff * Add a new table for "bucketed" object counts * doc: update table clusters in SQL diagram * swh.storage.content_missing: Improve docstring -- Antoine R. Dumont (@ardumont) Tue, 20 Feb 2018 13:32:25 +0100 swh-storage (0.0.98-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.98 * Switch backwards compatibility for snapshots off -- Nicolas Dandrimont Tue, 06 Feb 2018 15:27:15 +0100 swh-storage (0.0.97-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.97 * refactor database initialization * use a separate thread instead of a temporary file for COPY operations * add more snapshot-related endpoints -- Nicolas Dandrimont Tue, 06 Feb 2018 14:07:07 +0100 swh-storage (0.0.96-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.96 * Add snapshot models * Add support for hg revision type -- Nicolas Dandrimont Tue, 19 Dec 2017 16:25:57 +0100 swh-storage (0.0.95-1~swh1) unstable-swh; urgency=medium * v0.0.95 * swh.storage: Rename indexer_configuration to tool * swh.storage: Migrate indexer model to its own model -- Antoine R. Dumont (@ardumont) Thu, 07 Dec 2017 09:56:31 +0100 swh-storage (0.0.94-1~swh1) unstable-swh; urgency=medium * v0.0.94 * Open searching origins methods to storage -- Antoine R. Dumont (@ardumont) Tue, 05 Dec 2017 12:32:57 +0100 swh-storage (0.0.93-1~swh1) unstable-swh; urgency=medium * v0.0.93 * swh.storage: Open indexer_configuration_add endpoint * swh-data: Update content mimetype indexer configuration * origin_visit_get: make order repeatable * db: Make unique indices actually unique and vice versa * Add origin_metadata endpoints (add, get, etc...) * cleanup: Remove unused content provenance cache tables -- Antoine R. Dumont (@ardumont) Fri, 24 Nov 2017 11:14:11 +0100 swh-storage (0.0.92-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.92 * make swh.storage.schemata work on SQLAlchemy 1.0 -- Nicolas Dandrimont Thu, 12 Oct 2017 19:51:24 +0200 swh-storage (0.0.91-1~swh1) unstable-swh; urgency=medium * Release swh.storage version 0.0.91 * Update packaging runes -- Nicolas Dandrimont Thu, 12 Oct 2017 18:41:46 +0200 swh-storage (0.0.90-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.90 * Remove leaky dependency on python3-kafka -- Nicolas Dandrimont Wed, 11 Oct 2017 18:53:22 +0200 swh-storage (0.0.89-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.89 * Add new package for ancillary schemata * Add new metadata-related entry points * Update for new swh.model -- Nicolas Dandrimont Wed, 11 Oct 2017 17:39:29 +0200 swh-storage (0.0.88-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.88 * Move the archiver to its own module * Prepare building for stretch -- Nicolas Dandrimont Fri, 30 Jun 2017 14:52:12 +0200 swh-storage (0.0.87-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.87 * update tasks to new swh.scheduler api -- Nicolas Dandrimont Mon, 12 Jun 2017 17:54:11 +0200 swh-storage (0.0.86-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.86 * archiver updates -- Nicolas Dandrimont Tue, 06 Jun 2017 18:43:43 +0200 swh-storage (0.0.85-1~swh1) unstable-swh; urgency=medium * v0.0.85 * Improve license endpoint's unknown license policy -- Antoine R. Dumont (@ardumont) Tue, 06 Jun 2017 17:55:40 +0200 swh-storage (0.0.84-1~swh1) unstable-swh; urgency=medium * v0.0.84 * Update indexer endpoints to use indexer configuration id * Add indexer configuration endpoint -- Antoine R. Dumont (@ardumont) Fri, 02 Jun 2017 16:16:47 +0200 swh-storage (0.0.83-1~swh1) unstable-swh; urgency=medium * v0.0.83 * Add blake2s256 new hash computation on content -- Antoine R. Dumont (@ardumont) Fri, 31 Mar 2017 12:27:09 +0200 swh-storage (0.0.82-1~swh1) unstable-swh; urgency=medium * v0.0.82 * swh.storage.listener: Subscribe to new origin notifications * sql/swh-func: improve equality check on the three columns for swh_content_missing * swh.storage: add length to directory listing primitives * refactoring: Migrate from swh.core.hashutil to swh.model.hashutil * swh.storage.archiver.updater: Create a content updater journal client * vault: add a git fast-import cooker * vault: generic cache to allow multiple cooker types and formats -- Antoine R. Dumont (@ardumont) Tue, 21 Mar 2017 14:50:16 +0100 swh-storage (0.0.81-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.81 * archiver improvements for mass injection in azure -- Nicolas Dandrimont Thu, 09 Mar 2017 11:15:28 +0100 swh-storage (0.0.80-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.80 * archiver improvements related to the mass injection of contents in azure * updates to the vault cooker -- Nicolas Dandrimont Tue, 07 Mar 2017 15:12:35 +0100 swh-storage (0.0.79-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.79 * archiver: keep counts of objects in each archive * converters: normalize timestamps using swh.model -- Nicolas Dandrimont Tue, 14 Feb 2017 19:37:36 +0100 swh-storage (0.0.78-1~swh1) unstable-swh; urgency=medium * v0.0.78 * Refactoring some common code into swh.core + adaptation api calls in * swh.objstorage and swh.storage (storage and vault) -- Antoine R. Dumont (@ardumont) Thu, 26 Jan 2017 15:08:03 +0100 swh-storage (0.0.77-1~swh1) unstable-swh; urgency=medium * v0.0.77 * Paginate results for origin_visits endpoint -- Antoine R. Dumont (@ardumont) Thu, 19 Jan 2017 14:41:49 +0100 swh-storage (0.0.76-1~swh1) unstable-swh; urgency=medium * v0.0.76 * Unify storage and objstorage configuration and instantiation functions -- Antoine R. Dumont (@ardumont) Thu, 15 Dec 2016 18:25:58 +0100 swh-storage (0.0.75-1~swh1) unstable-swh; urgency=medium * v0.0.75 * Add information on indexer tools (T610) -- Antoine R. Dumont (@ardumont) Fri, 02 Dec 2016 18:21:36 +0100 swh-storage (0.0.74-1~swh1) unstable-swh; urgency=medium * v0.0.74 * Use strict equality for content ctags' symbols search -- Antoine R. Dumont (@ardumont) Tue, 29 Nov 2016 17:25:29 +0100 swh-storage (0.0.73-1~swh1) unstable-swh; urgency=medium * v0.0.73 * Improve ctags search query for edge cases -- Antoine R. Dumont (@ardumont) Mon, 28 Nov 2016 16:34:55 +0100 swh-storage (0.0.72-1~swh1) unstable-swh; urgency=medium * v0.0.72 * Permit pagination on content_ctags_search api endpoint -- Antoine R. Dumont (@ardumont) Thu, 24 Nov 2016 14:19:29 +0100 swh-storage (0.0.71-1~swh1) unstable-swh; urgency=medium * v0.0.71 * Open full-text search endpoint on ctags -- Antoine R. Dumont (@ardumont) Wed, 23 Nov 2016 17:33:51 +0100 swh-storage (0.0.70-1~swh1) unstable-swh; urgency=medium * v0.0.70 * Add new license endpoints (add/get) * Update ctags endpoints to align update conflict policy -- Antoine R. Dumont (@ardumont) Thu, 10 Nov 2016 17:27:49 +0100 swh-storage (0.0.69-1~swh1) unstable-swh; urgency=medium * v0.0.69 * storage: Open ctags entry points (missing, add, get) * storage: allow adding several origins at once -- Antoine R. Dumont (@ardumont) Thu, 20 Oct 2016 16:07:07 +0200 swh-storage (0.0.68-1~swh1) unstable-swh; urgency=medium * v0.0.68 * indexer: Open mimetype/language get endpoints * indexer: Add the mimetype/language add function with conflict_update flag * archiver: Extend worker-to-backend to transmit messages to another * queue (once done) -- Antoine R. Dumont (@ardumont) Thu, 13 Oct 2016 15:30:21 +0200 swh-storage (0.0.67-1~swh1) unstable-swh; urgency=medium * v0.0.67 * Fix provenance storage init function -- Antoine R. Dumont (@ardumont) Wed, 12 Oct 2016 02:24:12 +0200 swh-storage (0.0.66-1~swh1) unstable-swh; urgency=medium * v0.0.66 * Improve provenance configuration format -- Antoine R. Dumont (@ardumont) Wed, 12 Oct 2016 01:39:26 +0200 swh-storage (0.0.65-1~swh1) unstable-swh; urgency=medium * v0.0.65 * Open api entry points for swh.indexer about content mimetype and * language * Update schema graph to latest version -- Antoine R. Dumont (@ardumont) Sat, 08 Oct 2016 10:00:30 +0200 swh-storage (0.0.64-1~swh1) unstable-swh; urgency=medium * v0.0.64 * Fix: Missing incremented version 5 for archiver.dbversion * Retrieve information on a content cached * sql/swh-func: content cache populates lines in deterministic order -- Antoine R. Dumont (@ardumont) Thu, 29 Sep 2016 21:50:59 +0200 swh-storage (0.0.63-1~swh1) unstable-swh; urgency=medium * v0.0.63 * Make the 'worker to backend' destination agnostic (message parameter) * Improve 'unknown sha1' policy (archiver db can lag behind swh db) * Improve 'force copy' policy -- Antoine R. Dumont (@ardumont) Fri, 23 Sep 2016 12:29:50 +0200 swh-storage (0.0.62-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.62 * Updates to the provenance cache to reduce churn on the main tables -- Nicolas Dandrimont Thu, 22 Sep 2016 18:54:52 +0200 swh-storage (0.0.61-1~swh1) unstable-swh; urgency=medium * v0.0.61 * Handle copies of unregistered sha1 in archiver db * Fix copy to only the targeted destination * Update to latest python3-swh.core dependency -- Antoine R. Dumont (@ardumont) Thu, 22 Sep 2016 13:44:05 +0200 swh-storage (0.0.60-1~swh1) unstable-swh; urgency=medium * v0.0.60 * Update archiver dependencies -- Antoine R. Dumont (@ardumont) Tue, 20 Sep 2016 16:46:48 +0200 swh-storage (0.0.59-1~swh1) unstable-swh; urgency=medium * v0.0.59 * Unify configuration property between director/worker * Deal with potential missing contents in the archiver db * Improve get_contents_error implementation * Remove dead code in swh.storage.db about archiver -- Antoine R. Dumont (@ardumont) Sat, 17 Sep 2016 12:50:14 +0200 swh-storage (0.0.58-1~swh1) unstable-swh; urgency=medium * v0.0.58 * ArchiverDirectorToBackend reads sha1 from stdin and sends chunks of sha1 * for archival. -- Antoine R. Dumont (@ardumont) Fri, 16 Sep 2016 22:17:14 +0200 swh-storage (0.0.57-1~swh1) unstable-swh; urgency=medium * v0.0.57 * Update swh.storage.archiver -- Antoine R. Dumont (@ardumont) Thu, 15 Sep 2016 16:30:11 +0200 swh-storage (0.0.56-1~swh1) unstable-swh; urgency=medium * v0.0.56 * Vault: Add vault implementation (directory cooker & cache * implementation + its api) * Archiver: Add another archiver implementation (direct to backend) -- Antoine R. Dumont (@ardumont) Thu, 15 Sep 2016 10:56:35 +0200 swh-storage (0.0.55-1~swh1) unstable-swh; urgency=medium * v0.0.55 * Fix origin_visit endpoint -- Antoine R. Dumont (@ardumont) Thu, 08 Sep 2016 15:21:28 +0200 swh-storage (0.0.54-1~swh1) unstable-swh; urgency=medium * v0.0.54 * Open origin_visit_get_by entry point -- Antoine R. Dumont (@ardumont) Mon, 05 Sep 2016 12:36:34 +0200 swh-storage (0.0.53-1~swh1) unstable-swh; urgency=medium * v0.0.53 * Add cache about content provenance * debian: fix python3-swh.storage.archiver runtime dependency * debian: create new package python3-swh.storage.provenance -- Antoine R. Dumont (@ardumont) Fri, 02 Sep 2016 11:14:09 +0200 swh-storage (0.0.52-1~swh1) unstable-swh; urgency=medium * v0.0.52 * Package python3-swh.storage.archiver -- Antoine R. Dumont (@ardumont) Thu, 25 Aug 2016 14:55:23 +0200 swh-storage (0.0.51-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.51 * Add new metadata column to origin_visit * Update swh-add-directory script for updated API -- Nicolas Dandrimont Wed, 24 Aug 2016 14:36:03 +0200 swh-storage (0.0.50-1~swh1) unstable-swh; urgency=medium * v0.0.50 * Add a function to pull (only) metadata for a list of contents * Update occurrence_add api entry point to properly deal with origin_visit * Add origin_visit api entry points to create/update origin_visit -- Antoine R. Dumont (@ardumont) Tue, 23 Aug 2016 16:29:26 +0200 swh-storage (0.0.49-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.49 * Proper dependency on python3-kafka -- Nicolas Dandrimont Fri, 19 Aug 2016 13:45:52 +0200 swh-storage (0.0.48-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.48 * Updates to the archiver * Notification support for new object creations -- Nicolas Dandrimont Fri, 19 Aug 2016 12:13:50 +0200 swh-storage (0.0.47-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.47 * Update storage archiver to new schemaless schema -- Nicolas Dandrimont Fri, 22 Jul 2016 16:59:19 +0200 swh-storage (0.0.46-1~swh1) unstable-swh; urgency=medium * v0.0.46 * Update archiver bootstrap -- Antoine R. Dumont (@ardumont) Wed, 20 Jul 2016 19:04:42 +0200 swh-storage (0.0.45-1~swh1) unstable-swh; urgency=medium * v0.0.45 * Separate swh.storage.archiver's db from swh.storage.storage -- Antoine R. Dumont (@ardumont) Tue, 19 Jul 2016 15:05:36 +0200 swh-storage (0.0.44-1~swh1) unstable-swh; urgency=medium * v0.0.44 * Open listing visits per origin api -- Quentin Campos Fri, 08 Jul 2016 11:27:10 +0200 swh-storage (0.0.43-1~swh1) unstable-swh; urgency=medium * v0.0.43 * Extract objstorage to its own package swh.objstorage -- Quentin Campos Mon, 27 Jun 2016 14:57:12 +0200 swh-storage (0.0.42-1~swh1) unstable-swh; urgency=medium * Add an object storage multiplexer to allow transition between multiple versions of object storages. -- Quentin Campos Tue, 21 Jun 2016 15:03:52 +0200 swh-storage (0.0.41-1~swh1) unstable-swh; urgency=medium * Refactoring of the object storage in order to allow multiple versions of it, as well as a multiplexer for version transition. -- Quentin Campos Thu, 16 Jun 2016 15:54:16 +0200 swh-storage (0.0.40-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.40: * Refactor objstorage to allow for different implementations * Updates to the checker functionality * Bump swh.core dependency to v0.0.20 -- Nicolas Dandrimont Tue, 14 Jun 2016 17:25:42 +0200 swh-storage (0.0.39-1~swh1) unstable-swh; urgency=medium * v0.0.39 * Add run_from_webserver function for objstorage api server * Add unique identifier message on default api server route endpoints -- Antoine R. Dumont (@ardumont) Fri, 20 May 2016 15:27:34 +0200 swh-storage (0.0.38-1~swh1) unstable-swh; urgency=medium * v0.0.38 * Add an http api for object storage * Implement an archiver to perform backup copies -- Quentin Campos Fri, 20 May 2016 14:40:14 +0200 swh-storage (0.0.37-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.37 * Add fullname to person table * Add svn as a revision type -- Nicolas Dandrimont Fri, 08 Apr 2016 16:44:24 +0200 swh-storage (0.0.36-1~swh1) unstable-swh; urgency=medium * Release swh.storage v0.0.36 * Add json-schema documentation for the jsonb fields * Overhaul entity handling -- Nicolas Dandrimont Wed, 16 Mar 2016 17:27:17 +0100 swh-storage (0.0.35-1~swh1) unstable-swh; urgency=medium * Release swh-storage v0.0.35 * Factor in temporary tables with only an id (db v059) * Allow generic object search by sha1_git (db v060) -- Nicolas Dandrimont Thu, 25 Feb 2016 16:21:01 +0100 swh-storage (0.0.34-1~swh1) unstable-swh; urgency=medium * Release swh.storage version 0.0.34 * occurrence improvements * commit metadata improvements -- Nicolas Dandrimont Fri, 19 Feb 2016 18:20:07 +0100 swh-storage (0.0.33-1~swh1) unstable-swh; urgency=medium * Bump swh.storage to version 0.0.33 -- Nicolas Dandrimont Fri, 05 Feb 2016 11:17:00 +0100 swh-storage (0.0.32-1~swh1) unstable-swh; urgency=medium * v0.0.32 * Let the person's id flow * sql/upgrades/051: 050->051 schema change * sql/upgrades/050: 049->050 schema change - Clean up obsolete functions * sql/upgrades/049: Final take for 048->049 schema change. * sql: Use a new schema for occurrences -- Antoine R. Dumont (@ardumont) Fri, 29 Jan 2016 17:44:27 +0100 swh-storage (0.0.31-1~swh1) unstable-swh; urgency=medium * v0.0.31 * Deal with occurrence_history.branch, occurrence.branch, release.name as bytes -- Antoine R. Dumont (@ardumont) Wed, 27 Jan 2016 15:45:53 +0100 swh-storage (0.0.30-1~swh1) unstable-swh; urgency=medium * Prepare swh.storage v0.0.30 release * type-agnostic occurrences and revisions -- Nicolas Dandrimont Tue, 26 Jan 2016 07:36:43 +0100 swh-storage (0.0.29-1~swh1) unstable-swh; urgency=medium * v0.0.29 * New: * Upgrade sql schema to 041→043 * Deal with communication downtime between clients and storage * Open occurrence_get(origin_id) to retrieve latest occurrences per origin * Open release_get_by to retrieve a release by origin * Open directory_get to retrieve information on directory by id * Open entity_get to retrieve information on entity + hierarchy from its uuid * Open directory_get that retrieve information on directory per id * Update: * directory_get/directory_ls: Rename to directory_ls * revision_log: update to retrieve logs from multiple root revisions * revision_get_by: branch name filtering is now optional -- Antoine R. Dumont (@ardumont) Wed, 20 Jan 2016 16:15:50 +0100 swh-storage (0.0.28-1~swh1) unstable-swh; urgency=medium * v0.0.28 * Open entity_get api -- Antoine R. Dumont (@ardumont) Fri, 15 Jan 2016 16:37:27 +0100 swh-storage (0.0.27-1~swh1) unstable-swh; urgency=medium * v0.0.27 * Open directory_entry_get_by_path api * Improve get_revision_by api performance * sql/swh-schema: add index on origin(type, url) --> improve origin lookup api * Bump to 039 db version -- Antoine R. Dumont (@ardumont) Fri, 15 Jan 2016 12:42:47 +0100 swh-storage (0.0.26-1~swh1) unstable-swh; urgency=medium * v0.0.26 * Open revision_get_by to retrieve a revision by occurrence criterion filtering * sql/upgrades/036: add 035→036 upgrade script -- Antoine R. Dumont (@ardumont) Wed, 13 Jan 2016 12:46:44 +0100 swh-storage (0.0.25-1~swh1) unstable-swh; urgency=medium * v0.0.25 * Limit results in swh_revision_list* * Create the package to align the current db production version on https://archive.softwareheritage.org/ -- Antoine R. Dumont (@ardumont) Fri, 08 Jan 2016 11:33:08 +0100 swh-storage (0.0.24-1~swh1) unstable-swh; urgency=medium * Prepare swh.storage release v0.0.24 * Add a limit argument to revision_log -- Nicolas Dandrimont Wed, 06 Jan 2016 15:12:53 +0100 swh-storage (0.0.23-1~swh1) unstable-swh; urgency=medium * v0.0.23 * Protect against overflow, wrapped in ValueError for client * Fix relative path import for remote storage. * api to retrieve revision_log is now 'parents' aware -- Antoine R. Dumont (@ardumont) Wed, 06 Jan 2016 11:30:58 +0100 swh-storage (0.0.22-1~swh1) unstable-swh; urgency=medium * Release v0.0.22 * Fix relative import for remote storage -- Nicolas Dandrimont Wed, 16 Dec 2015 16:04:48 +0100 swh-storage (0.0.21-1~swh1) unstable-swh; urgency=medium * Prepare release v0.0.21 * Protect the storage api client from overflows * Add a get_storage function mapping to local or remote storage -- Nicolas Dandrimont Wed, 16 Dec 2015 13:34:46 +0100 swh-storage (0.0.20-1~swh1) unstable-swh; urgency=medium * v0.0.20 * allow numeric timestamps with offset * Open revision_log api * start migration to swh.model -- Antoine R. Dumont (@ardumont) Mon, 07 Dec 2015 15:20:36 +0100 swh-storage (0.0.19-1~swh1) unstable-swh; urgency=medium * v0.0.19 * Improve directory listing with content data * Open person_get * Open release_get data reading * Improve origin_get api * Effort to unify api output on dict (for read) * Migrate backend to 032 -- Antoine R. Dumont (@ardumont) Fri, 27 Nov 2015 13:33:34 +0100 swh-storage (0.0.18-1~swh1) unstable-swh; urgency=medium * v0.0.18 * Improve origin_get to permit retrieval per id * Update directory_get implementation (add join from * directory_entry_file to content) * Open release_get : [sha1] -> [Release] -- Antoine R. Dumont (@ardumont) Thu, 19 Nov 2015 11:18:35 +0100 swh-storage (0.0.17-1~swh1) unstable-swh; urgency=medium * Prepare deployment of swh.storage v0.0.17 * Add some entity related entry points -- Nicolas Dandrimont Tue, 03 Nov 2015 16:40:59 +0100 swh-storage (0.0.16-1~swh1) unstable-swh; urgency=medium * v0.0.16 * Add metadata column in revision (db version 29) * cache http connection for remote storage client -- Antoine R. Dumont (@ardumont) Thu, 29 Oct 2015 10:29:00 +0100 swh-storage (0.0.15-1~swh1) unstable-swh; urgency=medium * Prepare deployment of swh.storage v0.0.15 * Allow population of fetch_history * Update organizations / projects as entities * Use schema v028 for directory addition -- Nicolas Dandrimont Tue, 27 Oct 2015 11:43:39 +0100 swh-storage (0.0.14-1~swh1) unstable-swh; urgency=medium * Prepare swh.storage v0.0.14 deployment -- Nicolas Dandrimont Fri, 16 Oct 2015 15:34:08 +0200 swh-storage (0.0.13-1~swh1) unstable-swh; urgency=medium * Prepare deploying swh.storage v0.0.13 -- Nicolas Dandrimont Fri, 16 Oct 2015 14:51:44 +0200 swh-storage (0.0.12-1~swh1) unstable-swh; urgency=medium * Prepare deploying swh.storage v0.0.12 -- Nicolas Dandrimont Tue, 13 Oct 2015 12:39:18 +0200 swh-storage (0.0.11-1~swh1) unstable-swh; urgency=medium * Preparing deployment of swh.storage v0.0.11 -- Nicolas Dandrimont Fri, 09 Oct 2015 17:44:51 +0200 swh-storage (0.0.10-1~swh1) unstable-swh; urgency=medium * Prepare deployment of swh.storage v0.0.10 -- Nicolas Dandrimont Tue, 06 Oct 2015 17:37:00 +0200 swh-storage (0.0.9-1~swh1) unstable-swh; urgency=medium * Prepare deployment of swh.storage v0.0.9 -- Nicolas Dandrimont Thu, 01 Oct 2015 19:03:00 +0200 swh-storage (0.0.8-1~swh1) unstable-swh; urgency=medium * Prepare deployment of swh.storage v0.0.8 -- Nicolas Dandrimont Thu, 01 Oct 2015 11:32:46 +0200 swh-storage (0.0.7-1~swh1) unstable-swh; urgency=medium * Prepare deployment of swh.storage v0.0.7 -- Nicolas Dandrimont Tue, 29 Sep 2015 16:52:54 +0200 swh-storage (0.0.6-1~swh1) unstable-swh; urgency=medium * Prepare deployment of swh.storage v0.0.6 -- Nicolas Dandrimont Tue, 29 Sep 2015 16:43:24 +0200 swh-storage (0.0.5-1~swh1) unstable-swh; urgency=medium * Prepare deploying swh.storage v0.0.5 -- Nicolas Dandrimont Tue, 29 Sep 2015 16:27:00 +0200 swh-storage (0.0.1-1~swh1) unstable-swh; urgency=medium * Initial release * swh.storage.api: Properly escape arbitrary byte sequences in arguments -- Nicolas Dandrimont Tue, 22 Sep 2015 17:02:34 +0200 diff --git a/docs/extrinsic-metadata-specification.rst b/docs/extrinsic-metadata-specification.rst index 8d557ab8..a9c7fa90 100644 --- a/docs/extrinsic-metadata-specification.rst +++ b/docs/extrinsic-metadata-specification.rst @@ -1,353 +1,355 @@ :orphan: .. _extrinsic-metadata-specification: Extrinsic metadata specification ================================ :term:`Extrinsic metadata` is information about software that is not part of the source code itself but still closely related to the software. Typical sources for extrinsic metadata are: the hosting place of a repository, which can offer metadata via its web view or API; external registries like collaborative curation initiatives; and out-of-band information available at source code archival time. Since they are not part of the source code, a dedicated mechanism to fetch and store them is needed. This specification assumes the reader is familiar with Software Heritage's :ref:`architecture` and :ref:`data-model`. Metadata sources ---------------- Authorities ^^^^^^^^^^^ Metadata authorities are entities that provide metadata about an :term:`origin`. Metadata authorities include: code hosting places, :term:`deposit` submitters, and registries (eg. Wikidata). An authority is uniquely defined by these properties: * its type, representing the kind of authority, which is one of these values: * ``deposit_client``, for metadata pushed to Software Heritage at the same time as a software artifact * ``forge``, for metadata pulled from the same source as the one hosting the software artifacts (which includes package managers) * ``registry``, for metadata pulled from a third-party * its URL, which unambiguously identifies an instance of the authority type. Examples: =============== ================================= type url =============== ================================= deposit_client https://hal.archives-ouvertes.fr/ deposit_client https://hal.inria.fr/ deposit_client https://software.intel.com/ forge https://gitlab.com/ forge https://gitlab.inria.fr/ forge https://0xacab.org/ forge https://github.com/ registry https://www.wikidata.org/ registry https://swmath.org/ registry https://ascl.net/ =============== ================================= Metadata fetchers ^^^^^^^^^^^^^^^^^ Metadata fetchers are software components used to fetch metadata from a metadata authority, and ingest them into the Software Heritage archive. A metadata fetcher is uniquely defined by these properties: * its type * its version Examples: * :term:`loaders `, which may either discover metadata as a side-effect of loading source code, or be dedicated to fetching metadata. * :term:`listers `, which may discover metadata as a side-effect of discovering origins. * :term:`deposit` submitters, which push metadata to SWH from a third-party; usually at the same time as a :term:`software artifact` * crawlers, which fetch metadata from an authority in a way that is none of the above (eg. by querying a specific API of the origin's forge). Storage API ----------- Authorities and metadata fetchers ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Data model ~~~~~~~~~~ The :term:`storage` API uses these structures to represent metadata authorities and metadata fetchers (simplified Python code):: class MetadataAuthorityType(Enum): DEPOSIT_CLIENT = "deposit_client" FORGE = "forge" REGISTRY = "registry" class MetadataAuthority(BaseModel): """Represents an entity that provides metadata about an origin or software artifact.""" object_type = "metadata_authority" type: MetadataAuthorityType url: str class MetadataFetcher(BaseModel): """Represents a software component used to fetch metadata from a metadata authority, and ingest them into the Software Heritage archive.""" object_type = "metadata_fetcher" name: str version: str Storage API ~~~~~~~~~~~ * ``metadata_authority_add(authorities: List[MetadataAuthority])`` which adds a list of ``MetadataAuthority`` to the storage. * ``metadata_authority_get(type: MetadataAuthorityType, url: str) -> Optional[MetadataAuthority]`` which looks up a known authority (there is at most one) and if it is known, returns the corresponding ``MetadataAuthority`` * ``metadata_fetcher_add(fetchers: List[MetadataFetcher])`` which adds a list of ``MetadataFetcher`` to the storage. * ``metadata_fetcher_get(name: str, version: str) -> Optional[MetadataFetcher]`` which looks up a known fetcher (there is at most one) and if it is known, returns the corresponding ``MetadataFetcher`` Artifact metadata ^^^^^^^^^^^^^^^^^ Data model ~~~~~~~~~~ The storage database stores metadata on origins, and all software artifacts supported by the data model. They are represented using this structure (simplified Python code):: class RawExtrinsicMetadata(HashableObject, BaseModel): object_type = "raw_extrinsic_metadata" # target object target: ExtendedSWHID # source discovery_date: datetime.datetime authority: MetadataAuthority fetcher: MetadataFetcher # the metadata itself format: str metadata: bytes # context origin: Optional[str] = None visit: Optional[int] = None snapshot: Optional[CoreSWHID] = None release: Optional[CoreSWHID] = None revision: Optional[CoreSWHID] = None path: Optional[bytes] = None directory: Optional[CoreSWHID] = None id: Sha1Git The ``target`` may be: * a regular :ref:`core SWHID `, * a SWHID-like string with type ``ori`` and the SHA1 of an origin URL * a SWHID-like string with type ``emd`` and the SHA1 of an other ``RawExtrinsicMetadata`` object (to represent metadata on metadata objects) ``id`` is a sha1 hash of the ``RawExtrinsicMetadata`` object itself; it may be used in other ``RawExtrinsicMetadata`` as target. ``discovery_date`` is a Python datetime. ``authority`` must be a dict containing keys ``type`` and ``url``, and ``fetcher`` a dict containing keys ``name`` and ``version``. The authority and fetcher must be known to the storage before using this endpoint. ``format`` is a text field indicating the format of the content of the ``metadata`` byte string, see `extrinsic-metadata-formats`_. ``metadata`` is a byte array. Its format is specific to each authority; and is treated as an opaque value by the storage. Unifying these various formats into a common language is outside the scope of this specification. Finally, the remaining fields allow metadata can be given on a specific artifact within a specified context (for example: a directory in a specific revision from a specific visit on a specific origin) which will be stored along the metadata itself. For example, two origins may develop the same file independently; the information about authorship, licensing or even description may vary about the same artifact in a different context. This is why it is important to qualify the metadata with the complete context for which it is intended, if any. The allowed context fields for each ``target`` type are: * for ``emd`` (extrinsic metadata) and ``ori`` (origin): none * for ``snp`` (snapshot): ``origin`` (a URL) and ``visit`` (an integer) * for ``rel`` (release): those above, plus ``snapshot`` (the core SWHID of a snapshot) * for ``rev`` (revision): all those above, plus ``release`` (the core SWHID of a release) * for ``dir`` (directory): all those above, plus ``revision`` (the core SWHID of a revision) and ``path`` (a byte string), representing the path to this directory from the root of the ``revision`` * for ``cnt`` (content): all those above, plus ``directory`` (the core SWHID of a directory) All keys are optional, but should be provided whenever possible. The dictionary may be empty, if metadata is fully independent from context. In all cases, ``visit`` should only be provided if ``origin`` is (as visit ids are only unique with respect to an origin). Storage API ~~~~~~~~~~~ The storage API offers three endpoints to manipulate origin metadata: * Adding metadata:: raw_extrinsic_metadata_add(metadata: List[RawExtrinsicMetadata]) which adds a list of ``RawExtrinsicMetadata`` objects, whose ``metadata`` field is a byte string obtained from a given authority and associated to the ``target``. * Getting all metadata:: raw_extrinsic_metadata_get( target: ExtendedSWHID, authority: MetadataAuthority, after: Optional[datetime.datetime] = None, page_token: Optional[bytes] = None, limit: int = 1000, ) -> PagedResult[RawExtrinsicMetadata]: returns a list of ``RawExtrinsicMetadata`` with the given ``target`` and from the given ``authority``. If ``after`` is provided, only objects whose discovery date is more recent are returnered. ``PagedResult`` is a structure containing the results themselves, and a ``next_page_token`` used to fetch the next list of results, if any .. _extrinsic-metadata-formats: Extrinsic metadata formats -------------------------- Formats are identified by an opaque string. When possible, it should be the MIME type already in use to describe the metadata format outside Software Heritage. Otherwise it should be unambiguous, printable ASCII without spaces, and human-readable. Here is a list of all the metadata format stored: ``application/vnd.github.v3+json`` The metadata is the response of an API call to GitHub. ``gitlab-project-json`` The metadata is the response of an API call to a Gitlab instance. ``pypi-project-json`` The metadata is a release entry from a PyPI project's JSON file, extracted and re-serialized. ``replicate-npm-package-json`` ditto, but from a replicate.npmjs.com project ``nixguix-sources-json`` ditto, but from https://nix-community.github.io/nixpkgs-swh/ ``original-artifacts-json`` tarball data, see below ``sword-v2-atom-codemeta`` XML Atom document, with Codemeta metadata, as sent by a deposit client, see the :ref:`Deposit protocol reference `. ``sword-v2-atom-codemeta-v2`` Deprecated alias of ``sword-v2-atom-codemeta`` ``sword-v2-atom-codemeta-v2-in-json`` Deprecated, JSON serialization of a ``sword-v2-atom-codemeta`` document. ``xml-deposit-info`` Information about a deposit, to identify the provenance of a metadata object sent via swh-deposit, see below Details on some of these formats: +.. _extrinsic-metadata-original-artifacts-json: + original-artifacts-json ^^^^^^^^^^^^^^^^^^^^^^^ This is a loosely defined format, originally used as a ``metadata`` column on the ``revision`` table that changed over the years. It is a JSON array, and each entry is a JSON object representing an archive (tarball, zipball, ...) that was unpackaged by the SWH loader before loading its content in Software Heritage. When writing this specification, it was stabilized to this format:: [ { "length": , "filename": "", "checksums": { "sha1": "", "sha256": "", }, "url": "" }, ... ] Older ``original-artifacts-json`` were migrated to use this format, but may be missing some of the keys. xml-deposit-info ^^^^^^^^^^^^^^^^ Deposits with code objects are loaded as their own origin, so we can look them up in the deposit database from their metadata (which hold the origin as a context). This is not true for metadata-only deposits, because we don't create an origin for them; so we need to store this information somewhere. The naive solution would be to insert them in the Atom entry provided by the client, but it means altering a document before we archive it, which potentially corrupts it or loses part of the data. Therefore, on each metadata-only deposit, the deposit creates an extra "metametadata" object, with the original metadata object as target, and using this format:: {{ deposit.id }} {{ deposit.client.provider_url }} {{ deposit.collection.name }} diff --git a/requirements-swh.txt b/requirements-swh.txt index 7d53946f..cb0758cc 100644 --- a/requirements-swh.txt +++ b/requirements-swh.txt @@ -1,4 +1,4 @@ -swh.core[db,http] >= 2.10 +swh.core[db,http] >= 2.14 swh.counters >= v0.8.0 swh.model >= 6.3.0 swh.objstorage >= 0.2.2 diff --git a/swh.storage.egg-info/PKG-INFO b/swh.storage.egg-info/PKG-INFO index 353ab2b4..8d55c604 100644 --- a/swh.storage.egg-info/PKG-INFO +++ b/swh.storage.egg-info/PKG-INFO @@ -1,246 +1,246 @@ Metadata-Version: 2.1 Name: swh.storage -Version: 1.5.1 +Version: 1.6.0 Summary: Software Heritage storage manager Home-page: https://forge.softwareheritage.org/diffusion/DSTO/ Author: Software Heritage developers Author-email: swh-devel@inria.fr Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest Project-URL: Funding, https://www.softwareheritage.org/donate Project-URL: Source, https://forge.softwareheritage.org/source/swh-storage Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-storage/ Classifier: Programming Language :: Python :: 3 Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3) Classifier: Operating System :: OS Independent Classifier: Development Status :: 5 - Production/Stable Requires-Python: >=3.7 Description-Content-Type: text/markdown Provides-Extra: testing Provides-Extra: journal License-File: LICENSE License-File: AUTHORS swh-storage =========== Abstraction layer over the archive, allowing to access all stored source code artifacts as well as their metadata. See the [documentation](https://docs.softwareheritage.org/devel/swh-storage/index.html) for more details. ## Quick start ### Dependencies Python tests for this module include tests that cannot be run without a local Postgresql database, so you need the Postgresql server executable on your machine (no need to have a running Postgresql server). They also expect a cassandra server. #### Debian-like host ``` $ sudo apt install libpq-dev postgresql-11 cassandra ``` #### Non Debian-like host The tests expects the path to `cassandra` to either be unspecified, it is then looked up at `/usr/sbin/cassandra`, either specified through the environment variable `SWH_CASSANDRA_BIN`. Optionally, you can avoid running the cassandra tests. ``` (swh) :~/swh-storage$ tox -- -m 'not cassandra' ``` ### Installation It is strongly recommended to use a virtualenv. In the following, we consider you work in a virtualenv named `swh`. See the [developer setup guide](https://docs.softwareheritage.org/devel/developer-setup.html#developer-setup) for a more details on how to setup a working environment. You can install the package directly from [pypi](https://pypi.org/p/swh.storage): ``` (swh) :~$ pip install swh.storage [...] ``` Or from sources: ``` (swh) :~$ git clone https://forge.softwareheritage.org/source/swh-storage.git [...] (swh) :~$ cd swh-storage (swh) :~/swh-storage$ pip install . [...] ``` Then you can check it's properly installed: ``` (swh) :~$ swh storage --help Usage: swh storage [OPTIONS] COMMAND [ARGS]... Software Heritage Storage tools. Options: -h, --help Show this message and exit. Commands: rpc-serve Software Heritage Storage RPC server. ``` ## Tests The best way of running Python tests for this module is to use [tox](https://tox.readthedocs.io/). ``` (swh) :~$ pip install tox ``` ### tox From the sources directory, simply use tox: ``` (swh) :~/swh-storage$ tox [...] ========= 315 passed, 6 skipped, 15 warnings in 40.86 seconds ========== _______________________________ summary ________________________________ flake8: commands succeeded py3: commands succeeded congratulations :) ``` Note: it is possible to set the `JAVA_HOME` environment variable to specify the version of the JVM to be used by Cassandra. For example, at the time of writing this, Cassandra does not support java 14, so one may want to use for example java 11: ``` (swh) :~/swh-storage$ export JAVA_HOME=/usr/lib/jvm/java-14-openjdk-amd64/bin/java (swh) :~/swh-storage$ tox [...] ``` ## Development The storage server can be locally started. It requires a configuration file and a running Postgresql database. ### Sample configuration A typical configuration `storage.yml` file is: ``` storage: cls: postgresql db: "dbname=softwareheritage-dev user= password=" objstorage: cls: pathslicing root: /tmp/swh-storage/ slicing: 0:2/2:4/4:6 ``` which means, this uses: - a local storage instance whose db connection is to `softwareheritage-dev` local instance, - the objstorage uses a local objstorage instance whose: - `root` path is /tmp/swh-storage, - slicing scheme is `0:2/2:4/4:6`. This means that the identifier of the content (sha1) which will be stored on disk at first level with the first 2 hex characters, the second level with the next 2 hex characters and the third level with the next 2 hex characters. And finally the complete hash file holding the raw content. For example: 00062f8bd330715c4f819373653d97b3cd34394c will be stored at 00/06/2f/00062f8bd330715c4f819373653d97b3cd34394c Note that the `root` path should exist on disk before starting the server. ### Starting the storage server If the python package has been properly installed (e.g. in a virtual env), you should be able to use the command: ``` (swh) :~/swh-storage$ swh storage rpc-serve storage.yml ``` This runs a local swh-storage api at 5002 port. ``` (swh) :~/swh-storage$ curl http://127.0.0.1:5002 Software Heritage storage server

You have reached the Software Heritage storage server.
See its documentation and API for more information

``` ### And then what? In your upper layer ([loader-git](https://forge.softwareheritage.org/source/swh-loader-git/), [loader-svn](https://forge.softwareheritage.org/source/swh-loader-svn/), etc...), you can define a remote storage with this snippet of yaml configuration. ``` storage: cls: remote url: http://localhost:5002/ ``` You could directly define a postgresql storage with the following snippet: ``` storage: cls: postgresql db: service=swh-dev objstorage: cls: pathslicing root: /home/storage/swh-storage/ slicing: 0:2/2:4/4:6 ``` ## Cassandra As an alternative to PostgreSQL, swh-storage can use Cassandra as a database backend. It can be used like this: ``` storage: cls: cassandra hosts: - localhost objstorage: cls: pathslicing root: /home/storage/swh-storage/ slicing: 0:2/2:4/4:6 ``` The Cassandra swh-storage implementation supports both Cassandra >= 4.0-alpha2 and ScyllaDB >= 4.4 (and possibly earlier versions, but this is untested). While the main code supports both transparently, running tests or configuring the schema requires specific code when using ScyllaDB, enabled by setting the `SWH_USE_SCYLLADB=1` environment variable. diff --git a/swh.storage.egg-info/requires.txt b/swh.storage.egg-info/requires.txt index eb2ce030..013e982a 100644 --- a/swh.storage.egg-info/requires.txt +++ b/swh.storage.egg-info/requires.txt @@ -1,33 +1,33 @@ cassandra-driver!=3.21.0,>=3.19.0 click deprecated flask iso8601 mypy_extensions psycopg2 redis tenacity>=6.2 typing-extensions -swh.core[db,http]>=2.10 +swh.core[db,http]>=2.14 swh.counters>=v0.8.0 swh.model>=6.3.0 swh.objstorage>=0.2.2 [journal] swh.journal>=0.9 [testing] hypothesis>=3.11.0 pytest pytest-mock swh.model[testing]>=0.0.50 pytz pytest-redis pytest-xdist types-python-dateutil types-pytz types-pyyaml types-redis types-requests types-toml swh.journal>=0.9 diff --git a/swh/storage/api/server.py b/swh/storage/api/server.py index 22c69ec0..db416280 100644 --- a/swh/storage/api/server.py +++ b/swh/storage/api/server.py @@ -1,195 +1,205 @@ # Copyright (C) 2015-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information import logging import os from typing import Any, Dict, Optional -from psycopg2.errors import QueryCanceled +from psycopg2.errors import OperationalError, QueryCanceled from swh.core import config from swh.core.api import RPCServerApp from swh.core.api import encode_data_server as encode_data from swh.core.api import error_handler, serializers from swh.storage import get_storage as get_swhstorage from ..exc import StorageArgumentException from ..interface import StorageInterface from ..metrics import send_metric, timed from .serializers import DECODERS, ENCODERS def get_storage(): global storage if not storage: storage = get_swhstorage(**app.config["storage"]) return storage class StorageServerApp(RPCServerApp): extra_type_decoders = DECODERS extra_type_encoders = ENCODERS method_decorators = [timed] def _process_metrics(self, metrics, endpoint): for metric, count in metrics.items(): send_metric(metric=metric, count=count, method_name=endpoint) def post_content_add(self, ret, kw): self._process_metrics(ret, "content_add") def post_content_add_metadata(self, ret, kw): self._process_metrics(ret, "content_add_metadata") def post_skipped_content_add(self, ret, kw): self._process_metrics(ret, "skipped_content_add") def post_directory_add(self, ret, kw): self._process_metrics(ret, "directory_add") def post_revision_add(self, ret, kw): self._process_metrics(ret, "revision_add") def post_release_add(self, ret, kw): self._process_metrics(ret, "release_add") def post_snapshot_add(self, ret, kw): self._process_metrics(ret, "snapshot_add") def post_origin_visit_status_add(self, ret, kw): self._process_metrics(ret, "origin_visit_status_add") def post_origin_add(self, ret, kw): self._process_metrics(ret, "origin_add") def post_raw_extrinsic_metadata_add(self, ret, kw): self._process_metrics(ret, "raw_extrinsic_metadata_add") def post_metadata_fetcher_add(self, ret, kw): self._process_metrics(ret, "metadata_fetcher_add") def post_metadata_authority_add(self, ret, kw): self._process_metrics(ret, "metadata_authority_add") def post_extid_add(self, ret, kw): self._process_metrics(ret, "extid_add") def post_origin_visit_add(self, ret, kw): nb_visits = len(ret) send_metric( "origin_visit:add", count=nb_visits, # method_name should be "origin_visit_add", but changing it now would break # existing metrics method_name="origin_visit", ) app = StorageServerApp( __name__, backend_class=StorageInterface, backend_factory=get_storage ) storage = None @app.errorhandler(StorageArgumentException) def argument_error_handler(exception): return error_handler(exception, encode_data, status_code=400) -@app.errorhandler(QueryCanceled) -def querycanceled_error_handler(exception): +@app.errorhandler(OperationalError) +def operationalerror_exception_handler(exception): # Same as error_handler(exception, encode_data); but does not log or send to Sentry. # These errors are noisy, and are better logged on the caller's side after it - # retried a few times + # retried a few times. + # Additionally, we return 503 instead of 500, telling clients they should retry. + response = encode_data(serializers.exception_to_dict(exception)) + response.status_code = 503 + return response + + +@app.errorhandler(QueryCanceled) +def querycancelled_exception_handler(exception): + # Ditto, but 500 instead of 503, because this is usually caused by the query + # size instead of a transient failure response = encode_data(serializers.exception_to_dict(exception)) response.status_code = 500 return response @app.errorhandler(Exception) def default_error_handler(exception): return error_handler(exception, encode_data) @app.route("/") @timed def index(): return """ Software Heritage storage server

You have reached the Software Heritage storage server.
See its documentation and API for more information

""" @app.route("/stat/counters", methods=["GET"]) @timed def stat_counters(): return encode_data(get_storage().stat_counters()) @app.route("/stat/refresh", methods=["GET"]) @timed def refresh_stat_counters(): return encode_data(get_storage().refresh_stat_counters()) api_cfg = None def load_and_check_config(config_path: Optional[str]) -> Dict[str, Any]: """Check the minimal configuration is set to run the api or raise an error explanation. Args: config_path: Path to the configuration file to load Raises: Error if the setup is not as expected Returns: configuration as a dict """ if not config_path: raise EnvironmentError("Configuration file must be defined") if not os.path.exists(config_path): raise FileNotFoundError(f"Configuration file {config_path} does not exist") cfg = config.read(config_path) if "storage" not in cfg: raise KeyError("Missing 'storage' configuration") return cfg def make_app_from_configfile() -> StorageServerApp: """Run the WSGI app from the webserver, loading the configuration from a configuration file. SWH_CONFIG_FILENAME environment variable defines the configuration path to load. """ global api_cfg if not api_cfg: config_path = os.environ.get("SWH_CONFIG_FILENAME") api_cfg = load_and_check_config(config_path) app.config.update(api_cfg) handler = logging.StreamHandler() app.logger.addHandler(handler) return app if __name__ == "__main__": print("Deprecated. Use swh-storage") diff --git a/swh/storage/proxies/retry.py b/swh/storage/proxies/retry.py index 31dff2f2..99756ae2 100644 --- a/swh/storage/proxies/retry.py +++ b/swh/storage/proxies/retry.py @@ -1,82 +1,101 @@ # Copyright (C) 2019-2021 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information import logging import traceback -from tenacity import retry, stop_after_attempt, wait_random_exponential +from tenacity import RetryCallState, retry, stop_after_attempt, wait_random_exponential +from tenacity.wait import wait_base +from swh.core.api import TransientRemoteException from swh.storage import get_storage from swh.storage.exc import StorageArgumentException from swh.storage.interface import StorageInterface logger = logging.getLogger(__name__) -def should_retry_adding(retry_state) -> bool: +def should_retry_adding(retry_state: RetryCallState) -> bool: """Retry if the error/exception is (probably) not about a caller error""" attempt = retry_state.outcome + assert attempt if attempt.failed: error = attempt.exception() if isinstance(error, StorageArgumentException): # Exception is due to an invalid argument return False elif isinstance(error, KeyboardInterrupt): return False else: # Other exception module = getattr(error, "__module__", None) if module: error_name = error.__module__ + "." + error.__class__.__name__ else: error_name = error.__class__.__name__ logger.warning( "Retrying RPC call", exc_info=False, extra={ "swh_type": "storage_retry", "swh_exception_type": error_name, "swh_exception": traceback.format_exc(), }, ) return True else: # No exception return False +class wait_transient_exceptions(wait_base): + """Wait longer when servers return HTTP 503.""" + + def __init__(self, wait: float) -> None: + self.wait = wait + + def __call__(self, retry_state: RetryCallState) -> float: + attempt = retry_state.outcome + assert attempt + + if attempt.failed and isinstance(attempt.exception(), TransientRemoteException): + return self.wait + else: + return 0.0 + + swh_retry = retry( retry=should_retry_adding, - wait=wait_random_exponential(multiplier=1, max=10), + wait=wait_random_exponential(multiplier=1, max=10) + wait_transient_exceptions(10), stop=stop_after_attempt(3), reraise=True, ) def retry_function(storage, attribute_name): @swh_retry def newf(*args, **kwargs): return getattr(storage, attribute_name)(*args, **kwargs) return newf class RetryingProxyStorage: """Storage implementation which retries adding objects when it specifically fails (hash collision, integrity error). """ def __init__(self, storage): self.storage: StorageInterface = get_storage(**storage) for attribute_name in dir(StorageInterface): if attribute_name.startswith("_"): continue attribute = getattr(self.storage, attribute_name) if hasattr(attribute, "__call__"): setattr( self, attribute_name, retry_function(self.storage, attribute_name) ) diff --git a/swh/storage/tests/test_api_client.py b/swh/storage/tests/test_api_client.py index 5c8a6390..e3663ac7 100644 --- a/swh/storage/tests/test_api_client.py +++ b/swh/storage/tests/test_api_client.py @@ -1,98 +1,140 @@ -# Copyright (C) 2015-2020 The Software Heritage developers +# Copyright (C) 2015-2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information +import psycopg2.errors import pytest +from swh.core.api import RemoteException, TransientRemoteException import swh.storage from swh.storage import get_storage import swh.storage.api.server as server from swh.storage.tests.storage_tests import ( TestStorageGeneratedData as _TestStorageGeneratedData, ) from swh.storage.tests.storage_tests import TestStorage as _TestStorage # tests are executed using imported classes (TestStorage and # TestStorageGeneratedData) using overloaded swh_storage fixture # below @pytest.fixture def app_server(): server.storage = swh.storage.get_storage( cls="memory", journal_writer={"cls": "memory"} ) yield server @pytest.fixture def app(app_server): return app_server.app @pytest.fixture def swh_rpc_client_class(): def storage_factory(**kwargs): storage_config = { "cls": "remote", **kwargs, } return get_storage(**storage_config) return storage_factory @pytest.fixture def swh_storage(swh_rpc_client, app_server): # This version of the swh_storage fixture uses the swh_rpc_client fixture # to instantiate a RemoteStorage (see swh_rpc_client_class above) that # proxies, via the swh.core RPC mechanism, the local (in memory) storage # configured in the app_server fixture above. # # Also note that, for the sake of # making it easier to write tests, the in-memory journal writer of the # in-memory backend storage is attached to the RemoteStorage as its # journal_writer attribute. storage = swh_rpc_client journal_writer = getattr(storage, "journal_writer", None) storage.journal_writer = app_server.storage.journal_writer storage.objstorage = app_server.storage.objstorage yield storage storage.journal_writer = journal_writer class TestStorageApi(_TestStorage): @pytest.mark.skip( 'The "person" table of the pgsql is a legacy thing, and not ' "supported by the cassandra backend." ) def test_person_fullname_unicity(self): pass @pytest.mark.skip("content_update is not yet implemented for Cassandra") def test_content_update(self): pass @pytest.mark.skip("Not supported by Cassandra") def test_origin_count(self): pass + def test_exception(self, app_server, swh_storage, mocker): + """Checks the client re-raises unknown exceptions as a :exc:`RemoteException`""" + assert swh_storage.revision_get(["\x01" * 20]) == [None] + mocker.patch.object( + app_server.storage._cql_runner, + "revision_get", + side_effect=ValueError("crash"), + ) + with pytest.raises(RemoteException) as e: + swh_storage.revision_get(["\x01" * 20]) + assert not isinstance(e, TransientRemoteException) + + def test_operationalerror_exception(self, app_server, swh_storage, mocker): + """Checks the client re-raises as a :exc:`TransientRemoteException` + rather than the base :exc:`RemoteException`; so the retrying proxy + retries for longer.""" + assert swh_storage.revision_get(["\x01" * 20]) == [None] + mocker.patch.object( + app_server.storage._cql_runner, + "revision_get", + side_effect=psycopg2.errors.AdminShutdown("cluster is shutting down"), + ) + with pytest.raises(RemoteException) as excinfo: + swh_storage.revision_get(["\x01" * 20]) + assert isinstance(excinfo.value, TransientRemoteException) + + def test_querycancelled_exception(self, app_server, swh_storage, mocker): + """Checks the client re-raises as a :exc:`TransientRemoteException` + rather than the base :exc:`RemoteException`; so the retrying proxy + retries for longer.""" + assert swh_storage.revision_get(["\x01" * 20]) == [None] + mocker.patch.object( + app_server.storage._cql_runner, + "revision_get", + side_effect=psycopg2.errors.QueryCanceled("too big!"), + ) + with pytest.raises(RemoteException) as excinfo: + swh_storage.revision_get(["\x01" * 20]) + assert not isinstance(excinfo.value, TransientRemoteException) + class TestStorageApiGeneratedData(_TestStorageGeneratedData): @pytest.mark.skip("Not supported by Cassandra") def test_origin_count(self): pass @pytest.mark.skip("Not supported by Cassandra") def test_origin_count_with_visit_no_visits(self): pass @pytest.mark.skip("Not supported by Cassandra") def test_origin_count_with_visit_with_visits_and_snapshot(self): pass @pytest.mark.skip("Not supported by Cassandra") def test_origin_count_with_visit_with_visits_no_snapshot(self): pass diff --git a/swh/storage/tests/test_retry.py b/swh/storage/tests/test_retry.py index 75f7586e..dde53861 100644 --- a/swh/storage/tests/test_retry.py +++ b/swh/storage/tests/test_retry.py @@ -1,219 +1,267 @@ # Copyright (C) 2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information from unittest.mock import call import attr import psycopg2 import pytest +from swh.core.api import TransientRemoteException from swh.storage.exc import HashCollision, StorageArgumentException from swh.storage.utils import now @pytest.fixture def monkeypatch_sleep(monkeypatch, swh_storage): """In test context, we don't want to wait, make test faster""" from swh.storage.proxies.retry import RetryingProxyStorage for method_name, method in RetryingProxyStorage.__dict__.items(): if "_add" in method_name or "_update" in method_name: monkeypatch.setattr(method.retry, "sleep", lambda x: None) return monkeypatch @pytest.fixture def fake_hash_collision(sample_data): return HashCollision("sha1", "38762cf7f55934b34d179ae6a4c80cadccbb7f0a", []) @pytest.fixture def swh_storage_backend_config(): yield { "cls": "pipeline", "steps": [ {"cls": "retry"}, {"cls": "memory"}, ], } def test_retrying_proxy_storage_content_add(swh_storage, sample_data): """Standard content_add works as before""" sample_content = sample_data.content content = swh_storage.content_get_data(sample_content.sha1) assert content is None s = swh_storage.content_add([sample_content]) assert s == { "content:add": 1, "content:add:bytes": sample_content.length, } content = swh_storage.content_get_data(sample_content.sha1) assert content == sample_content.data def test_retrying_proxy_storage_content_add_with_retry( monkeypatch_sleep, swh_storage, sample_data, mocker, fake_hash_collision, ): """Multiple retries for hash collision and psycopg2 error but finally ok""" mock_memory = mocker.patch("swh.storage.in_memory.InMemoryStorage.content_add") mock_memory.side_effect = [ # first try goes ko fake_hash_collision, # second try goes ko psycopg2.IntegrityError("content already inserted"), # ok then! {"content:add": 1}, ] sample_content = sample_data.content + sleep = mocker.patch("time.sleep") content = swh_storage.content_get_data(sample_content.sha1) assert content is None s = swh_storage.content_add([sample_content]) assert s == {"content:add": 1} mock_memory.assert_has_calls( [ call([sample_content]), call([sample_content]), call([sample_content]), ] ) + assert len(sleep.mock_calls) == 2 + (_name, args1, _kwargs) = sleep.mock_calls[0] + (_name, args2, _kwargs) = sleep.mock_calls[1] + assert 0 < args1[0] < 1 + assert 0 < args2[0] < 2 + + +def test_retrying_proxy_storage_content_add_with_retry_of_transient( + monkeypatch_sleep, + swh_storage, + sample_data, + mocker, +): + """Multiple retries for hash collision and psycopg2 error but finally ok + after many attempts""" + mock_memory = mocker.patch("swh.storage.in_memory.InMemoryStorage.content_add") + mock_memory.side_effect = [ + TransientRemoteException("temporary failure"), + TransientRemoteException("temporary failure"), + # ok then! + {"content:add": 1}, + ] + + sample_content = sample_data.content + + content = swh_storage.content_get_data(sample_content.sha1) + assert content is None + + sleep = mocker.patch("time.sleep") + s = swh_storage.content_add([sample_content]) + assert s == {"content:add": 1} + + mock_memory.assert_has_calls( + [ + call([sample_content]), + call([sample_content]), + call([sample_content]), + ] + ) + + assert len(sleep.mock_calls) == 2 + (_name, args1, _kwargs) = sleep.mock_calls[0] + (_name, args2, _kwargs) = sleep.mock_calls[1] + assert 10 < args1[0] < 11 + assert 10 < args2[0] < 12 + def test_retrying_proxy_swh_storage_content_add_failure( swh_storage, sample_data, mocker ): """Unfiltered errors are raising without retry""" mock_memory = mocker.patch("swh.storage.in_memory.InMemoryStorage.content_add") mock_memory.side_effect = StorageArgumentException("Refuse to add content always!") sample_content = sample_data.content content = swh_storage.content_get_data(sample_content.sha1) assert content is None with pytest.raises(StorageArgumentException, match="Refuse to add"): swh_storage.content_add([sample_content]) assert mock_memory.call_count == 1 def test_retrying_proxy_storage_content_add_metadata(swh_storage, sample_data): """Standard content_add_metadata works as before""" sample_content = sample_data.content content = attr.evolve(sample_content, data=None) pk = content.sha1 content_metadata = swh_storage.content_get([pk]) assert content_metadata == [None] s = swh_storage.content_add_metadata([attr.evolve(content, ctime=now())]) assert s == { "content:add": 1, } content_metadata = swh_storage.content_get([pk]) assert len(content_metadata) == 1 assert content_metadata[0].sha1 == pk def test_retrying_proxy_storage_content_add_metadata_with_retry( monkeypatch_sleep, swh_storage, sample_data, mocker, fake_hash_collision ): """Multiple retries for hash collision and psycopg2 error but finally ok""" mock_memory = mocker.patch( "swh.storage.in_memory.InMemoryStorage.content_add_metadata" ) mock_memory.side_effect = [ # first try goes ko fake_hash_collision, # second try goes ko psycopg2.IntegrityError("content_metadata already inserted"), # ok then! {"content:add": 1}, ] sample_content = sample_data.content content = attr.evolve(sample_content, data=None) s = swh_storage.content_add_metadata([content]) assert s == {"content:add": 1} mock_memory.assert_has_calls( [ call([content]), call([content]), call([content]), ] ) def test_retrying_proxy_swh_storage_content_add_metadata_failure( swh_storage, sample_data, mocker ): """Unfiltered errors are raising without retry""" mock_memory = mocker.patch( "swh.storage.in_memory.InMemoryStorage.content_add_metadata" ) mock_memory.side_effect = StorageArgumentException( "Refuse to add content_metadata!" ) sample_content = sample_data.content content = attr.evolve(sample_content, data=None) with pytest.raises(StorageArgumentException, match="Refuse to add"): swh_storage.content_add_metadata([content]) assert mock_memory.call_count == 1 def test_retrying_proxy_swh_storage_keyboardinterrupt(swh_storage, sample_data, mocker): """Unfiltered errors are raising without retry""" mock_memory = mocker.patch("swh.storage.in_memory.InMemoryStorage.content_add") mock_memory.side_effect = KeyboardInterrupt() sample_content = sample_data.content content = swh_storage.content_get_data(sample_content.sha1) assert content is None with pytest.raises(KeyboardInterrupt): swh_storage.content_add([sample_content]) assert mock_memory.call_count == 1 def test_retrying_proxy_swh_storage_content_add_metadata_retry_failed( swh_storage, sample_data, mocker ): """When retrying fails every time, the last exception gets re-raised instead of a RetryError""" mock_memory = mocker.patch( "swh.storage.in_memory.InMemoryStorage.content_add_metadata" ) mock_memory.side_effect = [ ValueError(f"This will get raised eventually (attempt {i})") for i in range(1, 10) ] sample_content = sample_data.content content = attr.evolve(sample_content, data=None) with pytest.raises(ValueError, match="(attempt 3)"): swh_storage.content_add_metadata([content]) assert mock_memory.call_count == 3