"
objstorage:
cls: pathslicing
args:
root: /tmp/swh-storage/
slicing: 0:2/2:4/4:6
```
which means, this uses:
- a local storage instance whose db connection is to
`softwareheritage-dev` local instance,
- the objstorage uses a local objstorage instance whose:
- `root` path is /tmp/swh-storage,
- slicing scheme is `0:2/2:4/4:6`. This means that the identifier of
the content (sha1) which will be stored on disk at first level
with the first 2 hex characters, the second level with the next 2
hex characters and the third level with the next 2 hex
characters. And finally the complete hash file holding the raw
content. For example: 00062f8bd330715c4f819373653d97b3cd34394c
will be stored at 00/06/2f/00062f8bd330715c4f819373653d97b3cd34394c
Note that the `root` path should exist on disk before starting the server.
### Starting the storage server
If the python package has been properly installed (e.g. in a virtual env), you
should be able to use the command:
```
(swh) :~/swh-storage$ swh storage rpc-serve storage.yml
```
This runs a local swh-storage api at 5002 port.
```
(swh) :~/swh-storage$ curl http://127.0.0.1:5002
Software Heritage storage server
You have reached the
Software Heritage
storage server.
See its
documentation
and API for more information
```
### And then what?
In your upper layer
([loader-git](https://forge.softwareheritage.org/source/swh-loader-git/),
[loader-svn](https://forge.softwareheritage.org/source/swh-loader-svn/),
etc...), you can define a remote storage with this snippet of yaml
configuration.
```
storage:
cls: remote
args:
url: http://localhost:5002/
```
You could directly define a local storage with the following snippet:
```
storage:
cls: local
args:
db: service=swh-dev
objstorage:
cls: pathslicing
args:
root: /home/storage/swh-storage/
slicing: 0:2/2:4/4:6
```
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: testing
Provides-Extra: schemata
Provides-Extra: journal
diff --git a/swh.storage.egg-info/SOURCES.txt b/swh.storage.egg-info/SOURCES.txt
index 623fcdc1..6827b03e 100644
--- a/swh.storage.egg-info/SOURCES.txt
+++ b/swh.storage.egg-info/SOURCES.txt
@@ -1,266 +1,293 @@
+.gitignore
+.pre-commit-config.yaml
+AUTHORS
+CODE_OF_CONDUCT.md
+CONTRIBUTORS
+LICENSE
MANIFEST.in
Makefile
Makefile.local
README.md
+mypy.ini
pyproject.toml
pytest.ini
+requirements-swh-journal.txt
+requirements-swh.txt
+requirements-test.txt
+requirements.txt
setup.cfg
setup.py
tox.ini
version.txt
./requirements-swh-journal.txt
./requirements-swh.txt
./requirements-test.txt
./requirements.txt
bin/swh-storage-add-dir
+docs/.gitignore
+docs/Makefile
+docs/Makefile.local
+docs/archive-copies.rst
+docs/conf.py
+docs/extrinsic-metadata-specification.rst
+docs/index.rst
+docs/sql-storage.rst
+docs/_static/.placeholder
+docs/_templates/.placeholder
+docs/images/.gitignore
+docs/images/Makefile
+docs/images/swh-archive-copies.dia
sql/.gitignore
sql/Makefile
sql/TODO
sql/clusters.dot
sql/bin/db-upgrade
sql/bin/dot_add_content
+sql/doc/json
sql/doc/json/.gitignore
sql/doc/json/Makefile
sql/doc/json/entity.lister_metadata.schema.json
sql/doc/json/entity.metadata.schema.json
sql/doc/json/entity_history.lister_metadata.schema.json
sql/doc/json/entity_history.metadata.schema.json
sql/doc/json/fetch_history.result.schema.json
sql/doc/json/list_history.result.schema.json
sql/doc/json/listable_entity.list_params.schema.json
sql/doc/json/origin_visit.metadata.json
sql/doc/json/tool.tool_configuration.schema.json
sql/json/.gitignore
sql/json/Makefile
sql/json/entity.lister_metadata.schema.json
sql/json/entity.metadata.schema.json
sql/json/entity_history.lister_metadata.schema.json
sql/json/entity_history.metadata.schema.json
sql/json/fetch_history.result.schema.json
sql/json/list_history.result.schema.json
sql/json/listable_entity.list_params.schema.json
sql/json/origin_visit.metadata.json
sql/json/tool.tool_configuration.schema.json
sql/upgrades/015.sql
sql/upgrades/016.sql
sql/upgrades/017.sql
sql/upgrades/018.sql
sql/upgrades/019.sql
sql/upgrades/020.sql
sql/upgrades/021.sql
sql/upgrades/022.sql
sql/upgrades/023.sql
sql/upgrades/024.sql
sql/upgrades/025.sql
sql/upgrades/026.sql
sql/upgrades/027.sql
sql/upgrades/028.sql
sql/upgrades/029.sql
sql/upgrades/030.sql
sql/upgrades/032.sql
sql/upgrades/033.sql
sql/upgrades/034.sql
sql/upgrades/035.sql
sql/upgrades/036.sql
sql/upgrades/037.sql
sql/upgrades/038.sql
sql/upgrades/039.sql
sql/upgrades/040.sql
sql/upgrades/041.sql
sql/upgrades/042.sql
sql/upgrades/043.sql
sql/upgrades/044.sql
sql/upgrades/045.sql
sql/upgrades/046.sql
sql/upgrades/047.sql
sql/upgrades/048.sql
sql/upgrades/049.sql
sql/upgrades/050.sql
sql/upgrades/051.sql
sql/upgrades/052.sql
sql/upgrades/053.sql
sql/upgrades/054.sql
sql/upgrades/055.sql
sql/upgrades/056.sql
sql/upgrades/057.sql
sql/upgrades/058.sql
sql/upgrades/059.sql
sql/upgrades/060.sql
sql/upgrades/061.sql
sql/upgrades/062.sql
sql/upgrades/063.sql
sql/upgrades/064.sql
sql/upgrades/065.sql
sql/upgrades/066.sql
sql/upgrades/067.sql
sql/upgrades/068.sql
sql/upgrades/069.sql
sql/upgrades/070.sql
sql/upgrades/071.sql
sql/upgrades/072.sql
sql/upgrades/073.sql
sql/upgrades/074.sql
sql/upgrades/075.sql
sql/upgrades/076.sql
sql/upgrades/077.sql
sql/upgrades/078.sql
sql/upgrades/079.sql
sql/upgrades/080.sql
sql/upgrades/081.sql
sql/upgrades/082.sql
sql/upgrades/083.sql
sql/upgrades/084.sql
sql/upgrades/085.sql
sql/upgrades/086.sql
sql/upgrades/087.sql
sql/upgrades/088.sql
sql/upgrades/089.sql
sql/upgrades/090.sql
sql/upgrades/091.sql
sql/upgrades/092.sql
sql/upgrades/093.sql
sql/upgrades/094.sql
sql/upgrades/095.sql
sql/upgrades/096.sql
sql/upgrades/097.sql
sql/upgrades/098.sql
sql/upgrades/099.sql
sql/upgrades/100.sql
sql/upgrades/101.sql
sql/upgrades/102.sql
sql/upgrades/103.sql
sql/upgrades/104.sql
sql/upgrades/105.sql
sql/upgrades/106.sql
sql/upgrades/107.sql
sql/upgrades/108.sql
sql/upgrades/109.sql
sql/upgrades/110.sql
sql/upgrades/111.sql
sql/upgrades/112.sql
sql/upgrades/113.sql
sql/upgrades/114.sql
sql/upgrades/115.sql
sql/upgrades/116.sql
sql/upgrades/117.sql
sql/upgrades/118.sql
sql/upgrades/119.sql
sql/upgrades/120.sql
sql/upgrades/121.sql
sql/upgrades/122.sql
sql/upgrades/123.sql
sql/upgrades/124.sql
sql/upgrades/125.sql
sql/upgrades/126.sql
sql/upgrades/127.sql
sql/upgrades/128.sql
sql/upgrades/129.sql
sql/upgrades/130.sql
sql/upgrades/131.sql
sql/upgrades/132.sql
sql/upgrades/133.sql
sql/upgrades/134.sql
sql/upgrades/135.sql
sql/upgrades/136.sql
sql/upgrades/137.sql
sql/upgrades/138.sql
sql/upgrades/139.sql
sql/upgrades/140.sql
sql/upgrades/141.sql
sql/upgrades/142.sql
sql/upgrades/143.sql
sql/upgrades/144.sql
sql/upgrades/145.sql
sql/upgrades/146.sql
sql/upgrades/147.sql
sql/upgrades/148.sql
sql/upgrades/149.sql
sql/upgrades/150.sql
sql/upgrades/151.sql
sql/upgrades/152.sql
sql/upgrades/153.sql
sql/upgrades/154.sql
sql/upgrades/155.sql
sql/upgrades/156.sql
sql/upgrades/157.sql
sql/upgrades/158.sql
swh/__init__.py
swh.storage.egg-info/PKG-INFO
swh.storage.egg-info/SOURCES.txt
swh.storage.egg-info/dependency_links.txt
swh.storage.egg-info/entry_points.txt
swh.storage.egg-info/requires.txt
swh.storage.egg-info/top_level.txt
swh/storage/__init__.py
swh/storage/backfill.py
swh/storage/buffer.py
swh/storage/cli.py
swh/storage/common.py
swh/storage/converters.py
swh/storage/db.py
swh/storage/exc.py
swh/storage/extrinsic_metadata.py
swh/storage/filter.py
swh/storage/fixer.py
swh/storage/in_memory.py
swh/storage/interface.py
swh/storage/metrics.py
swh/storage/objstorage.py
swh/storage/py.typed
+swh/storage/pytest_plugin.py
swh/storage/replay.py
swh/storage/retry.py
swh/storage/storage.py
swh/storage/utils.py
swh/storage/validate.py
swh/storage/writer.py
swh/storage/algos/__init__.py
swh/storage/algos/diff.py
swh/storage/algos/dir_iterators.py
swh/storage/algos/origin.py
swh/storage/algos/revisions_walker.py
swh/storage/algos/snapshot.py
swh/storage/api/__init__.py
swh/storage/api/client.py
swh/storage/api/serializers.py
swh/storage/api/server.py
swh/storage/cassandra/__init__.py
swh/storage/cassandra/common.py
swh/storage/cassandra/converters.py
swh/storage/cassandra/cql.py
swh/storage/cassandra/schema.py
swh/storage/cassandra/storage.py
swh/storage/sql/10-swh-init.sql
swh/storage/sql/20-swh-enums.sql
swh/storage/sql/30-swh-schema.sql
swh/storage/sql/40-swh-func.sql
swh/storage/sql/60-swh-indexes.sql
swh/storage/tests/__init__.py
swh/storage/tests/conftest.py
swh/storage/tests/generate_data_test.py
swh/storage/tests/storage_data.py
swh/storage/tests/test_api_client.py
swh/storage/tests/test_backfill.py
swh/storage/tests/test_buffer.py
swh/storage/tests/test_cassandra.py
swh/storage/tests/test_cassandra_converters.py
swh/storage/tests/test_cli.py
swh/storage/tests/test_converters.py
swh/storage/tests/test_db.py
swh/storage/tests/test_exception.py
swh/storage/tests/test_filter.py
swh/storage/tests/test_in_memory.py
swh/storage/tests/test_init.py
swh/storage/tests/test_kafka_writer.py
swh/storage/tests/test_metrics.py
swh/storage/tests/test_replay.py
swh/storage/tests/test_retry.py
swh/storage/tests/test_revision_bw_compat.py
swh/storage/tests/test_server.py
swh/storage/tests/test_storage.py
swh/storage/tests/test_utils.py
swh/storage/tests/algos/__init__.py
swh/storage/tests/algos/test_diff.py
swh/storage/tests/algos/test_dir_iterator.py
swh/storage/tests/algos/test_origin.py
swh/storage/tests/algos/test_revisions_walker.py
-swh/storage/tests/algos/test_snapshot.py
\ No newline at end of file
+swh/storage/tests/algos/test_snapshot.py
+swh/storage/tests/data/storage.yml
\ No newline at end of file
diff --git a/swh.storage.egg-info/entry_points.txt b/swh.storage.egg-info/entry_points.txt
index a3379a55..c1dba848 100644
--- a/swh.storage.egg-info/entry_points.txt
+++ b/swh.storage.egg-info/entry_points.txt
@@ -1,6 +1,8 @@
[console_scripts]
swh-storage=swh.storage.cli:main
[swh.cli.subcommands]
storage=swh.storage.cli:storage
+ [pytest11]
+ pytest_swh_storage=swh.storage.pytest_plugin
\ No newline at end of file
diff --git a/swh/storage/tests/conftest.py b/swh/storage/pytest_plugin.py
similarity index 74%
copy from swh/storage/tests/conftest.py
copy to swh/storage/pytest_plugin.py
index 52d9b4f9..1b010923 100644
--- a/swh/storage/tests/conftest.py
+++ b/swh/storage/pytest_plugin.py
@@ -1,272 +1,208 @@
-# Copyright (C) 2019 The Software Heritage developers
+# Copyright (C) 2019-2020 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import glob
-import pytest
-import multiprocessing.util
-from typing import Union
from os import path, environ
-from hypothesis import settings
-from typing import Dict
+from typing import Dict, Union
-try:
- import pytest_cov.embed
-except ImportError:
- pytest_cov = None
+import pytest
+
+import swh.storage
from pytest_postgresql import factories
from pytest_postgresql.janitor import DatabaseJanitor, psycopg2, Version
-import swh.storage
-
from swh.core.utils import numfile_sortkey as sortkey
-from swh.model.tests.generate_testdata import gen_contents, gen_origins
-from swh.model.model import (
- Content,
- Directory,
- Origin,
- OriginVisit,
- Release,
- Revision,
- SkippedContent,
- Snapshot,
-)
-
-
-OBJECT_FACTORY = {
- "content": Content.from_dict,
- "directory": Directory.from_dict,
- "origin": Origin.from_dict,
- "origin_visit": OriginVisit.from_dict,
- "release": Release.from_dict,
- "revision": Revision.from_dict,
- "skipped_content": SkippedContent.from_dict,
- "snapshot": Snapshot.from_dict,
-}
+from swh.storage import get_storage
+from swh.storage.tests.storage_data import data
+
SQL_DIR = path.join(path.dirname(swh.storage.__file__), "sql")
environ["LC_ALL"] = "C.UTF-8"
DUMP_FILES = path.join(SQL_DIR, "*.sql")
-# define tests profile. Full documentation is at:
-# https://hypothesis.readthedocs.io/en/latest/settings.html#settings-profiles
-settings.register_profile("fast", max_examples=5, deadline=5000)
-settings.register_profile("slow", max_examples=20, deadline=5000)
-
-
-if pytest_cov is not None:
- # pytest_cov + multiprocessing can cause a segmentation fault when starting
- # the child process ; so we're
- # removing pytest-coverage's hook that runs when a child process starts.
- # This means code run in child processes won't be counted in the coverage
- # report, but this is not an issue because the only code that runs only in
- # child processes is the RPC server.
- for (key, value) in multiprocessing.util._afterfork_registry.items():
- if value is pytest_cov.embed.multiprocessing_start:
- del multiprocessing.util._afterfork_registry[key]
- break
- else:
- assert False, "missing pytest_cov.embed.multiprocessing_start?"
-
@pytest.fixture
def swh_storage_backend_config(postgresql_proc, swh_storage_postgresql):
yield {
"cls": "local",
"db": "postgresql://{user}@{host}:{port}/{dbname}".format(
host=postgresql_proc.host,
port=postgresql_proc.port,
user="postgres",
dbname="tests",
),
"objstorage": {"cls": "memory", "args": {}},
"journal_writer": {"cls": "memory",},
}
@pytest.fixture
def swh_storage(swh_storage_backend_config):
- return swh.storage.get_storage(cls="validate", storage=swh_storage_backend_config)
-
-
-@pytest.fixture
-def swh_contents(swh_storage):
- contents = gen_contents(n=20)
- swh_storage.content_add([c for c in contents if c["status"] != "absent"])
- swh_storage.skipped_content_add([c for c in contents if c["status"] == "absent"])
- return contents
-
-
-@pytest.fixture
-def swh_origins(swh_storage):
- origins = gen_origins(n=100)
- swh_storage.origin_add(origins)
- return origins
+ return get_storage(cls="validate", storage=swh_storage_backend_config)
# the postgres_fact factory fixture below is mostly a copy of the code
# from pytest-postgresql. We need a custom version here to be able to
# specify our version of the DBJanitor we use.
def postgresql_fact(process_fixture_name, db_name=None, dump_files=DUMP_FILES):
@pytest.fixture
def postgresql_factory(request):
"""
Fixture factory for PostgreSQL.
:param FixtureRequest request: fixture request object
:rtype: psycopg2.connection
:returns: postgresql client
"""
config = factories.get_config(request)
if not psycopg2:
raise ImportError("No module named psycopg2. Please install it.")
proc_fixture = request.getfixturevalue(process_fixture_name)
# _, config = try_import('psycopg2', request)
pg_host = proc_fixture.host
pg_port = proc_fixture.port
pg_user = proc_fixture.user
pg_options = proc_fixture.options
pg_db = db_name or config["dbname"]
with SwhDatabaseJanitor(
pg_user,
pg_host,
pg_port,
pg_db,
proc_fixture.version,
dump_files=dump_files,
):
connection = psycopg2.connect(
dbname=pg_db,
user=pg_user,
host=pg_host,
port=pg_port,
options=pg_options,
)
yield connection
connection.close()
return postgresql_factory
swh_storage_postgresql = postgresql_fact("postgresql_proc")
# This version of the DatabaseJanitor implement a different setup/teardown
# behavior than than the stock one: instead of dropping, creating and
# initializing the database for each test, it create and initialize the db only
# once, then it truncate the tables. This is needed to have acceptable test
# performances.
class SwhDatabaseJanitor(DatabaseJanitor):
def __init__(
self,
user: str,
host: str,
port: str,
db_name: str,
version: Union[str, float, Version],
dump_files: str = DUMP_FILES,
) -> None:
super().__init__(user, host, port, db_name, version)
self.dump_files = sorted(glob.glob(dump_files), key=sortkey)
def db_setup(self):
with psycopg2.connect(
dbname=self.db_name, user=self.user, host=self.host, port=self.port,
) as cnx:
with cnx.cursor() as cur:
for fname in self.dump_files:
with open(fname) as fobj:
sql = fobj.read().replace("concurrently", "").strip()
if sql:
cur.execute(sql)
cnx.commit()
def db_reset(self):
with psycopg2.connect(
dbname=self.db_name, user=self.user, host=self.host, port=self.port,
) as cnx:
with cnx.cursor() as cur:
cur.execute(
"SELECT table_name FROM information_schema.tables "
"WHERE table_schema = %s",
("public",),
)
tables = set(table for (table,) in cur.fetchall())
for table in tables:
cur.execute("truncate table %s cascade" % table)
cur.execute(
"SELECT sequence_name FROM information_schema.sequences "
"WHERE sequence_schema = %s",
("public",),
)
seqs = set(seq for (seq,) in cur.fetchall())
for seq in seqs:
cur.execute("ALTER SEQUENCE %s RESTART;" % seq)
cnx.commit()
def init(self):
with self.cursor() as cur:
cur.execute(
"SELECT COUNT(1) FROM pg_database WHERE datname=%s;", (self.db_name,)
)
db_exists = cur.fetchone()[0] == 1
if db_exists:
cur.execute(
"UPDATE pg_database SET datallowconn=true " "WHERE datname = %s;",
(self.db_name,),
)
if db_exists:
self.db_reset()
else:
with self.cursor() as cur:
cur.execute('CREATE DATABASE "{}";'.format(self.db_name))
self.db_setup()
def drop(self):
pid_column = "pid"
with self.cursor() as cur:
cur.execute(
"UPDATE pg_database SET datallowconn=false " "WHERE datname = %s;",
(self.db_name,),
)
cur.execute(
"SELECT pg_terminate_backend(pg_stat_activity.{})"
"FROM pg_stat_activity "
"WHERE pg_stat_activity.datname = %s;".format(pid_column),
(self.db_name,),
)
@pytest.fixture
def sample_data() -> Dict:
"""Pre-defined sample storage object data to manipulate
Returns:
Dict of data (keys: content, directory, revision, release, person,
origin)
"""
- from .storage_data import data
-
return {
"content": [data.cont, data.cont2],
"content_metadata": [data.cont3],
"skipped_content": [data.skipped_cont, data.skipped_cont2],
"person": [data.person],
"directory": [data.dir2, data.dir],
"revision": [data.revision, data.revision2, data.revision3],
"release": [data.release, data.release2, data.release3],
"snapshot": [data.snapshot],
"origin": [data.origin, data.origin2],
"fetcher": [data.metadata_fetcher],
"authority": [data.metadata_authority],
"origin_metadata": [data.origin_metadata, data.origin_metadata2],
}
diff --git a/swh/storage/tests/conftest.py b/swh/storage/tests/conftest.py
index 52d9b4f9..7598d9a1 100644
--- a/swh/storage/tests/conftest.py
+++ b/swh/storage/tests/conftest.py
@@ -1,272 +1,75 @@
-# Copyright (C) 2019 The Software Heritage developers
+# Copyright (C) 2019-2020 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
-import glob
import pytest
import multiprocessing.util
-from typing import Union
-from os import path, environ
from hypothesis import settings
-from typing import Dict
try:
import pytest_cov.embed
except ImportError:
pytest_cov = None
-from pytest_postgresql import factories
-from pytest_postgresql.janitor import DatabaseJanitor, psycopg2, Version
-
-import swh.storage
-
-from swh.core.utils import numfile_sortkey as sortkey
from swh.model.tests.generate_testdata import gen_contents, gen_origins
from swh.model.model import (
Content,
Directory,
Origin,
OriginVisit,
Release,
Revision,
SkippedContent,
Snapshot,
)
+from swh.storage.pytest_plugin import * # noqa # for retro compatibility
OBJECT_FACTORY = {
"content": Content.from_dict,
"directory": Directory.from_dict,
"origin": Origin.from_dict,
"origin_visit": OriginVisit.from_dict,
"release": Release.from_dict,
"revision": Revision.from_dict,
"skipped_content": SkippedContent.from_dict,
"snapshot": Snapshot.from_dict,
}
-SQL_DIR = path.join(path.dirname(swh.storage.__file__), "sql")
-
-environ["LC_ALL"] = "C.UTF-8"
-
-DUMP_FILES = path.join(SQL_DIR, "*.sql")
# define tests profile. Full documentation is at:
# https://hypothesis.readthedocs.io/en/latest/settings.html#settings-profiles
settings.register_profile("fast", max_examples=5, deadline=5000)
settings.register_profile("slow", max_examples=20, deadline=5000)
if pytest_cov is not None:
# pytest_cov + multiprocessing can cause a segmentation fault when starting
# the child process ; so we're
# removing pytest-coverage's hook that runs when a child process starts.
# This means code run in child processes won't be counted in the coverage
# report, but this is not an issue because the only code that runs only in
# child processes is the RPC server.
for (key, value) in multiprocessing.util._afterfork_registry.items():
if value is pytest_cov.embed.multiprocessing_start:
del multiprocessing.util._afterfork_registry[key]
break
else:
assert False, "missing pytest_cov.embed.multiprocessing_start?"
-@pytest.fixture
-def swh_storage_backend_config(postgresql_proc, swh_storage_postgresql):
- yield {
- "cls": "local",
- "db": "postgresql://{user}@{host}:{port}/{dbname}".format(
- host=postgresql_proc.host,
- port=postgresql_proc.port,
- user="postgres",
- dbname="tests",
- ),
- "objstorage": {"cls": "memory", "args": {}},
- "journal_writer": {"cls": "memory",},
- }
-
-
-@pytest.fixture
-def swh_storage(swh_storage_backend_config):
- return swh.storage.get_storage(cls="validate", storage=swh_storage_backend_config)
-
-
@pytest.fixture
def swh_contents(swh_storage):
contents = gen_contents(n=20)
swh_storage.content_add([c for c in contents if c["status"] != "absent"])
swh_storage.skipped_content_add([c for c in contents if c["status"] == "absent"])
return contents
@pytest.fixture
def swh_origins(swh_storage):
origins = gen_origins(n=100)
swh_storage.origin_add(origins)
return origins
-
-
-# the postgres_fact factory fixture below is mostly a copy of the code
-# from pytest-postgresql. We need a custom version here to be able to
-# specify our version of the DBJanitor we use.
-def postgresql_fact(process_fixture_name, db_name=None, dump_files=DUMP_FILES):
- @pytest.fixture
- def postgresql_factory(request):
- """
- Fixture factory for PostgreSQL.
-
- :param FixtureRequest request: fixture request object
- :rtype: psycopg2.connection
- :returns: postgresql client
- """
- config = factories.get_config(request)
- if not psycopg2:
- raise ImportError("No module named psycopg2. Please install it.")
- proc_fixture = request.getfixturevalue(process_fixture_name)
-
- # _, config = try_import('psycopg2', request)
- pg_host = proc_fixture.host
- pg_port = proc_fixture.port
- pg_user = proc_fixture.user
- pg_options = proc_fixture.options
- pg_db = db_name or config["dbname"]
- with SwhDatabaseJanitor(
- pg_user,
- pg_host,
- pg_port,
- pg_db,
- proc_fixture.version,
- dump_files=dump_files,
- ):
- connection = psycopg2.connect(
- dbname=pg_db,
- user=pg_user,
- host=pg_host,
- port=pg_port,
- options=pg_options,
- )
- yield connection
- connection.close()
-
- return postgresql_factory
-
-
-swh_storage_postgresql = postgresql_fact("postgresql_proc")
-
-
-# This version of the DatabaseJanitor implement a different setup/teardown
-# behavior than than the stock one: instead of dropping, creating and
-# initializing the database for each test, it create and initialize the db only
-# once, then it truncate the tables. This is needed to have acceptable test
-# performances.
-class SwhDatabaseJanitor(DatabaseJanitor):
- def __init__(
- self,
- user: str,
- host: str,
- port: str,
- db_name: str,
- version: Union[str, float, Version],
- dump_files: str = DUMP_FILES,
- ) -> None:
- super().__init__(user, host, port, db_name, version)
- self.dump_files = sorted(glob.glob(dump_files), key=sortkey)
-
- def db_setup(self):
- with psycopg2.connect(
- dbname=self.db_name, user=self.user, host=self.host, port=self.port,
- ) as cnx:
- with cnx.cursor() as cur:
- for fname in self.dump_files:
- with open(fname) as fobj:
- sql = fobj.read().replace("concurrently", "").strip()
- if sql:
- cur.execute(sql)
- cnx.commit()
-
- def db_reset(self):
- with psycopg2.connect(
- dbname=self.db_name, user=self.user, host=self.host, port=self.port,
- ) as cnx:
- with cnx.cursor() as cur:
- cur.execute(
- "SELECT table_name FROM information_schema.tables "
- "WHERE table_schema = %s",
- ("public",),
- )
- tables = set(table for (table,) in cur.fetchall())
- for table in tables:
- cur.execute("truncate table %s cascade" % table)
-
- cur.execute(
- "SELECT sequence_name FROM information_schema.sequences "
- "WHERE sequence_schema = %s",
- ("public",),
- )
- seqs = set(seq for (seq,) in cur.fetchall())
- for seq in seqs:
- cur.execute("ALTER SEQUENCE %s RESTART;" % seq)
- cnx.commit()
-
- def init(self):
- with self.cursor() as cur:
- cur.execute(
- "SELECT COUNT(1) FROM pg_database WHERE datname=%s;", (self.db_name,)
- )
- db_exists = cur.fetchone()[0] == 1
- if db_exists:
- cur.execute(
- "UPDATE pg_database SET datallowconn=true " "WHERE datname = %s;",
- (self.db_name,),
- )
-
- if db_exists:
- self.db_reset()
- else:
- with self.cursor() as cur:
- cur.execute('CREATE DATABASE "{}";'.format(self.db_name))
- self.db_setup()
-
- def drop(self):
- pid_column = "pid"
- with self.cursor() as cur:
- cur.execute(
- "UPDATE pg_database SET datallowconn=false " "WHERE datname = %s;",
- (self.db_name,),
- )
- cur.execute(
- "SELECT pg_terminate_backend(pg_stat_activity.{})"
- "FROM pg_stat_activity "
- "WHERE pg_stat_activity.datname = %s;".format(pid_column),
- (self.db_name,),
- )
-
-
-@pytest.fixture
-def sample_data() -> Dict:
- """Pre-defined sample storage object data to manipulate
-
- Returns:
- Dict of data (keys: content, directory, revision, release, person,
- origin)
-
- """
- from .storage_data import data
-
- return {
- "content": [data.cont, data.cont2],
- "content_metadata": [data.cont3],
- "skipped_content": [data.skipped_cont, data.skipped_cont2],
- "person": [data.person],
- "directory": [data.dir2, data.dir],
- "revision": [data.revision, data.revision2, data.revision3],
- "release": [data.release, data.release2, data.release3],
- "snapshot": [data.snapshot],
- "origin": [data.origin, data.origin2],
- "fetcher": [data.metadata_fetcher],
- "authority": [data.metadata_authority],
- "origin_metadata": [data.origin_metadata, data.origin_metadata2],
- }
diff --git a/swh/storage/tests/data/storage.yml b/swh/storage/tests/data/storage.yml
new file mode 100644
index 00000000..97f9f67b
--- /dev/null
+++ b/swh/storage/tests/data/storage.yml
@@ -0,0 +1,13 @@
+storage:
+ cls: local
+ args:
+ db: dbname=%s
+
+ objstorage:
+ cls: pathslicing
+ args:
+ root: TMPDIR
+ slicing: "0:1/1:5"
+
+ journal_writer:
+ cls: inmemory