Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F9124050
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
45 KB
Subscribers
None
View Options
diff --git a/PKG-INFO b/PKG-INFO
index ae62cd1..f9ce448 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,73 +1,73 @@
Metadata-Version: 2.1
Name: swh.objstorage
-Version: 0.2.1
+Version: 0.2.2
Summary: Software Heritage Object Storage
Home-page: https://forge.softwareheritage.org/diffusion/DOBJS
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
License: UNKNOWN
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-objstorage
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-objstorage/
Description: swh-objstorage
==============
Content-addressable object storage for the Software Heritage project.
Quick start
-----------
The easiest way to try the swh-objstorage object storage is to install it in a
virtualenv. Here, we will be using
[[https://virtualenvwrapper.readthedocs.io|virtualenvwrapper]]_ but any virtual
env tool should work the same.
In the example below we will create a new objstorage using the
[[https://docs.softwareheritage.org/devel/apidoc/swh.objstorage.html#module-swh.objstorage.objstorage_pathslicing|pathslicer]]
backend.
```
~/swh$ mkvirtualenv -p /usr/bin/python3 -i swh.objstorage swh-objstorage
[...]
(swh-objstorage) ~/swh$ cat >local.yml <<EOF
objstorage:
cls: pathslicing
args:
root: /tmp/objstorage
slicing: 0:2/2:4/4:6
EOF
(swh-objstorage) ~/swh$ mkdir /tmp/objstorage
(swh-objstorage) ~/swh$ swh-objstorage -C local.yml serve -p 15003
INFO:swh.core.config:Loading config file local.yml
======== Running on http://0.0.0.0:15003 ========
(Press CTRL+C to quit)
```
Now we have an API listening on http://0.0.0.0:15003 we can use to store and
retrieve objects from. I an other terminal:
```
~/swh$ workon swh-objstorage
(swh-objstorage) ~/swh$ cat >remote.yml <<EOF
objstorage:
cls: remote
args:
url: http://127.0.0.1:15003
EOF
(swh-objstorage) ~/swh$ swh-objstorage -C remote.yml import .
INFO:swh.core.config:Loading config file remote.yml
Imported 1369 files for a volume of 722837 bytes in 2 seconds
```
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: testing
diff --git a/swh.objstorage.egg-info/PKG-INFO b/swh.objstorage.egg-info/PKG-INFO
index ae62cd1..f9ce448 100644
--- a/swh.objstorage.egg-info/PKG-INFO
+++ b/swh.objstorage.egg-info/PKG-INFO
@@ -1,73 +1,73 @@
Metadata-Version: 2.1
Name: swh.objstorage
-Version: 0.2.1
+Version: 0.2.2
Summary: Software Heritage Object Storage
Home-page: https://forge.softwareheritage.org/diffusion/DOBJS
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
License: UNKNOWN
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-objstorage
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-objstorage/
Description: swh-objstorage
==============
Content-addressable object storage for the Software Heritage project.
Quick start
-----------
The easiest way to try the swh-objstorage object storage is to install it in a
virtualenv. Here, we will be using
[[https://virtualenvwrapper.readthedocs.io|virtualenvwrapper]]_ but any virtual
env tool should work the same.
In the example below we will create a new objstorage using the
[[https://docs.softwareheritage.org/devel/apidoc/swh.objstorage.html#module-swh.objstorage.objstorage_pathslicing|pathslicer]]
backend.
```
~/swh$ mkvirtualenv -p /usr/bin/python3 -i swh.objstorage swh-objstorage
[...]
(swh-objstorage) ~/swh$ cat >local.yml <<EOF
objstorage:
cls: pathslicing
args:
root: /tmp/objstorage
slicing: 0:2/2:4/4:6
EOF
(swh-objstorage) ~/swh$ mkdir /tmp/objstorage
(swh-objstorage) ~/swh$ swh-objstorage -C local.yml serve -p 15003
INFO:swh.core.config:Loading config file local.yml
======== Running on http://0.0.0.0:15003 ========
(Press CTRL+C to quit)
```
Now we have an API listening on http://0.0.0.0:15003 we can use to store and
retrieve objects from. I an other terminal:
```
~/swh$ workon swh-objstorage
(swh-objstorage) ~/swh$ cat >remote.yml <<EOF
objstorage:
cls: remote
args:
url: http://127.0.0.1:15003
EOF
(swh-objstorage) ~/swh$ swh-objstorage -C remote.yml import .
INFO:swh.core.config:Loading config file remote.yml
Imported 1369 files for a volume of 722837 bytes in 2 seconds
```
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: testing
diff --git a/swh/objstorage/api/server.py b/swh/objstorage/api/server.py
index aecb036..a906c27 100644
--- a/swh/objstorage/api/server.py
+++ b/swh/objstorage/api/server.py
@@ -1,278 +1,271 @@
# Copyright (C) 2015-2020 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import json
import os
import aiohttp.web
from swh.core.api.asynchronous import RPCServerApp, decode_request
from swh.core.api.asynchronous import encode_data_server as encode_data
from swh.core.api.serializers import SWHJSONDecoder, msgpack_loads
from swh.core.config import read as config_read
from swh.core.statsd import statsd
from swh.model import hashutil
from swh.objstorage.exc import Error, ObjNotFoundError
from swh.objstorage.factory import get_objstorage
from swh.objstorage.objstorage import DEFAULT_LIMIT
def timed(f):
async def w(*a, **kw):
with statsd.timed(
"swh_objstorage_request_duration_seconds", tags={"endpoint": f.__name__}
):
return await f(*a, **kw)
return w
@timed
async def index(request):
return aiohttp.web.Response(body="SWH Objstorage API server")
@timed
async def check_config(request):
req = await decode_request(request)
return encode_data(request.app["objstorage"].check_config(**req))
@timed
async def contains(request):
req = await decode_request(request)
return encode_data(request.app["objstorage"].__contains__(**req))
@timed
async def add_bytes(request):
req = await decode_request(request)
statsd.increment(
"swh_objstorage_in_bytes_total",
len(req["content"]),
tags={"endpoint": "add_bytes"},
)
return encode_data(request.app["objstorage"].add(**req))
@timed
async def add_batch(request):
req = await decode_request(request)
return encode_data(request.app["objstorage"].add_batch(**req))
@timed
async def get_bytes(request):
req = await decode_request(request)
ret = request.app["objstorage"].get(**req)
statsd.increment(
"swh_objstorage_out_bytes_total", len(ret), tags={"endpoint": "get_bytes"}
)
return encode_data(ret)
@timed
async def get_batch(request):
req = await decode_request(request)
return encode_data(request.app["objstorage"].get_batch(**req))
@timed
async def check(request):
req = await decode_request(request)
return encode_data(request.app["objstorage"].check(**req))
@timed
async def delete(request):
req = await decode_request(request)
return encode_data(request.app["objstorage"].delete(**req))
# Management methods
@timed
async def get_random_contents(request):
req = await decode_request(request)
return encode_data(request.app["objstorage"].get_random(**req))
# Streaming methods
@timed
async def add_stream(request):
hex_id = request.match_info["hex_id"]
obj_id = hashutil.hash_to_bytes(hex_id)
check_pres = request.query.get("check_presence", "").lower() == "true"
objstorage = request.app["objstorage"]
if check_pres and obj_id in objstorage:
return encode_data(obj_id)
# XXX this really should go in a decode_stream_request coroutine in
# swh.core, but since py35 does not support async generators, it cannot
# easily be made for now
content_type = request.headers.get("Content-Type")
if content_type == "application/x-msgpack":
decode = msgpack_loads
elif content_type == "application/json":
decode = lambda x: json.loads(x, cls=SWHJSONDecoder) # noqa
else:
raise ValueError("Wrong content type `%s` for API request" % content_type)
buffer = b""
with objstorage.chunk_writer(obj_id) as write:
while not request.content.at_eof():
data, eot = await request.content.readchunk()
buffer += data
if eot:
write(decode(buffer))
buffer = b""
return encode_data(obj_id)
@timed
async def get_stream(request):
hex_id = request.match_info["hex_id"]
obj_id = hashutil.hash_to_bytes(hex_id)
response = aiohttp.web.StreamResponse()
await response.prepare(request)
for chunk in request.app["objstorage"].get_stream(obj_id, 2 << 20):
await response.write(chunk)
await response.write_eof()
return response
@timed
async def list_content(request):
last_obj_id = request.query.get("last_obj_id")
if last_obj_id:
last_obj_id = bytes.fromhex(last_obj_id)
limit = int(request.query.get("limit", DEFAULT_LIMIT))
response = aiohttp.web.StreamResponse()
response.enable_chunked_encoding()
await response.prepare(request)
for obj_id in request.app["objstorage"].list_content(last_obj_id, limit=limit):
await response.write(obj_id)
await response.write_eof()
return response
def make_app(config):
"""Initialize the remote api application.
"""
client_max_size = config.get("client_max_size", 1024 * 1024 * 1024)
app = RPCServerApp(client_max_size=client_max_size)
app.client_exception_classes = (ObjNotFoundError, Error)
# retro compatibility configuration settings
app["config"] = config
- _cfg = config["objstorage"]
- app["objstorage"] = get_objstorage(_cfg["cls"], _cfg["args"])
+ app["objstorage"] = get_objstorage(**config["objstorage"])
app.router.add_route("GET", "/", index)
app.router.add_route("POST", "/check_config", check_config)
app.router.add_route("POST", "/content/contains", contains)
app.router.add_route("POST", "/content/add", add_bytes)
app.router.add_route("POST", "/content/add/batch", add_batch)
app.router.add_route("POST", "/content/get", get_bytes)
app.router.add_route("POST", "/content/get/batch", get_batch)
app.router.add_route("POST", "/content/get/random", get_random_contents)
app.router.add_route("POST", "/content/check", check)
app.router.add_route("POST", "/content/delete", delete)
app.router.add_route("GET", "/content", list_content)
app.router.add_route("POST", "/content/add_stream/{hex_id}", add_stream)
app.router.add_route("GET", "/content/get_stream/{hex_id}", get_stream)
return app
def load_and_check_config(config_file):
"""Check the minimal configuration is set to run the api or raise an
error explanation.
Args:
config_file (str): Path to the configuration file to load
Raises:
Error if the setup is not as expected
Returns:
configuration as a dict
"""
if not config_file:
raise EnvironmentError("Configuration file must be defined")
if not os.path.exists(config_file):
raise FileNotFoundError("Configuration file %s does not exist" % (config_file,))
cfg = config_read(config_file)
return validate_config(cfg)
def validate_config(cfg):
"""Check the minimal configuration is set to run the api or raise an
explanatory error.
Args:
cfg (dict): Loaded configuration.
Raises:
Error if the setup is not as expected
Returns:
configuration as a dict
"""
if "objstorage" not in cfg:
raise KeyError("Invalid configuration; missing objstorage config entry")
missing_keys = []
vcfg = cfg["objstorage"]
- for key in ("cls", "args"):
- v = vcfg.get(key)
- if v is None:
- missing_keys.append(key)
-
- if missing_keys:
- raise KeyError(
- "Invalid configuration; missing %s config entry"
- % (", ".join(missing_keys),)
- )
-
- cls = vcfg.get("cls")
+ if "cls" not in vcfg:
+ raise KeyError("Invalid configuration; missing cls config entry")
+
+ cls = vcfg["cls"]
if cls == "pathslicing":
- args = vcfg["args"]
+ # Backwards-compatibility: either get the deprecated `args` from the
+ # objstorage config, or use the full config itself to check for keys
+ args = vcfg.get("args", vcfg)
for key in ("root", "slicing"):
v = args.get(key)
if v is None:
missing_keys.append(key)
if missing_keys:
raise KeyError(
- "Invalid configuration; missing args.%s config entry"
+ "Invalid configuration; missing %s config entry"
% (", ".join(missing_keys),)
)
return cfg
def make_app_from_configfile():
"""Load configuration and then build application to run
"""
config_file = os.environ.get("SWH_CONFIG_FILENAME")
config = load_and_check_config(config_file)
return make_app(config=config)
if __name__ == "__main__":
print("Deprecated. Use swh-objstorage")
diff --git a/swh/objstorage/cli.py b/swh/objstorage/cli.py
index 3180b88..5eff218 100644
--- a/swh/objstorage/cli.py
+++ b/swh/objstorage/cli.py
@@ -1,130 +1,131 @@
# Copyright (C) 2015-2020 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import logging
# WARNING: do not import unnecessary things here to keep cli startup time under
# control
import os
import time
import click
-from swh.core.cli import CONTEXT_SETTINGS, swh as swh_cli_group
+from swh.core.cli import CONTEXT_SETTINGS
+from swh.core.cli import swh as swh_cli_group
@swh_cli_group.group(name="objstorage", context_settings=CONTEXT_SETTINGS)
@click.option(
"--config-file",
"-C",
default=None,
type=click.Path(exists=True, dir_okay=False,),
help="Configuration file.",
)
@click.pass_context
def objstorage_cli_group(ctx, config_file):
"""Software Heritage Objstorage tools.
"""
from swh.core import config
if not config_file:
config_file = os.environ.get("SWH_CONFIG_FILENAME")
if config_file:
if not os.path.exists(config_file):
raise ValueError("%s does not exist" % config_file)
conf = config.read(config_file)
else:
conf = {}
ctx.ensure_object(dict)
ctx.obj["config"] = conf
# for BW compat
cli = objstorage_cli_group
@objstorage_cli_group.command("rpc-serve")
@click.option(
"--host",
default="0.0.0.0",
metavar="IP",
show_default=True,
help="Host ip address to bind the server on",
)
@click.option(
"--port",
"-p",
default=5003,
type=click.INT,
metavar="PORT",
show_default=True,
help="Binding port of the server",
)
@click.pass_context
def serve(ctx, host, port):
"""Run a standalone objstorage server.
This is not meant to be run on production systems.
"""
import aiohttp.web
from swh.objstorage.api.server import make_app, validate_config
app = make_app(validate_config(ctx.obj["config"]))
if ctx.obj["log_level"] == "DEBUG":
app.update(debug=True)
aiohttp.web.run_app(app, host=host, port=int(port))
@objstorage_cli_group.command("import")
@click.argument("directory", required=True, nargs=-1)
@click.pass_context
def import_directories(ctx, directory):
"""Import a local directory in an existing objstorage.
"""
from swh.objstorage.factory import get_objstorage
objstorage = get_objstorage(**ctx.obj["config"]["objstorage"])
nobj = 0
volume = 0
t0 = time.time()
for dirname in directory:
for root, _dirs, files in os.walk(dirname):
for name in files:
path = os.path.join(root, name)
with open(path, "rb") as f:
objstorage.add(f.read())
volume += os.stat(path).st_size
nobj += 1
click.echo(
"Imported %d files for a volume of %s bytes in %d seconds"
% (nobj, volume, time.time() - t0)
)
@objstorage_cli_group.command("fsck")
@click.pass_context
def fsck(ctx):
"""Check the objstorage is not corrupted.
"""
from swh.objstorage.factory import get_objstorage
objstorage = get_objstorage(**ctx.obj["config"]["objstorage"])
for obj_id in objstorage:
try:
objstorage.check(obj_id)
except objstorage.Error as err:
logging.error(err)
def main():
return cli(auto_envvar_prefix="SWH_OBJSTORAGE")
if __name__ == "__main__":
main()
diff --git a/swh/objstorage/factory.py b/swh/objstorage/factory.py
index a095880..0ffe652 100644
--- a/swh/objstorage/factory.py
+++ b/swh/objstorage/factory.py
@@ -1,107 +1,118 @@
# Copyright (C) 2016-2020 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
from typing import Callable, Dict, Union
+import warnings
from swh.objstorage.api.client import RemoteObjStorage
from swh.objstorage.backends.generator import RandomGeneratorObjStorage
from swh.objstorage.backends.in_memory import InMemoryObjStorage
from swh.objstorage.backends.pathslicing import PathSlicingObjStorage
from swh.objstorage.backends.seaweed import WeedObjStorage
from swh.objstorage.multiplexer import MultiplexerObjStorage, StripingObjStorage
from swh.objstorage.multiplexer.filter import add_filters
from swh.objstorage.objstorage import ID_HASH_LENGTH, ObjStorage # noqa
__all__ = ["get_objstorage", "ObjStorage"]
_STORAGE_CLASSES: Dict[str, Union[type, Callable[..., type]]] = {
"pathslicing": PathSlicingObjStorage,
"remote": RemoteObjStorage,
"memory": InMemoryObjStorage,
"weed": WeedObjStorage,
"random": RandomGeneratorObjStorage,
}
_STORAGE_CLASSES_MISSING = {}
try:
from swh.objstorage.backends.azure import (
AzureCloudObjStorage,
PrefixedAzureCloudObjStorage,
)
_STORAGE_CLASSES["azure"] = AzureCloudObjStorage
_STORAGE_CLASSES["azure-prefixed"] = PrefixedAzureCloudObjStorage
except ImportError as e:
_STORAGE_CLASSES_MISSING["azure"] = e.args[0]
_STORAGE_CLASSES_MISSING["azure-prefixed"] = e.args[0]
try:
from swh.objstorage.backends.rados import RADOSObjStorage
_STORAGE_CLASSES["rados"] = RADOSObjStorage
except ImportError as e:
_STORAGE_CLASSES_MISSING["rados"] = e.args[0]
try:
from swh.objstorage.backends.libcloud import (
AwsCloudObjStorage,
OpenStackCloudObjStorage,
)
_STORAGE_CLASSES["s3"] = AwsCloudObjStorage
_STORAGE_CLASSES["swift"] = OpenStackCloudObjStorage
except ImportError as e:
_STORAGE_CLASSES_MISSING["s3"] = e.args[0]
_STORAGE_CLASSES_MISSING["swift"] = e.args[0]
-def get_objstorage(cls, args):
+def get_objstorage(cls: str, args=None, **kwargs):
""" Create an ObjStorage using the given implementation class.
Args:
- cls (str): objstorage class unique key contained in the
+ cls: objstorage class unique key contained in the
_STORAGE_CLASSES dict.
- args (dict): arguments for the required class of objstorage
- that must match exactly the one in the `__init__` method of the
- class.
+ kwargs: arguments for the required class of objstorage
+ that must match exactly the one in the `__init__` method of the
+ class.
Returns:
subclass of ObjStorage that match the given `storage_class` argument.
Raises:
ValueError: if the given storage class is not a valid objstorage
key.
"""
if cls in _STORAGE_CLASSES:
- return _STORAGE_CLASSES[cls](**args)
+ if args is not None:
+ warnings.warn(
+ 'Explicit "args" key is deprecated for objstorage initialization, '
+ "use class arguments keys directly instead.",
+ DeprecationWarning,
+ )
+ # TODO: when removing this, drop the "args" backwards compatibility
+ # from swh.objstorage.api.server configuration checker
+ kwargs = args
+
+ return _STORAGE_CLASSES[cls](**kwargs)
else:
raise ValueError(
"Storage class {} is not available: {}".format(
cls, _STORAGE_CLASSES_MISSING.get(cls, "unknown name")
)
)
def _construct_filtered_objstorage(storage_conf, filters_conf):
return add_filters(get_objstorage(**storage_conf), filters_conf)
_STORAGE_CLASSES["filtered"] = _construct_filtered_objstorage
def _construct_multiplexer_objstorage(objstorages):
storages = [get_objstorage(**conf) for conf in objstorages]
return MultiplexerObjStorage(storages)
_STORAGE_CLASSES["multiplexer"] = _construct_multiplexer_objstorage
def _construct_striping_objstorage(objstorages):
storages = [get_objstorage(**conf) for conf in objstorages]
return StripingObjStorage(storages)
_STORAGE_CLASSES["striping"] = _construct_striping_objstorage
diff --git a/swh/objstorage/tests/test_multiplexer_filter.py b/swh/objstorage/tests/test_multiplexer_filter.py
index bf62208..03142a2 100644
--- a/swh/objstorage/tests/test_multiplexer_filter.py
+++ b/swh/objstorage/tests/test_multiplexer_filter.py
@@ -1,331 +1,332 @@
# Copyright (C) 2015-2020 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import random
import shutil
from string import ascii_lowercase
import tempfile
import unittest
from swh.model import hashutil
from swh.objstorage.exc import Error, ObjNotFoundError
from swh.objstorage.factory import get_objstorage
from swh.objstorage.multiplexer.filter import id_prefix, id_regex, read_only
from swh.objstorage.objstorage import compute_hash
def get_random_content():
return bytes("".join(random.sample(ascii_lowercase, 10)), "utf8")
class MixinTestReadFilter(unittest.TestCase):
# Read only filter should not allow writing
def setUp(self):
super().setUp()
self.tmpdir = tempfile.mkdtemp()
pstorage = {
"cls": "pathslicing",
- "args": {"root": self.tmpdir, "slicing": "0:5"},
+ "root": self.tmpdir,
+ "slicing": "0:5",
}
base_storage = get_objstorage(**pstorage)
base_storage.id = compute_hash
self.storage = get_objstorage(
- "filtered", {"storage_conf": pstorage, "filters_conf": [read_only()]}
+ "filtered", storage_conf=pstorage, filters_conf=[read_only()]
)
self.valid_content = b"pre-existing content"
self.invalid_content = b"invalid_content"
self.true_invalid_content = b"Anything that is not correct"
self.absent_content = b"non-existent content"
# Create a valid content.
self.valid_id = base_storage.add(self.valid_content)
# Create an invalid id and add a content with it.
self.invalid_id = base_storage.id(self.true_invalid_content)
base_storage.add(self.invalid_content, obj_id=self.invalid_id)
# Compute an id for a non-existing content.
self.absent_id = base_storage.id(self.absent_content)
def tearDown(self):
super().tearDown()
shutil.rmtree(self.tmpdir)
def test_can_contains(self):
self.assertTrue(self.valid_id in self.storage)
self.assertTrue(self.invalid_id in self.storage)
self.assertFalse(self.absent_id in self.storage)
def test_can_iter(self):
self.assertIn(self.valid_id, iter(self.storage))
self.assertIn(self.invalid_id, iter(self.storage))
def test_can_len(self):
self.assertEqual(2, len(self.storage))
def test_can_get(self):
self.assertEqual(self.valid_content, self.storage.get(self.valid_id))
self.assertEqual(self.invalid_content, self.storage.get(self.invalid_id))
def test_can_check(self):
with self.assertRaises(ObjNotFoundError):
self.storage.check(self.absent_id)
with self.assertRaises(Error):
self.storage.check(self.invalid_id)
self.storage.check(self.valid_id)
def test_can_get_random(self):
self.assertEqual(1, len(list(self.storage.get_random(1))))
self.assertEqual(
len(list(self.storage)), len(set(self.storage.get_random(1000)))
)
def test_cannot_add(self):
new_id = self.storage.add(b"New content")
result = self.storage.add(self.valid_content, self.valid_id)
self.assertIsNone(new_id, self.storage)
self.assertIsNone(result)
def test_cannot_restore(self):
result = self.storage.restore(self.valid_content, self.valid_id)
self.assertIsNone(result)
class MixinTestIdFilter:
""" Mixin class that tests the filters based on filter.IdFilter
Methods "make_valid", "make_invalid" and "filter_storage" must be
implemented by subclasses.
"""
def setUp(self):
super().setUp()
# Use a hack here : as the mock uses the content as id, it is easy to
# create contents that are filtered or not.
self.prefix = "71"
self.tmpdir = tempfile.mkdtemp()
# Make the storage filtered
self.sconf = {
"cls": "pathslicing",
"args": {"root": self.tmpdir, "slicing": "0:5"},
}
storage = get_objstorage(**self.sconf)
self.base_storage = storage
self.storage = self.filter_storage(self.sconf)
# Set the id calculators
storage.id = compute_hash
# Present content with valid id
self.present_valid_content = self.ensure_valid(b"yroqdtotji")
self.present_valid_id = storage.id(self.present_valid_content)
# Present content with invalid id
self.present_invalid_content = self.ensure_invalid(b"glxddlmmzb")
self.present_invalid_id = storage.id(self.present_invalid_content)
# Missing content with valid id
self.missing_valid_content = self.ensure_valid(b"rmzkdclkez")
self.missing_valid_id = storage.id(self.missing_valid_content)
# Missing content with invalid id
self.missing_invalid_content = self.ensure_invalid(b"hlejfuginh")
self.missing_invalid_id = storage.id(self.missing_invalid_content)
# Present corrupted content with valid id
self.present_corrupted_valid_content = self.ensure_valid(b"cdsjwnpaij")
self.true_present_corrupted_valid_content = self.ensure_valid(b"mgsdpawcrr")
self.present_corrupted_valid_id = storage.id(
self.true_present_corrupted_valid_content
)
# Present corrupted content with invalid id
self.present_corrupted_invalid_content = self.ensure_invalid(b"pspjljnrco")
self.true_present_corrupted_invalid_content = self.ensure_invalid(b"rjocbnnbso")
self.present_corrupted_invalid_id = storage.id(
self.true_present_corrupted_invalid_content
)
# Missing (potentially) corrupted content with valid id
self.missing_corrupted_valid_content = self.ensure_valid(b"zxkokfgtou")
self.true_missing_corrupted_valid_content = self.ensure_valid(b"royoncooqa")
self.missing_corrupted_valid_id = storage.id(
self.true_missing_corrupted_valid_content
)
# Missing (potentially) corrupted content with invalid id
self.missing_corrupted_invalid_content = self.ensure_invalid(b"hxaxnrmnyk")
self.true_missing_corrupted_invalid_content = self.ensure_invalid(b"qhbolyuifr")
self.missing_corrupted_invalid_id = storage.id(
self.true_missing_corrupted_invalid_content
)
# Add the content that are supposed to be present
self.storage.add(self.present_valid_content)
self.storage.add(self.present_invalid_content)
self.storage.add(
self.present_corrupted_valid_content, obj_id=self.present_corrupted_valid_id
)
self.storage.add(
self.present_corrupted_invalid_content,
obj_id=self.present_corrupted_invalid_id,
)
def tearDown(self):
super().tearDown()
shutil.rmtree(self.tmpdir)
def filter_storage(self, sconf):
raise NotImplementedError(
"Id_filter test class must have a filter_storage method"
)
def ensure_valid(self, content=None):
if content is None:
content = get_random_content()
while not self.storage.is_valid(self.base_storage.id(content)):
content = get_random_content()
return content
def ensure_invalid(self, content=None):
if content is None:
content = get_random_content()
while self.storage.is_valid(self.base_storage.id(content)):
content = get_random_content()
return content
def test_contains(self):
# Both contents are present, but the invalid one should be ignored.
self.assertTrue(self.present_valid_id in self.storage)
self.assertFalse(self.present_invalid_id in self.storage)
self.assertFalse(self.missing_valid_id in self.storage)
self.assertFalse(self.missing_invalid_id in self.storage)
self.assertTrue(self.present_corrupted_valid_id in self.storage)
self.assertFalse(self.present_corrupted_invalid_id in self.storage)
self.assertFalse(self.missing_corrupted_valid_id in self.storage)
self.assertFalse(self.missing_corrupted_invalid_id in self.storage)
def test_iter(self):
self.assertIn(self.present_valid_id, iter(self.storage))
self.assertNotIn(self.present_invalid_id, iter(self.storage))
self.assertNotIn(self.missing_valid_id, iter(self.storage))
self.assertNotIn(self.missing_invalid_id, iter(self.storage))
self.assertIn(self.present_corrupted_valid_id, iter(self.storage))
self.assertNotIn(self.present_corrupted_invalid_id, iter(self.storage))
self.assertNotIn(self.missing_corrupted_valid_id, iter(self.storage))
self.assertNotIn(self.missing_corrupted_invalid_id, iter(self.storage))
def test_len(self):
# Four contents are present, but only two should be valid.
self.assertEqual(2, len(self.storage))
def test_get(self):
self.assertEqual(
self.present_valid_content, self.storage.get(self.present_valid_id)
)
with self.assertRaises(ObjNotFoundError):
self.storage.get(self.present_invalid_id)
with self.assertRaises(ObjNotFoundError):
self.storage.get(self.missing_valid_id)
with self.assertRaises(ObjNotFoundError):
self.storage.get(self.missing_invalid_id)
self.assertEqual(
self.present_corrupted_valid_content,
self.storage.get(self.present_corrupted_valid_id),
)
with self.assertRaises(ObjNotFoundError):
self.storage.get(self.present_corrupted_invalid_id)
with self.assertRaises(ObjNotFoundError):
self.storage.get(self.missing_corrupted_valid_id)
with self.assertRaises(ObjNotFoundError):
self.storage.get(self.missing_corrupted_invalid_id)
def test_check(self):
self.storage.check(self.present_valid_id)
with self.assertRaises(ObjNotFoundError):
self.storage.check(self.present_invalid_id)
with self.assertRaises(ObjNotFoundError):
self.storage.check(self.missing_valid_id)
with self.assertRaises(ObjNotFoundError):
self.storage.check(self.missing_invalid_id)
with self.assertRaises(Error):
self.storage.check(self.present_corrupted_valid_id)
with self.assertRaises(ObjNotFoundError):
self.storage.check(self.present_corrupted_invalid_id)
with self.assertRaises(ObjNotFoundError):
self.storage.check(self.missing_corrupted_valid_id)
with self.assertRaises(ObjNotFoundError):
self.storage.check(self.missing_corrupted_invalid_id)
def test_get_random(self):
self.assertEqual(0, len(list(self.storage.get_random(0))))
random_content = list(self.storage.get_random(1000))
self.assertIn(self.present_valid_id, random_content)
self.assertNotIn(self.present_invalid_id, random_content)
self.assertNotIn(self.missing_valid_id, random_content)
self.assertNotIn(self.missing_invalid_id, random_content)
self.assertIn(self.present_corrupted_valid_id, random_content)
self.assertNotIn(self.present_corrupted_invalid_id, random_content)
self.assertNotIn(self.missing_corrupted_valid_id, random_content)
self.assertNotIn(self.missing_corrupted_invalid_id, random_content)
def test_add(self):
# Add valid and invalid contents to the storage and check their
# presence with the unfiltered storage.
valid_content = self.ensure_valid(b"ulepsrjbgt")
valid_id = self.base_storage.id(valid_content)
invalid_content = self.ensure_invalid(b"znvghkjked")
invalid_id = self.base_storage.id(invalid_content)
self.storage.add(valid_content)
self.storage.add(invalid_content)
self.assertTrue(valid_id in self.base_storage)
self.assertFalse(invalid_id in self.base_storage)
def test_restore(self):
# Add corrupted content to the storage and the try to restore it
valid_content = self.ensure_valid(b"ulepsrjbgt")
valid_id = self.base_storage.id(valid_content)
corrupted_content = self.ensure_valid(b"ltjkjsloyb")
corrupted_id = self.base_storage.id(corrupted_content)
self.storage.add(corrupted_content, obj_id=valid_id)
with self.assertRaises(ObjNotFoundError):
self.storage.check(corrupted_id)
with self.assertRaises(Error):
self.storage.check(valid_id)
self.storage.restore(valid_content)
self.storage.check(valid_id)
class TestPrefixFilter(MixinTestIdFilter, unittest.TestCase):
def setUp(self):
self.prefix = b"71"
super().setUp()
def ensure_valid(self, content):
obj_id = compute_hash(content)
hex_obj_id = hashutil.hash_to_hex(obj_id)
self.assertTrue(hex_obj_id.startswith(self.prefix))
return content
def ensure_invalid(self, content):
obj_id = compute_hash(content)
hex_obj_id = hashutil.hash_to_hex(obj_id)
self.assertFalse(hex_obj_id.startswith(self.prefix))
return content
def filter_storage(self, sconf):
return get_objstorage(
"filtered",
{"storage_conf": sconf, "filters_conf": [id_prefix(self.prefix)]},
)
class TestRegexFilter(MixinTestIdFilter, unittest.TestCase):
def setUp(self):
self.regex = r"[a-f][0-9].*"
super().setUp()
def filter_storage(self, sconf):
return get_objstorage(
"filtered", {"storage_conf": sconf, "filters_conf": [id_regex(self.regex)]}
)
diff --git a/swh/objstorage/tests/test_objstorage_api.py b/swh/objstorage/tests/test_objstorage_api.py
index b3abc12..7abde98 100644
--- a/swh/objstorage/tests/test_objstorage_api.py
+++ b/swh/objstorage/tests/test_objstorage_api.py
@@ -1,52 +1,50 @@
# Copyright (C) 2015-2020 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import shutil
import tempfile
import unittest
import pytest
from swh.core.api.tests.server_testing import ServerTestFixtureAsync
from swh.objstorage.api.server import make_app
from swh.objstorage.factory import get_objstorage
from swh.objstorage.tests.objstorage_testing import ObjStorageTestFixture
class TestRemoteObjStorage(
ServerTestFixtureAsync, ObjStorageTestFixture, unittest.TestCase
):
""" Test the remote archive API.
"""
def setUp(self):
self.tmpdir = tempfile.mkdtemp()
self.config = {
"objstorage": {
"cls": "pathslicing",
- "args": {
- "root": self.tmpdir,
- "slicing": "0:1/0:5",
- "allow_delete": True,
- },
+ "root": self.tmpdir,
+ "slicing": "0:1/0:5",
+ "allow_delete": True,
},
"client_max_size": 8 * 1024 * 1024,
}
self.app = make_app(self.config)
super().setUp()
self.storage = get_objstorage("remote", {"url": self.url()})
def tearDown(self):
super().tearDown()
shutil.rmtree(self.tmpdir)
@pytest.mark.skip("makes no sense to test this for the remote api")
def test_delete_not_allowed(self):
pass
@pytest.mark.skip("makes no sense to test this for the remote api")
def test_delete_not_allowed_by_default(self):
pass
diff --git a/swh/objstorage/tests/test_server.py b/swh/objstorage/tests/test_server.py
index 638199c..179864d 100644
--- a/swh/objstorage/tests/test_server.py
+++ b/swh/objstorage/tests/test_server.py
@@ -1,115 +1,115 @@
# Copyright (C) 2019 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import copy
import pytest
import yaml
from swh.objstorage.api.server import load_and_check_config
def prepare_config_file(tmpdir, content, name="config.yml"):
"""Prepare configuration file in `$tmpdir/name` with content `content`.
Args:
tmpdir (LocalPath): root directory
content (str/dict): Content of the file either as string or as a dict.
If a dict, converts the dict into a yaml string.
name (str): configuration filename
Returns
path (str) of the configuration file prepared.
"""
config_path = tmpdir / name
if isinstance(content, dict): # convert if needed
content = yaml.dump(content)
config_path.write_text(content, encoding="utf-8")
# pytest on python3.5 does not support LocalPath manipulation, so
# convert path to string
return str(config_path)
def test_load_and_check_config_no_configuration():
"""Inexistent configuration files raises"""
- with pytest.raises(EnvironmentError) as e:
+ with pytest.raises(EnvironmentError, match="Configuration file must be defined"):
load_and_check_config(None)
- assert e.value.args[0] == "Configuration file must be defined"
-
config_path = "/indexer/inexistent/config.yml"
- with pytest.raises(FileNotFoundError) as e:
+ with pytest.raises(FileNotFoundError, match=f"{config_path} does not exist"):
load_and_check_config(config_path)
- assert e.value.args[0] == "Configuration file %s does not exist" % (config_path,)
-
def test_load_and_check_config_invalid_configuration_toplevel(tmpdir):
"""Invalid configuration raises"""
config = {"something": "useless"}
config_path = prepare_config_file(tmpdir, content=config)
- with pytest.raises(KeyError) as e:
+ with pytest.raises(KeyError, match="missing objstorage config entry"):
load_and_check_config(config_path)
- assert e.value.args[0] == "Invalid configuration; missing objstorage config entry"
-
def test_load_and_check_config_invalid_configuration(tmpdir):
"""Invalid configuration raises"""
- for data, missing_keys in [
- ({"objstorage": {"something": "useless"}}, ["cls", "args"]),
- ({"objstorage": {"cls": "something"}}, ["args"]),
- ]:
- config_path = prepare_config_file(tmpdir, content=data)
- with pytest.raises(KeyError) as e:
- load_and_check_config(config_path)
-
- assert e.value.args[0] == "Invalid configuration; missing %s config entry" % (
- ", ".join(missing_keys),
- )
+ config_path = prepare_config_file(
+ tmpdir, content={"objstorage": {"something": "useless"}}
+ )
+ with pytest.raises(KeyError, match="missing cls config entry"):
+ load_and_check_config(config_path)
def test_load_and_check_config_invalid_configuration_level2(tmpdir):
"""Invalid configuration at 2nd level raises"""
config = {
"objstorage": {
"cls": "pathslicing",
"args": {"root": "root", "slicing": "slicing",},
"client_max_size": "10",
}
}
for key in ("root", "slicing"):
c = copy.deepcopy(config)
c["objstorage"]["args"].pop(key)
config_path = prepare_config_file(tmpdir, c)
- with pytest.raises(KeyError) as e:
+ with pytest.raises(KeyError, match=f"missing {key} config entry"):
load_and_check_config(config_path)
- assert (
- e.value.args[0]
- == "Invalid configuration; missing args.%s config entry" % key
- )
-
-def test_load_and_check_config_fine(tmpdir):
+@pytest.mark.parametrize(
+ "config",
+ [
+ pytest.param(
+ {
+ "objstorage": {
+ "cls": "pathslicing",
+ "args": {"root": "root", "slicing": "slicing"},
+ }
+ },
+ id="pathslicing-bw-compat",
+ ),
+ pytest.param(
+ {
+ "objstorage": {
+ "cls": "pathslicing",
+ "root": "root",
+ "slicing": "slicing",
+ }
+ },
+ id="pathslicing",
+ ),
+ pytest.param(
+ {"client_max_size": "10", "objstorage": {"cls": "memory", "args": {}}},
+ id="empty-args-bw-compat",
+ ),
+ pytest.param(
+ {"client_max_size": "10", "objstorage": {"cls": "memory"}}, id="empty-args"
+ ),
+ ],
+)
+def test_load_and_check_config(tmpdir, config):
"""pathslicing configuration fine loads ok"""
- config = {
- "objstorage": {
- "cls": "pathslicing",
- "args": {"root": "root", "slicing": "slicing",},
- }
- }
-
- config_path = prepare_config_file(tmpdir, config)
- cfg = load_and_check_config(config_path)
- assert cfg == config
-
-
-def test_load_and_check_config_fine2(tmpdir):
- config = {"client_max_size": "10", "objstorage": {"cls": "remote", "args": {}}}
config_path = prepare_config_file(tmpdir, config)
cfg = load_and_check_config(config_path)
assert cfg == config
diff --git a/tox.ini b/tox.ini
index b4b54bc..af7d070 100644
--- a/tox.ini
+++ b/tox.ini
@@ -1,35 +1,35 @@
[tox]
envlist=flake8,py3,mypy
[testenv]
extras =
testing
deps =
pytest-cov
dev: pdbpp
commands =
pytest --cov={envsitepackagesdir}/swh/objstorage \
{envsitepackagesdir}/swh/objstorage \
--cov-branch {posargs}
[testenv:black]
skip_install = true
deps =
- black
+ black==19.10b0
commands =
{envpython} -m black --check swh
[testenv:flake8]
skip_install = true
deps =
flake8
commands =
{envpython} -m flake8
[testenv:mypy]
extras =
testing
deps =
mypy
commands =
mypy swh
File Metadata
Details
Attached
Mime Type
text/x-diff
Expires
Sat, Jun 21, 6:37 PM (1 w, 6 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3329009
Attached To
rDOBJS Object storage
Event Timeline
Log In to Comment