Page MenuHomeSoftware Heritage

Swh components: make a default configuration file available for each of them
Closed, MigratedEdits Locked

Description

I understand that removing implicit configuration is the right way to go but giving easy access to default YAML configuration files
synchronized with each configuration change will be a great helper here.

Indeed, I got back working on the NPM loader but I got stuck on the following swh-objstorage error
that suddenly appeared:

Traceback (most recent call last):
  File "/home/antoine/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 894, in load
    self.store_data()
  File "/home/antoine/swh/swh-environment/swh-loader-npm/swh/loader/npm/loader.py", line 245, in store_data
    self.flush()
  File "/home/antoine/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 704, in flush
    self.send_batch_contents(contents)
  File "/home/antoine/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 647, in send_batch_contents
    packet_size_bytes=packet_size_bytes)
  File "/home/antoine/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 46, in send_in_packets
    sender(formatted_objects)
  File "/usr/lib/python3/dist-packages/retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "/usr/lib/python3/dist-packages/retrying.py", line 206, in call
    return attempt.get(self._wrap_exception)
  File "/usr/lib/python3/dist-packages/retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise
    raise value
  File "/usr/lib/python3/dist-packages/retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "/home/antoine/swh/swh-environment/swh-loader-core/swh/loader/core/loader.py", line 400, in send_contents
    self.storage.content_add(content_list)
  File "/home/antoine/swh/swh-environment/swh-storage/swh/storage/api/client.py", line 23, in content_add
    return self.post('content/add', {'content': content})
  File "/home/antoine/swh/swh-environment/swh-core/swh/core/api/__init__.py", line 185, in post
    return self._decode_response(response)
  File "/home/antoine/swh/swh-environment/swh-core/swh/core/api/__init__.py", line 220, in _decode_response
    raise pickle.loads(decode_response(response))
swh.core.api.RemoteException: Unexpected status code for API request: 413 (b'413: Request Entity Too Large')

It took me some time to understand that some changes were made in configuration and that my configuration should have been
updated to the following (I was missing the client_max_size entry):

objstorage:
  cls: 'pathslicing'
  args: 
    root: '/home/antoine/swh/objects/'
    slicing: '0:2/2:4/4:6'

client_max_size: 8388608

Having the expected configuration clearly exposed somewhere would have greatly help me here.

So I think exposing default configuration for each swh component, maybe in a separate repository,
could be useful.

Event Timeline

anlambert triaged this task as Normal priority.Feb 27 2019, 4:43 PM
anlambert created this task.
anlambert renamed this task from Swh components: make a default configuration file available in each repo to Swh components: make a default configuration file available for each of them.Feb 27 2019, 5:19 PM
anlambert updated the task description. (Show Details)

In the current state of our dev environment, I'd expect "good" default configuration files for development live in the swh-docker-dev repository. They also have the nice property that people are regularly using them to run their tests locally, so we have a better chance of noticing when they break.

I don't really know if they should stay there or if they should be moved back to the individual repositories.

In the specific case of client_max_size, I think the commit removing the default configuration from the objstorage api server should have made sure to keep the previously set default value (which was explicitly set to 1GB in rDOBJSe70f5329fd7b32a7b402fb98101abe4f384acc29) rather than drop it completely.

In the current state of our dev environment, I'd expect "good" default configuration files for development live in the swh-docker-dev repository. They also have the nice property that people are regularly using them to run their tests locally, so we have a better chance of noticing when they break.

I saw that configuration files were available in the docker repository but their content is quite specific to that environment (hostnames, database names, ...).
For those who do not use docker, having a working default configuration available would be of interest.

I don't really know if they should stay there or if they should be moved back to the individual repositories.

Thinking it back, having a default configuration file in each repository seems the better option as it will facilitate
its synchronization with each configuration update.
This could also be used to reference all available configuration entries for each component in a simple way.
Nevertheless, this is only my opinion and this should be discussed with the other team members.