Changeset View
Standalone View
swh/storage/archiver/copier.py
- This file was added.
# Copyright (C) 2015 The Software Heritage developers | |||||
# See the AUTHORS file at the top-level directory of this distribution | |||||
# License: GNU General Public License version 3, or any later version | |||||
# See top-level LICENSE file for more information | |||||
from swh.core import hashutil | |||||
from ..objstorage.api.client import RemoteObjStorage | |||||
class ArchiverCopier(): | |||||
""" This archiver copy some files into a remote objstorage | |||||
in order to get a backup. | |||||
Attributes: | |||||
content_ids: A list of sha1's that represents the content this copier | |||||
has to archive. | |||||
ardumont: a list of sha1`s`
this copier `has` | |||||
server (RemoteArchive): The remote object storage that is used to | |||||
backup content. | |||||
master_storage (Storage): The master storage that contains the data | |||||
the copier needs to archive. | |||||
""" | |||||
def __init__(self, destination, content, master_storage): | |||||
""" Create a Copier for the archiver | |||||
Args: | |||||
destination: A tuple (archive_name, archive_url) that represents a | |||||
remote object storage as in the 'archives' table. | |||||
content: A list of sha1 that represents the content this copier | |||||
have to archive. | |||||
master_storage (Storage): The master storage of the system that | |||||
contains the data to archive. | |||||
""" | |||||
_name, self.url = destination | |||||
self.content_ids = content | |||||
self.server = RemoteObjStorage(self.url) | |||||
self.master_storage = master_storage | |||||
def run(self): | |||||
""" Do the copy on the backup storage. | |||||
Run the archiver copier in order to copy the required content | |||||
into the current destination. | |||||
The content which corresponds to the sha1 in self.content_ids | |||||
will be fetched from the master_storage and then copied into | |||||
the backup object storage. | |||||
Returns: | |||||
A boolean that indicates if the whole content have been copied. | |||||
""" | |||||
self.content_ids = list(map(lambda x: hashutil.hex_to_hash(x[2:]), | |||||
Done Inline ActionsI think it might work without needing to consume the map. Can you refresh my memory, why do you need to strip the first 2 characters (is it '\\x')? ardumont: I think it might work without needing to consume the map.
Can you refresh my memory, why do… | |||||
self.content_ids)) | |||||
contents = self.master_storage.content_get(self.content_ids) | |||||
try: | |||||
for content in contents: | |||||
content_data = content['data'] | |||||
self.server.content_add(content_data) | |||||
Done Inline Actionsself.server.content_add(map(lambda c: c['data'], contents)) ? ardumont: ```
self.server.content_add(map(lambda c: c['data'], contents))
```
? | |||||
Done Inline ActionsJust checked : RemoteObjStorage::content_add is linked to ObjStorage::add_bytes that only take a single content. qcampos: Just checked : `RemoteObjStorage::content_add` is linked to `ObjStorage::add_bytes` that only… | |||||
except: | |||||
return False | |||||
Done Inline ActionsYou may want to avoid writing your doubts in the code and use differential for that (as you also did below). I like to use, TODO (you'll need to think more and improve the code), FIXME (you saw something ugly but this is not the time to fix it) , HACK (you had to do something horribly ugly but did not see any other way around it) followed by some concise description about what this is all about. ardumont: You may want to avoid writing your doubts in the code and use differential for that (as you… | |||||
return True | |||||
Not Done Inline ActionsIf that there is an error during the process, some files may have been copied. What should be the best option? Mark them as 'present', or just ignore them and reschedule the whole batch after the 'ongoing' maximum delay is elapsed? qcampos: If that there is an error during the process, some files may have been copied. What should be… | |||||
Not Done Inline ActionsAs a first approximation, considering nothing is done is reasonable. ardumont: As a first approximation, considering nothing is done is reasonable.
(If nothing prevents later… | |||||
Not Done Inline Actions
What i meant was, as long as you can write and overwrite existing contents again, i think it's ok. ardumont: > (If nothing prevents later to add again some existing content)
What i meant was, as long as… | |||||
Not Done Inline ActionsShouldn't create any problem (ATM) as the object storage just write files without checking if they already exists. qcampos: Shouldn't create any problem (ATM) as the object storage just write files without checking if… | |||||
Done Inline ActionsWell, it depends on how you see the asynchroneous part work... If archive-director spawns asynchroneous archive-workers which also spawns asynchroneous archive-copiers, it could be simpler to do what you propose (database access here)... But if the async activity stops at archive-workers meaning only archive-directory spawns async archive-workers. ardumont: Well, it depends on how you see the asynchroneous part work...
If archive-director spawns… | |||||
Done Inline ActionsNow that I think about it , the specification is kinda blurry about what precisely is asynchronous. But I understood it as Director spawning async workers that execute copier's code. qcampos: Now that I think about it , the specification is kinda blurry about what precisely is… |
a list of sha1`s`
this copier has