Page MenuHomeSoftware Heritage

No OneTemporary

diff --git a/PKG-INFO b/PKG-INFO
index d786f77..181930d 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,10 +1,10 @@
Metadata-Version: 1.0
Name: swh.loader.dir
-Version: 0.0.26
+Version: 0.0.27
Summary: Software Heritage Directory Loader
Home-page: https://forge.softwareheritage.org/diffusion/DLDDIR
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
License: UNKNOWN
Description: UNKNOWN
Platform: UNKNOWN
diff --git a/README b/README
deleted file mode 100644
index 90b283b..0000000
--- a/README
+++ /dev/null
@@ -1,96 +0,0 @@
-SWH-loader-dir
-==============
-
-The Software Heritage Directory Loader is a tool and a library to walk a local
-directory and inject into the SWH dataset all unknown contained files.
-
-
-Directory loader
-================
-
-
-### Configuration
-
-This is the loader's (or task's) configuration file.
-
-loader/dir.yml:
-
- storage:
- cls: remote
- args:
- url: http://localhost:5002/
-
- send_contents: True
- send_directories: True
- send_revisions: True
- send_releases: True
- send_occurrences: True
- # nb of max contents to send for storage
- content_packet_size: 100
- # 100 Mib of content data
- content_packet_block_size_bytes: 104857600
- # limit for swh content storage for one blob (beyond that limit, the
- # content's data is not sent for storage)
- content_packet_size_bytes: 1073741824
- directory_packet_size: 250
- revision_packet_size: 100
- release_packet_size: 100
- occurrence_packet_size: 100
-
-Present in possible locations:
-- ~/.config/swh/loader/dir.ini
-- ~/.swh/loader/dir.ini
-- /etc/softwareheritage/loader/dir.ini
-
-
-#### Toplevel
-
-Load directory directly from code or toplevel:
-
- from swh.loader.dir.loader import DirLoader
-
- dir_path = '/path/to/directory
-
- # Fill in those
- origin = {'url': 'some-origin', 'type': 'dir'}
- visit_date = 'Tue, 3 May 2017 17:16:32 +0200'
- release = None
- revision = {}
- occurrence = {}
-
- DirLoader().load(dir_path, origin, visit_date, revision, release, [occurrence])
-
-
-#### Celery
-
-Load directory using celery.
-
-Providing you have a properly configured celery up and running
-
-worker.ini needs to be updated with the following keys:
-
- task_modules = swh.loader.dir.tasks
- task_queues = swh_loader_dir
-
-cf. https://forge.softwareheritage.org/diffusion/DCORE/browse/master/README.md
-for more details
-
-You can send the following message to the task queue:
-
- from swh.loader.dir.tasks import LoadDirRepository
-
- # Fill in those
- origin = {'url': 'some-origin', 'type': 'dir'}
- visit_date = 'Tue, 3 May 2017 17:16:32 +0200'
- release = None
- revision = {}
- occurrence = {}
-
- # Send message to the task queue
- LoaderDirRepository().run(('/path/to/dir, origin, visit_date, revision, release, [occurrence]))
-
-
-Directory producer
-==================
-
-None
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..0047603
--- /dev/null
+++ b/README.md
@@ -0,0 +1,96 @@
+SWH-loader-dir
+===================
+
+The Software Heritage Directory Loader is a tool and a library.
+
+Its sole purpose is to walk a local directory and inject into the SWH
+dataset all unknown contained files from that directory structure.
+
+
+## Configuration
+
+The loader needs a configuration file in *`{/etc/softwareheritage |
+~/.config/swh | ~/.swh}`/loader/dir.yml*.
+
+This file should be similar to this (adapt according to your needs):
+
+``` yaml
+storage:
+ cls: remote
+ args:
+ url: http://localhost:5002/
+
+send_contents: True
+send_directories: True
+send_revisions: True
+send_releases: True
+send_occurrences: True
+# nb of max contents to send for storage
+content_packet_size: 100
+# 100 Mib of content data
+content_packet_block_size_bytes: 104857600
+# limit for swh content storage for one blob (beyond that limit, the
+# content's data is not sent for storage)
+content_packet_size_bytes: 1073741824
+directory_packet_size: 250
+revision_packet_size: 100
+release_packet_size: 100
+occurrence_packet_size: 100
+```
+
+## Run
+
+To run the loader, you can use either:
+
+- python3's toplevel
+- celery
+
+### Toplevel
+
+Load directory directly from code or toplevel:
+
+``` Python
+from swh.loader.dir.loader import DirLoader
+
+dir_path = '/path/to/directory
+
+# Fill in those
+origin = {'url': 'some-origin', 'type': 'dir'}
+visit_date = 'Tue, 3 May 2017 17:16:32 +0200'
+release = None
+revision = {}
+occurrence = {}
+
+DirLoader().load(dir_path, origin, visit_date, revision, release, [occurrence])
+```
+
+### Celery
+
+To use celery, add the following entries in the
+*`{/etc/softwareheritage | ~/.config/swh | ~/.swh}`/worker.yml*` file:
+
+``` yaml
+task_modules:
+ - swh.loader.dir.tasks
+task_queues:
+ - swh_loader_dir
+```
+
+cf. [swh-core's documentation](https://forge.softwareheritage.org/diffusion/DCORE/browse/master/README.md) for
+more details.
+
+You can then send the following message to the task queue:
+
+``` Python
+from swh.loader.dir.tasks import LoadDirRepository
+
+# Fill in those
+origin = {'url': 'some-origin', 'type': 'dir'}
+visit_date = 'Tue, 3 May 2017 17:16:32 +0200'
+release = None
+revision = {}
+occurrence = {}
+
+# Send message to the task queue
+LoaderDirRepository().run(('/path/to/dir', origin, visit_date, revision, release, [occurrence]))
+```
diff --git a/debian/control b/debian/control
index 53bcbf0..2748828 100644
--- a/debian/control
+++ b/debian/control
@@ -1,28 +1,28 @@
Source: swh-loader-dir
Maintainer: Software Heritage developers <swh-devel@inria.fr>
Section: python
Priority: optional
Build-Depends: debhelper (>= 9),
dh-python,
python3-all,
python3-nose,
python3-setuptools,
python3-swh.core (>= 0.0.14~),
- python3-swh.loader.core (>= 0.0.14~),
+ python3-swh.loader.core (>= 0.0.15~),
python3-swh.model (>= 0.0.15~),
python3-swh.scheduler (>= 0.0.14~),
python3-swh.storage (>= 0.0.83~),
python3-vcversioner
Standards-Version: 3.9.6
Homepage: https://forge.softwareheritage.org/diffusion/DLDDIR/
Package: python3-swh.loader.dir
Architecture: all
Depends: python3-swh.core (>= 0.0.14~),
- python3-swh.loader.core (>= 0.0.14~),
+ python3-swh.loader.core (>= 0.0.15~),
python3-swh.model (>= 0.0.15~),
python3-swh.scheduler (>= 0.0.14~),
python3-swh.storage (>= 0.0.83~),
${misc:Depends},
${python3:Depends}
Description: Software Heritage Directory Loader
diff --git a/docs/.gitignore b/docs/.gitignore
new file mode 100644
index 0000000..f6b5c55
--- /dev/null
+++ b/docs/.gitignore
@@ -0,0 +1,4 @@
+_build/
+apidoc/
+*-stamp
+README.md
diff --git a/docs/Makefile b/docs/Makefile
new file mode 100644
index 0000000..ec260d2
--- /dev/null
+++ b/docs/Makefile
@@ -0,0 +1,6 @@
+include ../../swh-docs/Makefile.sphinx
+
+html: copy_md
+
+copy_md:
+ cp ../README.md README.md
diff --git a/docs/_static/.placeholder b/docs/_static/.placeholder
new file mode 100644
index 0000000..e69de29
diff --git a/docs/_templates/.placeholder b/docs/_templates/.placeholder
new file mode 100644
index 0000000..e69de29
diff --git a/docs/conf.py b/docs/conf.py
new file mode 100644
index 0000000..190deb7
--- /dev/null
+++ b/docs/conf.py
@@ -0,0 +1 @@
+from swh.docs.sphinx.conf import * # NoQA
diff --git a/docs/index.rst b/docs/index.rst
new file mode 100644
index 0000000..2e88ed8
--- /dev/null
+++ b/docs/index.rst
@@ -0,0 +1,15 @@
+Software Heritage - Development Documentation
+=============================================
+
+.. toctree::
+ :maxdepth: 2
+ :caption: Contents:
+
+ README.md
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
diff --git a/requirements-swh.txt b/requirements-swh.txt
index 6eb25c7..99fd197 100644
--- a/requirements-swh.txt
+++ b/requirements-swh.txt
@@ -1,5 +1,5 @@
swh.core >= 0.0.14
swh.model >= 0.0.15
swh.scheduler >= 0.0.14
swh.storage >= 0.0.83
-swh.loader.core >= 0.0.14
+swh.loader.core >= 0.0.15
diff --git a/swh.loader.dir.egg-info/PKG-INFO b/swh.loader.dir.egg-info/PKG-INFO
index d786f77..181930d 100644
--- a/swh.loader.dir.egg-info/PKG-INFO
+++ b/swh.loader.dir.egg-info/PKG-INFO
@@ -1,10 +1,10 @@
Metadata-Version: 1.0
Name: swh.loader.dir
-Version: 0.0.26
+Version: 0.0.27
Summary: Software Heritage Directory Loader
Home-page: https://forge.softwareheritage.org/diffusion/DLDDIR
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
License: UNKNOWN
Description: UNKNOWN
Platform: UNKNOWN
diff --git a/swh.loader.dir.egg-info/SOURCES.txt b/swh.loader.dir.egg-info/SOURCES.txt
index 3f3d5a5..adf57fb 100644
--- a/swh.loader.dir.egg-info/SOURCES.txt
+++ b/swh.loader.dir.egg-info/SOURCES.txt
@@ -1,32 +1,38 @@
.gitignore
AUTHORS
LICENSE
MANIFEST.in
Makefile
Makefile.local
-README
+README.md
requirements-swh.txt
requirements.txt
setup.py
version.txt
bin/swh-check-missing-objects.py
bin/swh-loader-dir
debian/changelog
debian/compat
debian/control
debian/copyright
debian/rules
debian/source/format
+docs/.gitignore
+docs/Makefile
+docs/conf.py
+docs/index.rst
+docs/_static/.placeholder
+docs/_templates/.placeholder
resources/dir.ini
resources/loader/dir.ini
swh.loader.dir.egg-info/PKG-INFO
swh.loader.dir.egg-info/SOURCES.txt
swh.loader.dir.egg-info/dependency_links.txt
swh.loader.dir.egg-info/requires.txt
swh.loader.dir.egg-info/top_level.txt
swh/loader/dir/__init__.py
swh/loader/dir/converters.py
swh/loader/dir/loader.py
swh/loader/dir/tasks.py
swh/loader/dir/tests/test_converters.py
swh/loader/dir/tests/test_loader.py
\ No newline at end of file
diff --git a/swh.loader.dir.egg-info/requires.txt b/swh.loader.dir.egg-info/requires.txt
index 7d8bb2c..1af9336 100644
--- a/swh.loader.dir.egg-info/requires.txt
+++ b/swh.loader.dir.egg-info/requires.txt
@@ -1,8 +1,8 @@
click
retrying
swh.core>=0.0.14
-swh.loader.core>=0.0.14
+swh.loader.core>=0.0.15
swh.model>=0.0.15
swh.scheduler>=0.0.14
swh.storage>=0.0.83
vcversioner
diff --git a/swh/loader/dir/loader.py b/swh/loader/dir/loader.py
index 2c13464..03137ad 100644
--- a/swh/loader/dir/loader.py
+++ b/swh/loader/dir/loader.py
@@ -1,247 +1,255 @@
# Copyright (C) 2015-2017 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import click
import os
import sys
import uuid
from swh.loader.core import loader
from swh.model import git
from swh.model.git import GitType
from . import converters
class DirLoader(loader.SWHLoader):
"""A bulk loader for a directory.
This will load the content of the directory.
Args:
- - dir_path: source of the directory to import
- - origin: Dictionary origin
- - id: origin's id
- - url: url origin we fetched
- - type: type of the origin
- - revision: Dictionary of information needed, keys are:
- - author_name: revision's author name
- - author_email: revision's author email
- - author_date: timestamp (e.g. 1444054085)
- - author_offset: date offset e.g. -0220, +0100
- - committer_name: revision's committer name
- - committer_email: revision's committer email
- - committer_date: timestamp
- - committer_offset: date offset e.g. -0220, +0100
- - type: type of revision dir, tar
- - message: synthetic message for the revision
- - release: Dictionary of information needed, keys are:
- - name: release name
- - date: release timestamp (e.g. 1444054085)
- - offset: release date offset e.g. -0220, +0100
- - author_name: release author's name
- - author_email: release author's email
- - comment: release's comment message
- - occurrences: List of occurrences as dictionary.
- Information needed, keys are:
- - branch: occurrence's branch name
- - date: validity date (e.g. 2015-01-01 00:00:00+00)
+ dir_path: source of the directory to import
+ origin (dict): dictionary with the following keys:
+
+ - id: origin's id
+ - url: url origin we fetched
+ - type: type of the origin
+
+ revision (dict): dictionary with the following keys:
+
+ - author_name: revision's author name
+ - author_email: revision's author email
+ - author_date: timestamp (e.g. 1444054085)
+ - author_offset: date offset e.g. -0220, +0100
+ - committer_name: revision's committer name
+ - committer_email: revision's committer email
+ - committer_date: timestamp
+ - committer_offset: date offset e.g. -0220, +0100
+ - type: type of revision dir, tar
+ - message: synthetic message for the revision
+
+ release (dict): dictionary with the following keys:
+
+ - name: release name
+ - date: release timestamp (e.g. 1444054085)
+ - offset: release date offset e.g. -0220, +0100
+ - author_name: release author's name
+ - author_email: release author's email
+ - comment: release's comment message
+
+ occurrences (dict): dictionary with the following keys:
+
+ - branch: occurrence's branch name
+ - date: validity date (e.g. 2015-01-01 00:00:00+00)
"""
CONFIG_BASE_FILENAME = 'loader/dir'
def __init__(self, logging_class='swh.loader.dir.DirLoader',
config=None):
super().__init__(logging_class=logging_class, config=config)
def list_repo_objs(self, dir_path, revision, release):
"""List all objects from dir_path.
Args:
- - dir_path (path): the directory to list
- - revision: revision dictionary representation
- - release: release dictionary representation
+ dir_path: the directory to list
+ revision: revision dictionary representation
+ release: release dictionary representation
Returns:
- a dict containing lists of `Oid`s with keys for each object type:
+ list: lists of oid-s with keys for each object type:
+
- CONTENT
- DIRECTORY
+
"""
def _revision_from(tree_hash, revision):
full_rev = dict(revision)
full_rev['directory'] = tree_hash
full_rev = converters.commit_to_revision(full_rev)
full_rev['id'] = git.compute_revision_sha1_git(full_rev)
return full_rev
def _release_from(revision_hash, release):
full_rel = dict(release)
full_rel['target'] = revision_hash
full_rel['target_type'] = 'revision'
full_rel = converters.annotated_tag_to_release(full_rel)
full_rel['id'] = git.compute_release_sha1_git(full_rel)
return full_rel
log_id = str(uuid.uuid4())
sdir_path = dir_path.decode('utf-8')
self.log.info("Started listing %s" % dir_path, extra={
'swh_type': 'dir_list_objs_start',
'swh_repo': sdir_path,
'swh_id': log_id,
})
objects_per_path = git.compute_hashes_from_directory(dir_path)
tree_hash = objects_per_path[dir_path]['checksums']['sha1_git']
full_rev = _revision_from(tree_hash, revision)
objects = {
GitType.BLOB: list(
git.objects_per_type(GitType.BLOB, objects_per_path)),
GitType.TREE: list(
git.objects_per_type(GitType.TREE, objects_per_path)),
GitType.COMM: [full_rev],
GitType.RELE: []
}
if release and 'name' in release:
full_rel = _release_from(full_rev['id'], release)
objects[GitType.RELE] = [full_rel]
self.log.info("Done listing the objects in %s: %d contents, "
"%d directories, %d revisions, %d releases" % (
sdir_path,
len(objects[GitType.BLOB]),
len(objects[GitType.TREE]),
len(objects[GitType.COMM]),
len(objects[GitType.RELE])
), extra={
'swh_type': 'dir_list_objs_end',
'swh_repo': sdir_path,
'swh_num_blobs': len(objects[GitType.BLOB]),
'swh_num_trees': len(objects[GitType.TREE]),
'swh_num_commits': len(objects[GitType.COMM]),
'swh_num_releases': len(objects[GitType.RELE]),
'swh_id': log_id,
})
return objects
def prepare(self, *args, **kwargs):
self.dir_path, self.origin, self.visit_date, self.revision, self.release, self.occs = args # noqa
if not os.path.exists(self.dir_path):
warn_msg = 'Skipping inexistant directory %s' % self.dir_path
self.log.error(warn_msg,
extra={
'swh_type': 'dir_repo_list_refs',
'swh_repo': self.dir_path,
'swh_num_refs': 0,
})
raise ValueError(warn_msg)
if isinstance(self.dir_path, str):
self.dir_path = self.dir_path.encode(sys.getfilesystemencoding())
def get_origin(self):
return self.origin # set in prepare method
def cleanup(self):
"""Nothing to clean up.
"""
pass
def fetch_data(self):
def _occurrence_from(origin_id, visit, revision_hash, occurrence):
occ = dict(occurrence)
occ.update({
'target': revision_hash,
'target_type': 'revision',
'origin': origin_id,
'visit': visit
})
return occ
def _occurrences_from(origin_id, visit, revision_hash, occurrences):
occs = []
for occurrence in occurrences:
occs.append(_occurrence_from(origin_id,
visit,
revision_hash,
occurrence))
return occs
# to load the repository, walk all objects, compute their hashes
self.objects = self.list_repo_objs(
self.dir_path, self.revision, self.release)
full_rev = self.objects[GitType.COMM][0] # only 1 revision
# Update objects with release and occurrences
self.objects[GitType.REFS] = _occurrences_from(
self.origin_id, self.visit, full_rev['id'], self.occs)
def store_data(self):
objects = self.objects
self.maybe_load_contents(objects[GitType.BLOB])
self.maybe_load_directories(objects[GitType.TREE])
self.maybe_load_revisions(objects[GitType.COMM])
self.maybe_load_releases(objects[GitType.RELE])
self.maybe_load_occurrences(objects[GitType.REFS])
@click.command()
@click.option('--dir-path', required=1, help='Directory path to load')
@click.option('--origin-url', required=1, help='Origin url for that directory')
@click.option('--visit-date', default=None, help='Visit date time override')
def main(dir_path, origin_url, visit_date):
"""Debugging purpose."""
d = DirLoader()
origin = {
'url': origin_url,
'type': 'dir'
}
import datetime
commit_time = int(datetime.datetime.now(
tz=datetime.timezone.utc).timestamp()
)
swh_person = {
'name': 'Software Heritage',
'fullname': 'Software Heritage',
'email': 'robot@softwareheritage.org'
}
revision_message = 'swh-loader-dir: synthetic revision message'
revision_type = 'tar'
revision = {
'date': {
'timestamp': commit_time,
'offset': 0,
},
'committer_date': {
'timestamp': commit_time,
'offset': 0,
},
'author': swh_person,
'committer': swh_person,
'type': revision_type,
'message': revision_message,
'metadata': {},
'synthetic': True,
}
release = None
occurrence = {
'branch': os.path.basename(dir_path),
}
d.load(dir_path, origin, visit_date, revision, release, [occurrence])
if __name__ == '__main__':
main()
diff --git a/swh/loader/dir/tests/test_loader.py b/swh/loader/dir/tests/test_loader.py
index 2c8829a..4c27f05 100644
--- a/swh/loader/dir/tests/test_loader.py
+++ b/swh/loader/dir/tests/test_loader.py
@@ -1,142 +1,319 @@
# Copyright (C) 2015 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import os
import shutil
import subprocess
import tempfile
import unittest
from nose.tools import istest
from swh.loader.dir.loader import DirLoader
from swh.model.git import GitType
-class TestLoader(unittest.TestCase):
+class InitTestLoader(unittest.TestCase):
@classmethod
def setUpClass(cls):
super().setUpClass()
cls.tmp_root_path = tempfile.mkdtemp().encode('utf-8')
start_path = os.path.dirname(__file__).encode('utf-8')
sample_folder_archive = os.path.join(start_path,
b'../../../../..',
b'swh-storage-testdata',
b'dir-folders',
b'sample-folder.tgz')
cls.root_path = os.path.join(cls.tmp_root_path, b'sample-folder')
# uncompress the sample folder
subprocess.check_output(
['tar', 'xvf', sample_folder_archive, '-C', cls.tmp_root_path],
)
@classmethod
def tearDownClass(cls):
super().tearDownClass()
shutil.rmtree(cls.tmp_root_path)
+
+class DirLoaderListRepoObject(InitTestLoader):
+
def setUp(self):
super().setUp()
self.info = {
'storage': {
'cls': 'remote',
'args': {
'url': 'http://localhost:5002/',
}
},
'content_size_limit': 104857600,
'log_db': 'dbname=softwareheritage-log',
'directory_packet_size': 25000,
'content_packet_size': 10000,
'send_contents': True,
'send_directories': True,
'content_packet_size_bytes': 1073741824,
'occurrence_packet_size': 100000,
'send_revisions': True,
'revision_packet_size': 100000,
'content_packet_block_size_bytes': 104857600,
'send_occurrences': True,
'release_packet_size': 100000,
'send_releases': True
}
self.origin = {
'url': 'file:///dev/null',
'type': 'dir',
}
self.occurrence = {
'branch': 'master',
'authority_id': 1,
'validity': '2015-01-01 00:00:00+00',
}
self.revision = {
'author': {
'name': 'swh author',
'email': 'swh@inria.fr',
'fullname': 'swh'
},
'date': {
'timestamp': 1444054085,
'offset': 0
},
'committer': {
'name': 'swh committer',
'email': 'swh@inria.fr',
'fullname': 'swh'
},
'committer_date': {
'timestamp': 1444054085,
'offset': 0,
},
'type': 'tar',
'message': 'synthetic revision',
'metadata': {'foo': 'bar'},
}
self.release = {
'name': 'v0.0.1',
'date': {
'timestamp': 1444054085,
'offset': 0,
},
'author': {
'name': 'swh author',
'fullname': 'swh',
'email': 'swh@inria.fr',
},
'message': 'synthetic release',
}
self.dirloader = DirLoader(config=self.info)
@istest
def load_without_storage(self):
# when
objects = self.dirloader.list_repo_objs(
self.root_path,
self.revision,
self.release)
# then
self.assertEquals(len(objects), 4,
"4 objects types, blob, tree, revision, release")
self.assertEquals(len(objects[GitType.BLOB]), 8,
"8 contents: 3 files + 5 links")
self.assertEquals(len(objects[GitType.TREE]), 5,
- "5 directories: 4 subdirs + 1 empty + 1 main dir")
+ "5 directories: 4 subdirs + 1 empty")
self.assertEquals(len(objects[GitType.COMM]), 1, "synthetic revision")
self.assertEquals(len(objects[GitType.RELE]), 1, "synthetic release")
# print('objects: %s\n objects-per-path: %s\n' %
# (objects.keys(),
# objects_per_path.keys()))
+
+
+class LoaderNoStorageForTest:
+ """Mixin class to inhibit the persistence and keep in memory the data
+ sent for storage.
+
+ cf. SWHDirLoaderNoStorage
+
+ """
+ def __init__(self):
+ super().__init__()
+ # Init the state
+ self.all_contents = []
+ self.all_directories = []
+ self.all_revisions = []
+ self.all_releases = []
+ self.all_occurrences = []
+
+ def send_origin(self, origin):
+ self.origin = origin
+
+ def send_origin_visit(self, origin_id, ts):
+ self.origin_visit = {
+ 'origin': origin_id,
+ 'ts': ts,
+ 'visit': 1,
+ }
+ return self.origin_visit
+
+ def update_origin_visit(self, origin_id, visit, status):
+ self.status = status
+ self.origin_visit = visit
+
+ def maybe_load_contents(self, all_contents):
+ self.all_contents.extend(all_contents)
+
+ def maybe_load_directories(self, all_directories):
+ self.all_directories.extend(all_directories)
+
+ def maybe_load_revisions(self, all_revisions):
+ self.all_revisions.extend(all_revisions)
+
+ def maybe_load_releases(self, releases):
+ self.all_releases.extend(releases)
+
+ def maybe_load_occurrences(self, all_occurrences):
+ self.all_occurrences.extend(all_occurrences)
+
+ def open_fetch_history(self):
+ return 1
+
+ def close_fetch_history_success(self, fetch_history_id):
+ pass
+
+ def close_fetch_history_failure(self, fetch_history_id):
+ pass
+
+
+TEST_CONFIG = {
+ 'extraction_dir': '/tmp/tests/loader-tar/', # where to extract the tarball
+ 'storage': { # we instantiate it but we don't use it in test context
+ 'cls': 'remote',
+ 'args': {
+ 'url': 'http://127.0.0.1:9999', # somewhere that does not exist
+ }
+ },
+ 'send_contents': False,
+ 'send_directories': False,
+ 'send_revisions': False,
+ 'send_releases': False,
+ 'send_occurrences': False,
+ 'content_packet_size': 100,
+ 'content_packet_block_size_bytes': 104857600,
+ 'content_packet_size_bytes': 1073741824,
+ 'directory_packet_size': 250,
+ 'revision_packet_size': 100,
+ 'release_packet_size': 100,
+ 'occurrence_packet_size': 100,
+}
+
+
+def parse_config_file(base_filename=None, config_filename=None,
+ additional_configs=None, global_config=True):
+ return TEST_CONFIG
+
+
+# Inhibit side-effect loading configuration from disk
+DirLoader.parse_config_file = parse_config_file
+
+
+class SWHDirLoaderNoStorage(LoaderNoStorageForTest, DirLoader):
+ """A DirLoader with no persistence.
+
+ Context:
+ Load a tarball with a persistent-less tarball loader
+
+ """
+ pass
+
+
+class SWHDirLoaderITTest(InitTestLoader):
+ def setUp(self):
+ super().setUp()
+
+ self.loader = SWHDirLoaderNoStorage()
+
+ @istest
+ def load(self):
+ """Process a new tarball should be ok
+
+ """
+ # given
+ origin = {
+ 'url': 'file:///tmp/sample-folder',
+ 'type': 'dir'
+ }
+
+ visit_date = 'Tue, 3 May 2016 17:16:32 +0200'
+
+ import datetime
+ commit_time = int(datetime.datetime.now(
+ tz=datetime.timezone.utc).timestamp()
+ )
+
+ swh_person = {
+ 'name': 'Software Heritage',
+ 'fullname': 'Software Heritage',
+ 'email': 'robot@softwareheritage.org'
+ }
+
+ revision_message = 'swh-loader-tar: synthetic revision message'
+ revision_type = 'tar'
+ revision = {
+ 'date': {
+ 'timestamp': commit_time,
+ 'offset': 0,
+ },
+ 'committer_date': {
+ 'timestamp': commit_time,
+ 'offset': 0,
+ },
+ 'author': swh_person,
+ 'committer': swh_person,
+ 'type': revision_type,
+ 'message': revision_message,
+ 'metadata': {},
+ 'synthetic': True,
+ }
+
+ occurrence = {
+ 'branch': os.path.basename(self.root_path),
+ }
+
+ # when
+ self.loader.load(self.root_path, origin, visit_date, revision, None,
+ [occurrence])
+
+ # then
+ self.assertEquals(len(self.loader.all_contents), 8)
+ self.assertEquals(len(self.loader.all_directories), 5)
+ self.assertEquals(len(self.loader.all_revisions), 1)
+
+ actual_revision = self.loader.all_revisions[0]
+ self.assertEquals(actual_revision['synthetic'],
+ True)
+ self.assertEquals(actual_revision['parents'],
+ [])
+ self.assertEquals(actual_revision['type'],
+ 'tar')
+ self.assertEquals(actual_revision['message'],
+ b'swh-loader-tar: synthetic revision message')
+
+ self.assertEquals(len(self.loader.all_releases), 0)
+ self.assertEquals(len(self.loader.all_occurrences), 1)
diff --git a/version.txt b/version.txt
index 67b844c..2ceb9b5 100644
--- a/version.txt
+++ b/version.txt
@@ -1 +1 @@
-v0.0.26-0-gbadc8d3
\ No newline at end of file
+v0.0.27-0-g5583044
\ No newline at end of file

File Metadata

Mime Type
application/octet-stream
Expires
Sun, Jul 27, 4:14 PM (2 d)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3331304

Event Timeline