Page MenuHomeSoftware Heritage

No OneTemporary

diff --git a/PKG-INFO b/PKG-INFO
index 401b595..abf9369 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,101 +1,101 @@
Metadata-Version: 2.1
Name: swh.loader.git
-Version: 0.1.1
+Version: 0.1.2
Summary: Software Heritage git loader
Home-page: https://forge.softwareheritage.org/diffusion/DLDG/
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
License: UNKNOWN
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-loader-git
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-loader-git/
Description: swh-loader-git
==============
The Software Heritage Git Loader is a tool and a library to walk a local
Git repository and inject into the SWH dataset all contained files that
weren't known before.
License
-------
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public
License along with this program.
Dependencies
------------
### Runtime
- python3
- python3-dulwich
- python3-retrying
- python3-swh.core
- python3-swh.model
- python3-swh.storage
- python3-swh.scheduler
### Test
- python3-nose
Requirements
------------
- implementation language, Python3
- coding guidelines: conform to PEP8
- Git access: via dulwich
Configuration
-------------
You can run the loader from a remote origin (*loader*) or from an
origin on disk (*from_disk*) directly by calling:
```
python3 -m swh.loader.git.{loader,from_disk}
```
### Location
Both tools expect a configuration file.
Either one of the following location:
- /etc/softwareheritage/
- ~/.config/swh/
- ~/.swh/
Note: Will call that location $SWH_CONFIG_PATH
### Configuration sample
Respectively the loader from a remote (`git.yml`) and the loader from
a disk (`git-disk.yml`), $SWH_CONFIG_PATH/loader/git{-disk}.yml:
```
storage:
cls: remote
args:
url: http://localhost:5002/
```
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: testing
diff --git a/debian/changelog b/debian/changelog
index 2149b57..86d9c0b 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,476 +1,478 @@
-swh-loader-git (0.1.1-1~swh1~bpo10+1) buster-swh; urgency=medium
+swh-loader-git (0.1.2-1~swh1) unstable-swh; urgency=medium
- * Rebuild for buster-swh
+ * New upstream release 0.1.2 - (tagged by Antoine Lambert
+ <antoine.lambert@inria.fr> on 2020-06-03 14:54:36 +0200)
+ * Upstream changes: - version 0.1.2
- -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 02 Jun 2020 15:51:35 +0000
+ -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 03 Jun 2020 12:57:56 +0000
swh-loader-git (0.1.1-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.1.1 - (tagged by Antoine Lambert
<antoine.lambert@inria.fr> on 2020-06-02 17:44:44 +0200)
* Upstream changes: - version 0.1.1
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 02 Jun 2020 15:50:08 +0000
swh-loader-git (0.1.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.1.0 - (tagged by Nicolas Dandrimont
<nicolas@dandrimont.eu> on 2020-05-29 10:33:12 +0200)
* Upstream changes: - Release swh.loader.git v0.1.0 - Use the
previous snapshot instead of any object from the archive to do -
incremental loads - Merge branch filtering behavior between the
local and remote loaders - Add default target branch for
symbolic references
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 29 May 2020 08:37:48 +0000
swh-loader-git (0.0.60-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.60 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2020-04-15 11:52:55
+0200)
* Upstream changes: - v0.0.60 - git.loader: fix failing origin
visit update step - Add a pyproject.toml file to target py37 for
black - Enable black
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 15 Apr 2020 10:05:51 +0000
swh-loader-git (0.0.59-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.59 - (tagged by Antoine Lambert
<antoine.lambert@inria.fr> on 2020-04-06 11:59:27 +0200)
* Upstream changes: - version 0.0.59
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 06 Apr 2020 10:04:59 +0000
swh-loader-git (0.0.58-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.58 - (tagged by Valentin Lorentz
<vlorentz@softwareheritage.org> on 2020-03-02 11:25:43 +0100)
* Upstream changes: - v0.0.58 - * Use origin_visit_get_latest
instead of snapshot_get_latest. - * Use swh-model objects
instead of dicts.
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 02 Mar 2020 10:28:37 +0000
swh-loader-git (0.0.57-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.57 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2020-02-07 03:32:49
+0100)
* Upstream changes: - v0.0.57 - loaders: Remove content size
computation during conversion
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 07 Feb 2020 02:45:56 +0000
swh-loader-git (0.0.56-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.56 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2020-01-28 13:24:24
+0100)
* Upstream changes: - v0.0.56 - git.loader: Migrate from
UnbufferedLoader to DVCSLoader
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 28 Jan 2020 12:27:02 +0000
swh-loader-git (0.0.55-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.55 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-12-12 14:41:10
+0100)
* Upstream changes: - v0.0.55 - loader: Bump dependency on
loader-core
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 12 Dec 2019 13:44:22 +0000
swh-loader-git (0.0.54-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.54 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-12-12 11:43:50
+0100)
* Upstream changes: - v0.0.54 - tasks: Enforce kwargs use in
task message
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 12 Dec 2019 10:46:28 +0000
swh-loader-git (0.0.53-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.53 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-12-10 11:24:30
+0100)
* Upstream changes: - v0.0.53 - tasks: Unify message format
with other loaders - tasks: Use celery's shared_task decorator
- tests: Migrate to pytest-mock's fixture - loader.git: Register
git worker - tasks: Rename task according to production -
git: Unify loaders constructor - Fix a typo reported by
codespell - Add a pre-commit config file - Migrate tox.ini
to extras = xxx instead of deps = .[testing] - De-specify
testenv:py3 - Drop version constraint on pytest < 4 -
Include all requirements in MANIFEST.in - Add support for
symbolic references
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 10 Dec 2019 10:27:32 +0000
swh-loader-git (0.0.52-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.52 - (tagged by Stefano Zacchiroli
<zack@upsilon.cc> on 2019-10-10 12:07:05 +0200)
* Upstream changes: - v0.0.52 - (brown paper bag release) -
* MANIFEST.in: ship py.typed
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 10 Oct 2019 10:12:13 +0000
swh-loader-git (0.0.51-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.51 - (tagged by Stefano Zacchiroli
<zack@upsilon.cc> on 2019-10-10 11:59:01 +0200)
* Upstream changes: - v0.0.51 - * tox.ini: Fix py3 environment
to use packaged tests - * typing: minimal changes to make a no-
op mypy run pass - * test_from_disk.py: avoid shadowing base
classes in tests
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 10 Oct 2019 10:02:08 +0000
swh-loader-git (0.0.50-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.50 - (tagged by Antoine Lambert
<antoine.lambert@inria.fr> on 2019-09-03 13:07:54 +0200)
* Upstream changes: - version 0.0.50
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 03 Sep 2019 11:13:19 +0000
swh-loader-git (0.0.49-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.49 - (tagged by Valentin Lorentz
<vlorentz@softwareheritage.org> on 2019-06-12 15:05:10 +0200)
* Upstream changes: - Use origin URLs instead of numeric ids.
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 19 Jun 2019 10:28:05 +0000
swh-loader-git (0.0.48-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.48 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-01-30 11:18:55
+0100)
* Upstream changes: - v0.0.48 - Bump dependency on swh-
scheduler 0.0.39 - Rewrite celery tasks as a decorated function
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 30 Jan 2019 10:22:22 +0000
swh-loader-git (0.0.43-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.43
* Support the new paginated snapshot branch fetching functions
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 18 Oct 2018 18:49:26 +0200
swh-loader-git (0.0.42-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.42
* Fix critical bug in incremental loading
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 11 Oct 2018 17:19:07 +0200
swh-loader-git (0.0.41-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.41
* Use explicit keyword argument for base_url in the load task
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 11 Oct 2018 16:26:27 +0200
swh-loader-git (0.0.40-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.40
* Improve python packaging
* Make the loader more robust against holes in the history caused by
* buggy imports
* Allow ignoring the history to make a full load
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Tue, 09 Oct 2018 16:28:14 +0200
swh-loader-git (0.0.39-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.39
* Avoid walking the history of large git repos, which takes a long
time
* Really save packfiles
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 20 Sep 2018 17:22:17 +0200
swh-loader-git (0.0.38-1~swh1) unstable-swh; urgency=medium
* v0.0.38
* Improve origin_visit initialization step
* Properly sandbox the prepare statement so that if it breaks, we can
* update appropriately the visit with the correct status
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 07 Mar 2018 11:39:30 +0100
swh-loader-git (0.0.37-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.37
* Remove spurious debug print
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Tue, 06 Feb 2018 16:00:40 +0100
swh-loader-git (0.0.36-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.36
* Update to use snapshots instead of occurrences
* Use dulwich get_transport_and_path rather than hardcode the tcp
transport
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Tue, 06 Feb 2018 14:42:36 +0100
swh-loader-git (0.0.35-1~swh1) unstable-swh; urgency=medium
* v0.0.35
* swh.loader.git.loader: Warn when object is corrupted and continue
* swh.loader.git.loader: Add structured data to the log message
regarding skipping objects
* swh.loader.git.loader: Force further checks on objects
* swh.loader.git.loader: Unify reading object from the repository
* swh.loader.git.loader: Warn when object malformed and continue
* swh.loader.git.loader: Trap missing object id and continue
* swh.loader.git.base: Reuse swh.loader.core base loader
* swh.loader.git.converters: Fix release time conversion issue when no
date provided
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 18 Dec 2017 12:08:01 +0100
swh-loader-git (0.0.34-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git version 0.0.34
* Update packaging runes
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 12 Oct 2017 20:12:11 +0200
swh-loader-git (0.0.33-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.33
* make the updater's parent commit cache more useful
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 15 Sep 2017 18:45:41 +0200
swh-loader-git (0.0.32-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git 0.0.32
* Update tasks to new swh.scheduler API
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Mon, 12 Jun 2017 18:04:50 +0200
swh-loader-git (0.0.31-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.31
* Migrate from swh.core.hashutil to swh.model.hashutil
* Only send objects that are actually missing
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 17 Mar 2017 17:40:17 +0100
swh-loader-git (0.0.30-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.30
* Fix handling of mergetag headers
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 09 Mar 2017 11:30:08 +0100
swh-loader-git (0.0.29-1~swh1) unstable-swh; urgency=medium
* v0.0.29
* GitLoaderFromArchive: Use the same configuration file as
* GitLoader (permit to deploy both as the same unit)
* git reader: Refactor to allow listing revisions as well as contents
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 20 Feb 2017 11:32:24 +0100
swh-loader-git (0.0.28-1~swh1) unstable-swh; urgency=medium
* v0.0.28
* loader: Fix fetch_date override
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 15 Feb 2017 18:43:32 +0100
swh-loader-git (0.0.27-1~swh1) unstable-swh; urgency=medium
* v0.0.27
* Add loader-git from archive
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 14 Feb 2017 18:56:52 +0100
swh-loader-git (0.0.26-1~swh1) unstable-swh; urgency=medium
* v0.0.26
* Add a git loader able to deal with git repository in archive
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 14 Feb 2017 16:24:50 +0100
swh-loader-git (0.0.25-1~swh1) unstable-swh; urgency=medium
* v0.0.25
* Fix to permit to actually pass the fetch date as parameter for
* the loading git disk loader
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Fri, 10 Feb 2017 17:34:35 +0100
swh-loader-git (0.0.24-1~swh1) unstable-swh; urgency=medium
* v0.0.24
* Update storage configuration reading
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Thu, 15 Dec 2016 18:40:29 +0100
swh-loader-git (0.0.23-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.23
* Make the save_data mechanism generic
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 02 Dec 2016 15:34:05 +0100
swh-loader-git (0.0.22-1~swh1) unstable-swh; urgency=medium
* v0.0.22
* Improve reader to permit to use it as analyzer tool
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Fri, 04 Nov 2016 10:37:24 +0100
swh-loader-git (0.0.21-1~swh1) unstable-swh; urgency=medium
* v0.0.21
* Improve the reader git to load all contents from a pack.
* Improve to avoid unnecessary readings from db
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 26 Oct 2016 17:06:12 +0200
swh-loader-git (0.0.20-1~swh1) unstable-swh; urgency=medium
* v0.0.20
* Add new reader git task
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 25 Oct 2016 18:40:17 +0200
swh-loader-git (0.0.19-1~swh1) unstable-swh; urgency=medium
* v0.0.19
* Update git loaders to register origin_visit's state
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 23 Aug 2016 16:34:15 +0200
swh-loader-git (0.0.18-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.18
* Properly handle skipped contents
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 19 Aug 2016 18:12:44 +0200
swh-loader-git (0.0.16-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.16
* Add exist_ok to packfile cache directory creation
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Mon, 01 Aug 2016 15:53:07 +0200
swh-loader-git (0.0.15-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.15
* Absence of remote refs doesn't throw an error in updater
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Wed, 15 Jun 2016 01:20:37 +0200
swh-loader-git (0.0.14-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.14
* Add a disk loader using dulwich
* Rework the loader logic to use a single pattern for both loaders
* Allow caching of packfiles for the remote loader
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Tue, 14 Jun 2016 18:10:21 +0200
swh-loader-git (0.0.13-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.13
* Update for latest schema revision
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 08 Apr 2016 16:46:41 +0200
swh-loader-git (0.0.12-1~swh1) unstable-swh; urgency=medium
* Release swh-loader-git v0.0.12
* Update to use new swh.storage api for object listing
* Add a size limit to packfiles
* Return a proper eventfulness for empty repositories
* Do not crawl the pack file if unnecessary
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 25 Feb 2016 18:21:34 +0100
swh-loader-git (0.0.11-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.11
* Implement git updater
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 19 Feb 2016 19:13:22 +0100
swh-loader-git (0.0.10-1~swh1) unstable-swh; urgency=medium
* Prepare swh.loader.git release v0.0.10
* Update for swh.model
* Use new swh.storage
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Mon, 07 Dec 2015 18:59:46 +0100
swh-loader-git (0.0.9-1~swh1) unstable-swh; urgency=medium
* Prepare deployment of swh.loader.git v0.0.9
* Close fetch_history on failure too
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Wed, 04 Nov 2015 10:54:37 +0100
swh-loader-git (0.0.8-1~swh1) unstable-swh; urgency=medium
* Prepare deployment of swh.loader.git v0.0.8
* New database schema (v028)
* Populate fetch_history (T121)
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Tue, 27 Oct 2015 18:11:26 +0100
swh-loader-git (0.0.7-1~swh1) unstable-swh; urgency=medium
* Prepare swh.loader.git v0.0.7 deployment
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Mon, 19 Oct 2015 12:37:09 +0200
swh-loader-git (0.0.6-1~swh1) unstable-swh; urgency=medium
* Prepare deployment of swh.loader.git v0.0.6
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 09 Oct 2015 17:50:35 +0200
swh-loader-git (0.0.5-1~swh1) unstable-swh; urgency=medium
* Prepare deployment of swh.loader.git v0.0.5
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Tue, 06 Oct 2015 17:42:11 +0200
swh-loader-git (0.0.4-1~swh1) unstable-swh; urgency=medium
* Prepare deployment of swh.loader.git v0.0.4
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 02 Oct 2015 14:54:04 +0200
swh-loader-git (0.0.3-1~swh1) unstable-swh; urgency=medium
* Prepare deployment of swh.loader.git v0.0.3
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 01 Oct 2015 11:36:28 +0200
swh-loader-git (0.0.2-1~swh1) unstable-swh; urgency=medium
* Prepare deploying swh.loader.git v0.0.2
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Tue, 29 Sep 2015 17:22:09 +0200
swh-loader-git (0.0.1-1~swh1) unstable-swh; urgency=medium
* Initial release
* Tagging swh.loader.git v0.0.1
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 25 Sep 2015 16:04:00 +0200
diff --git a/requirements-swh.txt b/requirements-swh.txt
index 38ea505..44d1ffe 100644
--- a/requirements-swh.txt
+++ b/requirements-swh.txt
@@ -1,5 +1,5 @@
swh.core >= 0.0.7
swh.loader.core >= 0.0.78
-swh.model >= 0.0.60
+swh.model >= 0.3.0
swh.scheduler >= 0.0.39
swh.storage >= 0.0.108
diff --git a/swh.loader.git.egg-info/PKG-INFO b/swh.loader.git.egg-info/PKG-INFO
index 401b595..abf9369 100644
--- a/swh.loader.git.egg-info/PKG-INFO
+++ b/swh.loader.git.egg-info/PKG-INFO
@@ -1,101 +1,101 @@
Metadata-Version: 2.1
Name: swh.loader.git
-Version: 0.1.1
+Version: 0.1.2
Summary: Software Heritage git loader
Home-page: https://forge.softwareheritage.org/diffusion/DLDG/
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
License: UNKNOWN
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-loader-git
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-loader-git/
Description: swh-loader-git
==============
The Software Heritage Git Loader is a tool and a library to walk a local
Git repository and inject into the SWH dataset all contained files that
weren't known before.
License
-------
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public
License along with this program.
Dependencies
------------
### Runtime
- python3
- python3-dulwich
- python3-retrying
- python3-swh.core
- python3-swh.model
- python3-swh.storage
- python3-swh.scheduler
### Test
- python3-nose
Requirements
------------
- implementation language, Python3
- coding guidelines: conform to PEP8
- Git access: via dulwich
Configuration
-------------
You can run the loader from a remote origin (*loader*) or from an
origin on disk (*from_disk*) directly by calling:
```
python3 -m swh.loader.git.{loader,from_disk}
```
### Location
Both tools expect a configuration file.
Either one of the following location:
- /etc/softwareheritage/
- ~/.config/swh/
- ~/.swh/
Note: Will call that location $SWH_CONFIG_PATH
### Configuration sample
Respectively the loader from a remote (`git.yml`) and the loader from
a disk (`git-disk.yml`), $SWH_CONFIG_PATH/loader/git{-disk}.yml:
```
storage:
cls: remote
args:
url: http://localhost:5002/
```
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: testing
diff --git a/swh.loader.git.egg-info/requires.txt b/swh.loader.git.egg-info/requires.txt
index c8de554..1edea3b 100644
--- a/swh.loader.git.egg-info/requires.txt
+++ b/swh.loader.git.egg-info/requires.txt
@@ -1,14 +1,14 @@
dulwich>=0.18.7
retrying
vcversioner
click
swh.core>=0.0.7
swh.loader.core>=0.0.78
-swh.model>=0.0.60
+swh.model>=0.3.0
swh.scheduler>=0.0.39
swh.storage>=0.0.108
[testing]
pytest
pytest-mock
swh.scheduler[testing]
diff --git a/swh/loader/git/converters.py b/swh/loader/git/converters.py
index 0cfb46e..be3abac 100644
--- a/swh/loader/git/converters.py
+++ b/swh/loader/git/converters.py
@@ -1,189 +1,189 @@
# Copyright (C) 2015-2020 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
"""Convert dulwich objects to dictionaries suitable for swh.storage"""
from typing import Any, Dict, Optional
from swh.model.hashutil import DEFAULT_ALGORITHMS, hash_to_bytes, MultiHash
from swh.model.model import (
BaseContent,
Content,
Directory,
DirectoryEntry,
ObjectType,
Person,
Release,
Revision,
RevisionType,
SkippedContent,
TargetType,
Timestamp,
TimestampWithTimezone,
)
HASH_ALGORITHMS = DEFAULT_ALGORITHMS - {"sha1_git"}
def dulwich_blob_to_content_id(blob) -> Dict[str, Any]:
"""Convert a dulwich blob to a Software Heritage content id"""
if blob.type_name != b"blob":
raise ValueError("Argument is not a blob.")
size = blob.raw_length()
data = blob.as_raw_string()
hashes = MultiHash.from_data(data, HASH_ALGORITHMS).digest()
hashes["sha1_git"] = blob.sha().digest()
hashes["length"] = size
return hashes
def dulwich_blob_to_content(blob, max_content_size=None) -> BaseContent:
"""Convert a dulwich blob to a Software Heritage content
"""
if blob.type_name != b"blob":
raise ValueError("Argument is not a blob.")
hashes = dulwich_blob_to_content_id(blob)
if max_content_size is not None and hashes["length"] >= max_content_size:
return SkippedContent(status="absent", reason="Content too large", **hashes,)
else:
return Content(data=blob.as_raw_string(), status="visible", **hashes,)
def dulwich_tree_to_directory(tree, log=None) -> Directory:
"""Format a tree as a directory"""
if tree.type_name != b"tree":
raise ValueError("Argument is not a tree.")
entries = []
entry_mode_map = {
0o040000: "dir",
0o160000: "rev",
0o100644: "file",
0o100755: "file",
0o120000: "file",
}
for entry in tree.iteritems():
entries.append(
DirectoryEntry(
type=entry_mode_map.get(entry.mode, "file"),
perms=entry.mode,
name=entry.path,
target=hash_to_bytes(entry.sha.decode("ascii")),
)
)
- return Directory(id=tree.sha().digest(), entries=entries,)
+ return Directory(id=tree.sha().digest(), entries=tuple(entries),)
def parse_author(name_email: bytes) -> Person:
"""Parse an author line"""
return Person.from_fullname(name_email)
def dulwich_tsinfo_to_timestamp(
timestamp, timezone, timezone_neg_utc
) -> TimestampWithTimezone:
"""Convert the dulwich timestamp information to a structure compatible with
Software Heritage"""
return TimestampWithTimezone(
timestamp=Timestamp(seconds=int(timestamp), microseconds=0,),
offset=timezone // 60,
negative_utc=timezone_neg_utc if timezone == 0 else False,
)
def dulwich_commit_to_revision(commit, log=None) -> Revision:
if commit.type_name != b"commit":
raise ValueError("Argument is not a commit.")
git_metadata = []
if commit.encoding is not None:
git_metadata.append(["encoding", commit.encoding])
if commit.mergetag:
for mergetag in commit.mergetag:
raw_string = mergetag.as_raw_string()
assert raw_string.endswith(b"\n")
git_metadata.append(["mergetag", raw_string[:-1]])
if commit.extra:
git_metadata.extend([k.decode("utf-8"), v] for k, v in commit.extra)
if commit.gpgsig:
git_metadata.append(["gpgsig", commit.gpgsig])
if git_metadata:
metadata: Optional[Dict[str, Any]] = {
"extra_headers": git_metadata,
}
else:
metadata = None
return Revision(
id=commit.sha().digest(),
author=parse_author(commit.author),
date=dulwich_tsinfo_to_timestamp(
commit.author_time, commit.author_timezone, commit._author_timezone_neg_utc,
),
committer=parse_author(commit.committer),
committer_date=dulwich_tsinfo_to_timestamp(
commit.commit_time, commit.commit_timezone, commit._commit_timezone_neg_utc,
),
type=RevisionType.GIT,
directory=bytes.fromhex(commit.tree.decode()),
message=commit.message,
metadata=metadata,
synthetic=False,
- parents=[bytes.fromhex(p.decode()) for p in commit.parents],
+ parents=tuple(bytes.fromhex(p.decode()) for p in commit.parents),
)
DULWICH_TARGET_TYPES = {
b"blob": TargetType.CONTENT,
b"tree": TargetType.DIRECTORY,
b"commit": TargetType.REVISION,
b"tag": TargetType.RELEASE,
}
DULWICH_OBJECT_TYPES = {
b"blob": ObjectType.CONTENT,
b"tree": ObjectType.DIRECTORY,
b"commit": ObjectType.REVISION,
b"tag": ObjectType.RELEASE,
}
def dulwich_tag_to_release(tag, log=None) -> Release:
if tag.type_name != b"tag":
raise ValueError("Argument is not a tag.")
target_type, target = tag.object
if tag.tagger:
author: Optional[Person] = parse_author(tag.tagger)
if not tag.tag_time:
date = None
else:
date = dulwich_tsinfo_to_timestamp(
tag.tag_time, tag.tag_timezone, tag._tag_timezone_neg_utc,
)
else:
author = date = None
return Release(
id=tag.sha().digest(),
author=author,
date=date,
name=tag.name,
target=bytes.fromhex(target.decode()),
target_type=DULWICH_OBJECT_TYPES[target_type.type_name],
message=tag._message,
metadata=None,
synthetic=False,
)
diff --git a/swh/loader/git/tests/test_converters.py b/swh/loader/git/tests/test_converters.py
index 8b71a80..849de2a 100644
--- a/swh/loader/git/tests/test_converters.py
+++ b/swh/loader/git/tests/test_converters.py
@@ -1,319 +1,319 @@
# Copyright (C) 2015-2018 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import os
import pytest
import shutil
import subprocess
import tempfile
import unittest
import dulwich.repo
from swh.model.hashutil import bytehex_to_hash, hash_to_bytes
from swh.model.model import (
Content,
Person,
Release,
Revision,
RevisionType,
ObjectType,
Timestamp,
TimestampWithTimezone,
)
import swh.loader.git.converters as converters
TEST_DATA = os.path.join(os.path.dirname(__file__), "data")
class SWHObjectType:
"""Dulwich lookalike ObjectType class
"""
def __init__(self, type_name):
self.type_name = type_name
class SWHTag:
"""Dulwich lookalike tag class
"""
def __init__(
self,
name,
type_name,
target,
target_type,
tagger,
tag_time,
tag_timezone,
message,
):
self.name = name
self.type_name = type_name
self.object = SWHObjectType(target_type), target
self.tagger = tagger
self._message = message
self.tag_time = tag_time
self.tag_timezone = tag_timezone
self._tag_timezone_neg_utc = False
def sha(self):
from hashlib import sha1
return sha1()
@pytest.mark.fs
class TestConverters(unittest.TestCase):
@classmethod
def setUpClass(cls):
super().setUpClass()
cls.repo_path = tempfile.mkdtemp()
cls.repo = dulwich.repo.Repo.init_bare(cls.repo_path)
fast_export = os.path.join(
TEST_DATA, "git-repos", "example-submodule.fast-export.xz"
)
xz = subprocess.Popen(
["xzcat"], stdin=open(fast_export, "rb"), stdout=subprocess.PIPE,
)
git = subprocess.Popen(
["git", "fast-import", "--quiet"], stdin=xz.stdout, cwd=cls.repo_path,
)
# flush stdout of xz
xz.stdout.close()
git.communicate()
@classmethod
def tearDownClass(cls):
super().tearDownClass()
shutil.rmtree(cls.repo_path)
def test_blob_to_content(self):
content_id = b"28c6f4023d65f74e3b59a2dea3c4277ed9ee07b0"
content = converters.dulwich_blob_to_content(self.repo[content_id])
expected_content = Content(
sha1_git=bytehex_to_hash(content_id),
sha1=hash_to_bytes("4850a3420a2262ff061cb296fb915430fa92301c"),
sha256=hash_to_bytes(
"fee7c8a485a10321ad94b64135073cb5" "5f22cb9f57fa2417d2adfb09d310adef"
),
blake2s256=hash_to_bytes(
"5d71873f42a137f6d89286e43677721e574" "1fa05ce4cd5e3c7ea7c44d4c2d10b"
),
data=(
b'[submodule "example-dependency"]\n'
b"\tpath = example-dependency\n"
b"\turl = https://github.com/githubtraining/"
b"example-dependency.git\n"
),
length=124,
status="visible",
)
self.assertEqual(content, expected_content)
def test_convertion_wrong_input(self):
class Something:
type_name = b"something-not-the-right-type"
m = {
"blob": converters.dulwich_blob_to_content,
"blob2": converters.dulwich_blob_to_content_id,
"tree": converters.dulwich_tree_to_directory,
"commit": converters.dulwich_tree_to_directory,
"tag": converters.dulwich_tag_to_release,
}
for _callable in m.values():
with self.assertRaises(ValueError):
_callable(Something())
def test_commit_to_revision(self):
sha1 = b"9768d0b576dbaaecd80abedad6dfd0d72f1476da"
revision = converters.dulwich_commit_to_revision(self.repo[sha1])
expected_revision = Revision(
id=hash_to_bytes("9768d0b576dbaaecd80abedad6dfd0d72f1476da"),
directory=b"\xf0i\\./\xa7\xce\x9dW@#\xc3A7a\xa4s\xe5\x00\xca",
type=RevisionType.GIT,
committer=Person(
name=b"Stefano Zacchiroli",
fullname=b"Stefano Zacchiroli <zack@upsilon.cc>",
email=b"zack@upsilon.cc",
),
author=Person(
name=b"Stefano Zacchiroli",
fullname=b"Stefano Zacchiroli <zack@upsilon.cc>",
email=b"zack@upsilon.cc",
),
committer_date=TimestampWithTimezone(
timestamp=Timestamp(seconds=1443083765, microseconds=0,),
negative_utc=False,
offset=120,
),
message=b"add submodule dependency\n",
metadata=None,
date=TimestampWithTimezone(
timestamp=Timestamp(seconds=1443083765, microseconds=0,),
negative_utc=False,
offset=120,
),
- parents=[b"\xc3\xc5\x88q23`\x9f[\xbb\xb2\xd9\xe7\xf3\xfbJf\x0f?r"],
+ parents=(b"\xc3\xc5\x88q23`\x9f[\xbb\xb2\xd9\xe7\xf3\xfbJf\x0f?r",),
synthetic=False,
)
self.assertEqual(revision, expected_revision)
def test_author_line_to_author(self):
# edge case out of the way
with self.assertRaises(TypeError):
converters.parse_author(None)
tests = {
b"a <b@c.com>": Person(
name=b"a", email=b"b@c.com", fullname=b"a <b@c.com>",
),
b"<foo@bar.com>": Person(
name=None, email=b"foo@bar.com", fullname=b"<foo@bar.com>",
),
b"malformed <email": Person(
name=b"malformed", email=b"email", fullname=b"malformed <email"
),
b"trailing <sp@c.e> ": Person(
name=b"trailing", email=b"sp@c.e", fullname=b"trailing <sp@c.e> ",
),
b"no<sp@c.e>": Person(name=b"no", email=b"sp@c.e", fullname=b"no<sp@c.e>",),
b" <>": Person(name=None, email=None, fullname=b" <>",),
b"something": Person(name=b"something", email=None, fullname=b"something"),
}
for author in sorted(tests):
parsed_author = tests[author]
self.assertEqual(parsed_author, converters.parse_author(author))
def test_dulwich_tag_to_release_no_author_no_date(self):
target = b"641fb6e08ddb2e4fd096dcf18e80b894bf"
message = b"some release message"
tag = SWHTag(
name=b"blah",
type_name=b"tag",
target=target,
target_type=b"commit",
message=message,
tagger=None,
tag_time=None,
tag_timezone=None,
)
# when
actual_release = converters.dulwich_tag_to_release(tag)
# then
expected_release = Release(
author=None,
date=None,
id=b"\xda9\xa3\xee^kK\r2U\xbf\xef\x95`\x18\x90\xaf\xd8\x07\t",
message=message,
metadata=None,
name=b"blah",
synthetic=False,
target=hash_to_bytes(target.decode()),
target_type=ObjectType.REVISION,
)
self.assertEqual(actual_release, expected_release)
def test_dulwich_tag_to_release_author_and_date(self):
tagger = b"hey dude <hello@mail.org>"
target = b"641fb6e08ddb2e4fd096dcf18e80b894bf"
message = b"some release message"
import datetime
date = datetime.datetime(2007, 12, 5, tzinfo=datetime.timezone.utc).timestamp()
tag = SWHTag(
name=b"blah",
type_name=b"tag",
target=target,
target_type=b"commit",
message=message,
tagger=tagger,
tag_time=date,
tag_timezone=0,
)
# when
actual_release = converters.dulwich_tag_to_release(tag)
# then
expected_release = Release(
author=Person(
email=b"hello@mail.org",
fullname=b"hey dude <hello@mail.org>",
name=b"hey dude",
),
date=TimestampWithTimezone(
negative_utc=False,
offset=0,
timestamp=Timestamp(seconds=1196812800, microseconds=0,),
),
id=b"\xda9\xa3\xee^kK\r2U\xbf\xef\x95`\x18\x90\xaf\xd8\x07\t",
message=message,
metadata=None,
name=b"blah",
synthetic=False,
target=hash_to_bytes(target.decode()),
target_type=ObjectType.REVISION,
)
self.assertEqual(actual_release, expected_release)
def test_dulwich_tag_to_release_author_no_date(self):
# to reproduce bug T815 (fixed)
tagger = b"hey dude <hello@mail.org>"
target = b"641fb6e08ddb2e4fd096dcf18e80b894bf"
message = b"some release message"
tag = SWHTag(
name=b"blah",
type_name=b"tag",
target=target,
target_type=b"commit",
message=message,
tagger=tagger,
tag_time=None,
tag_timezone=None,
)
# when
actual_release = converters.dulwich_tag_to_release(tag)
# then
expected_release = Release(
author=Person(
email=b"hello@mail.org",
fullname=b"hey dude <hello@mail.org>",
name=b"hey dude",
),
date=None,
id=b"\xda9\xa3\xee^kK\r2U\xbf\xef\x95`\x18\x90\xaf\xd8\x07\t",
message=message,
metadata=None,
name=b"blah",
synthetic=False,
target=hash_to_bytes(target.decode()),
target_type=ObjectType.REVISION,
)
self.assertEqual(actual_release, expected_release)
diff --git a/version.txt b/version.txt
index 17a6c63..670643b 100644
--- a/version.txt
+++ b/version.txt
@@ -1 +1 @@
-v0.1.1-0-g7323aec
\ No newline at end of file
+v0.1.2-0-g302c01e
\ No newline at end of file

File Metadata

Mime Type
text/x-diff
Expires
Wed, Jun 4, 7:11 PM (1 w, 3 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3398970

Event Timeline