Page Menu
Home
Software Heritage
Search
Configure Global Search
Log In
Files
F8393294
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Mute Notifications
Award Token
Flag For Later
Size
43 KB
Subscribers
None
View Options
diff --git a/PKG-INFO b/PKG-INFO
index 401b595..abf9369 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,101 +1,101 @@
Metadata-Version: 2.1
Name: swh.loader.git
-Version: 0.1.1
+Version: 0.1.2
Summary: Software Heritage git loader
Home-page: https://forge.softwareheritage.org/diffusion/DLDG/
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
License: UNKNOWN
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-loader-git
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-loader-git/
Description: swh-loader-git
==============
The Software Heritage Git Loader is a tool and a library to walk a local
Git repository and inject into the SWH dataset all contained files that
weren't known before.
License
-------
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public
License along with this program.
Dependencies
------------
### Runtime
- python3
- python3-dulwich
- python3-retrying
- python3-swh.core
- python3-swh.model
- python3-swh.storage
- python3-swh.scheduler
### Test
- python3-nose
Requirements
------------
- implementation language, Python3
- coding guidelines: conform to PEP8
- Git access: via dulwich
Configuration
-------------
You can run the loader from a remote origin (*loader*) or from an
origin on disk (*from_disk*) directly by calling:
```
python3 -m swh.loader.git.{loader,from_disk}
```
### Location
Both tools expect a configuration file.
Either one of the following location:
- /etc/softwareheritage/
- ~/.config/swh/
- ~/.swh/
Note: Will call that location $SWH_CONFIG_PATH
### Configuration sample
Respectively the loader from a remote (`git.yml`) and the loader from
a disk (`git-disk.yml`), $SWH_CONFIG_PATH/loader/git{-disk}.yml:
```
storage:
cls: remote
args:
url: http://localhost:5002/
```
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: testing
diff --git a/debian/changelog b/debian/changelog
index 2149b57..86d9c0b 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,476 +1,478 @@
-swh-loader-git (0.1.1-1~swh1~bpo10+1) buster-swh; urgency=medium
+swh-loader-git (0.1.2-1~swh1) unstable-swh; urgency=medium
- * Rebuild for buster-swh
+ * New upstream release 0.1.2 - (tagged by Antoine Lambert
+ <antoine.lambert@inria.fr> on 2020-06-03 14:54:36 +0200)
+ * Upstream changes: - version 0.1.2
- -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 02 Jun 2020 15:51:35 +0000
+ -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 03 Jun 2020 12:57:56 +0000
swh-loader-git (0.1.1-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.1.1 - (tagged by Antoine Lambert
<antoine.lambert@inria.fr> on 2020-06-02 17:44:44 +0200)
* Upstream changes: - version 0.1.1
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 02 Jun 2020 15:50:08 +0000
swh-loader-git (0.1.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.1.0 - (tagged by Nicolas Dandrimont
<nicolas@dandrimont.eu> on 2020-05-29 10:33:12 +0200)
* Upstream changes: - Release swh.loader.git v0.1.0 - Use the
previous snapshot instead of any object from the archive to do -
incremental loads - Merge branch filtering behavior between the
local and remote loaders - Add default target branch for
symbolic references
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 29 May 2020 08:37:48 +0000
swh-loader-git (0.0.60-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.60 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2020-04-15 11:52:55
+0200)
* Upstream changes: - v0.0.60 - git.loader: fix failing origin
visit update step - Add a pyproject.toml file to target py37 for
black - Enable black
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 15 Apr 2020 10:05:51 +0000
swh-loader-git (0.0.59-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.59 - (tagged by Antoine Lambert
<antoine.lambert@inria.fr> on 2020-04-06 11:59:27 +0200)
* Upstream changes: - version 0.0.59
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 06 Apr 2020 10:04:59 +0000
swh-loader-git (0.0.58-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.58 - (tagged by Valentin Lorentz
<vlorentz@softwareheritage.org> on 2020-03-02 11:25:43 +0100)
* Upstream changes: - v0.0.58 - * Use origin_visit_get_latest
instead of snapshot_get_latest. - * Use swh-model objects
instead of dicts.
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 02 Mar 2020 10:28:37 +0000
swh-loader-git (0.0.57-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.57 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2020-02-07 03:32:49
+0100)
* Upstream changes: - v0.0.57 - loaders: Remove content size
computation during conversion
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 07 Feb 2020 02:45:56 +0000
swh-loader-git (0.0.56-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.56 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2020-01-28 13:24:24
+0100)
* Upstream changes: - v0.0.56 - git.loader: Migrate from
UnbufferedLoader to DVCSLoader
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 28 Jan 2020 12:27:02 +0000
swh-loader-git (0.0.55-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.55 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-12-12 14:41:10
+0100)
* Upstream changes: - v0.0.55 - loader: Bump dependency on
loader-core
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 12 Dec 2019 13:44:22 +0000
swh-loader-git (0.0.54-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.54 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-12-12 11:43:50
+0100)
* Upstream changes: - v0.0.54 - tasks: Enforce kwargs use in
task message
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 12 Dec 2019 10:46:28 +0000
swh-loader-git (0.0.53-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.53 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-12-10 11:24:30
+0100)
* Upstream changes: - v0.0.53 - tasks: Unify message format
with other loaders - tasks: Use celery's shared_task decorator
- tests: Migrate to pytest-mock's fixture - loader.git: Register
git worker - tasks: Rename task according to production -
git: Unify loaders constructor - Fix a typo reported by
codespell - Add a pre-commit config file - Migrate tox.ini
to extras = xxx instead of deps = .[testing] - De-specify
testenv:py3 - Drop version constraint on pytest < 4 -
Include all requirements in MANIFEST.in - Add support for
symbolic references
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 10 Dec 2019 10:27:32 +0000
swh-loader-git (0.0.52-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.52 - (tagged by Stefano Zacchiroli
<zack@upsilon.cc> on 2019-10-10 12:07:05 +0200)
* Upstream changes: - v0.0.52 - (brown paper bag release) -
* MANIFEST.in: ship py.typed
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 10 Oct 2019 10:12:13 +0000
swh-loader-git (0.0.51-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.51 - (tagged by Stefano Zacchiroli
<zack@upsilon.cc> on 2019-10-10 11:59:01 +0200)
* Upstream changes: - v0.0.51 - * tox.ini: Fix py3 environment
to use packaged tests - * typing: minimal changes to make a no-
op mypy run pass - * test_from_disk.py: avoid shadowing base
classes in tests
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 10 Oct 2019 10:02:08 +0000
swh-loader-git (0.0.50-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.50 - (tagged by Antoine Lambert
<antoine.lambert@inria.fr> on 2019-09-03 13:07:54 +0200)
* Upstream changes: - version 0.0.50
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 03 Sep 2019 11:13:19 +0000
swh-loader-git (0.0.49-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.49 - (tagged by Valentin Lorentz
<vlorentz@softwareheritage.org> on 2019-06-12 15:05:10 +0200)
* Upstream changes: - Use origin URLs instead of numeric ids.
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 19 Jun 2019 10:28:05 +0000
swh-loader-git (0.0.48-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.48 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-01-30 11:18:55
+0100)
* Upstream changes: - v0.0.48 - Bump dependency on swh-
scheduler 0.0.39 - Rewrite celery tasks as a decorated function
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 30 Jan 2019 10:22:22 +0000
swh-loader-git (0.0.43-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.43
* Support the new paginated snapshot branch fetching functions
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 18 Oct 2018 18:49:26 +0200
swh-loader-git (0.0.42-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.42
* Fix critical bug in incremental loading
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 11 Oct 2018 17:19:07 +0200
swh-loader-git (0.0.41-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.41
* Use explicit keyword argument for base_url in the load task
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 11 Oct 2018 16:26:27 +0200
swh-loader-git (0.0.40-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.40
* Improve python packaging
* Make the loader more robust against holes in the history caused by
* buggy imports
* Allow ignoring the history to make a full load
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Tue, 09 Oct 2018 16:28:14 +0200
swh-loader-git (0.0.39-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.39
* Avoid walking the history of large git repos, which takes a long
time
* Really save packfiles
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 20 Sep 2018 17:22:17 +0200
swh-loader-git (0.0.38-1~swh1) unstable-swh; urgency=medium
* v0.0.38
* Improve origin_visit initialization step
* Properly sandbox the prepare statement so that if it breaks, we can
* update appropriately the visit with the correct status
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 07 Mar 2018 11:39:30 +0100
swh-loader-git (0.0.37-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.37
* Remove spurious debug print
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Tue, 06 Feb 2018 16:00:40 +0100
swh-loader-git (0.0.36-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.36
* Update to use snapshots instead of occurrences
* Use dulwich get_transport_and_path rather than hardcode the tcp
transport
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Tue, 06 Feb 2018 14:42:36 +0100
swh-loader-git (0.0.35-1~swh1) unstable-swh; urgency=medium
* v0.0.35
* swh.loader.git.loader: Warn when object is corrupted and continue
* swh.loader.git.loader: Add structured data to the log message
regarding skipping objects
* swh.loader.git.loader: Force further checks on objects
* swh.loader.git.loader: Unify reading object from the repository
* swh.loader.git.loader: Warn when object malformed and continue
* swh.loader.git.loader: Trap missing object id and continue
* swh.loader.git.base: Reuse swh.loader.core base loader
* swh.loader.git.converters: Fix release time conversion issue when no
date provided
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 18 Dec 2017 12:08:01 +0100
swh-loader-git (0.0.34-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git version 0.0.34
* Update packaging runes
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 12 Oct 2017 20:12:11 +0200
swh-loader-git (0.0.33-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.33
* make the updater's parent commit cache more useful
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 15 Sep 2017 18:45:41 +0200
swh-loader-git (0.0.32-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git 0.0.32
* Update tasks to new swh.scheduler API
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Mon, 12 Jun 2017 18:04:50 +0200
swh-loader-git (0.0.31-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.31
* Migrate from swh.core.hashutil to swh.model.hashutil
* Only send objects that are actually missing
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 17 Mar 2017 17:40:17 +0100
swh-loader-git (0.0.30-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.30
* Fix handling of mergetag headers
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 09 Mar 2017 11:30:08 +0100
swh-loader-git (0.0.29-1~swh1) unstable-swh; urgency=medium
* v0.0.29
* GitLoaderFromArchive: Use the same configuration file as
* GitLoader (permit to deploy both as the same unit)
* git reader: Refactor to allow listing revisions as well as contents
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 20 Feb 2017 11:32:24 +0100
swh-loader-git (0.0.28-1~swh1) unstable-swh; urgency=medium
* v0.0.28
* loader: Fix fetch_date override
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 15 Feb 2017 18:43:32 +0100
swh-loader-git (0.0.27-1~swh1) unstable-swh; urgency=medium
* v0.0.27
* Add loader-git from archive
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 14 Feb 2017 18:56:52 +0100
swh-loader-git (0.0.26-1~swh1) unstable-swh; urgency=medium
* v0.0.26
* Add a git loader able to deal with git repository in archive
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 14 Feb 2017 16:24:50 +0100
swh-loader-git (0.0.25-1~swh1) unstable-swh; urgency=medium
* v0.0.25
* Fix to permit to actually pass the fetch date as parameter for
* the loading git disk loader
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Fri, 10 Feb 2017 17:34:35 +0100
swh-loader-git (0.0.24-1~swh1) unstable-swh; urgency=medium
* v0.0.24
* Update storage configuration reading
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Thu, 15 Dec 2016 18:40:29 +0100
swh-loader-git (0.0.23-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.23
* Make the save_data mechanism generic
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 02 Dec 2016 15:34:05 +0100
swh-loader-git (0.0.22-1~swh1) unstable-swh; urgency=medium
* v0.0.22
* Improve reader to permit to use it as analyzer tool
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Fri, 04 Nov 2016 10:37:24 +0100
swh-loader-git (0.0.21-1~swh1) unstable-swh; urgency=medium
* v0.0.21
* Improve the reader git to load all contents from a pack.
* Improve to avoid unnecessary readings from db
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 26 Oct 2016 17:06:12 +0200
swh-loader-git (0.0.20-1~swh1) unstable-swh; urgency=medium
* v0.0.20
* Add new reader git task
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 25 Oct 2016 18:40:17 +0200
swh-loader-git (0.0.19-1~swh1) unstable-swh; urgency=medium
* v0.0.19
* Update git loaders to register origin_visit's state
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 23 Aug 2016 16:34:15 +0200
swh-loader-git (0.0.18-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.18
* Properly handle skipped contents
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 19 Aug 2016 18:12:44 +0200
swh-loader-git (0.0.16-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.16
* Add exist_ok to packfile cache directory creation
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Mon, 01 Aug 2016 15:53:07 +0200
swh-loader-git (0.0.15-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.15
* Absence of remote refs doesn't throw an error in updater
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Wed, 15 Jun 2016 01:20:37 +0200
swh-loader-git (0.0.14-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.14
* Add a disk loader using dulwich
* Rework the loader logic to use a single pattern for both loaders
* Allow caching of packfiles for the remote loader
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Tue, 14 Jun 2016 18:10:21 +0200
swh-loader-git (0.0.13-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.13
* Update for latest schema revision
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 08 Apr 2016 16:46:41 +0200
swh-loader-git (0.0.12-1~swh1) unstable-swh; urgency=medium
* Release swh-loader-git v0.0.12
* Update to use new swh.storage api for object listing
* Add a size limit to packfiles
* Return a proper eventfulness for empty repositories
* Do not crawl the pack file if unnecessary
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 25 Feb 2016 18:21:34 +0100
swh-loader-git (0.0.11-1~swh1) unstable-swh; urgency=medium
* Release swh.loader.git v0.0.11
* Implement git updater
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 19 Feb 2016 19:13:22 +0100
swh-loader-git (0.0.10-1~swh1) unstable-swh; urgency=medium
* Prepare swh.loader.git release v0.0.10
* Update for swh.model
* Use new swh.storage
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Mon, 07 Dec 2015 18:59:46 +0100
swh-loader-git (0.0.9-1~swh1) unstable-swh; urgency=medium
* Prepare deployment of swh.loader.git v0.0.9
* Close fetch_history on failure too
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Wed, 04 Nov 2015 10:54:37 +0100
swh-loader-git (0.0.8-1~swh1) unstable-swh; urgency=medium
* Prepare deployment of swh.loader.git v0.0.8
* New database schema (v028)
* Populate fetch_history (T121)
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Tue, 27 Oct 2015 18:11:26 +0100
swh-loader-git (0.0.7-1~swh1) unstable-swh; urgency=medium
* Prepare swh.loader.git v0.0.7 deployment
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Mon, 19 Oct 2015 12:37:09 +0200
swh-loader-git (0.0.6-1~swh1) unstable-swh; urgency=medium
* Prepare deployment of swh.loader.git v0.0.6
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 09 Oct 2015 17:50:35 +0200
swh-loader-git (0.0.5-1~swh1) unstable-swh; urgency=medium
* Prepare deployment of swh.loader.git v0.0.5
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Tue, 06 Oct 2015 17:42:11 +0200
swh-loader-git (0.0.4-1~swh1) unstable-swh; urgency=medium
* Prepare deployment of swh.loader.git v0.0.4
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 02 Oct 2015 14:54:04 +0200
swh-loader-git (0.0.3-1~swh1) unstable-swh; urgency=medium
* Prepare deployment of swh.loader.git v0.0.3
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 01 Oct 2015 11:36:28 +0200
swh-loader-git (0.0.2-1~swh1) unstable-swh; urgency=medium
* Prepare deploying swh.loader.git v0.0.2
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Tue, 29 Sep 2015 17:22:09 +0200
swh-loader-git (0.0.1-1~swh1) unstable-swh; urgency=medium
* Initial release
* Tagging swh.loader.git v0.0.1
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 25 Sep 2015 16:04:00 +0200
diff --git a/requirements-swh.txt b/requirements-swh.txt
index 38ea505..44d1ffe 100644
--- a/requirements-swh.txt
+++ b/requirements-swh.txt
@@ -1,5 +1,5 @@
swh.core >= 0.0.7
swh.loader.core >= 0.0.78
-swh.model >= 0.0.60
+swh.model >= 0.3.0
swh.scheduler >= 0.0.39
swh.storage >= 0.0.108
diff --git a/swh.loader.git.egg-info/PKG-INFO b/swh.loader.git.egg-info/PKG-INFO
index 401b595..abf9369 100644
--- a/swh.loader.git.egg-info/PKG-INFO
+++ b/swh.loader.git.egg-info/PKG-INFO
@@ -1,101 +1,101 @@
Metadata-Version: 2.1
Name: swh.loader.git
-Version: 0.1.1
+Version: 0.1.2
Summary: Software Heritage git loader
Home-page: https://forge.softwareheritage.org/diffusion/DLDG/
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
License: UNKNOWN
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-loader-git
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-loader-git/
Description: swh-loader-git
==============
The Software Heritage Git Loader is a tool and a library to walk a local
Git repository and inject into the SWH dataset all contained files that
weren't known before.
License
-------
This program is free software: you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation, either version 3 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General
Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public
License along with this program.
Dependencies
------------
### Runtime
- python3
- python3-dulwich
- python3-retrying
- python3-swh.core
- python3-swh.model
- python3-swh.storage
- python3-swh.scheduler
### Test
- python3-nose
Requirements
------------
- implementation language, Python3
- coding guidelines: conform to PEP8
- Git access: via dulwich
Configuration
-------------
You can run the loader from a remote origin (*loader*) or from an
origin on disk (*from_disk*) directly by calling:
```
python3 -m swh.loader.git.{loader,from_disk}
```
### Location
Both tools expect a configuration file.
Either one of the following location:
- /etc/softwareheritage/
- ~/.config/swh/
- ~/.swh/
Note: Will call that location $SWH_CONFIG_PATH
### Configuration sample
Respectively the loader from a remote (`git.yml`) and the loader from
a disk (`git-disk.yml`), $SWH_CONFIG_PATH/loader/git{-disk}.yml:
```
storage:
cls: remote
args:
url: http://localhost:5002/
```
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: testing
diff --git a/swh.loader.git.egg-info/requires.txt b/swh.loader.git.egg-info/requires.txt
index c8de554..1edea3b 100644
--- a/swh.loader.git.egg-info/requires.txt
+++ b/swh.loader.git.egg-info/requires.txt
@@ -1,14 +1,14 @@
dulwich>=0.18.7
retrying
vcversioner
click
swh.core>=0.0.7
swh.loader.core>=0.0.78
-swh.model>=0.0.60
+swh.model>=0.3.0
swh.scheduler>=0.0.39
swh.storage>=0.0.108
[testing]
pytest
pytest-mock
swh.scheduler[testing]
diff --git a/swh/loader/git/converters.py b/swh/loader/git/converters.py
index 0cfb46e..be3abac 100644
--- a/swh/loader/git/converters.py
+++ b/swh/loader/git/converters.py
@@ -1,189 +1,189 @@
# Copyright (C) 2015-2020 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
"""Convert dulwich objects to dictionaries suitable for swh.storage"""
from typing import Any, Dict, Optional
from swh.model.hashutil import DEFAULT_ALGORITHMS, hash_to_bytes, MultiHash
from swh.model.model import (
BaseContent,
Content,
Directory,
DirectoryEntry,
ObjectType,
Person,
Release,
Revision,
RevisionType,
SkippedContent,
TargetType,
Timestamp,
TimestampWithTimezone,
)
HASH_ALGORITHMS = DEFAULT_ALGORITHMS - {"sha1_git"}
def dulwich_blob_to_content_id(blob) -> Dict[str, Any]:
"""Convert a dulwich blob to a Software Heritage content id"""
if blob.type_name != b"blob":
raise ValueError("Argument is not a blob.")
size = blob.raw_length()
data = blob.as_raw_string()
hashes = MultiHash.from_data(data, HASH_ALGORITHMS).digest()
hashes["sha1_git"] = blob.sha().digest()
hashes["length"] = size
return hashes
def dulwich_blob_to_content(blob, max_content_size=None) -> BaseContent:
"""Convert a dulwich blob to a Software Heritage content
"""
if blob.type_name != b"blob":
raise ValueError("Argument is not a blob.")
hashes = dulwich_blob_to_content_id(blob)
if max_content_size is not None and hashes["length"] >= max_content_size:
return SkippedContent(status="absent", reason="Content too large", **hashes,)
else:
return Content(data=blob.as_raw_string(), status="visible", **hashes,)
def dulwich_tree_to_directory(tree, log=None) -> Directory:
"""Format a tree as a directory"""
if tree.type_name != b"tree":
raise ValueError("Argument is not a tree.")
entries = []
entry_mode_map = {
0o040000: "dir",
0o160000: "rev",
0o100644: "file",
0o100755: "file",
0o120000: "file",
}
for entry in tree.iteritems():
entries.append(
DirectoryEntry(
type=entry_mode_map.get(entry.mode, "file"),
perms=entry.mode,
name=entry.path,
target=hash_to_bytes(entry.sha.decode("ascii")),
)
)
- return Directory(id=tree.sha().digest(), entries=entries,)
+ return Directory(id=tree.sha().digest(), entries=tuple(entries),)
def parse_author(name_email: bytes) -> Person:
"""Parse an author line"""
return Person.from_fullname(name_email)
def dulwich_tsinfo_to_timestamp(
timestamp, timezone, timezone_neg_utc
) -> TimestampWithTimezone:
"""Convert the dulwich timestamp information to a structure compatible with
Software Heritage"""
return TimestampWithTimezone(
timestamp=Timestamp(seconds=int(timestamp), microseconds=0,),
offset=timezone // 60,
negative_utc=timezone_neg_utc if timezone == 0 else False,
)
def dulwich_commit_to_revision(commit, log=None) -> Revision:
if commit.type_name != b"commit":
raise ValueError("Argument is not a commit.")
git_metadata = []
if commit.encoding is not None:
git_metadata.append(["encoding", commit.encoding])
if commit.mergetag:
for mergetag in commit.mergetag:
raw_string = mergetag.as_raw_string()
assert raw_string.endswith(b"\n")
git_metadata.append(["mergetag", raw_string[:-1]])
if commit.extra:
git_metadata.extend([k.decode("utf-8"), v] for k, v in commit.extra)
if commit.gpgsig:
git_metadata.append(["gpgsig", commit.gpgsig])
if git_metadata:
metadata: Optional[Dict[str, Any]] = {
"extra_headers": git_metadata,
}
else:
metadata = None
return Revision(
id=commit.sha().digest(),
author=parse_author(commit.author),
date=dulwich_tsinfo_to_timestamp(
commit.author_time, commit.author_timezone, commit._author_timezone_neg_utc,
),
committer=parse_author(commit.committer),
committer_date=dulwich_tsinfo_to_timestamp(
commit.commit_time, commit.commit_timezone, commit._commit_timezone_neg_utc,
),
type=RevisionType.GIT,
directory=bytes.fromhex(commit.tree.decode()),
message=commit.message,
metadata=metadata,
synthetic=False,
- parents=[bytes.fromhex(p.decode()) for p in commit.parents],
+ parents=tuple(bytes.fromhex(p.decode()) for p in commit.parents),
)
DULWICH_TARGET_TYPES = {
b"blob": TargetType.CONTENT,
b"tree": TargetType.DIRECTORY,
b"commit": TargetType.REVISION,
b"tag": TargetType.RELEASE,
}
DULWICH_OBJECT_TYPES = {
b"blob": ObjectType.CONTENT,
b"tree": ObjectType.DIRECTORY,
b"commit": ObjectType.REVISION,
b"tag": ObjectType.RELEASE,
}
def dulwich_tag_to_release(tag, log=None) -> Release:
if tag.type_name != b"tag":
raise ValueError("Argument is not a tag.")
target_type, target = tag.object
if tag.tagger:
author: Optional[Person] = parse_author(tag.tagger)
if not tag.tag_time:
date = None
else:
date = dulwich_tsinfo_to_timestamp(
tag.tag_time, tag.tag_timezone, tag._tag_timezone_neg_utc,
)
else:
author = date = None
return Release(
id=tag.sha().digest(),
author=author,
date=date,
name=tag.name,
target=bytes.fromhex(target.decode()),
target_type=DULWICH_OBJECT_TYPES[target_type.type_name],
message=tag._message,
metadata=None,
synthetic=False,
)
diff --git a/swh/loader/git/tests/test_converters.py b/swh/loader/git/tests/test_converters.py
index 8b71a80..849de2a 100644
--- a/swh/loader/git/tests/test_converters.py
+++ b/swh/loader/git/tests/test_converters.py
@@ -1,319 +1,319 @@
# Copyright (C) 2015-2018 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
import os
import pytest
import shutil
import subprocess
import tempfile
import unittest
import dulwich.repo
from swh.model.hashutil import bytehex_to_hash, hash_to_bytes
from swh.model.model import (
Content,
Person,
Release,
Revision,
RevisionType,
ObjectType,
Timestamp,
TimestampWithTimezone,
)
import swh.loader.git.converters as converters
TEST_DATA = os.path.join(os.path.dirname(__file__), "data")
class SWHObjectType:
"""Dulwich lookalike ObjectType class
"""
def __init__(self, type_name):
self.type_name = type_name
class SWHTag:
"""Dulwich lookalike tag class
"""
def __init__(
self,
name,
type_name,
target,
target_type,
tagger,
tag_time,
tag_timezone,
message,
):
self.name = name
self.type_name = type_name
self.object = SWHObjectType(target_type), target
self.tagger = tagger
self._message = message
self.tag_time = tag_time
self.tag_timezone = tag_timezone
self._tag_timezone_neg_utc = False
def sha(self):
from hashlib import sha1
return sha1()
@pytest.mark.fs
class TestConverters(unittest.TestCase):
@classmethod
def setUpClass(cls):
super().setUpClass()
cls.repo_path = tempfile.mkdtemp()
cls.repo = dulwich.repo.Repo.init_bare(cls.repo_path)
fast_export = os.path.join(
TEST_DATA, "git-repos", "example-submodule.fast-export.xz"
)
xz = subprocess.Popen(
["xzcat"], stdin=open(fast_export, "rb"), stdout=subprocess.PIPE,
)
git = subprocess.Popen(
["git", "fast-import", "--quiet"], stdin=xz.stdout, cwd=cls.repo_path,
)
# flush stdout of xz
xz.stdout.close()
git.communicate()
@classmethod
def tearDownClass(cls):
super().tearDownClass()
shutil.rmtree(cls.repo_path)
def test_blob_to_content(self):
content_id = b"28c6f4023d65f74e3b59a2dea3c4277ed9ee07b0"
content = converters.dulwich_blob_to_content(self.repo[content_id])
expected_content = Content(
sha1_git=bytehex_to_hash(content_id),
sha1=hash_to_bytes("4850a3420a2262ff061cb296fb915430fa92301c"),
sha256=hash_to_bytes(
"fee7c8a485a10321ad94b64135073cb5" "5f22cb9f57fa2417d2adfb09d310adef"
),
blake2s256=hash_to_bytes(
"5d71873f42a137f6d89286e43677721e574" "1fa05ce4cd5e3c7ea7c44d4c2d10b"
),
data=(
b'[submodule "example-dependency"]\n'
b"\tpath = example-dependency\n"
b"\turl = https://github.com/githubtraining/"
b"example-dependency.git\n"
),
length=124,
status="visible",
)
self.assertEqual(content, expected_content)
def test_convertion_wrong_input(self):
class Something:
type_name = b"something-not-the-right-type"
m = {
"blob": converters.dulwich_blob_to_content,
"blob2": converters.dulwich_blob_to_content_id,
"tree": converters.dulwich_tree_to_directory,
"commit": converters.dulwich_tree_to_directory,
"tag": converters.dulwich_tag_to_release,
}
for _callable in m.values():
with self.assertRaises(ValueError):
_callable(Something())
def test_commit_to_revision(self):
sha1 = b"9768d0b576dbaaecd80abedad6dfd0d72f1476da"
revision = converters.dulwich_commit_to_revision(self.repo[sha1])
expected_revision = Revision(
id=hash_to_bytes("9768d0b576dbaaecd80abedad6dfd0d72f1476da"),
directory=b"\xf0i\\./\xa7\xce\x9dW@#\xc3A7a\xa4s\xe5\x00\xca",
type=RevisionType.GIT,
committer=Person(
name=b"Stefano Zacchiroli",
fullname=b"Stefano Zacchiroli <zack@upsilon.cc>",
email=b"zack@upsilon.cc",
),
author=Person(
name=b"Stefano Zacchiroli",
fullname=b"Stefano Zacchiroli <zack@upsilon.cc>",
email=b"zack@upsilon.cc",
),
committer_date=TimestampWithTimezone(
timestamp=Timestamp(seconds=1443083765, microseconds=0,),
negative_utc=False,
offset=120,
),
message=b"add submodule dependency\n",
metadata=None,
date=TimestampWithTimezone(
timestamp=Timestamp(seconds=1443083765, microseconds=0,),
negative_utc=False,
offset=120,
),
- parents=[b"\xc3\xc5\x88q23`\x9f[\xbb\xb2\xd9\xe7\xf3\xfbJf\x0f?r"],
+ parents=(b"\xc3\xc5\x88q23`\x9f[\xbb\xb2\xd9\xe7\xf3\xfbJf\x0f?r",),
synthetic=False,
)
self.assertEqual(revision, expected_revision)
def test_author_line_to_author(self):
# edge case out of the way
with self.assertRaises(TypeError):
converters.parse_author(None)
tests = {
b"a <b@c.com>": Person(
name=b"a", email=b"b@c.com", fullname=b"a <b@c.com>",
),
b"<foo@bar.com>": Person(
name=None, email=b"foo@bar.com", fullname=b"<foo@bar.com>",
),
b"malformed <email": Person(
name=b"malformed", email=b"email", fullname=b"malformed <email"
),
b"trailing <sp@c.e> ": Person(
name=b"trailing", email=b"sp@c.e", fullname=b"trailing <sp@c.e> ",
),
b"no<sp@c.e>": Person(name=b"no", email=b"sp@c.e", fullname=b"no<sp@c.e>",),
b" <>": Person(name=None, email=None, fullname=b" <>",),
b"something": Person(name=b"something", email=None, fullname=b"something"),
}
for author in sorted(tests):
parsed_author = tests[author]
self.assertEqual(parsed_author, converters.parse_author(author))
def test_dulwich_tag_to_release_no_author_no_date(self):
target = b"641fb6e08ddb2e4fd096dcf18e80b894bf"
message = b"some release message"
tag = SWHTag(
name=b"blah",
type_name=b"tag",
target=target,
target_type=b"commit",
message=message,
tagger=None,
tag_time=None,
tag_timezone=None,
)
# when
actual_release = converters.dulwich_tag_to_release(tag)
# then
expected_release = Release(
author=None,
date=None,
id=b"\xda9\xa3\xee^kK\r2U\xbf\xef\x95`\x18\x90\xaf\xd8\x07\t",
message=message,
metadata=None,
name=b"blah",
synthetic=False,
target=hash_to_bytes(target.decode()),
target_type=ObjectType.REVISION,
)
self.assertEqual(actual_release, expected_release)
def test_dulwich_tag_to_release_author_and_date(self):
tagger = b"hey dude <hello@mail.org>"
target = b"641fb6e08ddb2e4fd096dcf18e80b894bf"
message = b"some release message"
import datetime
date = datetime.datetime(2007, 12, 5, tzinfo=datetime.timezone.utc).timestamp()
tag = SWHTag(
name=b"blah",
type_name=b"tag",
target=target,
target_type=b"commit",
message=message,
tagger=tagger,
tag_time=date,
tag_timezone=0,
)
# when
actual_release = converters.dulwich_tag_to_release(tag)
# then
expected_release = Release(
author=Person(
email=b"hello@mail.org",
fullname=b"hey dude <hello@mail.org>",
name=b"hey dude",
),
date=TimestampWithTimezone(
negative_utc=False,
offset=0,
timestamp=Timestamp(seconds=1196812800, microseconds=0,),
),
id=b"\xda9\xa3\xee^kK\r2U\xbf\xef\x95`\x18\x90\xaf\xd8\x07\t",
message=message,
metadata=None,
name=b"blah",
synthetic=False,
target=hash_to_bytes(target.decode()),
target_type=ObjectType.REVISION,
)
self.assertEqual(actual_release, expected_release)
def test_dulwich_tag_to_release_author_no_date(self):
# to reproduce bug T815 (fixed)
tagger = b"hey dude <hello@mail.org>"
target = b"641fb6e08ddb2e4fd096dcf18e80b894bf"
message = b"some release message"
tag = SWHTag(
name=b"blah",
type_name=b"tag",
target=target,
target_type=b"commit",
message=message,
tagger=tagger,
tag_time=None,
tag_timezone=None,
)
# when
actual_release = converters.dulwich_tag_to_release(tag)
# then
expected_release = Release(
author=Person(
email=b"hello@mail.org",
fullname=b"hey dude <hello@mail.org>",
name=b"hey dude",
),
date=None,
id=b"\xda9\xa3\xee^kK\r2U\xbf\xef\x95`\x18\x90\xaf\xd8\x07\t",
message=message,
metadata=None,
name=b"blah",
synthetic=False,
target=hash_to_bytes(target.decode()),
target_type=ObjectType.REVISION,
)
self.assertEqual(actual_release, expected_release)
diff --git a/version.txt b/version.txt
index 17a6c63..670643b 100644
--- a/version.txt
+++ b/version.txt
@@ -1 +1 @@
-v0.1.1-0-g7323aec
\ No newline at end of file
+v0.1.2-0-g302c01e
\ No newline at end of file
File Metadata
Details
Attached
Mime Type
text/x-diff
Expires
Wed, Jun 4, 7:11 PM (1 w, 3 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3398970
Attached To
rDLDG Git loader
Event Timeline
Log In to Comment