Page MenuHomeSoftware Heritage

No OneTemporary

diff --git a/PKG-INFO b/PKG-INFO
index abb4b2e..728206b 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,122 +1,122 @@
Metadata-Version: 2.1
Name: swh.lister
-Version: 0.10.0
+Version: 1.0.0
Summary: Software Heritage lister
Home-page: https://forge.softwareheritage.org/diffusion/DLSGH/
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
License: UNKNOWN
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-lister
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-lister/
Description: swh-lister
==========
This component from the Software Heritage stack aims to produce listings
of software origins and their urls hosted on various public developer platforms
or package managers. As these operations are quite similar, it provides a set of
Python modules abstracting common software origins listing behaviors.
It also provides several lister implementations, contained in the
following Python modules:
- `swh.lister.bitbucket`
- `swh.lister.cgit`
- `swh.lister.cran`
- `swh.lister.debian`
- `swh.lister.gitea`
- `swh.lister.github`
- `swh.lister.gitlab`
- `swh.lister.gnu`
- `swh.lister.launchpad`
- `swh.lister.npm`
- `swh.lister.packagist`
- `swh.lister.phabricator`
- `swh.lister.pypi`
Dependencies
------------
All required dependencies can be found in the `requirements*.txt` files located
at the root of the repository.
Local deployment
----------------
## lister configuration
Each lister implemented so far by Software Heritage (`bitbucket`, `cgit`, `cran`, `debian`,
`gitea`, `github`, `gitlab`, `gnu`, `launchpad`, `npm`, `packagist`, `phabricator`, `pypi`)
must be configured by following the instructions below (please note that you have to replace
`<lister_name>` by one of the lister name introduced above).
### Preparation steps
1. `mkdir ~/.config/swh/`
2. create configuration file `~/.config/swh/listers.yml`
### Configuration file sample
Minimalistic configuration shared by all listers to add in file `~/.config/swh/listers.yml`:
```lang=yml
scheduler:
cls: 'remote'
args:
url: 'http://localhost:5008/'
credentials: {}
```
Note: This expects scheduler (5008) service to run locally
## Executing a lister
Once configured, a lister can be executed by using the `swh` CLI tool with the
following options and commands:
```
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister <lister_name> [lister_parameters]
```
Examples:
```
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister bitbucket
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister cran
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitea url=https://codeberg.org/api/v1/
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitlab url=https://salsa.debian.org/api/v4/
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister npm
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister pypi
```
Licensing
---------
This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public License
along with this program.
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: testing
diff --git a/debian/changelog b/debian/changelog
index 09759ef..fc12832 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,772 +1,776 @@
-swh-lister (0.10.0-1~swh1~bpo10+1) buster-swh; urgency=medium
+swh-lister (1.0.0-1~swh1) unstable-swh; urgency=medium
- * Rebuild for buster-swh
+ * New upstream release 1.0.0 - (tagged by Nicolas Dandrimont
+ <nicolas@dandrimont.eu> on 2021-03-22 10:56:04 +0100)
+ * Upstream changes: - Release swh.lister v1.0.0 - All listers
+ have been rewritten and are ready to be used in production -
+ with the most recent version of the swh.scheduler APIs.
- -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 01 Mar 2021 09:03:55 +0000
+ -- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 22 Mar 2021 10:13:35 +0000
swh-lister (0.10.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.10.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-03-01 09:59:16
+0100)
* Upstream changes: - v0.10.0 - docs: Add new "howto write a
lister tutorial" with unified lister api
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 01 Mar 2021 09:01:54 +0000
swh-lister (0.9.1-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.9.1 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-02-08 14:09:27
+0100)
* Upstream changes: - v0.9.1 - debian: Update archive mirror
URL templates to process
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 08 Feb 2021 13:12:05 +0000
swh-lister (0.9.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.9.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-02-08 08:50:07
+0100)
* Upstream changes: - v0.9.0 - docs: Update listers execution
instructions - cran: Prevent multiple listing of an origin -
cran: Add support for parsing date with milliseconds - pypi: Use
BeautifulSoup for parsing HTML instead of xmltodict
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 08 Feb 2021 07:52:57 +0000
swh-lister (0.8.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.8.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-02-03 11:12:52
+0100)
* Upstream changes: - v0.8.0 - packagist: Reimplement lister
using new Lister API - gnu: Remove dependency on pytz -
Remove no longer used models field in dict returned by register -
Remove no longer used legacy Lister API and update CLI options
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 03 Feb 2021 10:15:54 +0000
swh-lister (0.7.1-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.7.1 - (tagged by Vincent SELLIER
<vincent.sellier@softwareheritage.org> on 2021-02-01 17:52:33 +0100)
* Upstream changes: - v0.7.1 - * cgit: remove the repository
urls's trailing /
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 01 Feb 2021 16:56:35 +0000
swh-lister (0.7.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.7.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-02-01 09:31:30
+0100)
* Upstream changes: - v0.7.0 - pattern: Bump packet split to
chunk of 1000 records - cgit: Compute origin urls out of a base
git url when provided. - gnu: Reimplement lister using new
Lister API
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 01 Feb 2021 08:35:14 +0000
swh-lister (0.6.1-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.6.1 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-01-29 09:07:21
+0100)
* Upstream changes: - v0.6.1 - launchpad: Remove call to
dataclasses.asdict on lister state - launchpad: Prevent error
due to origin listed twice - Make debian lister constructors
compatible with credentials - launchpad/tasks: Fix ping task
function name - pattern: Make lister flush regularly origins to
scheduler
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 29 Jan 2021 08:11:13 +0000
swh-lister (0.6.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.6.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-01-28 15:48:32
+0100)
* Upstream changes: - v0.6.0 - launchpad: Reimplement lister
using new Lister API - Make stateless lister constructors
compatible with credentials
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 28 Jan 2021 14:52:49 +0000
swh-lister (0.5.4-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.5.4 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-01-28 11:23:29
+0100)
* Upstream changes: - v0.5.4 - gitlab: Deal with missing or
trailing / in url input - tox.ini: Work around build failure due
to upstream release
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 28 Jan 2021 10:27:59 +0000
swh-lister (0.5.2-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.5.2 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-01-27 17:19:10
+0100)
* Upstream changes: - v0.5.2 - test_cli: Drop launchpad lister
from the test_get_lister
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 27 Jan 2021 16:25:31 +0000
swh-lister (0.5.1-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.5.1 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-01-27 16:39:20
+0100)
* Upstream changes: - v0.5.1 - launchpad: Actually mock the
anonymous login to launchpad - Drop no longer
swh.lister.core.{indexing,page_by_page}_lister - tests: Drop
unneeded reset instruction - cgit: Don't stop the listing when a
repository page is not available
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 27 Jan 2021 15:47:39 +0000
swh-lister (0.5.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.5.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2021-01-27 14:33:24
+0100)
* Upstream changes: - v0.5.0 - cgit: Add support for
last_update information during listing - Port Debian lister to
new lister api - gitlab: Implement keyset-based pagination
listing - cran: Retrieve last update date for each listed
package - Port CRAN lister to new lister api - gitlab: Add
support for last_update information during listing - Port Gitea
lister to new lister api - Port cgit lister to the new lister
api - bitbucket: Pick random credentials in configuration and
improve logging - Port Gitlab lister to the new lister api -
Port Npm lister to new lister api - Port PyPI lister to new
lister api - Port Bitbucket lister to new lister api - Port
Phabricator lister to new lister api - Port GitHub lister to new
lister api - Introduce a simpler base pattern for lister
implementations
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 27 Jan 2021 13:40:34 +0000
swh-lister (0.4.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.4.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-11-23 15:47:05
+0100)
* Upstream changes: - v0.4.0 - requirements: Rework
dependencies - tests: Reduce db initialization fixtures to a
minimum - Create listing task with a default of 3 if unspecified
- lister.pytest_plugin: Simplify fixture setup - tests: Clarify
listers test configuration
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 23 Nov 2020 14:52:03 +0000
swh-lister (0.3.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.3.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-10-19 09:50:43
+0200)
* Upstream changes: - v0.3.0 - lister.config: Adapt scheduler
configuration structure - drop mock_get_scheduler which creates
indirection for no good reason
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 19 Oct 2020 07:56:17 +0000
swh-lister (0.2.1-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.2.1 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-10-07 14:02:42
+0200)
* Upstream changes: - v0.2.1 - lister_base: Drop leftover
mixin SWHConfig which is no longer used
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 07 Oct 2020 12:07:43 +0000
swh-lister (0.2.0-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.2.0 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-10-06 09:33:33
+0200)
* Upstream changes: - v0.2.0 - lister*: Migrate away from
SWHConfig mixin - tox.ini: pin black to the pre-commit version
(19.10b0) to avoid flip-flops - Run isort after the CLI import
changes
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 06 Oct 2020 07:36:07 +0000
swh-lister (0.1.5-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.1.5 - (tagged by David Douard
<david.douard@sdfa3.org> on 2020-09-25 11:51:57 +0200)
* Upstream changes: - v0.1.5
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 25 Sep 2020 09:55:44 +0000
swh-lister (0.1.4-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.1.4 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-09-10 11:32:46
+0200)
* Upstream changes: - v0.1.4 - gitea.lister: Fix uid to be
unique across instance - utils.split_range: Split into not
overlapping ranges - gitea.tasks: Fix parameter name from 'sort'
to 'order'
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 10 Sep 2020 09:35:53 +0000
swh-lister (0.1.3-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.1.3 - (tagged by Vincent SELLIER
<vincent.sellier@softwareheritage.org> on 2020-09-08 14:48:08 +0200)
* Upstream changes: - v0.1.3 - Launchpad: rename task name to
match conventions - tests: Separate lister instantiations
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 08 Sep 2020 12:53:22 +0000
swh-lister (0.1.2-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.1.2 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-09-02 13:07:30
+0200)
* Upstream changes: - v0.1.2 - pytest_plugin: Instantiate only
lister with no particular setup - pytest: Define plugin and
declare it in the root conftest
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 02 Sep 2020 11:10:14 +0000
swh-lister (0.1.1-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.1.1 - (tagged by Antoine R. Dumont
(@ardumont) <ardumont@softwareheritage.org> on 2020-09-01 16:08:48
+0200)
* Upstream changes: - v0.1.1 - test_cli: Exclude launchpad
lister from the check
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 01 Sep 2020 14:11:46 +0000
swh-lister (0.1.0-1~swh2) unstable-swh; urgency=medium
* Update dependencies
-- Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Wed, 26 Aug 2020 16:05:03 +0000
swh-lister (0.1.0-1~swh1) unstable-swh; urgency=medium
[ Nicolas Dandrimont ]
* Use setuptools-scm instead of vcversioner
[ Software Heritage autobuilder (on jenkins-debian1) ]
* New upstream release 0.1.0 - (tagged by David Douard
<david.douard@sdfa3.org> on 2020-08-25 18:33:55 +0200)
* Upstream changes: - v0.1.0
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 25 Aug 2020 16:39:28 +0000
swh-lister (0.0.50-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.50 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2020-01-20 10:44:57
+0100)
* Upstream changes: - v0.0.50 - github.lister: Filter out
partial repositories which break listing - docs: Fix sphinx
warnings - core.lister_base: Improve slightly docs and types
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 20 Jan 2020 09:51:23 +0000
swh-lister (0.0.49-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.49 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2020-01-17 14:20:35
+0100)
* Upstream changes: - v0.0.49 - github.lister: Use Retry-After
header when rate limit reached
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 17 Jan 2020 13:27:56 +0000
swh-lister (0.0.48-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.48 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2020-01-16 13:56:12
+0100)
* Upstream changes: - v0.0.48 - cran.lister: Use cran's
canonical url for origin url - cran.lister: Version uid so we
can list new package versions - cran.lister: Adapt docstring
sample accordingly
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 16 Jan 2020 13:03:54 +0000
swh-lister (0.0.47-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.47 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2020-01-09 10:26:18
+0100)
* Upstream changes: - v0.0.47 - cran.lister: Align loading
tasks' with loader's expectation
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 09 Jan 2020 09:34:26 +0000
swh-lister (0.0.46-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.46 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-12-19 14:09:45
+0100)
* Upstream changes: - v0.0.46 - lister.debian: Make debian
init step idempotent and up-to-date - lister_base: Split into
chunks the tasks prior to creation
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 19 Dec 2019 13:16:45 +0000
swh-lister (0.0.45-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.45 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-12-10 11:27:17
+0100)
* Upstream changes: - v0.0.45 - core: Align listers' task
output (hg/git tasks) with expected format - npm: Align lister's
loader output tasks with expected format - lister/tasks:
Standardize return statements
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 10 Dec 2019 10:32:45 +0000
swh-lister (0.0.44-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.44 - (tagged by Nicolas Dandrimont
<nicolas@dandrimont.eu> on 2019-11-22 16:15:54 +0100)
* Upstream changes: - Release swh.lister v0.0.44 - Define
proper User Agents everywhere
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 22 Nov 2019 15:31:33 +0000
swh-lister (0.0.43-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.43 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-11-21 18:46:35
+0100)
* Upstream changes: - v0.0.43 - lister.pypi: Align lister with
pypi package loader - lister.npm: Align lister with npm package
loader - lister.tests: Avoid duplication setup step - Fix
typos (and trailing ws) reported by codespell - Add a pre-commit
config file
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 21 Nov 2019 17:56:34 +0000
swh-lister (0.0.42-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.42 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-11-21 13:52:16
+0100)
* Upstream changes: - v0.0.42 - cran/gnu: Rename task_type to
load-archive-files - lister.tests: Add missing task_type for
package listers - Migrate tox.ini to extras = xxx instead of
deps = .[testing] - Merge tox environments - Include all
requirements in MANIFEST.in - lister.cli: Remove task type
register cli
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 21 Nov 2019 13:00:29 +0000
swh-lister (0.0.41-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.41 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-11-15 12:02:13
+0100)
* Upstream changes: - v0.0.41 - simple_lister: Flush to db
more frequently - gnu.lister: Use url as primary key -
gnu.lister.tests: Add missing assertion - gnu.lister: Add
missing retries_left parameter - debian.models: Migrate tests
from storage to debian lister model
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 15 Nov 2019 11:06:35 +0000
swh-lister (0.0.40-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.40 - (tagged by Nicolas Dandrimont
<nicolas@dandrimont.eu> on 2019-11-13 13:54:38 +0100)
* Upstream changes: - Release swh.lister 0.0.40 - Fix bogus
NotImplementedError on Area.index_uris
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 13 Nov 2019 13:02:08 +0000
swh-lister (0.0.39-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.39 - (tagged by Nicolas Dandrimont
<nicolas@dandrimont.eu> on 2019-11-13 13:23:31 +0100)
* Upstream changes: - Release swh.lister 0.0.39 - Properly
register all tasks - Fix up db_partition_indices to avoid
expensive scans
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 13 Nov 2019 12:28:33 +0000
swh-lister (0.0.38-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.38 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-11-06 15:55:46
+0100)
* Upstream changes: - v0.0.38 - Remove swh.storage.schemata
remnants
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 06 Nov 2019 15:00:16 +0000
swh-lister (0.0.37-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.37 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-11-06 15:06:51
+0100)
* Upstream changes: - v0.0.37 - Update swh-core dependency
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 06 Nov 2019 14:18:31 +0000
swh-lister (0.0.36-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.36 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-11-06 11:33:33
+0100)
* Upstream changes: - v0.0.36 - lister.*.tests: Add at least
one integration test - gnu.lister: Move gnu listers specifity
within the lister's scope - debian/lister: Use url parameter
name instead of origin - debian/model: Install lister model
within the lister repository - lister.*.tasks: Stop binding
tasks to a specific instance of the - celery app -
cran.lister: Refactor and fix cran lister - github/lister:
Prevent erroneous scheduler tasks disabling -
phabricator/lister: Fix lister - setup.py: Kill deprecated swh-
lister command - Bootstrap typing annotations
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 06 Nov 2019 10:55:41 +0000
swh-lister (0.0.35-1~swh4) unstable-swh; urgency=medium
* Fix runtime dependencies
-- Antoine R. Dumont (@ardumont) <ardumont@softwareheritage.org> Wed, 11 Sep 2019 10:58:01 +0200
swh-lister (0.0.35-1~swh3) unstable-swh; urgency=medium
* Bump dh-python to >= 3 for pybuild.testfiles.
-- Nicolas Dandrimont <olasd@debian.org> Tue, 10 Sep 2019 14:58:11 +0200
swh-lister (0.0.35-1~swh2) unstable-swh; urgency=medium
* Add egg-info to pybuild.testfiles. Close T1995.
-- Nicolas Dandrimont <olasd@debian.org> Tue, 10 Sep 2019 14:36:22 +0200
swh-lister (0.0.35-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.35 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-09-09 12:14:42
+0200)
* Upstream changes: - v0.0.35 - Fix debian package
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 09 Sep 2019 10:19:02 +0000
swh-lister (0.0.34-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.34 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-09-06 14:03:39
+0200)
* Upstream changes: - v0.0.34 - listers: Implement listers as
plugins - cgit: rewrite the CGit lister (and add more tests)
- listers: simplify and unify constructor use - phabricator:
randomly select the API token in the provided list - docs: Fix
toc
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 06 Sep 2019 12:09:13 +0000
swh-lister (0.0.33-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.33 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-08-29 10:23:20
+0200)
* Upstream changes: - v0.0.33 - lister.cli: Allow to list
forges with policy and priority - listers: Add New packagist
lister - listers: Allow to override policy and priority for
scheduled tasks - tests: Add tests to cli, pypi and improve
lister core's - docs: Add code of conduct document
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 29 Aug 2019 08:28:23 +0000
swh-lister (0.0.32-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.32 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-06-28 18:21:50
+0200)
* Upstream changes: - v0.0.32 - Clean up dead code - Add
missing *.html sample for tests to run in packaging
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 28 Jun 2019 16:42:05 +0000
swh-lister (0.0.31-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.31 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-06-28 17:57:48
+0200)
* Upstream changes: - v0.0.31 - Add cgit instance lister -
Add back description in cran lister - Update contributors
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Fri, 28 Jun 2019 16:06:25 +0000
swh-lister (0.0.30-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.30 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-06-26 14:52:13
+0200)
* Upstream changes: - v0.0.30 - Drop last description mentions
for gitlab and cran listers.
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 26 Jun 2019 13:02:11 +0000
swh-lister (0.0.29-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.29 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-06-26 12:37:14
+0200)
* Upstream changes: - v0.0.29 - lister: Fix bitbucket lister
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 26 Jun 2019 10:47:20 +0000
swh-lister (0.0.28-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.28 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-06-20 12:00:09
+0200)
* Upstream changes: - v0.0.28 - listers: Remove unused columns
`origin_id` / `description` - gnu-lister: Use origin-type as
'tar' (and not 'gnu') - phabricator: Remove unused code
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 20 Jun 2019 10:07:48 +0000
swh-lister (0.0.27-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.27 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-06-18 10:27:09
+0200)
* Upstream changes: - v0.0.27 - Unify lister tablenames to use
consistently singular - Add missing instance field to
phabricator repository model
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Tue, 18 Jun 2019 08:44:38 +0000
swh-lister (0.0.26-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.26 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-06-17 17:53:33
+0200)
* Upstream changes: - v0.0.26 - phabricator.lister: Use
credentials setup from configuration file - gitlab.lister:
Remove request_params method override
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Mon, 17 Jun 2019 16:05:05 +0000
swh-lister (0.0.25-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.25 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-06-13 15:54:42
+0200)
* Upstream changes: - v0.0.25 - Add new cran lister -
listers: Stop creating origins when scheduling new tasks
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 13 Jun 2019 13:59:30 +0000
swh-lister (0.0.24-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.24 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-06-12 12:02:54
+0200)
* Upstream changes: - v0.0.24 - swh.lister.gnu: Add new gnu
lister
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 12 Jun 2019 10:10:56 +0000
swh-lister (0.0.23-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.23 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-05-29 14:04:22
+0200)
* Upstream changes: - v0.0.23 - lister: Unify credentials
structure between listers
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Wed, 29 May 2019 12:10:51 +0000
swh-lister (0.0.22-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.22 - (tagged by Antoine Lambert
<antoine.lambert@inria.fr> on 2019-05-23 10:59:39 +0200)
* Upstream changes: - version 0.0.22
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 23 May 2019 09:05:34 +0000
swh-lister (0.0.21-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.21 - (tagged by Antoine Lambert
<antoine.lambert@inria.fr> on 2019-04-11 11:00:55 +0200)
* Upstream changes: - version 0.0.21
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 11 Apr 2019 09:05:30 +0000
swh-lister (0.0.20-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.20 - (tagged by Antoine R. Dumont
(@ardumont) <antoine.romain.dumont@gmail.com> on 2019-02-14 10:50:06
+0100)
* Upstream changes: - v0.0.20 - d/*: debian packaging files
migrated to separated branches - lister.cli: Fix spelling typo
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 14 Feb 2019 09:59:29 +0000
swh-lister (0.0.19-1~swh1) unstable-swh; urgency=medium
* New upstream release 0.0.19 - (tagged by David Douard
<david.douard@sdfa3.org> on 2019-02-07 17:36:33 +0100)
* Upstream changes: - v0.0.19
-- Software Heritage autobuilder (on jenkins-debian1) <jenkins@jenkins-debian1.internal.softwareheritage.org> Thu, 07 Feb 2019 16:42:39 +0000
swh-lister (0.0.18-1~swh1) unstable-swh; urgency=medium
* v0.0.18
* docs: add title and brief module description
* gitlab.lister: Break asap when problem exists during fetch info
* gitlab.lister: Do not expect gitlab instances to have credentials
* setup: prepare for pypi upload
* gitlab/models.py: drop unused import
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Mon, 08 Oct 2018 15:54:12 +0200
swh-lister (0.0.17-1~swh1) unstable-swh; urgency=medium
* v0.0.17
* Change pypi project url to use the /project api
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 18 Sep 2018 11:35:25 +0200
swh-lister (0.0.16-1~swh1) unstable-swh; urgency=medium
* v0.0.16
* Normalize PyPI name
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Fri, 14 Sep 2018 13:25:56 +0200
swh-lister (0.0.15-1~swh1) unstable-swh; urgency=medium
* v0.0.15
* Add pypi lister
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Thu, 06 Sep 2018 17:09:25 +0200
swh-lister (0.0.14-1~swh1) unstable-swh; urgency=medium
* v0.0.14
* core.lister_base: Batch create origins (storage) & tasks (scheduler)
* swh.lister.cli: Add debian lister to the list of supported listers
* README.md: Update to demo the lister debian run
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Tue, 31 Jul 2018 15:46:12 +0200
swh-lister (0.0.13-1~swh1) unstable-swh; urgency=medium
* v0.0.13
* Fix missing use cases when unable to retrieve information from the
api
* server
* gitlab/lister: Allow specifying the number of elements to
* read (default is 20, same as the current gitlab api)
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Fri, 20 Jul 2018 13:46:04 +0200
swh-lister (0.0.12-1~swh1) unstable-swh; urgency=medium
* v0.0.12
* swh.lister.gitlab.tasks: Use gitlab as instance name for gitlab.com
* README.md: Add gitlab to the lister implementations referenced
* core/lister_base: Remove unused import
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Thu, 19 Jul 2018 11:29:14 +0200
swh-lister (0.0.11-1~swh1) unstable-swh; urgency=medium
* v0.0.11
* lister/gitlab: Add gitlab lister
* docs: Update documentation to demonstrate how to run a lister
locally
* core/lister: Make the listers' scheduler configuration adaptable
* debian/*: Fix debian packaging tests
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Wed, 18 Jul 2018 14:16:56 +0200
swh-lister (0.0.10-1~swh1) unstable-swh; urgency=medium
* Release swh.lister v0.0.10
* Add missing task_queue attribute for debian listing tasks
* Make sure tests run during build
* Clean up runtime dependencies
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Mon, 30 Oct 2017 17:37:25 +0100
swh-lister (0.0.9-1~swh1) unstable-swh; urgency=medium
* Release swh.lister v0.0.9
* Add tasks for the Debian lister
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Mon, 30 Oct 2017 14:20:58 +0100
swh-lister (0.0.8-1~swh1) unstable-swh; urgency=medium
* Release swh.lister v0.0.8
* Add versioned dependency on sqlalchemy
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Fri, 13 Oct 2017 12:15:38 +0200
swh-lister (0.0.7-1~swh1) unstable-swh; urgency=medium
* Release swh.lister version 0.0.7
* Update packaging runes
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 12 Oct 2017 18:07:52 +0200
swh-lister (0.0.6-1~swh1) unstable-swh; urgency=medium
* Release swh.lister v0.0.6
* Add new debian lister
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Wed, 11 Oct 2017 17:59:47 +0200
swh-lister (0.0.5-1~swh1) unstable-swh; urgency=medium
* Release swh.lister 0.0.5
* Make the lister more generic
* Add bitbucket lister
* Update tasks to new swh.scheduler API
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Mon, 12 Jun 2017 18:22:13 +0200
swh-lister (0.0.4-1~swh1) unstable-swh; urgency=medium
* v0.0.4
* Update storage configuration reading
-- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> Thu, 15 Dec 2016 19:07:24 +0100
swh-lister (0.0.3-1~swh1) unstable-swh; urgency=medium
* Release swh.lister.github v0.0.3
* Generate swh.scheduler tasks and swh.storage origins on the fly
* Use celery tasks to schedule own work
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 20 Oct 2016 17:30:39 +0200
swh-lister (0.0.2-1~swh1) unstable-swh; urgency=medium
* Release swh.lister.github 0.0.2
* Move constants to a constants module to avoid circular imports
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 17 Mar 2016 20:35:11 +0100
swh-lister (0.0.1-1~swh1) unstable-swh; urgency=medium
* Initial release
* Release swh.lister.github v0.0.1
-- Nicolas Dandrimont <nicolas@dandrimont.eu> Thu, 17 Mar 2016 19:01:20 +0100
diff --git a/docs/run_a_new_lister.rst b/docs/run_a_new_lister.rst
index 4054810..60ac214 100644
--- a/docs/run_a_new_lister.rst
+++ b/docs/run_a_new_lister.rst
@@ -1,87 +1,87 @@
.. _run-lister-tutorial:
Tutorial: run a lister within docker-dev in just a few steps
-=====================================================================
+============================================================
It is a good practice to run your new lister in docker-dev. This provides an almost
production-like environment. Testing the lister in docker dev prior to deployment
reduces the chances of encountering errors when turning it for production.
Here are the steps you need to follow to run a lister within your local environment.
1. You must edit the docker-compose override file (`docker-compose.override.yml`).
following the sample provided ::
version: '2'
services:
swh-lister:
volumes:
- "$SWH_ENVIRONMENT_HOME/swh-lister:/src/swh-lister"
The file named `docker-compose.override.yml` will automatically be loaded by
``docker-compose``.Having an override makes it possible to run a docker container
with some swh packages installed from sources instead of using the latest
published packages from pypi. For more details, you may refer to README.md
present in ``swh-docker-dev``.
2. Follow the instruction mentioned under heading **Preparation steps** and
**Configuration file sample** in README.md of swh-lister.
3. Add in the lister configuration the new ``task_modules`` and ``task_queues``
entry for the your new lister. You need to amend the conf/lister.yml file to
add the entries. Here is an example for GNU lister::
celery:
task_broker: amqp://guest:guest@amqp//
task_modules:
...
- swh.lister.gnu.tasks
task_queues:
...
- swh.lister.gnu.tasks.GNUListerTask
4. Make sure to run ``storage (5002)`` and ``scheduler (5008)`` services locally.
You may use the following command to run docker::
~/swh-environment/swh-docker-dev$ docker-compose up -d
5. Add the lister task-type in the scheduler. For example, if you want to
add pypi lister task-type ::
~/swh-environment$ swh scheduler task-type add list-gnu-full \
"swh.lister.gnu.tasks.GNUListerTask" "Full GNU lister" \
--default-interval '1 day' --backoff-factor 1
You can check all the task-type by::
~/swh-environment$swh scheduler task-type list
Known task types:
list-bitbucket-incremental:
Incrementally list BitBucket
list-cran:
Full CRAN Lister
list-debian-distribution:
List a Debian distribution
list-github-full:
Full update of GitHub repos list
list-github-incremental:
...
If your lister is creating new loading task not yet registered, you need
to register that task type as well.
6. Run your lister with the help of scheduler cli. You need to add the task in
the scheduler using its cli. For example, you need to execute this command
to run gnu lister ::
~/swh-environment$ swh scheduler --url http://localhost:5008/ task add \
list-gnu-full --policy oneshot
After the execution of lister is complete, you can see the loading task created::
~/swh-environment/swh-lister$ swh scheduler task list
You can also check the repositories listed by the lister from the scheduler database
in which the lister output is stored. To connect to the database::
~/swh-environment/docker$ docker-compose exec swh-scheduler bash -c \
'psql swh-scheduler -c "select url from listed_origins"'
diff --git a/swh.lister.egg-info/PKG-INFO b/swh.lister.egg-info/PKG-INFO
index abb4b2e..728206b 100644
--- a/swh.lister.egg-info/PKG-INFO
+++ b/swh.lister.egg-info/PKG-INFO
@@ -1,122 +1,122 @@
Metadata-Version: 2.1
Name: swh.lister
-Version: 0.10.0
+Version: 1.0.0
Summary: Software Heritage lister
Home-page: https://forge.softwareheritage.org/diffusion/DLSGH/
Author: Software Heritage developers
Author-email: swh-devel@inria.fr
License: UNKNOWN
Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Source, https://forge.softwareheritage.org/source/swh-lister
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-lister/
Description: swh-lister
==========
This component from the Software Heritage stack aims to produce listings
of software origins and their urls hosted on various public developer platforms
or package managers. As these operations are quite similar, it provides a set of
Python modules abstracting common software origins listing behaviors.
It also provides several lister implementations, contained in the
following Python modules:
- `swh.lister.bitbucket`
- `swh.lister.cgit`
- `swh.lister.cran`
- `swh.lister.debian`
- `swh.lister.gitea`
- `swh.lister.github`
- `swh.lister.gitlab`
- `swh.lister.gnu`
- `swh.lister.launchpad`
- `swh.lister.npm`
- `swh.lister.packagist`
- `swh.lister.phabricator`
- `swh.lister.pypi`
Dependencies
------------
All required dependencies can be found in the `requirements*.txt` files located
at the root of the repository.
Local deployment
----------------
## lister configuration
Each lister implemented so far by Software Heritage (`bitbucket`, `cgit`, `cran`, `debian`,
`gitea`, `github`, `gitlab`, `gnu`, `launchpad`, `npm`, `packagist`, `phabricator`, `pypi`)
must be configured by following the instructions below (please note that you have to replace
`<lister_name>` by one of the lister name introduced above).
### Preparation steps
1. `mkdir ~/.config/swh/`
2. create configuration file `~/.config/swh/listers.yml`
### Configuration file sample
Minimalistic configuration shared by all listers to add in file `~/.config/swh/listers.yml`:
```lang=yml
scheduler:
cls: 'remote'
args:
url: 'http://localhost:5008/'
credentials: {}
```
Note: This expects scheduler (5008) service to run locally
## Executing a lister
Once configured, a lister can be executed by using the `swh` CLI tool with the
following options and commands:
```
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister <lister_name> [lister_parameters]
```
Examples:
```
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister bitbucket
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister cran
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitea url=https://codeberg.org/api/v1/
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitlab url=https://salsa.debian.org/api/v4/
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister npm
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister pypi
```
Licensing
---------
This program is free software: you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the License, or (at your option) any later
version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details.
See top-level LICENSE file for the full text of the GNU General Public License
along with this program.
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 5 - Production/Stable
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: testing
diff --git a/swh/lister/github/lister.py b/swh/lister/github/lister.py
index 5bf5324..bbb1f63 100644
--- a/swh/lister/github/lister.py
+++ b/swh/lister/github/lister.py
@@ -1,280 +1,351 @@
# Copyright (C) 2020 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
from dataclasses import asdict, dataclass
import datetime
import logging
import random
import time
from typing import Any, Dict, Iterator, List, Optional
from urllib.parse import parse_qs, urlparse
import iso8601
import requests
+from tenacity import (
+ retry,
+ retry_any,
+ retry_if_exception_type,
+ retry_if_result,
+ wait_exponential,
+)
from swh.scheduler.interface import SchedulerInterface
from swh.scheduler.model import ListedOrigin
from .. import USER_AGENT
from ..pattern import CredentialsType, Lister
logger = logging.getLogger(__name__)
+def init_session(session: Optional[requests.Session] = None) -> requests.Session:
+ """Initialize a requests session with the proper headers for requests to
+ GitHub."""
+ if not session:
+ session = requests.Session()
+
+ session.headers.update(
+ {"Accept": "application/vnd.github.v3+json", "User-Agent": USER_AGENT}
+ )
+
+ return session
+
+
+class RateLimited(Exception):
+ def __init__(self, response):
+ self.reset_time: Optional[int]
+
+ # Figure out how long we need to sleep because of that rate limit
+ ratelimit_reset = response.headers.get("X-Ratelimit-Reset")
+ retry_after = response.headers.get("Retry-After")
+ if ratelimit_reset is not None:
+ self.reset_time = int(ratelimit_reset)
+ elif retry_after is not None:
+ self.reset_time = int(time.time()) + int(retry_after) + 1
+ else:
+ logger.warning(
+ "Received a rate-limit-like status code %s, but no rate-limit "
+ "headers set. Response content: %s",
+ response.status_code,
+ response.content,
+ )
+ self.reset_time = None
+ self.response = response
+
+
+@retry(
+ wait=wait_exponential(multiplier=1, min=4, max=10),
+ retry=retry_any(
+ # ChunkedEncodingErrors happen when the TLS connection gets reset, e.g.
+ # when running the lister on a connection with high latency
+ retry_if_exception_type(requests.exceptions.ChunkedEncodingError),
+ # 502 status codes happen for a Server Error, sometimes
+ retry_if_result(lambda r: r.status_code == 502),
+ ),
+)
+def github_request(
+ url: str, token: Optional[str] = None, session: Optional[requests.Session] = None
+) -> requests.Response:
+ session = init_session(session)
+
+ headers = {}
+ if token:
+ headers["Authorization"] = f"token {token}"
+
+ response = session.get(url, headers=headers)
+
+ anonymous = token is None and "Authorization" not in session.headers
+
+ if (
+ # GitHub returns inconsistent status codes between unauthenticated
+ # rate limit and authenticated rate limits. Handle both.
+ response.status_code == 429
+ or (anonymous and response.status_code == 403)
+ ):
+ raise RateLimited(response)
+
+ return response
+
+
@dataclass
class GitHubListerState:
"""State of the GitHub lister"""
last_seen_id: int = 0
"""Numeric id of the last repository listed on an incremental pass"""
class GitHubLister(Lister[GitHubListerState, List[Dict[str, Any]]]):
"""List origins from GitHub.
By default, the lister runs in incremental mode: it lists all repositories,
starting with the `last_seen_id` stored in the scheduler backend.
Providing the `first_id` and `last_id` arguments enables the "relisting" mode: in
that mode, the lister finds the origins present in the range **excluding**
`first_id` and **including** `last_id`. In this mode, the lister can overrun the
`last_id`: it will always record all the origins seen in a given page. As the lister
is fully idempotent, this is not a practical problem. Once relisting completes, the
lister state in the scheduler backend is not updated.
When the config contains a set of credentials, we shuffle this list at the beginning
of the listing. To follow GitHub's `abuse rate limit policy`_, we keep using the
same token over and over again, until its rate limit runs out. Once that happens, we
switch to the next token over in our shuffled list.
When a request fails with a rate limit exception for all tokens, we pause the
listing until the largest value for X-Ratelimit-Reset over all tokens.
When the credentials aren't set in the lister config, the lister can run in
anonymous mode too (e.g. for testing purposes).
.. _abuse rate limit policy: https://developer.github.com/v3/guides/best-practices-for-integrators/#dealing-with-abuse-rate-limits
Args:
first_id: the id of the first repo to list
last_id: stop listing after seeing a repo with an id higher than this value.
""" # noqa: E501
LISTER_NAME = "github"
API_URL = "https://api.github.com/repositories"
PAGE_SIZE = 1000
def __init__(
self,
scheduler: SchedulerInterface,
credentials: CredentialsType = None,
first_id: Optional[int] = None,
last_id: Optional[int] = None,
):
super().__init__(
scheduler=scheduler,
credentials=credentials,
url=self.API_URL,
instance="github",
)
self.first_id = first_id
self.last_id = last_id
self.relisting = self.first_id is not None or self.last_id is not None
- self.session = requests.Session()
- self.session.headers.update(
- {"Accept": "application/vnd.github.v3+json", "User-Agent": USER_AGENT}
- )
+ self.session = init_session()
random.shuffle(self.credentials)
self.anonymous = not self.credentials
if self.anonymous:
logger.warning("No tokens set in configuration, using anonymous mode")
self.token_index = -1
self.current_user: Optional[str] = None
if not self.anonymous:
# Initialize the first token value in the session headers
self.set_next_session_token()
def set_next_session_token(self) -> None:
"""Update the current authentication token with the next one in line."""
self.token_index = (self.token_index + 1) % len(self.credentials)
auth = self.credentials[self.token_index]
if "password" in auth:
token = auth["password"]
else:
token = auth["token"]
self.current_user = auth["username"]
logger.debug("Using authentication token for user %s", self.current_user)
self.session.headers.update({"Authorization": f"token {token}"})
def state_from_dict(self, d: Dict[str, Any]) -> GitHubListerState:
return GitHubListerState(**d)
def state_to_dict(self, state: GitHubListerState) -> Dict[str, Any]:
return asdict(state)
def get_pages(self) -> Iterator[List[Dict[str, Any]]]:
current_id = 0
if self.first_id is not None:
current_id = self.first_id
elif self.state is not None:
current_id = self.state.last_seen_id
current_url = f"{self.API_URL}?since={current_id}&per_page={self.PAGE_SIZE}"
while self.last_id is None or current_id < self.last_id:
logger.debug("Getting page %s", current_url)
# The following for/else loop handles rate limiting; if successful,
# it provides the rest of the function with a `response` object.
#
# If all tokens are rate-limited, we sleep until the reset time,
# then `continue` into another iteration of the outer while loop,
# attempting to get data from the same URL again.
max_attempts = 1 if self.anonymous else len(self.credentials)
reset_times: Dict[int, int] = {} # token index -> time
for attempt in range(max_attempts):
- response = self.session.get(current_url)
- if not (
- # GitHub returns inconsistent status codes between unauthenticated
- # rate limit and authenticated rate limits. Handle both.
- response.status_code == 429
- or (self.anonymous and response.status_code == 403)
- ):
- # Not rate limited, exit this loop.
+ try:
+ response = github_request(current_url, session=self.session)
break
-
- ratelimit_reset = response.headers.get("X-Ratelimit-Reset")
- if ratelimit_reset is None:
- logger.warning(
- "Rate-limit reached and X-Ratelimit-Reset value not found. "
- "Response content: %s",
- response.content,
- )
- else:
- reset_times[self.token_index] = int(ratelimit_reset)
-
- if not self.anonymous:
- logger.info(
- "Rate limit exhausted for current user %s (resetting at %s)",
- self.current_user,
- ratelimit_reset,
- )
- # Use next token in line
- self.set_next_session_token()
- # Wait one second to avoid triggering GitHub's abuse rate limits.
- time.sleep(1)
+ except RateLimited as e:
+ reset_info = "(unknown reset)"
+ if e.reset_time is not None:
+ reset_times[self.token_index] = e.reset_time
+ reset_info = "(resetting in %ss)" % (e.reset_time - time.time())
+
+ if not self.anonymous:
+ logger.info(
+ "Rate limit exhausted for current user %s %s",
+ self.current_user,
+ reset_info,
+ )
+ # Use next token in line
+ self.set_next_session_token()
+ # Wait one second to avoid triggering GitHub's abuse rate limits
+ time.sleep(1)
else:
# All tokens have been rate-limited. What do we do?
if not reset_times:
logger.warning(
"No X-Ratelimit-Reset value found in responses for any token; "
"Giving up."
)
break
sleep_time = max(reset_times.values()) - time.time() + 1
logger.info(
"Rate limits exhausted for all tokens. Sleeping for %f seconds.",
sleep_time,
)
time.sleep(sleep_time)
# This goes back to the outer page-by-page loop, doing one more
# iteration on the same page
continue
# We've successfully retrieved a (non-ratelimited) `response`. We
# still need to check it for validity.
if response.status_code != 200:
logger.warning(
"Got unexpected status_code %s: %s",
response.status_code,
response.content,
)
break
yield response.json()
if "next" not in response.links:
# No `next` link, we've reached the end of the world
logger.debug(
"No next link found in the response headers, all caught up"
)
break
# GitHub strongly advises to use the next link directly. We still
# parse it to get the id of the last repository we've reached so
# far.
next_url = response.links["next"]["url"]
parsed_url = urlparse(next_url)
if not parsed_url.query:
logger.warning("Failed to parse url %s", next_url)
break
parsed_query = parse_qs(parsed_url.query)
current_id = int(parsed_query["since"][0])
current_url = next_url
def get_origins_from_page(
self, page: List[Dict[str, Any]]
) -> Iterator[ListedOrigin]:
"""Convert a page of GitHub repositories into a list of ListedOrigins.
This records the html_url, as well as the pushed_at value if it exists.
"""
assert self.lister_obj.id is not None
for repo in page:
+ if not repo:
+ # null repositories in listings happen sometimes...
+ continue
+
pushed_at_str = repo.get("pushed_at")
pushed_at: Optional[datetime.datetime] = None
if pushed_at_str:
pushed_at = iso8601.parse_date(pushed_at_str)
yield ListedOrigin(
lister_id=self.lister_obj.id,
url=repo["html_url"],
visit_type="git",
last_update=pushed_at,
)
def commit_page(self, page: List[Dict[str, Any]]):
"""Update the currently stored state using the latest listed page"""
if self.relisting:
# Don't update internal state when relisting
return
+ if not page:
+ # Sometimes, when you reach the end of the world, GitHub returns an empty
+ # page of repositories
+ return
+
last_id = page[-1]["id"]
if last_id > self.state.last_seen_id:
self.state.last_seen_id = last_id
def finalize(self):
if self.relisting:
return
# Pull fresh lister state from the scheduler backend
scheduler_state = self.get_state_from_scheduler()
# Update the lister state in the backend only if the last seen id of
# the current run is higher than that stored in the database.
if self.state.last_seen_id > scheduler_state.last_seen_id:
self.updated = True

File Metadata

Mime Type
text/x-diff
Expires
Jul 4 2025, 9:26 AM (5 w, 5 d ago)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
3452499

Event Timeline