diff --git a/PKG-INFO b/PKG-INFO index 6bbaf3d..8d49371 100644 --- a/PKG-INFO +++ b/PKG-INFO @@ -1,125 +1,125 @@ Metadata-Version: 2.1 Name: swh.lister -Version: 3.0.1 +Version: 3.0.2 Summary: Software Heritage lister Home-page: https://forge.softwareheritage.org/diffusion/DLSGH/ Author: Software Heritage developers Author-email: swh-devel@inria.fr Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest Project-URL: Funding, https://www.softwareheritage.org/donate Project-URL: Source, https://forge.softwareheritage.org/source/swh-lister Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-lister/ Classifier: Programming Language :: Python :: 3 Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3) Classifier: Operating System :: OS Independent Classifier: Development Status :: 5 - Production/Stable Requires-Python: >=3.7 Description-Content-Type: text/markdown Provides-Extra: testing License-File: LICENSE swh-lister ========== This component from the Software Heritage stack aims to produce listings of software origins and their urls hosted on various public developer platforms or package managers. As these operations are quite similar, it provides a set of Python modules abstracting common software origins listing behaviors. It also provides several lister implementations, contained in the following Python modules: - `swh.lister.bitbucket` - `swh.lister.cgit` - `swh.lister.cran` - `swh.lister.debian` - `swh.lister.gitea` - `swh.lister.github` - `swh.lister.gitlab` - `swh.lister.gnu` - `swh.lister.golang` - `swh.lister.launchpad` - `swh.lister.maven` - `swh.lister.npm` - `swh.lister.packagist` - `swh.lister.phabricator` - `swh.lister.pypi` - `swh.lister.tuleap` - `swh.lister.gogs` Dependencies ------------ All required dependencies can be found in the `requirements*.txt` files located at the root of the repository. Local deployment ---------------- ## lister configuration Each lister implemented so far by Software Heritage (`bitbucket`, `cgit`, `cran`, `debian`, `gitea`, `github`, `gitlab`, `gnu`, `golang`, `launchpad`, `npm`, `packagist`, `phabricator`, `pypi`, `tuleap`, `maven`) must be configured by following the instructions below (please note that you have to replace `` by one of the lister name introduced above). ### Preparation steps 1. `mkdir ~/.config/swh/` 2. create configuration file `~/.config/swh/listers.yml` ### Configuration file sample Minimalistic configuration shared by all listers to add in file `~/.config/swh/listers.yml`: ```lang=yml scheduler: cls: 'remote' args: url: 'http://localhost:5008/' credentials: {} ``` Note: This expects scheduler (5008) service to run locally ## Executing a lister Once configured, a lister can be executed by using the `swh` CLI tool with the following options and commands: ``` $ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister [lister_parameters] ``` Examples: ``` $ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister bitbucket $ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister cran $ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitea url=https://codeberg.org/api/v1/ $ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitlab url=https://salsa.debian.org/api/v4/ $ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister npm $ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister pypi ``` Licensing --------- This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. See top-level LICENSE file for the full text of the GNU General Public License along with this program. diff --git a/debian/changelog b/debian/changelog index e46863f..eb58576 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,1229 +1,1233 @@ -swh-lister (3.0.1-1~swh1~bpo10+1) buster-swh; urgency=medium +swh-lister (3.0.2-1~swh1) unstable-swh; urgency=medium - * Rebuild for buster-swh + * New upstream release 3.0.2 - (tagged by Vincent SELLIER + on 2022-09-20 17:10:16 +0200) + * Upstream changes: - v3.0.2 - Changelog: - * 2022-09-20 + cgit: Ensure the clone url is searched on the right tab - * 2022- + 09-20 gogs: Skip pages with error 500 - -- Software Heritage autobuilder (on jenkins-debian1) Tue, 20 Sep 2022 09:42:00 +0000 + -- Software Heritage autobuilder (on jenkins-debian1) Tue, 20 Sep 2022 15:17:33 +0000 swh-lister (3.0.1-1~swh1) unstable-swh; urgency=medium * New upstream release 3.0.1 - (tagged by Antoine R. Dumont (@ardumont) on 2022-09-20 11:32:36 +0200) * Upstream changes: - v3.0.1 - golang: Update lister name - arch: Set log level to debug for URL requests - arch: Use tempfile module to create temporary directory - pubdev.lister: Decrease verbosity -- Software Heritage autobuilder (on jenkins-debian1) Tue, 20 Sep 2022 09:39:07 +0000 swh-lister (3.0.0-1~swh2) unstable-swh; urgency=medium * Fix build dependencies and bump new release -- Antoine R. Dumont (@ardumont) Thu, 08 Sep 2022 11:57:38 +0200 swh-lister (3.0.0-1~swh1) unstable-swh; urgency=medium * New upstream release 3.0.0 - (tagged by Antoine R. Dumont (@ardumont) on 2022-09-08 11:19:49 +0200) * Upstream changes: - v3.0.0 - Add new lister pubdev (Dart, Flutter) - Add new lister Arch User Repository (AUR) - Add new lister Golang - Add new lister Bower - Add new lister Gogs - maven: Use BeautifulSoup instead of xmltodict for parsing pom files - crates.lister: Implement incremental mode -- Software Heritage autobuilder (on jenkins-debian1) Thu, 08 Sep 2022 09:27:56 +0000 swh-lister (2.9.3-1~swh1) unstable-swh; urgency=medium * New upstream release 2.9.3 - (tagged by Antoine R. Dumont (@ardumont) on 2022-05-23 15:39:15 +0200) * Upstream changes: - v2.9.3 - Adapt maven lister to list canonical gh urls if any - Use swh.core.github.pytest_plugin in github tests -- Software Heritage autobuilder (on jenkins-debian1) Mon, 23 May 2022 13:47:34 +0000 swh-lister (2.9.2-1~swh1) unstable-swh; urgency=medium * New upstream release 2.9.2 - (tagged by Antoine R. Dumont (@ardumont) on 2022-05-10 10:22:12 +0200) * Upstream changes: - v2.9.2 - maven: Prevent UnicodeDecodeError when processing pom file -- Software Heritage autobuilder (on jenkins-debian1) Tue, 10 May 2022 08:27:22 +0000 swh-lister (2.9.1-1~swh1) unstable-swh; urgency=medium * New upstream release 2.9.1 - (tagged by Antoine R. Dumont (@ardumont) on 2022-04-29 14:45:18 +0200) * Upstream changes: - v2.9.1 - crates: Create one origin per package instead of per version - maven: Handle null mtime value in index for jar archive - maven: Remove extraction of groupId and artifactId from pom files - maven: Create one origin per package instead of one per package version - Bump mypy to v0.942 -- Software Heritage autobuilder (on jenkins-debian1) Fri, 29 Apr 2022 12:50:29 +0000 swh-lister (2.9.0-1~swh1) unstable-swh; urgency=medium * New upstream release 2.9.0 - (tagged by Valentin Lorentz on 2022-04-26 11:28:55 +0200) * Upstream changes: - v2.9.0 - * github: Remove dead code - * github: Refactor rate-limiting out of the GitHubLister class - * maven: Remove duplicated code related to setting instance from netloc -- Software Heritage autobuilder (on jenkins-debian1) Tue, 26 Apr 2022 09:34:54 +0000 swh-lister (2.8.2-1~swh1) unstable-swh; urgency=medium * New upstream release 2.8.2 - (tagged by Antoine R. Dumont (@ardumont) on 2022-04-25 12:34:14 +0200) * Upstream changes: - v2.8.2 - sourceforge: Fix listing of bzr projects - sourceforge: Do not consider Attic as a valid CVS module -- Software Heritage autobuilder (on jenkins-debian1) Mon, 25 Apr 2022 10:39:18 +0000 swh-lister (2.8.1-1~swh1) unstable-swh; urgency=medium * New upstream release 2.8.1 - (tagged by Antoine R. Dumont (@ardumont) on 2022-04-14 15:56:17 +0200) * Upstream changes: - v2.8.1 - maven: Fix argument of type 'NoneType' is not iterable -- Software Heritage autobuilder (on jenkins-debian1) Thu, 14 Apr 2022 14:01:42 +0000 swh-lister (2.8.0-1~swh2) unstable-swh; urgency=medium * Bump new release (fix build dep) -- Antoine R. Dumont (@ardumont) Thu, 14 Apr 2022 14:51:05 +0200 swh-lister (2.8.0-1~swh1) unstable-swh; urgency=medium * New upstream release 2.8.0 - (tagged by Antoine R. Dumont (@ardumont) on 2022-04-14 11:42:16 +0200) * Upstream changes: - v2.8.0 - lister: Add new rust crates lister - maven: Continue listing if unable to retrieve pom information - maven: log error message when not able to retrieve the index to read -- Software Heritage autobuilder (on jenkins-debian1) Thu, 14 Apr 2022 09:50:25 +0000 swh-lister (2.7.2-1~swh1) unstable-swh; urgency=medium * New upstream release 2.7.2 - (tagged by Antoine Lambert on 2022-03-11 13:34:15 +0100) * Upstream changes: - version 2.7.2 -- Software Heritage autobuilder (on jenkins-debian1) Fri, 11 Mar 2022 12:38:38 +0000 swh-lister (2.7.1-1~swh1) unstable-swh; urgency=medium * New upstream release 2.7.1 - (tagged by Antoine R. Dumont (@ardumont) on 2022-02-18 10:42:52 +0100) * Upstream changes: - v2.7.1 - launchpad: Ignore erratic page and continue listing next page -- Software Heritage autobuilder (on jenkins-debian1) Fri, 18 Feb 2022 09:46:37 +0000 swh-lister (2.7.0-1~swh1) unstable-swh; urgency=medium * New upstream release 2.7.0 - (tagged by Antoine R. Dumont (@ardumont) on 2022-02-17 13:56:23 +0100) * Upstream changes: - v2.7.0 - launchpad: Allow bzr origins listing - launchpad: Manage unhandled exceptions when listing - sourceforge: Fix origin URLs for CVS projects -- Software Heritage autobuilder (on jenkins-debian1) Thu, 17 Feb 2022 13:02:22 +0000 swh-lister (2.6.4-1~swh1) unstable-swh; urgency=medium * New upstream release 2.6.4 - (tagged by Antoine R. Dumont (@ardumont) on 2022-02-14 16:57:38 +0100) * Upstream changes: - v2.6.4 - sourceforge: fix support for listing bzr origins -- Software Heritage autobuilder (on jenkins-debian1) Mon, 14 Feb 2022 16:01:23 +0000 swh-lister (2.6.3-1~swh1) unstable-swh; urgency=medium * New upstream release 2.6.3 - (tagged by Antoine R. Dumont (@ardumont) on 2022-02-09 17:20:28 +0100) * Upstream changes: - v2.6.3 - maven: Fix last update datetime -- Software Heritage autobuilder (on jenkins-debian1) Wed, 09 Feb 2022 16:24:11 +0000 swh-lister (2.6.2-1~swh1) unstable-swh; urgency=medium * New upstream release 2.6.2 - (tagged by Antoine R. Dumont (@ardumont) on 2022-02-08 10:39:05 +0100) * Upstream changes: - v2.6.2 - Remove no longer needed tenacity workarounds - maven: Fix undef last_update in ListedOrigins. - maven: dismiss origins if they are malformed - e.g. wrong pom scm format, add test. - maven: Let logging instruction do the formatting - maven: Add more debug logging instruction - maven: Pass the base URL of the Maven instance to the loader - docs: Fix ReST syntax and sphinx warnings - Pin mypy and drop type annotations which makes mypy unhappy - requirements-test: Pin pytest to < 7.0.0 -- Software Heritage autobuilder (on jenkins-debian1) Tue, 08 Feb 2022 09:43:37 +0000 swh-lister (2.6.1-1~swh1) unstable-swh; urgency=medium * New upstream release 2.6.1 - (tagged by Antoine Lambert on 2021-12-06 10:47:19 +0100) * Upstream changes: - version 2.6.1 -- Software Heritage autobuilder (on jenkins-debian1) Mon, 06 Dec 2021 09:51:07 +0000 swh-lister (2.6.0-1~swh1) unstable-swh; urgency=medium * New upstream release 2.6.0 - (tagged by Antoine Lambert on 2021-12-03 16:17:52 +0100) * Upstream changes: - version 2.6.0 -- Software Heritage autobuilder (on jenkins-debian1) Fri, 03 Dec 2021 15:22:00 +0000 swh-lister (2.5.0-1~swh1) unstable-swh; urgency=medium * New upstream release 2.5.0 - (tagged by Antoine Lambert on 2021-12-03 14:44:36 +0100) * Upstream changes: - version 2.5.0 -- Software Heritage autobuilder (on jenkins-debian1) Fri, 03 Dec 2021 13:48:49 +0000 swh-lister (2.4.0-1~swh3) unstable-swh; urgency=medium * Fix changelog error and actual correct release -- Antoine R. Dumont (@ardumont) Fri, 03 Dec 2021 12:45:00 +0100 swh.lister (2.4.0-1~swh2) unstable-swh; urgency=medium * Update missing deps and release -- Antoine R. Dumont (@ardumont) Fri, 03 Dec 2021 12:37:13 +0100 swh-lister (2.4.0-1~swh1) unstable-swh; urgency=medium * New upstream release 2.4.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-12-03 12:17:36 +0100) * Upstream changes: - v2.4.0 - debian: Update extra_loader_arguments dict produced ListedOrigin models - debian: Add missing file URIs in lister output - Deduplicate origins in the GitHub lister - lister: Add new maven lister -- Software Heritage autobuilder (on jenkins-debian1) Fri, 03 Dec 2021 11:21:58 +0000 swh-lister (2.3.0-1~swh1) unstable-swh; urgency=medium * New upstream release 2.3.0 - (tagged by Valentin Lorentz on 2021-11-10 13:44:49 +0100) * Upstream changes: - v2.3.0 - * cran: Pass the package name to the loader -- Software Heritage autobuilder (on jenkins-debian1) Wed, 10 Nov 2021 13:03:02 +0000 swh-lister (2.2.0-1~swh1) unstable-swh; urgency=medium * New upstream release 2.2.0 - (tagged by Antoine Lambert on 2021-10-22 15:16:48 +0200) * Upstream changes: - version 2.2.0 -- Software Heritage autobuilder (on jenkins-debian1) Fri, 22 Oct 2021 13:23:02 +0000 swh-lister (2.1.0-1~swh1) unstable-swh; urgency=medium * New upstream release 2.1.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-10-13 10:16:37 +0200) * Upstream changes: - v2.1.0 - Let sourceforge origins be listed "enabled" by default - docs: Add a save forge documentation - docs: Explain task type registering to complete the save forge doc -- Software Heritage autobuilder (on jenkins-debian1) Wed, 13 Oct 2021 08:21:42 +0000 swh-lister (2.0.0-1~swh1) unstable-swh; urgency=medium * New upstream release 2.0.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-09-29 09:21:37 +0200) * Upstream changes: - v2.0.0 - opam: Share opam root directory even on multiple instances -- Software Heritage autobuilder (on jenkins-debian1) Wed, 29 Sep 2021 07:31:03 +0000 swh-lister (1.9.0-1~swh1) unstable-swh; urgency=medium * New upstream release 1.9.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-09-21 11:23:23 +0200) * Upstream changes: - v1.9.0 - gnu: Respect the pattern docstring about state initialization - opam: Allow defining where to actually install the opam_root folder - opam: Make the instance optional and derived from the url - opam: Move the state initialization into the get_pages method -- Software Heritage autobuilder (on jenkins-debian1) Tue, 21 Sep 2021 09:29:04 +0000 swh-lister (1.8.0-1~swh1) unstable-swh; urgency=medium * New upstream release 1.8.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-09-17 15:44:00 +0200) * Upstream changes: - v1.8.0 - Allow gitlab lister's name to be overridden by task arguments -- Software Heritage autobuilder (on jenkins-debian1) Fri, 17 Sep 2021 13:47:58 +0000 swh-lister (1.7.0-1~swh1) unstable-swh; urgency=medium * New upstream release 1.7.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-09-17 13:37:22 +0200) * Upstream changes: - v1.7.0 - gitlab: Allow ingestion of hg_git origins as hg ones (some instance can list tose e.g - foss.heptapod.net) -- Software Heritage autobuilder (on jenkins-debian1) Fri, 17 Sep 2021 11:41:52 +0000 swh-lister (1.6.0-1~swh1) unstable-swh; urgency=medium * New upstream release 1.6.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-09-17 10:50:28 +0200) * Upstream changes: - v1.6.0 - gitlab: Allow listing of instances providing multiple vcs_type -- Software Heritage autobuilder (on jenkins-debian1) Fri, 17 Sep 2021 08:55:14 +0000 swh-lister (1.5.0-1~swh1) unstable-swh; urgency=medium * New upstream release 1.5.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-07-23 16:28:50 +0200) * Upstream changes: - v1.5.0 - gitlab: Handle HTTP status code 500 when listing projects - gitlab: Update requests query parameters - gitlab: Adapt requests retry policy to consider HTTP 50x status codes - opam: Directly use the --root flag instead of using an env variable - pattern: Use URL network location as instance name when not provided -- Software Heritage autobuilder (on jenkins-debian1) Fri, 23 Jul 2021 14:32:51 +0000 swh-lister (1.4.0-1~swh2) unstable-swh; urgency=medium * Bump new release -- Antoine R. Dumont (@ardumont) Fri, 09 Jul 2021 13:17:00 +0200 swh-lister (1.4.0-1~swh1) unstable-swh; urgency=medium * New upstream release 1.4.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-07-09 13:01:04 +0200) * Upstream changes: - v1.4.0 - New Tuleap lister - New Opam lister - Make PyPI lister incremental - Make PyPI lister complete the information on origins -- Software Heritage autobuilder (on jenkins-debian1) Fri, 09 Jul 2021 11:06:37 +0000 swh-lister (1.3.6-1~swh1) unstable-swh; urgency=medium * New upstream release 1.3.6 - (tagged by Antoine R. Dumont (@ardumont) on 2021-06-04 11:59:24 +0200) * Upstream changes: - v1.3.6 - sourceforge: use http:// for Mercurial (as workaround) -- Software Heritage autobuilder (on jenkins-debian1) Fri, 04 Jun 2021 10:03:14 +0000 swh-lister (1.3.5-1~swh1) unstable-swh; urgency=medium * New upstream release 1.3.5 - (tagged by Antoine R. Dumont (@ardumont) on 2021-06-03 10:22:17 +0200) * Upstream changes: - v1.3.5 - sourceforge: set the protocol for origin urls -- Software Heritage autobuilder (on jenkins-debian1) Thu, 03 Jun 2021 08:26:13 +0000 swh-lister (1.3.4-1~swh1) unstable-swh; urgency=medium * New upstream release 1.3.4 - (tagged by Antoine R. Dumont (@ardumont) on 2021-05-31 16:54:37 +0200) * Upstream changes: - v1.3.4 - Disable the sourceforge lister origins (so they can be listed) -- Software Heritage autobuilder (on jenkins-debian1) Mon, 31 May 2021 15:08:17 +0000 swh-lister (1.3.3-1~swh1) unstable-swh; urgency=medium * New upstream release 1.3.3 - (tagged by Antoine R. Dumont (@ardumont) on 2021-05-28 14:18:53 +0200) * Upstream changes: - v1.3.3 - cgit/lister: Fix error when a missing version is not provided -- Software Heritage autobuilder (on jenkins-debian1) Fri, 28 May 2021 12:39:52 +0000 swh-lister (1.3.2-1~swh1) unstable-swh; urgency=medium * New upstream release 1.3.2 - (tagged by Antoine R. Dumont (@ardumont) on 2021-05-26 12:43:45 +0200) * Upstream changes: - v1.3.2 - sourceforge: retry for all retryable exceptions -- Software Heritage autobuilder (on jenkins-debian1) Wed, 26 May 2021 10:48:22 +0000 swh-lister (1.3.1-1~swh1) unstable-swh; urgency=medium * New upstream release 1.3.1 - (tagged by Antoine R. Dumont (@ardumont) on 2021-05-19 11:25:59 +0200) * Upstream changes: - v1.3.1 - sourceforge: don't abort on error for project -- Software Heritage autobuilder (on jenkins-debian1) Wed, 19 May 2021 09:30:14 +0000 swh-lister (1.3.0-1~swh1) unstable-swh; urgency=medium * New upstream release 1.3.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-05-07 17:17:50 +0200) * Upstream changes: - v1.3.0 - sourceforge/tasks: Allow incremental listing - sourceforge/lister: Add credentials parameter -- Software Heritage autobuilder (on jenkins-debian1) Fri, 07 May 2021 15:24:27 +0000 swh-lister (1.2.2-1~swh1) unstable-swh; urgency=medium * New upstream release 1.2.2 - (tagged by Antoine Lambert on 2021-05-07 14:43:24 +0200) * Upstream changes: - version 1.2.2 -- Software Heritage autobuilder (on jenkins-debian1) Fri, 07 May 2021 12:50:12 +0000 swh-lister (1.2.1-1~swh1) unstable-swh; urgency=medium * New upstream release 1.2.1 - (tagged by Antoine Lambert on 2021-05-07 14:10:36 +0200) * Upstream changes: - version 1.2.1 -- Software Heritage autobuilder (on jenkins-debian1) Fri, 07 May 2021 12:17:16 +0000 swh-lister (1.2.0-1~swh1) unstable-swh; urgency=medium * New upstream release 1.2.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-05-06 15:17:51 +0200) * Upstream changes: - v1.2.0 - Make the SourceForge lister incremental -- Software Heritage autobuilder (on jenkins-debian1) Fri, 07 May 2021 10:43:11 +0000 swh-lister (1.1.0-1~swh1) unstable-swh; urgency=medium * New upstream release 1.1.0 - (tagged by Antoine Lambert on 2021-04-29 14:29:27 +0200) * Upstream changes: - version 1.1.0 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 29 Apr 2021 12:33:59 +0000 swh-lister (1.0.0-1~swh1) unstable-swh; urgency=medium * New upstream release 1.0.0 - (tagged by Nicolas Dandrimont on 2021-03-22 10:56:04 +0100) * Upstream changes: - Release swh.lister v1.0.0 - All listers have been rewritten and are ready to be used in production - with the most recent version of the swh.scheduler APIs. -- Software Heritage autobuilder (on jenkins-debian1) Mon, 22 Mar 2021 10:13:35 +0000 swh-lister (0.10.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.10.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-03-01 09:59:16 +0100) * Upstream changes: - v0.10.0 - docs: Add new "howto write a lister tutorial" with unified lister api -- Software Heritage autobuilder (on jenkins-debian1) Mon, 01 Mar 2021 09:01:54 +0000 swh-lister (0.9.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.9.1 - (tagged by Antoine R. Dumont (@ardumont) on 2021-02-08 14:09:27 +0100) * Upstream changes: - v0.9.1 - debian: Update archive mirror URL templates to process -- Software Heritage autobuilder (on jenkins-debian1) Mon, 08 Feb 2021 13:12:05 +0000 swh-lister (0.9.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.9.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-02-08 08:50:07 +0100) * Upstream changes: - v0.9.0 - docs: Update listers execution instructions - cran: Prevent multiple listing of an origin - cran: Add support for parsing date with milliseconds - pypi: Use BeautifulSoup for parsing HTML instead of xmltodict -- Software Heritage autobuilder (on jenkins-debian1) Mon, 08 Feb 2021 07:52:57 +0000 swh-lister (0.8.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.8.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-02-03 11:12:52 +0100) * Upstream changes: - v0.8.0 - packagist: Reimplement lister using new Lister API - gnu: Remove dependency on pytz - Remove no longer used models field in dict returned by register - Remove no longer used legacy Lister API and update CLI options -- Software Heritage autobuilder (on jenkins-debian1) Wed, 03 Feb 2021 10:15:54 +0000 swh-lister (0.7.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.7.1 - (tagged by Vincent SELLIER on 2021-02-01 17:52:33 +0100) * Upstream changes: - v0.7.1 - * cgit: remove the repository urls's trailing / -- Software Heritage autobuilder (on jenkins-debian1) Mon, 01 Feb 2021 16:56:35 +0000 swh-lister (0.7.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.7.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-02-01 09:31:30 +0100) * Upstream changes: - v0.7.0 - pattern: Bump packet split to chunk of 1000 records - cgit: Compute origin urls out of a base git url when provided. - gnu: Reimplement lister using new Lister API -- Software Heritage autobuilder (on jenkins-debian1) Mon, 01 Feb 2021 08:35:14 +0000 swh-lister (0.6.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.6.1 - (tagged by Antoine R. Dumont (@ardumont) on 2021-01-29 09:07:21 +0100) * Upstream changes: - v0.6.1 - launchpad: Remove call to dataclasses.asdict on lister state - launchpad: Prevent error due to origin listed twice - Make debian lister constructors compatible with credentials - launchpad/tasks: Fix ping task function name - pattern: Make lister flush regularly origins to scheduler -- Software Heritage autobuilder (on jenkins-debian1) Fri, 29 Jan 2021 08:11:13 +0000 swh-lister (0.6.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.6.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-01-28 15:48:32 +0100) * Upstream changes: - v0.6.0 - launchpad: Reimplement lister using new Lister API - Make stateless lister constructors compatible with credentials -- Software Heritage autobuilder (on jenkins-debian1) Thu, 28 Jan 2021 14:52:49 +0000 swh-lister (0.5.4-1~swh1) unstable-swh; urgency=medium * New upstream release 0.5.4 - (tagged by Antoine R. Dumont (@ardumont) on 2021-01-28 11:23:29 +0100) * Upstream changes: - v0.5.4 - gitlab: Deal with missing or trailing / in url input - tox.ini: Work around build failure due to upstream release -- Software Heritage autobuilder (on jenkins-debian1) Thu, 28 Jan 2021 10:27:59 +0000 swh-lister (0.5.2-1~swh1) unstable-swh; urgency=medium * New upstream release 0.5.2 - (tagged by Antoine R. Dumont (@ardumont) on 2021-01-27 17:19:10 +0100) * Upstream changes: - v0.5.2 - test_cli: Drop launchpad lister from the test_get_lister -- Software Heritage autobuilder (on jenkins-debian1) Wed, 27 Jan 2021 16:25:31 +0000 swh-lister (0.5.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.5.1 - (tagged by Antoine R. Dumont (@ardumont) on 2021-01-27 16:39:20 +0100) * Upstream changes: - v0.5.1 - launchpad: Actually mock the anonymous login to launchpad - Drop no longer swh.lister.core.{indexing,page_by_page}_lister - tests: Drop unneeded reset instruction - cgit: Don't stop the listing when a repository page is not available -- Software Heritage autobuilder (on jenkins-debian1) Wed, 27 Jan 2021 15:47:39 +0000 swh-lister (0.5.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.5.0 - (tagged by Antoine R. Dumont (@ardumont) on 2021-01-27 14:33:24 +0100) * Upstream changes: - v0.5.0 - cgit: Add support for last_update information during listing - Port Debian lister to new lister api - gitlab: Implement keyset-based pagination listing - cran: Retrieve last update date for each listed package - Port CRAN lister to new lister api - gitlab: Add support for last_update information during listing - Port Gitea lister to new lister api - Port cgit lister to the new lister api - bitbucket: Pick random credentials in configuration and improve logging - Port Gitlab lister to the new lister api - Port Npm lister to new lister api - Port PyPI lister to new lister api - Port Bitbucket lister to new lister api - Port Phabricator lister to new lister api - Port GitHub lister to new lister api - Introduce a simpler base pattern for lister implementations -- Software Heritage autobuilder (on jenkins-debian1) Wed, 27 Jan 2021 13:40:34 +0000 swh-lister (0.4.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.4.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-11-23 15:47:05 +0100) * Upstream changes: - v0.4.0 - requirements: Rework dependencies - tests: Reduce db initialization fixtures to a minimum - Create listing task with a default of 3 if unspecified - lister.pytest_plugin: Simplify fixture setup - tests: Clarify listers test configuration -- Software Heritage autobuilder (on jenkins-debian1) Mon, 23 Nov 2020 14:52:03 +0000 swh-lister (0.3.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.3.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-10-19 09:50:43 +0200) * Upstream changes: - v0.3.0 - lister.config: Adapt scheduler configuration structure - drop mock_get_scheduler which creates indirection for no good reason -- Software Heritage autobuilder (on jenkins-debian1) Mon, 19 Oct 2020 07:56:17 +0000 swh-lister (0.2.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.2.1 - (tagged by Antoine R. Dumont (@ardumont) on 2020-10-07 14:02:42 +0200) * Upstream changes: - v0.2.1 - lister_base: Drop leftover mixin SWHConfig which is no longer used -- Software Heritage autobuilder (on jenkins-debian1) Wed, 07 Oct 2020 12:07:43 +0000 swh-lister (0.2.0-1~swh1) unstable-swh; urgency=medium * New upstream release 0.2.0 - (tagged by Antoine R. Dumont (@ardumont) on 2020-10-06 09:33:33 +0200) * Upstream changes: - v0.2.0 - lister*: Migrate away from SWHConfig mixin - tox.ini: pin black to the pre-commit version (19.10b0) to avoid flip-flops - Run isort after the CLI import changes -- Software Heritage autobuilder (on jenkins-debian1) Tue, 06 Oct 2020 07:36:07 +0000 swh-lister (0.1.5-1~swh1) unstable-swh; urgency=medium * New upstream release 0.1.5 - (tagged by David Douard on 2020-09-25 11:51:57 +0200) * Upstream changes: - v0.1.5 -- Software Heritage autobuilder (on jenkins-debian1) Fri, 25 Sep 2020 09:55:44 +0000 swh-lister (0.1.4-1~swh1) unstable-swh; urgency=medium * New upstream release 0.1.4 - (tagged by Antoine R. Dumont (@ardumont) on 2020-09-10 11:32:46 +0200) * Upstream changes: - v0.1.4 - gitea.lister: Fix uid to be unique across instance - utils.split_range: Split into not overlapping ranges - gitea.tasks: Fix parameter name from 'sort' to 'order' -- Software Heritage autobuilder (on jenkins-debian1) Thu, 10 Sep 2020 09:35:53 +0000 swh-lister (0.1.3-1~swh1) unstable-swh; urgency=medium * New upstream release 0.1.3 - (tagged by Vincent SELLIER on 2020-09-08 14:48:08 +0200) * Upstream changes: - v0.1.3 - Launchpad: rename task name to match conventions - tests: Separate lister instantiations -- Software Heritage autobuilder (on jenkins-debian1) Tue, 08 Sep 2020 12:53:22 +0000 swh-lister (0.1.2-1~swh1) unstable-swh; urgency=medium * New upstream release 0.1.2 - (tagged by Antoine R. Dumont (@ardumont) on 2020-09-02 13:07:30 +0200) * Upstream changes: - v0.1.2 - pytest_plugin: Instantiate only lister with no particular setup - pytest: Define plugin and declare it in the root conftest -- Software Heritage autobuilder (on jenkins-debian1) Wed, 02 Sep 2020 11:10:14 +0000 swh-lister (0.1.1-1~swh1) unstable-swh; urgency=medium * New upstream release 0.1.1 - (tagged by Antoine R. Dumont (@ardumont) on 2020-09-01 16:08:48 +0200) * Upstream changes: - v0.1.1 - test_cli: Exclude launchpad lister from the check -- Software Heritage autobuilder (on jenkins-debian1) Tue, 01 Sep 2020 14:11:46 +0000 swh-lister (0.1.0-1~swh2) unstable-swh; urgency=medium * Update dependencies -- Antoine R. Dumont (@ardumont) Wed, 26 Aug 2020 16:05:03 +0000 swh-lister (0.1.0-1~swh1) unstable-swh; urgency=medium [ Nicolas Dandrimont ] * Use setuptools-scm instead of vcversioner [ Software Heritage autobuilder (on jenkins-debian1) ] * New upstream release 0.1.0 - (tagged by David Douard on 2020-08-25 18:33:55 +0200) * Upstream changes: - v0.1.0 -- Software Heritage autobuilder (on jenkins-debian1) Tue, 25 Aug 2020 16:39:28 +0000 swh-lister (0.0.50-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.50 - (tagged by Antoine R. Dumont (@ardumont) on 2020-01-20 10:44:57 +0100) * Upstream changes: - v0.0.50 - github.lister: Filter out partial repositories which break listing - docs: Fix sphinx warnings - core.lister_base: Improve slightly docs and types -- Software Heritage autobuilder (on jenkins-debian1) Mon, 20 Jan 2020 09:51:23 +0000 swh-lister (0.0.49-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.49 - (tagged by Antoine R. Dumont (@ardumont) on 2020-01-17 14:20:35 +0100) * Upstream changes: - v0.0.49 - github.lister: Use Retry-After header when rate limit reached -- Software Heritage autobuilder (on jenkins-debian1) Fri, 17 Jan 2020 13:27:56 +0000 swh-lister (0.0.48-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.48 - (tagged by Antoine R. Dumont (@ardumont) on 2020-01-16 13:56:12 +0100) * Upstream changes: - v0.0.48 - cran.lister: Use cran's canonical url for origin url - cran.lister: Version uid so we can list new package versions - cran.lister: Adapt docstring sample accordingly -- Software Heritage autobuilder (on jenkins-debian1) Thu, 16 Jan 2020 13:03:54 +0000 swh-lister (0.0.47-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.47 - (tagged by Antoine R. Dumont (@ardumont) on 2020-01-09 10:26:18 +0100) * Upstream changes: - v0.0.47 - cran.lister: Align loading tasks' with loader's expectation -- Software Heritage autobuilder (on jenkins-debian1) Thu, 09 Jan 2020 09:34:26 +0000 swh-lister (0.0.46-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.46 - (tagged by Antoine R. Dumont (@ardumont) on 2019-12-19 14:09:45 +0100) * Upstream changes: - v0.0.46 - lister.debian: Make debian init step idempotent and up-to-date - lister_base: Split into chunks the tasks prior to creation -- Software Heritage autobuilder (on jenkins-debian1) Thu, 19 Dec 2019 13:16:45 +0000 swh-lister (0.0.45-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.45 - (tagged by Antoine R. Dumont (@ardumont) on 2019-12-10 11:27:17 +0100) * Upstream changes: - v0.0.45 - core: Align listers' task output (hg/git tasks) with expected format - npm: Align lister's loader output tasks with expected format - lister/tasks: Standardize return statements -- Software Heritage autobuilder (on jenkins-debian1) Tue, 10 Dec 2019 10:32:45 +0000 swh-lister (0.0.44-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.44 - (tagged by Nicolas Dandrimont on 2019-11-22 16:15:54 +0100) * Upstream changes: - Release swh.lister v0.0.44 - Define proper User Agents everywhere -- Software Heritage autobuilder (on jenkins-debian1) Fri, 22 Nov 2019 15:31:33 +0000 swh-lister (0.0.43-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.43 - (tagged by Antoine R. Dumont (@ardumont) on 2019-11-21 18:46:35 +0100) * Upstream changes: - v0.0.43 - lister.pypi: Align lister with pypi package loader - lister.npm: Align lister with npm package loader - lister.tests: Avoid duplication setup step - Fix typos (and trailing ws) reported by codespell - Add a pre-commit config file -- Software Heritage autobuilder (on jenkins-debian1) Thu, 21 Nov 2019 17:56:34 +0000 swh-lister (0.0.42-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.42 - (tagged by Antoine R. Dumont (@ardumont) on 2019-11-21 13:52:16 +0100) * Upstream changes: - v0.0.42 - cran/gnu: Rename task_type to load-archive-files - lister.tests: Add missing task_type for package listers - Migrate tox.ini to extras = xxx instead of deps = .[testing] - Merge tox environments - Include all requirements in MANIFEST.in - lister.cli: Remove task type register cli -- Software Heritage autobuilder (on jenkins-debian1) Thu, 21 Nov 2019 13:00:29 +0000 swh-lister (0.0.41-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.41 - (tagged by Antoine R. Dumont (@ardumont) on 2019-11-15 12:02:13 +0100) * Upstream changes: - v0.0.41 - simple_lister: Flush to db more frequently - gnu.lister: Use url as primary key - gnu.lister.tests: Add missing assertion - gnu.lister: Add missing retries_left parameter - debian.models: Migrate tests from storage to debian lister model -- Software Heritage autobuilder (on jenkins-debian1) Fri, 15 Nov 2019 11:06:35 +0000 swh-lister (0.0.40-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.40 - (tagged by Nicolas Dandrimont on 2019-11-13 13:54:38 +0100) * Upstream changes: - Release swh.lister 0.0.40 - Fix bogus NotImplementedError on Area.index_uris -- Software Heritage autobuilder (on jenkins-debian1) Wed, 13 Nov 2019 13:02:08 +0000 swh-lister (0.0.39-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.39 - (tagged by Nicolas Dandrimont on 2019-11-13 13:23:31 +0100) * Upstream changes: - Release swh.lister 0.0.39 - Properly register all tasks - Fix up db_partition_indices to avoid expensive scans -- Software Heritage autobuilder (on jenkins-debian1) Wed, 13 Nov 2019 12:28:33 +0000 swh-lister (0.0.38-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.38 - (tagged by Antoine R. Dumont (@ardumont) on 2019-11-06 15:55:46 +0100) * Upstream changes: - v0.0.38 - Remove swh.storage.schemata remnants -- Software Heritage autobuilder (on jenkins-debian1) Wed, 06 Nov 2019 15:00:16 +0000 swh-lister (0.0.37-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.37 - (tagged by Antoine R. Dumont (@ardumont) on 2019-11-06 15:06:51 +0100) * Upstream changes: - v0.0.37 - Update swh-core dependency -- Software Heritage autobuilder (on jenkins-debian1) Wed, 06 Nov 2019 14:18:31 +0000 swh-lister (0.0.36-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.36 - (tagged by Antoine R. Dumont (@ardumont) on 2019-11-06 11:33:33 +0100) * Upstream changes: - v0.0.36 - lister.*.tests: Add at least one integration test - gnu.lister: Move gnu listers specifity within the lister's scope - debian/lister: Use url parameter name instead of origin - debian/model: Install lister model within the lister repository - lister.*.tasks: Stop binding tasks to a specific instance of the - celery app - cran.lister: Refactor and fix cran lister - github/lister: Prevent erroneous scheduler tasks disabling - phabricator/lister: Fix lister - setup.py: Kill deprecated swh- lister command - Bootstrap typing annotations -- Software Heritage autobuilder (on jenkins-debian1) Wed, 06 Nov 2019 10:55:41 +0000 swh-lister (0.0.35-1~swh4) unstable-swh; urgency=medium * Fix runtime dependencies -- Antoine R. Dumont (@ardumont) Wed, 11 Sep 2019 10:58:01 +0200 swh-lister (0.0.35-1~swh3) unstable-swh; urgency=medium * Bump dh-python to >= 3 for pybuild.testfiles. -- Nicolas Dandrimont Tue, 10 Sep 2019 14:58:11 +0200 swh-lister (0.0.35-1~swh2) unstable-swh; urgency=medium * Add egg-info to pybuild.testfiles. Close T1995. -- Nicolas Dandrimont Tue, 10 Sep 2019 14:36:22 +0200 swh-lister (0.0.35-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.35 - (tagged by Antoine R. Dumont (@ardumont) on 2019-09-09 12:14:42 +0200) * Upstream changes: - v0.0.35 - Fix debian package -- Software Heritage autobuilder (on jenkins-debian1) Mon, 09 Sep 2019 10:19:02 +0000 swh-lister (0.0.34-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.34 - (tagged by Antoine R. Dumont (@ardumont) on 2019-09-06 14:03:39 +0200) * Upstream changes: - v0.0.34 - listers: Implement listers as plugins - cgit: rewrite the CGit lister (and add more tests) - listers: simplify and unify constructor use - phabricator: randomly select the API token in the provided list - docs: Fix toc -- Software Heritage autobuilder (on jenkins-debian1) Fri, 06 Sep 2019 12:09:13 +0000 swh-lister (0.0.33-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.33 - (tagged by Antoine R. Dumont (@ardumont) on 2019-08-29 10:23:20 +0200) * Upstream changes: - v0.0.33 - lister.cli: Allow to list forges with policy and priority - listers: Add New packagist lister - listers: Allow to override policy and priority for scheduled tasks - tests: Add tests to cli, pypi and improve lister core's - docs: Add code of conduct document -- Software Heritage autobuilder (on jenkins-debian1) Thu, 29 Aug 2019 08:28:23 +0000 swh-lister (0.0.32-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.32 - (tagged by Antoine R. Dumont (@ardumont) on 2019-06-28 18:21:50 +0200) * Upstream changes: - v0.0.32 - Clean up dead code - Add missing *.html sample for tests to run in packaging -- Software Heritage autobuilder (on jenkins-debian1) Fri, 28 Jun 2019 16:42:05 +0000 swh-lister (0.0.31-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.31 - (tagged by Antoine R. Dumont (@ardumont) on 2019-06-28 17:57:48 +0200) * Upstream changes: - v0.0.31 - Add cgit instance lister - Add back description in cran lister - Update contributors -- Software Heritage autobuilder (on jenkins-debian1) Fri, 28 Jun 2019 16:06:25 +0000 swh-lister (0.0.30-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.30 - (tagged by Antoine R. Dumont (@ardumont) on 2019-06-26 14:52:13 +0200) * Upstream changes: - v0.0.30 - Drop last description mentions for gitlab and cran listers. -- Software Heritage autobuilder (on jenkins-debian1) Wed, 26 Jun 2019 13:02:11 +0000 swh-lister (0.0.29-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.29 - (tagged by Antoine R. Dumont (@ardumont) on 2019-06-26 12:37:14 +0200) * Upstream changes: - v0.0.29 - lister: Fix bitbucket lister -- Software Heritage autobuilder (on jenkins-debian1) Wed, 26 Jun 2019 10:47:20 +0000 swh-lister (0.0.28-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.28 - (tagged by Antoine R. Dumont (@ardumont) on 2019-06-20 12:00:09 +0200) * Upstream changes: - v0.0.28 - listers: Remove unused columns `origin_id` / `description` - gnu-lister: Use origin-type as 'tar' (and not 'gnu') - phabricator: Remove unused code -- Software Heritage autobuilder (on jenkins-debian1) Thu, 20 Jun 2019 10:07:48 +0000 swh-lister (0.0.27-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.27 - (tagged by Antoine R. Dumont (@ardumont) on 2019-06-18 10:27:09 +0200) * Upstream changes: - v0.0.27 - Unify lister tablenames to use consistently singular - Add missing instance field to phabricator repository model -- Software Heritage autobuilder (on jenkins-debian1) Tue, 18 Jun 2019 08:44:38 +0000 swh-lister (0.0.26-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.26 - (tagged by Antoine R. Dumont (@ardumont) on 2019-06-17 17:53:33 +0200) * Upstream changes: - v0.0.26 - phabricator.lister: Use credentials setup from configuration file - gitlab.lister: Remove request_params method override -- Software Heritage autobuilder (on jenkins-debian1) Mon, 17 Jun 2019 16:05:05 +0000 swh-lister (0.0.25-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.25 - (tagged by Antoine R. Dumont (@ardumont) on 2019-06-13 15:54:42 +0200) * Upstream changes: - v0.0.25 - Add new cran lister - listers: Stop creating origins when scheduling new tasks -- Software Heritage autobuilder (on jenkins-debian1) Thu, 13 Jun 2019 13:59:30 +0000 swh-lister (0.0.24-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.24 - (tagged by Antoine R. Dumont (@ardumont) on 2019-06-12 12:02:54 +0200) * Upstream changes: - v0.0.24 - swh.lister.gnu: Add new gnu lister -- Software Heritage autobuilder (on jenkins-debian1) Wed, 12 Jun 2019 10:10:56 +0000 swh-lister (0.0.23-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.23 - (tagged by Antoine R. Dumont (@ardumont) on 2019-05-29 14:04:22 +0200) * Upstream changes: - v0.0.23 - lister: Unify credentials structure between listers -- Software Heritage autobuilder (on jenkins-debian1) Wed, 29 May 2019 12:10:51 +0000 swh-lister (0.0.22-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.22 - (tagged by Antoine Lambert on 2019-05-23 10:59:39 +0200) * Upstream changes: - version 0.0.22 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 23 May 2019 09:05:34 +0000 swh-lister (0.0.21-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.21 - (tagged by Antoine Lambert on 2019-04-11 11:00:55 +0200) * Upstream changes: - version 0.0.21 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 11 Apr 2019 09:05:30 +0000 swh-lister (0.0.20-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.20 - (tagged by Antoine R. Dumont (@ardumont) on 2019-02-14 10:50:06 +0100) * Upstream changes: - v0.0.20 - d/*: debian packaging files migrated to separated branches - lister.cli: Fix spelling typo -- Software Heritage autobuilder (on jenkins-debian1) Thu, 14 Feb 2019 09:59:29 +0000 swh-lister (0.0.19-1~swh1) unstable-swh; urgency=medium * New upstream release 0.0.19 - (tagged by David Douard on 2019-02-07 17:36:33 +0100) * Upstream changes: - v0.0.19 -- Software Heritage autobuilder (on jenkins-debian1) Thu, 07 Feb 2019 16:42:39 +0000 swh-lister (0.0.18-1~swh1) unstable-swh; urgency=medium * v0.0.18 * docs: add title and brief module description * gitlab.lister: Break asap when problem exists during fetch info * gitlab.lister: Do not expect gitlab instances to have credentials * setup: prepare for pypi upload * gitlab/models.py: drop unused import -- Antoine R. Dumont (@ardumont) Mon, 08 Oct 2018 15:54:12 +0200 swh-lister (0.0.17-1~swh1) unstable-swh; urgency=medium * v0.0.17 * Change pypi project url to use the /project api -- Antoine R. Dumont (@ardumont) Tue, 18 Sep 2018 11:35:25 +0200 swh-lister (0.0.16-1~swh1) unstable-swh; urgency=medium * v0.0.16 * Normalize PyPI name -- Antoine R. Dumont (@ardumont) Fri, 14 Sep 2018 13:25:56 +0200 swh-lister (0.0.15-1~swh1) unstable-swh; urgency=medium * v0.0.15 * Add pypi lister -- Antoine R. Dumont (@ardumont) Thu, 06 Sep 2018 17:09:25 +0200 swh-lister (0.0.14-1~swh1) unstable-swh; urgency=medium * v0.0.14 * core.lister_base: Batch create origins (storage) & tasks (scheduler) * swh.lister.cli: Add debian lister to the list of supported listers * README.md: Update to demo the lister debian run -- Antoine R. Dumont (@ardumont) Tue, 31 Jul 2018 15:46:12 +0200 swh-lister (0.0.13-1~swh1) unstable-swh; urgency=medium * v0.0.13 * Fix missing use cases when unable to retrieve information from the api * server * gitlab/lister: Allow specifying the number of elements to * read (default is 20, same as the current gitlab api) -- Antoine R. Dumont (@ardumont) Fri, 20 Jul 2018 13:46:04 +0200 swh-lister (0.0.12-1~swh1) unstable-swh; urgency=medium * v0.0.12 * swh.lister.gitlab.tasks: Use gitlab as instance name for gitlab.com * README.md: Add gitlab to the lister implementations referenced * core/lister_base: Remove unused import -- Antoine R. Dumont (@ardumont) Thu, 19 Jul 2018 11:29:14 +0200 swh-lister (0.0.11-1~swh1) unstable-swh; urgency=medium * v0.0.11 * lister/gitlab: Add gitlab lister * docs: Update documentation to demonstrate how to run a lister locally * core/lister: Make the listers' scheduler configuration adaptable * debian/*: Fix debian packaging tests -- Antoine R. Dumont (@ardumont) Wed, 18 Jul 2018 14:16:56 +0200 swh-lister (0.0.10-1~swh1) unstable-swh; urgency=medium * Release swh.lister v0.0.10 * Add missing task_queue attribute for debian listing tasks * Make sure tests run during build * Clean up runtime dependencies -- Nicolas Dandrimont Mon, 30 Oct 2017 17:37:25 +0100 swh-lister (0.0.9-1~swh1) unstable-swh; urgency=medium * Release swh.lister v0.0.9 * Add tasks for the Debian lister -- Nicolas Dandrimont Mon, 30 Oct 2017 14:20:58 +0100 swh-lister (0.0.8-1~swh1) unstable-swh; urgency=medium * Release swh.lister v0.0.8 * Add versioned dependency on sqlalchemy -- Nicolas Dandrimont Fri, 13 Oct 2017 12:15:38 +0200 swh-lister (0.0.7-1~swh1) unstable-swh; urgency=medium * Release swh.lister version 0.0.7 * Update packaging runes -- Nicolas Dandrimont Thu, 12 Oct 2017 18:07:52 +0200 swh-lister (0.0.6-1~swh1) unstable-swh; urgency=medium * Release swh.lister v0.0.6 * Add new debian lister -- Nicolas Dandrimont Wed, 11 Oct 2017 17:59:47 +0200 swh-lister (0.0.5-1~swh1) unstable-swh; urgency=medium * Release swh.lister 0.0.5 * Make the lister more generic * Add bitbucket lister * Update tasks to new swh.scheduler API -- Nicolas Dandrimont Mon, 12 Jun 2017 18:22:13 +0200 swh-lister (0.0.4-1~swh1) unstable-swh; urgency=medium * v0.0.4 * Update storage configuration reading -- Antoine R. Dumont (@ardumont) Thu, 15 Dec 2016 19:07:24 +0100 swh-lister (0.0.3-1~swh1) unstable-swh; urgency=medium * Release swh.lister.github v0.0.3 * Generate swh.scheduler tasks and swh.storage origins on the fly * Use celery tasks to schedule own work -- Nicolas Dandrimont Thu, 20 Oct 2016 17:30:39 +0200 swh-lister (0.0.2-1~swh1) unstable-swh; urgency=medium * Release swh.lister.github 0.0.2 * Move constants to a constants module to avoid circular imports -- Nicolas Dandrimont Thu, 17 Mar 2016 20:35:11 +0100 swh-lister (0.0.1-1~swh1) unstable-swh; urgency=medium * Initial release * Release swh.lister.github v0.0.1 -- Nicolas Dandrimont Thu, 17 Mar 2016 19:01:20 +0100 diff --git a/swh.lister.egg-info/PKG-INFO b/swh.lister.egg-info/PKG-INFO index 6bbaf3d..8d49371 100644 --- a/swh.lister.egg-info/PKG-INFO +++ b/swh.lister.egg-info/PKG-INFO @@ -1,125 +1,125 @@ Metadata-Version: 2.1 Name: swh.lister -Version: 3.0.1 +Version: 3.0.2 Summary: Software Heritage lister Home-page: https://forge.softwareheritage.org/diffusion/DLSGH/ Author: Software Heritage developers Author-email: swh-devel@inria.fr Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest Project-URL: Funding, https://www.softwareheritage.org/donate Project-URL: Source, https://forge.softwareheritage.org/source/swh-lister Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-lister/ Classifier: Programming Language :: Python :: 3 Classifier: Intended Audience :: Developers Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3) Classifier: Operating System :: OS Independent Classifier: Development Status :: 5 - Production/Stable Requires-Python: >=3.7 Description-Content-Type: text/markdown Provides-Extra: testing License-File: LICENSE swh-lister ========== This component from the Software Heritage stack aims to produce listings of software origins and their urls hosted on various public developer platforms or package managers. As these operations are quite similar, it provides a set of Python modules abstracting common software origins listing behaviors. It also provides several lister implementations, contained in the following Python modules: - `swh.lister.bitbucket` - `swh.lister.cgit` - `swh.lister.cran` - `swh.lister.debian` - `swh.lister.gitea` - `swh.lister.github` - `swh.lister.gitlab` - `swh.lister.gnu` - `swh.lister.golang` - `swh.lister.launchpad` - `swh.lister.maven` - `swh.lister.npm` - `swh.lister.packagist` - `swh.lister.phabricator` - `swh.lister.pypi` - `swh.lister.tuleap` - `swh.lister.gogs` Dependencies ------------ All required dependencies can be found in the `requirements*.txt` files located at the root of the repository. Local deployment ---------------- ## lister configuration Each lister implemented so far by Software Heritage (`bitbucket`, `cgit`, `cran`, `debian`, `gitea`, `github`, `gitlab`, `gnu`, `golang`, `launchpad`, `npm`, `packagist`, `phabricator`, `pypi`, `tuleap`, `maven`) must be configured by following the instructions below (please note that you have to replace `` by one of the lister name introduced above). ### Preparation steps 1. `mkdir ~/.config/swh/` 2. create configuration file `~/.config/swh/listers.yml` ### Configuration file sample Minimalistic configuration shared by all listers to add in file `~/.config/swh/listers.yml`: ```lang=yml scheduler: cls: 'remote' args: url: 'http://localhost:5008/' credentials: {} ``` Note: This expects scheduler (5008) service to run locally ## Executing a lister Once configured, a lister can be executed by using the `swh` CLI tool with the following options and commands: ``` $ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister [lister_parameters] ``` Examples: ``` $ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister bitbucket $ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister cran $ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitea url=https://codeberg.org/api/v1/ $ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitlab url=https://salsa.debian.org/api/v4/ $ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister npm $ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister pypi ``` Licensing --------- This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. See top-level LICENSE file for the full text of the GNU General Public License along with this program. diff --git a/swh.lister.egg-info/SOURCES.txt b/swh.lister.egg-info/SOURCES.txt index dad21b0..550a6ba 100644 --- a/swh.lister.egg-info/SOURCES.txt +++ b/swh.lister.egg-info/SOURCES.txt @@ -1,331 +1,337 @@ .git-blame-ignore-revs .gitignore .pre-commit-config.yaml ACKNOWLEDGEMENTS CODE_OF_CONDUCT.md CONTRIBUTORS LICENSE MANIFEST.in Makefile README.md conftest.py mypy.ini pyproject.toml pytest.ini requirements-swh.txt requirements-test.txt requirements.txt setup.cfg setup.py tox.ini docs/.gitignore docs/Makefile docs/cli.rst docs/conf.py docs/index.rst docs/new_lister_template.py docs/run_a_new_lister.rst docs/save_forge.rst docs/tutorial.rst docs/_static/.placeholder docs/_templates/.placeholder docs/images/new_base.png docs/images/new_bitbucket_lister.png docs/images/new_github_lister.png docs/images/old_github_lister.png sql/crawler.sql sql/pimp_db.sql swh/__init__.py swh.lister.egg-info/PKG-INFO swh.lister.egg-info/SOURCES.txt swh.lister.egg-info/dependency_links.txt swh.lister.egg-info/entry_points.txt swh.lister.egg-info/requires.txt swh.lister.egg-info/top_level.txt swh/lister/__init__.py swh/lister/cli.py swh/lister/pattern.py swh/lister/py.typed swh/lister/utils.py swh/lister/arch/__init__.py swh/lister/arch/lister.py swh/lister/arch/tasks.py swh/lister/arch/tests/__init__.py swh/lister/arch/tests/test_lister.py swh/lister/arch/tests/test_tasks.py swh/lister/arch/tests/data/fake_archlinux_archives_init.sh swh/lister/arch/tests/data/https_archive.archlinux.org/packages_d_dialog swh/lister/arch/tests/data/https_archive.archlinux.org/packages_g_gnome-code-assistance swh/lister/arch/tests/data/https_archive.archlinux.org/packages_g_gzip swh/lister/arch/tests/data/https_archive.archlinux.org/packages_l_libasyncns swh/lister/arch/tests/data/https_archive.archlinux.org/packages_m_mercurial swh/lister/arch/tests/data/https_archive.archlinux.org/packages_p_python-hglib swh/lister/arch/tests/data/https_archive.archlinux.org/repos_last_community_os_x86_64_community.files.tar.gz swh/lister/arch/tests/data/https_archive.archlinux.org/repos_last_core_os_x86_64_core.files.tar.gz swh/lister/arch/tests/data/https_archive.archlinux.org/repos_last_extra_os_x86_64_extra.files.tar.gz swh/lister/arch/tests/data/https_uk.mirror.archlinuxarm.org/aarch64_community_community.files.tar.gz swh/lister/arch/tests/data/https_uk.mirror.archlinuxarm.org/aarch64_core_core.files.tar.gz swh/lister/arch/tests/data/https_uk.mirror.archlinuxarm.org/aarch64_extra_extra.files.tar.gz swh/lister/arch/tests/data/https_uk.mirror.archlinuxarm.org/armv7h_community_community.files.tar.gz swh/lister/arch/tests/data/https_uk.mirror.archlinuxarm.org/armv7h_core_core.files.tar.gz swh/lister/arch/tests/data/https_uk.mirror.archlinuxarm.org/armv7h_extra_extra.files.tar.gz swh/lister/aur/__init__.py swh/lister/aur/lister.py swh/lister/aur/tasks.py swh/lister/aur/tests/__init__.py swh/lister/aur/tests/test_lister.py swh/lister/aur/tests/test_tasks.py swh/lister/aur/tests/data/fake_aur_packages.sh swh/lister/aur/tests/data/packages-meta-v1.json.gz swh/lister/bitbucket/__init__.py swh/lister/bitbucket/lister.py swh/lister/bitbucket/tasks.py swh/lister/bitbucket/tests/__init__.py swh/lister/bitbucket/tests/test_lister.py swh/lister/bitbucket/tests/test_tasks.py swh/lister/bitbucket/tests/data/bb_api_repositories_page1.json swh/lister/bitbucket/tests/data/bb_api_repositories_page2.json swh/lister/bower/__init__.py swh/lister/bower/lister.py swh/lister/bower/tasks.py swh/lister/bower/tests/__init__.py swh/lister/bower/tests/test_lister.py swh/lister/bower/tests/test_tasks.py swh/lister/bower/tests/data/https_registry.bower.io/packages swh/lister/cgit/__init__.py swh/lister/cgit/lister.py swh/lister/cgit/tasks.py swh/lister/cgit/tests/__init__.py swh/lister/cgit/tests/repo_list.txt swh/lister/cgit/tests/test_lister.py swh/lister/cgit/tests/test_tasks.py +swh/lister/cgit/tests/data/https_git.acdw.net/Readme.md +swh/lister/cgit/tests/data/https_git.acdw.net/cgit +swh/lister/cgit/tests/data/https_git.acdw.net/foo +swh/lister/cgit/tests/data/https_git.acdw.net/foo_summary +swh/lister/cgit/tests/data/https_git.acdw.net/sfeed +swh/lister/cgit/tests/data/https_git.acdw.net/sfeed_summary swh/lister/cgit/tests/data/https_git.baserock.org/cgit swh/lister/cgit/tests/data/https_git.eclipse.org/c swh/lister/cgit/tests/data/https_git.savannah.gnu.org/README swh/lister/cgit/tests/data/https_git.savannah.gnu.org/cgit swh/lister/cgit/tests/data/https_git.savannah.gnu.org/cgit_elisp-es.git swh/lister/cgit/tests/data/https_git.tizen/README swh/lister/cgit/tests/data/https_git.tizen/cgit swh/lister/cgit/tests/data/https_git.tizen/cgit,ofs=100 swh/lister/cgit/tests/data/https_git.tizen/cgit,ofs=50 swh/lister/cgit/tests/data/https_git.tizen/cgit_All-Projects swh/lister/cgit/tests/data/https_git.tizen/cgit_All-Users swh/lister/cgit/tests/data/https_git.tizen/cgit_Lock-Projects swh/lister/cgit/tests/data/https_git.tizen/cgit_adaptation_alsa-scenario-scn-data-0-base swh/lister/cgit/tests/data/https_git.tizen/cgit_adaptation_alsa-scenario-scn-data-0-mc1n2 swh/lister/cgit/tests/data/https_git.tizen/cgit_adaptation_ap_samsung_audio-hal-e3250 swh/lister/cgit/tests/data/https_git.tizen/cgit_adaptation_ap_samsung_audio-hal-e4x12 swh/lister/cgit/tests/data/https_git.tizen/cgit_adaptation_devices_nfc-plugin-nxp swh/lister/cgit/tests/data/https_git.tizen/cgit_adaptation_intel_mfld_bootstub-mfld-blackbay swh/lister/cgit/tests/data/https_git.tizen/cgit_adaptation_mtdev swh/lister/cgit/tests/data/https_git.tizen/cgit_adaptation_opengl-es-virtual-drv swh/lister/cgit/tests/data/https_git.tizen/cgit_adaptation_panda_libdrm swh/lister/cgit/tests/data/https_git.tizen/cgit_adaptation_panda_libnl swh/lister/cgit/tests/data/https_git.tizen/cgit_adaptation_xorg_driver_xserver-xorg-misc swh/lister/cgit/tests/data/https_git.tizen/cgit_apps_core_preloaded_ug-setting-gallery-efl swh/lister/cgit/tests/data/https_git.tizen/cgit_apps_core_preloaded_ug-setting-homescreen-efl swh/lister/cgit/tests/data/https_jff.email/cgit swh/lister/cran/__init__.py swh/lister/cran/list_all_packages.R swh/lister/cran/lister.py swh/lister/cran/tasks.py swh/lister/cran/tests/__init__.py swh/lister/cran/tests/test_lister.py swh/lister/cran/tests/test_tasks.py swh/lister/cran/tests/data/list-r-packages.json swh/lister/crates/__init__.py swh/lister/crates/lister.py swh/lister/crates/tasks.py swh/lister/crates/tests/__init__.py swh/lister/crates/tests/test_lister.py swh/lister/crates/tests/test_tasks.py swh/lister/crates/tests/data/fake-crates-repository.tar.gz swh/lister/crates/tests/data/fake_crates_repository_init.sh swh/lister/debian/__init__.py swh/lister/debian/lister.py swh/lister/debian/tasks.py swh/lister/debian/tests/__init__.py swh/lister/debian/tests/test_lister.py swh/lister/debian/tests/test_tasks.py swh/lister/debian/tests/data/Sources_bullseye swh/lister/debian/tests/data/Sources_buster swh/lister/debian/tests/data/Sources_stretch swh/lister/gitea/__init__.py swh/lister/gitea/lister.py swh/lister/gitea/tasks.py swh/lister/gitea/tests/__init__.py swh/lister/gitea/tests/test_lister.py swh/lister/gitea/tests/test_tasks.py swh/lister/gitea/tests/data/https_try.gitea.io/repos_page1 swh/lister/gitea/tests/data/https_try.gitea.io/repos_page2 swh/lister/github/__init__.py swh/lister/github/lister.py swh/lister/github/tasks.py swh/lister/github/utils.py swh/lister/github/tests/__init__.py swh/lister/github/tests/test_lister.py swh/lister/github/tests/test_tasks.py swh/lister/gitlab/__init__.py swh/lister/gitlab/lister.py swh/lister/gitlab/tasks.py swh/lister/gitlab/tests/__init__.py swh/lister/gitlab/tests/test_lister.py swh/lister/gitlab/tests/test_tasks.py swh/lister/gitlab/tests/data/https_foss.heptapod.net/api_response_page1.json swh/lister/gitlab/tests/data/https_gite.lirmm.fr/api_response_page1.json swh/lister/gitlab/tests/data/https_gite.lirmm.fr/api_response_page2.json swh/lister/gitlab/tests/data/https_gite.lirmm.fr/api_response_page3.json swh/lister/gitlab/tests/data/https_gitlab.com/api_response_page1.json swh/lister/gnu/__init__.py swh/lister/gnu/lister.py swh/lister/gnu/tasks.py swh/lister/gnu/tree.py swh/lister/gnu/tests/__init__.py swh/lister/gnu/tests/test_lister.py swh/lister/gnu/tests/test_tasks.py swh/lister/gnu/tests/test_tree.py swh/lister/gnu/tests/data/tree.json swh/lister/gnu/tests/data/tree.min.json swh/lister/gnu/tests/data/https_ftp.gnu.org/tree.json.gz swh/lister/gogs/__init__.py swh/lister/gogs/lister.py swh/lister/gogs/tasks.py swh/lister/gogs/tests/__init__.py swh/lister/gogs/tests/test_lister.py swh/lister/gogs/tests/test_tasks.py swh/lister/gogs/tests/data/https_try.gogs.io/repos_page1 swh/lister/gogs/tests/data/https_try.gogs.io/repos_page2 swh/lister/gogs/tests/data/https_try.gogs.io/repos_page3 swh/lister/gogs/tests/data/https_try.gogs.io/repos_page4 swh/lister/golang/__init__.py swh/lister/golang/lister.py swh/lister/golang/tasks.py swh/lister/golang/tests/__init__.py swh/lister/golang/tests/test_lister.py swh/lister/golang/tests/test_tasks.py swh/lister/golang/tests/data/page-1.txt swh/lister/golang/tests/data/page-2.txt swh/lister/golang/tests/data/page-3.txt swh/lister/launchpad/__init__.py swh/lister/launchpad/lister.py swh/lister/launchpad/tasks.py swh/lister/launchpad/tests/__init__.py swh/lister/launchpad/tests/conftest.py swh/lister/launchpad/tests/test_lister.py swh/lister/launchpad/tests/test_tasks.py swh/lister/launchpad/tests/data/launchpad_bzr_response.json swh/lister/launchpad/tests/data/launchpad_response1.json swh/lister/launchpad/tests/data/launchpad_response2.json swh/lister/maven/README.md swh/lister/maven/__init__.py swh/lister/maven/lister.py swh/lister/maven/tasks.py swh/lister/maven/tests/__init__.py swh/lister/maven/tests/test_lister.py swh/lister/maven/tests/test_tasks.py swh/lister/maven/tests/data/http_indexes/export_full.fld swh/lister/maven/tests/data/http_indexes/export_incr_first.fld swh/lister/maven/tests/data/http_indexes/export_null_mtime.fld swh/lister/maven/tests/data/https_maven.org/arangodb-graphql-1.2.pom swh/lister/maven/tests/data/https_maven.org/citrus-parent-3.0.7.pom swh/lister/maven/tests/data/https_maven.org/sprova4j-0.1.0.malformed.pom swh/lister/maven/tests/data/https_maven.org/sprova4j-0.1.0.pom swh/lister/maven/tests/data/https_maven.org/sprova4j-0.1.1.pom swh/lister/npm/__init__.py swh/lister/npm/lister.py swh/lister/npm/tasks.py swh/lister/npm/tests/test_lister.py swh/lister/npm/tests/test_tasks.py swh/lister/npm/tests/data/npm_full_page1.json swh/lister/npm/tests/data/npm_full_page2.json swh/lister/npm/tests/data/npm_incremental_page1.json swh/lister/npm/tests/data/npm_incremental_page2.json swh/lister/opam/__init__.py swh/lister/opam/lister.py swh/lister/opam/tasks.py swh/lister/opam/tests/__init__.py swh/lister/opam/tests/test_lister.py swh/lister/opam/tests/test_tasks.py swh/lister/opam/tests/data/fake_opam_repo/repo swh/lister/opam/tests/data/fake_opam_repo/version swh/lister/opam/tests/data/fake_opam_repo/packages/agrid/agrid.0.1/opam swh/lister/opam/tests/data/fake_opam_repo/packages/calculon/calculon.0.1/opam swh/lister/opam/tests/data/fake_opam_repo/packages/calculon/calculon.0.2/opam swh/lister/opam/tests/data/fake_opam_repo/packages/calculon/calculon.0.3/opam swh/lister/opam/tests/data/fake_opam_repo/packages/calculon/calculon.0.4/opam swh/lister/opam/tests/data/fake_opam_repo/packages/calculon/calculon.0.5/opam swh/lister/opam/tests/data/fake_opam_repo/packages/calculon/calculon.0.6/opam swh/lister/opam/tests/data/fake_opam_repo/packages/directories/directories.0.1/opam swh/lister/opam/tests/data/fake_opam_repo/packages/directories/directories.0.2/opam swh/lister/opam/tests/data/fake_opam_repo/packages/directories/directories.0.3/opam swh/lister/opam/tests/data/fake_opam_repo/packages/ocb/ocb.0.1/opam swh/lister/packagist/__init__.py swh/lister/packagist/lister.py swh/lister/packagist/tasks.py swh/lister/packagist/tests/__init__.py swh/lister/packagist/tests/test_lister.py swh/lister/packagist/tests/test_tasks.py swh/lister/packagist/tests/data/den1n_contextmenu.json swh/lister/packagist/tests/data/ljjackson_linnworks.json swh/lister/packagist/tests/data/lky_wx_article.json swh/lister/packagist/tests/data/spryker-eco_computop-api.json swh/lister/phabricator/__init__.py swh/lister/phabricator/lister.py swh/lister/phabricator/tasks.py swh/lister/phabricator/tests/__init__.py swh/lister/phabricator/tests/test_lister.py swh/lister/phabricator/tests/test_tasks.py swh/lister/phabricator/tests/data/__init__.py swh/lister/phabricator/tests/data/phabricator_api_repositories_page1.json swh/lister/phabricator/tests/data/phabricator_api_repositories_page2.json swh/lister/pubdev/__init__.py swh/lister/pubdev/lister.py swh/lister/pubdev/tasks.py swh/lister/pubdev/tests/__init__.py swh/lister/pubdev/tests/test_lister.py swh/lister/pubdev/tests/test_tasks.py swh/lister/pubdev/tests/data/https_pub.dev/api_package-names swh/lister/pubdev/tests/data/https_pub.dev/api_packages_Autolinker swh/lister/pubdev/tests/data/https_pub.dev/api_packages_Babylon swh/lister/pypi/__init__.py swh/lister/pypi/lister.py swh/lister/pypi/tasks.py swh/lister/pypi/tests/__init__.py swh/lister/pypi/tests/test_lister.py swh/lister/pypi/tests/test_tasks.py swh/lister/sourceforge/__init__.py swh/lister/sourceforge/lister.py swh/lister/sourceforge/tasks.py swh/lister/sourceforge/tests/__init__.py swh/lister/sourceforge/tests/test_lister.py swh/lister/sourceforge/tests/test_tasks.py swh/lister/sourceforge/tests/data/aaron.html swh/lister/sourceforge/tests/data/aaron.json swh/lister/sourceforge/tests/data/adobexmp.json swh/lister/sourceforge/tests/data/backapps-website.json swh/lister/sourceforge/tests/data/backapps.json swh/lister/sourceforge/tests/data/main-sitemap.xml swh/lister/sourceforge/tests/data/mojunk.json swh/lister/sourceforge/tests/data/mramm.json swh/lister/sourceforge/tests/data/ocaml-lpd.html swh/lister/sourceforge/tests/data/ocaml-lpd.json swh/lister/sourceforge/tests/data/os3dmodels.json swh/lister/sourceforge/tests/data/random-mercurial.json swh/lister/sourceforge/tests/data/subsitemap-0.xml swh/lister/sourceforge/tests/data/subsitemap-1.xml swh/lister/sourceforge/tests/data/t12eksandbox.html swh/lister/sourceforge/tests/data/t12eksandbox.json swh/lister/tests/__init__.py swh/lister/tests/test_cli.py swh/lister/tests/test_pattern.py swh/lister/tests/test_utils.py swh/lister/tuleap/__init__.py swh/lister/tuleap/lister.py swh/lister/tuleap/tasks.py swh/lister/tuleap/tests/__init__.py swh/lister/tuleap/tests/test_lister.py swh/lister/tuleap/tests/test_tasks.py swh/lister/tuleap/tests/data/https_tuleap.net/projects swh/lister/tuleap/tests/data/https_tuleap.net/repo_1 swh/lister/tuleap/tests/data/https_tuleap.net/repo_2 swh/lister/tuleap/tests/data/https_tuleap.net/repo_3 \ No newline at end of file diff --git a/swh/lister/cgit/lister.py b/swh/lister/cgit/lister.py index c0d9113..5ca9445 100644 --- a/swh/lister/cgit/lister.py +++ b/swh/lister/cgit/lister.py @@ -1,217 +1,234 @@ # Copyright (C) 2019-2021 The Software Heritage developers # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information from datetime import datetime, timezone import logging import re from typing import Any, Dict, Iterator, List, Optional from urllib.parse import urljoin, urlparse from bs4 import BeautifulSoup import requests from requests.exceptions import HTTPError from tenacity.before_sleep import before_sleep_log from swh.lister import USER_AGENT from swh.lister.pattern import CredentialsType, StatelessLister from swh.lister.utils import throttling_retry from swh.scheduler.interface import SchedulerInterface from swh.scheduler.model import ListedOrigin logger = logging.getLogger(__name__) Repositories = List[Dict[str, Any]] class CGitLister(StatelessLister[Repositories]): """Lister class for CGit repositories. This lister will retrieve the list of published git repositories by parsing the HTML page(s) of the index retrieved at `url`. The lister currently defines 2 listing behaviors: - If the `base_git_url` is provided, the listed origin urls are computed out of the base git url link and the one listed in the main listed page (resulting in less HTTP queries than the 2nd behavior below). This is expected to be the main deployed behavior. - Otherwise (with no `base_git_url`), for each found git repository listed, one extra HTTP query is made at the given url found in the main listing page to gather published "Clone" URLs to be used as origin URL for that git repo. If several "Clone" urls are provided, prefer the http/https one, if any, otherwise fallback to the first one. """ LISTER_NAME = "cgit" def __init__( self, scheduler: SchedulerInterface, url: str, instance: Optional[str] = None, credentials: Optional[CredentialsType] = None, base_git_url: Optional[str] = None, ): """Lister class for CGit repositories. Args: url: main URL of the CGit instance, i.e. url of the index of published git repositories on this instance. instance: Name of cgit instance. Defaults to url's network location if unset. base_git_url: Optional base git url which allows the origin url computations. """ super().__init__( scheduler=scheduler, url=url, instance=instance, credentials=credentials, ) self.session = requests.Session() self.session.headers.update( {"Accept": "application/html", "User-Agent": USER_AGENT} ) self.base_git_url = base_git_url @throttling_retry(before_sleep=before_sleep_log(logger, logging.DEBUG)) def _get_and_parse(self, url: str) -> BeautifulSoup: """Get the given url and parse the retrieved HTML using BeautifulSoup""" response = self.session.get(url) response.raise_for_status() return BeautifulSoup(response.text, features="html.parser") def get_pages(self) -> Iterator[Repositories]: """Generate git 'project' URLs found on the current CGit server The last_update date is retrieved on the list of repo page to avoid to compute it on the repository details which only give a date per branch """ next_page: Optional[str] = self.url while next_page: bs_idx = self._get_and_parse(next_page) page_results = [] for tr in bs_idx.find("div", {"class": "content"}).find_all( "tr", {"class": ""} ): repository_link = tr.find("a")["href"] repo_url = None git_url = None base_url = urljoin(self.url, repository_link).strip("/") if self.base_git_url: # mapping provided # computing git url git_url = base_url.replace(self.url, self.base_git_url) else: # we compute the git detailed page url from which we will retrieve # the git url (cf. self.get_origins_from_page) repo_url = base_url span = tr.find("span", {"class": re.compile("age-")}) last_updated_date = span.get("title") if span else None page_results.append( { "url": repo_url, "git_url": git_url, "last_updated_date": last_updated_date, } ) yield page_results try: pager = bs_idx.find("ul", {"class": "pager"}) current_page = pager.find("a", {"class": "current"}) if current_page: next_page = current_page.parent.next_sibling.a["href"] next_page = urljoin(self.url, next_page) except (AttributeError, KeyError): # no pager, or no next page next_page = None def get_origins_from_page( self, repositories: Repositories ) -> Iterator[ListedOrigin]: """Convert a page of cgit repositories into a list of ListedOrigins.""" assert self.lister_obj.id is not None for repo in repositories: origin_url = repo["git_url"] or self._get_origin_from_repository_url( repo["url"] ) if origin_url is None: continue yield ListedOrigin( lister_id=self.lister_obj.id, url=origin_url, visit_type="git", last_update=_parse_last_updated_date(repo), ) def _get_origin_from_repository_url(self, repository_url: str) -> Optional[str]: """Extract the git url from the repository page""" try: bs = self._get_and_parse(repository_url) except HTTPError as e: logger.warning( "Unexpected HTTP status code %s on %s", e.response.status_code, e.response.url, ) return None + # check if we are on the summary tab, if not, go to this tab + tab = bs.find("table", {"class": "tabs"}) + if tab: + summary_a = tab.find("a", string="summary") + if summary_a: + summary_url = urljoin(repository_url, summary_a["href"]).strip("/") + + if summary_url != repository_url: + logger.debug( + "%s : Active tab is not the summary, trying to load the summary page", + repository_url, + ) + return self._get_origin_from_repository_url(summary_url) + else: + logger.debug("No summary tab found on %s", repository_url) + # origin urls are listed on the repository page # TODO check if forcing https is better or not ? # # # urls = [x["href"] for x in bs.find_all("a", {"rel": "vcs-git"})] if not urls: + logger.debug("No git urls found on %s", repository_url) return None # look for the http/https url, if any, and use it as origin_url for url in urls: if urlparse(url).scheme in ("http", "https"): origin_url = url break else: # otherwise, choose the first one origin_url = urls[0] return origin_url def _parse_last_updated_date(repository: Dict[str, Any]) -> Optional[datetime]: """Parse the last updated date""" date = repository.get("last_updated_date") if not date: return None parsed_date = None for date_format in ("%Y-%m-%d %H:%M:%S %z", "%Y-%m-%d %H:%M:%S (%Z)"): try: parsed_date = datetime.strptime(date, date_format) # force UTC to avoid naive datetime if not parsed_date.tzinfo: parsed_date = parsed_date.replace(tzinfo=timezone.utc) break except Exception: pass if not parsed_date: logger.warning( "Could not parse %s last_updated date: %s", repository["url"], date, ) return parsed_date diff --git a/swh/lister/cgit/tests/data/https_git.acdw.net/Readme.md b/swh/lister/cgit/tests/data/https_git.acdw.net/Readme.md new file mode 100644 index 0000000..0b02a73 --- /dev/null +++ b/swh/lister/cgit/tests/data/https_git.acdw.net/Readme.md @@ -0,0 +1 @@ +These files are a partial dump of http://git.savannah.gnu.org/cgit diff --git a/swh/lister/cgit/tests/data/https_git.acdw.net/cgit b/swh/lister/cgit/tests/data/https_git.acdw.net/cgit new file mode 100644 index 0000000..a3da859 --- /dev/null +++ b/swh/lister/cgit/tests/data/https_git.acdw.net/cgit @@ -0,0 +1,40 @@ + + + +friendware by acdw + + + + + +
+ + + + +
+index
+ + +
+ + +
+ + diff --git a/swh/lister/cgit/tests/data/https_git.acdw.net/foo b/swh/lister/cgit/tests/data/https_git.acdw.net/foo new file mode 100644 index 0000000..c6560a4 --- /dev/null +++ b/swh/lister/cgit/tests/data/https_git.acdw.net/foo @@ -0,0 +1,33 @@ + + + + + + + + + +
+ + + + +
+index
+ + +
+
No repositories found
+
+ +
+ + diff --git a/swh/lister/cgit/tests/data/https_git.acdw.net/foo_summary b/swh/lister/cgit/tests/data/https_git.acdw.net/foo_summary new file mode 100644 index 0000000..c6560a4 --- /dev/null +++ b/swh/lister/cgit/tests/data/https_git.acdw.net/foo_summary @@ -0,0 +1,33 @@ + + + + + + + + + +
+ + + + +
+index
+ + +
+
No repositories found
+
+ +
+ + diff --git a/swh/lister/cgit/tests/data/https_git.acdw.net/sfeed b/swh/lister/cgit/tests/data/https_git.acdw.net/sfeed new file mode 100644 index 0000000..d0d01ad --- /dev/null +++ b/swh/lister/cgit/tests/data/https_git.acdw.net/sfeed @@ -0,0 +1,49 @@ + + + +sfeed - My sfeed scripts + + + + + + + + +
+ + + + +
+about summary refs log tree commit diff stats
+ + + +
+
+

sfeed

+

Turns out, sfeed is cool! You can see what this repo generates at https://acdw.casa/planet/.

+
+ +
+ + diff --git a/swh/lister/cgit/tests/data/https_git.acdw.net/sfeed_summary b/swh/lister/cgit/tests/data/https_git.acdw.net/sfeed_summary new file mode 100644 index 0000000..b71e1cf --- /dev/null +++ b/swh/lister/cgit/tests/data/https_git.acdw.net/sfeed_summary @@ -0,0 +1,63 @@ + + + +sfeed - My sfeed scripts + + + + + + + + +
+ + + + +
+about summary refs log tree commit diff stats
+ + + +
+
+
+ + + + + + + + + + + + + + + +
BranchCommit messageAuthorAge
mainAdd APODCase Duckworth38 min.
 
 
AgeCommit messageAuthor
38 min.Add APOD HEAD mainCase Duckworth
4 daysChange fresh item colorsCase Duckworth
4 daysIndentationCase Duckworth
5 daysAdd Tab CompletionCase Duckworth
5 daysAdd Lonnie JohnsonCase Duckworth
7 daysAdd miniature calendar; metafilterCase Duckworth
9 daysAdd active listeningCase Duckworth
10 daysAdd tilde.town blogCase Duckworth
12 daysAdd zsergeCase Duckworth
12 daysRemove duplicateCase Duckworth
[...]
 
Clone
https://git.acdw.net/sfeed
+ +
+ + diff --git a/swh/lister/cgit/tests/test_lister.py b/swh/lister/cgit/tests/test_lister.py index f996333..9b5c0c3 100644 --- a/swh/lister/cgit/tests/test_lister.py +++ b/swh/lister/cgit/tests/test_lister.py @@ -1,267 +1,280 @@ # Copyright (C) 2019-2021 The Software Heritage developers # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information from datetime import datetime, timedelta, timezone import os from typing import List import pytest from swh.core.pytest_plugin import requests_mock_datadir_factory from swh.lister import __version__ from swh.lister.cgit.lister import CGitLister, _parse_last_updated_date from swh.lister.pattern import ListerStats def test_lister_cgit_get_pages_one_page(requests_mock_datadir, swh_scheduler): url = "https://git.savannah.gnu.org/cgit/" lister_cgit = CGitLister(swh_scheduler, url=url) repos: List[List[str]] = list(lister_cgit.get_pages()) flattened_repos = sum(repos, []) assert len(flattened_repos) == 977 assert flattened_repos[0]["url"] == "https://git.savannah.gnu.org/cgit/elisp-es.git" # note the url below is NOT a subpath of /cgit/ assert ( flattened_repos[-1]["url"] == "https://git.savannah.gnu.org/path/to/yetris.git" ) # noqa # note the url below is NOT on the same server assert flattened_repos[-2]["url"] == "http://example.org/cgit/xstarcastle.git" def test_lister_cgit_get_pages_with_pages(requests_mock_datadir, swh_scheduler): url = "https://git.tizen/cgit/" lister_cgit = CGitLister(swh_scheduler, url=url) repos: List[List[str]] = list(lister_cgit.get_pages()) flattened_repos = sum(repos, []) # we should have 16 repos (listed on 3 pages) assert len(repos) == 3 assert len(flattened_repos) == 16 def test_lister_cgit_run_with_page(requests_mock_datadir, swh_scheduler): """cgit lister supports pagination""" url = "https://git.tizen/cgit/" lister_cgit = CGitLister(swh_scheduler, url=url) stats = lister_cgit.run() expected_nb_origins = 16 assert stats == ListerStats(pages=3, origins=expected_nb_origins) # test page parsing scheduler_origins = swh_scheduler.get_listed_origins( lister_cgit.lister_obj.id ).results assert len(scheduler_origins) == expected_nb_origins # test listed repositories for listed_origin in scheduler_origins: assert listed_origin.visit_type == "git" assert listed_origin.url.startswith("https://git.tizen") # test user agent content assert len(requests_mock_datadir.request_history) != 0 for request in requests_mock_datadir.request_history: assert "User-Agent" in request.headers user_agent = request.headers["User-Agent"] assert "Software Heritage Lister" in user_agent assert __version__ in user_agent def test_lister_cgit_run_populates_last_update(requests_mock_datadir, swh_scheduler): """cgit lister returns last updated date""" url = "https://git.tizen/cgit" urls_without_date = [ f"https://git.tizen.org/cgit/{suffix_url}" for suffix_url in [ "All-Projects", "All-Users", "Lock-Projects", ] ] lister_cgit = CGitLister(swh_scheduler, url=url) stats = lister_cgit.run() expected_nb_origins = 16 assert stats == ListerStats(pages=3, origins=expected_nb_origins) # test page parsing scheduler_origins = swh_scheduler.get_listed_origins( lister_cgit.lister_obj.id ).results assert len(scheduler_origins) == expected_nb_origins # test listed repositories for listed_origin in scheduler_origins: if listed_origin.url in urls_without_date: assert listed_origin.last_update is None else: assert listed_origin.last_update is not None @pytest.mark.parametrize( "date_str,expected_date", [ ({}, None), ("unexpected date", None), ("2020-0140-10 10:10:10 (GMT)", None), ( "2020-01-10 10:10:10 (GMT)", datetime( year=2020, month=1, day=10, hour=10, minute=10, second=10, tzinfo=timezone.utc, ), ), ( "2019-08-04 05:10:41 +0100", datetime( year=2019, month=8, day=4, hour=5, minute=10, second=41, tzinfo=timezone(timedelta(hours=1)), ), ), ], ) def test_lister_cgit_date_parsing(date_str, expected_date): """test cgit lister date parsing""" repository = {"url": "url", "last_updated_date": date_str} assert _parse_last_updated_date(repository) == expected_date requests_mock_datadir_missing_url = requests_mock_datadir_factory( ignore_urls=[ "https://git.tizen/cgit/adaptation/ap_samsung/audio-hal-e4x12", ] ) def test_lister_cgit_get_origin_from_repo_failing( requests_mock_datadir_missing_url, swh_scheduler ): url = "https://git.tizen/cgit/" lister_cgit = CGitLister(swh_scheduler, url=url) stats = lister_cgit.run() expected_nb_origins = 15 assert stats == ListerStats(pages=3, origins=expected_nb_origins) @pytest.mark.parametrize( "credentials, expected_credentials", [ (None, []), ({"key": "value"}, []), ( {"cgit": {"tizen": [{"username": "user", "password": "pass"}]}}, [{"username": "user", "password": "pass"}], ), ], ) def test_lister_cgit_instantiation_with_credentials( credentials, expected_credentials, swh_scheduler ): url = "https://git.tizen/cgit/" lister = CGitLister( swh_scheduler, url=url, instance="tizen", credentials=credentials ) # Credentials are allowed in constructor assert lister.credentials == expected_credentials def test_lister_cgit_from_configfile(swh_scheduler_config, mocker): load_from_envvar = mocker.patch("swh.lister.pattern.load_from_envvar") load_from_envvar.return_value = { "scheduler": {"cls": "local", **swh_scheduler_config}, "url": "https://git.tizen/cgit/", "instance": "tizen", "credentials": {}, } lister = CGitLister.from_configfile() assert lister.scheduler is not None assert lister.credentials is not None @pytest.mark.parametrize( "url,base_git_url,expected_nb_origins", [ ("https://git.eclipse.org/c", "https://eclipse.org/r", 5), ("https://git.baserock.org/cgit/", "https://git.baserock.org/git/", 3), ("https://jff.email/cgit/", "git://jff.email/opt/git/", 6), ], ) def test_lister_cgit_with_base_git_url( url, base_git_url, expected_nb_origins, requests_mock_datadir, swh_scheduler ): """With base git url provided, listed urls should be the computed origin urls""" lister_cgit = CGitLister( swh_scheduler, url=url, base_git_url=base_git_url, ) stats = lister_cgit.run() assert stats == ListerStats(pages=1, origins=expected_nb_origins) # test page parsing scheduler_origins = swh_scheduler.get_listed_origins( lister_cgit.lister_obj.id ).results assert len(scheduler_origins) == expected_nb_origins # test listed repositories for listed_origin in scheduler_origins: assert listed_origin.visit_type == "git" assert listed_origin.url.startswith(base_git_url) assert ( listed_origin.url.startswith(url) is False ), f"url should be mapped to {base_git_url}" def test_lister_cgit_get_pages_with_pages_and_retry( requests_mock_datadir, requests_mock, datadir, mocker, swh_scheduler ): url = "https://git.tizen/cgit/" with open(os.path.join(datadir, "https_git.tizen/cgit,ofs=50"), "rb") as page: requests_mock.get( f"{url}?ofs=50", [ {"content": None, "status_code": 429}, {"content": None, "status_code": 429}, {"content": page.read(), "status_code": 200}, ], ) lister_cgit = CGitLister(swh_scheduler, url=url) mocker.patch.object(lister_cgit._get_and_parse.retry, "sleep") repos: List[List[str]] = list(lister_cgit.get_pages()) flattened_repos = sum(repos, []) # we should have 16 repos (listed on 3 pages) assert len(repos) == 3 assert len(flattened_repos) == 16 + + +def test_lister_cgit_summary_not_default(requests_mock_datadir, swh_scheduler): + """cgit lister returns git url when the default repository tab is not the summary""" + + url = "https://git.acdw.net/cgit" + + lister_cgit = CGitLister(swh_scheduler, url=url) + + stats = lister_cgit.run() + + expected_nb_origins = 1 + assert stats == ListerStats(pages=1, origins=expected_nb_origins) diff --git a/swh/lister/gitea/tests/test_lister.py b/swh/lister/gitea/tests/test_lister.py index 90ec624..8e3242b 100644 --- a/swh/lister/gitea/tests/test_lister.py +++ b/swh/lister/gitea/tests/test_lister.py @@ -1,153 +1,176 @@ # Copyright (C) 2017-2020 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information import json from pathlib import Path from typing import Dict, List, Tuple import pytest import requests +from requests import HTTPError from swh.lister.gitea.lister import GiteaLister from swh.lister.gogs.lister import GogsListerPage from swh.scheduler.model import ListedOrigin TRYGITEA_URL = "https://try.gitea.io/api/v1/" TRYGITEA_P1_URL = TRYGITEA_URL + "repos/search?limit=3&page=1" TRYGITEA_P2_URL = TRYGITEA_URL + "repos/search?limit=3&page=2" @pytest.fixture def trygitea_p1(datadir) -> Tuple[str, Dict[str, str], GogsListerPage, List[str]]: text = Path(datadir, "https_try.gitea.io", "repos_page1").read_text() headers = { "Link": '<{p2}>; rel="next",<{p2}>; rel="last"'.format(p2=TRYGITEA_P2_URL) } page_data = json.loads(text) page_result = GogsListerPage( repos=GiteaLister.extract_repos(page_data), next_link=TRYGITEA_P2_URL ) origin_urls = [r["clone_url"] for r in page_data["data"]] return text, headers, page_result, origin_urls @pytest.fixture def trygitea_p2(datadir) -> Tuple[str, Dict[str, str], GogsListerPage, List[str]]: text = Path(datadir, "https_try.gitea.io", "repos_page2").read_text() headers = { "Link": '<{p1}>; rel="prev",<{p1}>; rel="first"'.format(p1=TRYGITEA_P1_URL) } page_data = json.loads(text) page_result = GogsListerPage( repos=GiteaLister.extract_repos(page_data), next_link=None ) origin_urls = [r["clone_url"] for r in page_data["data"]] return text, headers, page_result, origin_urls def check_listed_origins(lister_urls: List[str], scheduler_origins: List[ListedOrigin]): """Asserts that the two collections have the same origin URLs. Does not test last_update.""" assert set(lister_urls) == {origin.url for origin in scheduler_origins} def test_gitea_full_listing( swh_scheduler, requests_mock, mocker, trygitea_p1, trygitea_p2 ): """Covers full listing of multiple pages, rate-limit, page size (required for test), checking page results and listed origins, statelessness.""" kwargs = dict(url=TRYGITEA_URL, instance="try_gitea", page_size=3) lister = GiteaLister(scheduler=swh_scheduler, **kwargs) lister.get_origins_from_page = mocker.spy(lister, "get_origins_from_page") p1_text, p1_headers, p1_result, p1_origin_urls = trygitea_p1 p2_text, p2_headers, p2_result, p2_origin_urls = trygitea_p2 requests_mock.get(TRYGITEA_P1_URL, text=p1_text, headers=p1_headers) requests_mock.get( TRYGITEA_P2_URL, [ {"status_code": requests.codes.too_many_requests}, {"text": p2_text, "headers": p2_headers}, ], ) # end test setup stats = lister.run() # start test checks assert stats.pages == 2 assert stats.origins == 6 calls = [mocker.call(p1_result), mocker.call(p2_result)] lister.get_origins_from_page.assert_has_calls(calls) scheduler_origins = swh_scheduler.get_listed_origins(lister.lister_obj.id).results check_listed_origins(p1_origin_urls + p2_origin_urls, scheduler_origins) lister_state = lister.get_state_from_scheduler() assert lister_state.last_seen_next_link == TRYGITEA_P2_URL assert lister_state.last_seen_repo_id == p2_result.repos[-1]["id"] def test_gitea_auth_instance(swh_scheduler, requests_mock, trygitea_p1): """Covers token authentication, token from credentials, instance inference from URL.""" api_token = "teapot" instance = "try.gitea.io" creds = {"gitea": {instance: [{"username": "u", "password": api_token}]}} kwargs1 = dict(url=TRYGITEA_URL, api_token=api_token) lister = GiteaLister(scheduler=swh_scheduler, **kwargs1) # test API token assert "Authorization" in lister.session.headers assert lister.session.headers["Authorization"].lower() == "token %s" % api_token kwargs2 = dict(url=TRYGITEA_URL, credentials=creds) lister = GiteaLister(scheduler=swh_scheduler, **kwargs2) # test API token from credentials assert "Authorization" in lister.session.headers assert lister.session.headers["Authorization"].lower() == "token %s" % api_token # test instance inference from URL assert lister.instance assert "gitea" in lister.instance # infer something related to that # setup requests mocking p1_text, p1_headers, _, _ = trygitea_p1 p1_headers["Link"] = p1_headers["Link"].replace("next", "") # only 1 page base_url = TRYGITEA_URL + lister.REPO_LIST_PATH requests_mock.get(base_url, text=p1_text, headers=p1_headers) # now check the lister runs without error stats = lister.run() assert stats.pages == 1 @pytest.mark.parametrize("http_code", [400, 500, 502]) -def test_gitea_list_http_error(swh_scheduler, requests_mock, http_code): +def test_gitea_list_http_error( + swh_scheduler, requests_mock, http_code, trygitea_p1, trygitea_p2 +): """Test handling of some HTTP errors commonly encountered""" lister = GiteaLister(scheduler=swh_scheduler, url=TRYGITEA_URL, page_size=3) + p1_text, p1_headers, _, p1_origin_urls = trygitea_p1 + p3_text, p3_headers, _, p3_origin_urls = trygitea_p2 + base_url = TRYGITEA_URL + lister.REPO_LIST_PATH - requests_mock.get(base_url, status_code=http_code) + requests_mock.get( + base_url, + [ + {"text": p1_text, "headers": p1_headers, "status_code": 200}, + {"status_code": http_code}, + {"text": p3_text, "headers": p3_headers, "status_code": 200}, + ], + ) - with pytest.raises(requests.HTTPError): + # pages with fatal repositories should be skipped (no error raised) + # See T4423 for more details + if http_code == 500: lister.run() + else: + with pytest.raises(HTTPError): + lister.run() + # Both P1 and P3 origins should be listed in case of 500 error + # While in other cases, only P1 origins should be listed scheduler_origins = swh_scheduler.get_listed_origins(lister.lister_obj.id).results - assert len(scheduler_origins) == 0 + check_listed_origins( + (p1_origin_urls + p3_origin_urls) if http_code == 500 else p1_origin_urls, + scheduler_origins, + ) diff --git a/swh/lister/gogs/lister.py b/swh/lister/gogs/lister.py index 8c5a72d..16d9626 100644 --- a/swh/lister/gogs/lister.py +++ b/swh/lister/gogs/lister.py @@ -1,207 +1,220 @@ # Copyright (C) 2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information - from dataclasses import asdict, dataclass import logging import random -from typing import Any, Dict, Iterator, List, Optional -from urllib.parse import parse_qs, urljoin, urlparse +from typing import Any, Dict, Iterator, List, Optional, Tuple +from urllib.parse import parse_qs, parse_qsl, urlencode, urljoin, urlparse import iso8601 import requests from tenacity.before_sleep import before_sleep_log from swh.lister.utils import throttling_retry from swh.scheduler.interface import SchedulerInterface from swh.scheduler.model import ListedOrigin from .. import USER_AGENT from ..pattern import CredentialsType, Lister logger = logging.getLogger(__name__) Repo = Dict[str, Any] @dataclass class GogsListerPage: repos: Optional[List[Repo]] = None next_link: Optional[str] = None @dataclass class GogsListerState: last_seen_next_link: Optional[str] = None """Last link header (could be already visited) during an incremental pass.""" last_seen_repo_id: Optional[int] = None """Last repo id seen during an incremental pass.""" def _parse_page_id(url: Optional[str]) -> int: """Parse the page id from a Gogs page url.""" if url is None: return 0 return int(parse_qs(urlparse(url).query)["page"][0]) class GogsLister(Lister[GogsListerState, GogsListerPage]): """List origins from the Gogs Gogs API documentation: https://github.com/gogs/docs-api The API is protected behind authentication so credentials/API tokens are mandatory. It supports pagination and provides next page URL through the 'next' value of the 'Link' header. The default value for page size ('limit') is 10 but the maximum allowed value is 50. """ LISTER_NAME = "gogs" VISIT_TYPE = "git" REPO_LIST_PATH = "repos/search" def __init__( self, scheduler: SchedulerInterface, url: str, instance: Optional[str] = None, api_token: Optional[str] = None, page_size: int = 50, credentials: CredentialsType = None, ): super().__init__( scheduler=scheduler, credentials=credentials, url=url, instance=instance, ) self.query_params = { "limit": page_size, } self.api_token = api_token if self.api_token is None: if len(self.credentials) > 0: cred = random.choice(self.credentials) username = cred.get("username") self.api_token = cred["password"] logger.info("Using authentication credentials from user %s", username) else: # Raises an error on Gogs, or a warning on Gitea self.on_anonymous_mode() - self.max_page_limit = 2 - self.session = requests.Session() self.session.headers.update( { "Accept": "application/json", "User-Agent": USER_AGENT, } ) if self.api_token: self.session.headers["Authorization"] = f"token {self.api_token}" def on_anonymous_mode(self): raise ValueError("No credentials or API token provided") def state_from_dict(self, d: Dict[str, Any]) -> GogsListerState: return GogsListerState(**d) def state_to_dict(self, state: GogsListerState) -> Dict[str, Any]: return asdict(state) @throttling_retry(before_sleep=before_sleep_log(logger, logging.WARNING)) - def page_request(self, url, params) -> requests.Response: + def page_request( + self, url: str, params: Dict[str, Any] + ) -> Tuple[Dict[str, Any], Dict[str, Any]]: logger.debug("Fetching URL %s with params %s", url, params) response = self.session.get(url, params=params) if response.status_code != 200: logger.warning( "Unexpected HTTP status code %s on %s: %s", response.status_code, response.url, response.content, ) - response.raise_for_status() - - return response + if ( + response.status_code == 500 + ): # Temporary hack for skipping fatal repos (T4423) + url_parts = urlparse(url) + query: Dict[str, Any] = dict(parse_qsl(url_parts.query)) + query.update({"page": _parse_page_id(url) + 1}) + next_page_link = url_parts._replace(query=urlencode(query)).geturl() + body: Dict[str, Any] = {"data": []} + links = {"next": {"url": next_page_link}} + return body, links + else: + response.raise_for_status() + + return response.json(), response.links @classmethod def extract_repos(cls, body: Dict[str, Any]) -> List[Repo]: fields_filter = ["id", "clone_url", "updated_at"] return [{k: r[k] for k in fields_filter} for r in body["data"]] def get_pages(self) -> Iterator[GogsListerPage]: page_id = 1 if self.state.last_seen_next_link is not None: page_id = _parse_page_id(self.state.last_seen_next_link) # base with trailing slash, path without leading slash for urljoin next_link: Optional[str] = urljoin(self.url, self.REPO_LIST_PATH) - response = self.page_request(next_link, {**self.query_params, "page": page_id}) + + body, links = self.page_request( + next_link, {**self.query_params, "page": page_id} + ) while next_link is not None: - repos = self.extract_repos(response.json()) + repos = self.extract_repos(body) - assert len(response.links) > 0, "API changed: no Link header found" - if "next" in response.links: - next_link = response.links["next"]["url"] + assert len(links) > 0, "API changed: no Link header found" + if "next" in links: + next_link = links["next"]["url"] else: next_link = None # Happens for the last page yield GogsListerPage(repos=repos, next_link=next_link) if next_link is not None: - response = self.page_request(next_link, {}) + body, links = self.page_request(next_link, {}) def get_origins_from_page(self, page: GogsListerPage) -> Iterator[ListedOrigin]: """Convert a page of Gogs repositories into a list of ListedOrigins""" assert self.lister_obj.id is not None assert page.repos is not None for r in page.repos: last_update = iso8601.parse_date(r["updated_at"]) yield ListedOrigin( lister_id=self.lister_obj.id, visit_type=self.VISIT_TYPE, url=r["clone_url"], last_update=last_update, ) def commit_page(self, page: GogsListerPage) -> None: last_seen_next_link = page.next_link page_id = _parse_page_id(last_seen_next_link) state_page_id = _parse_page_id(self.state.last_seen_next_link) if page_id > state_page_id: self.state.last_seen_next_link = last_seen_next_link if (page.repos is not None) and len(page.repos) > 0: self.state.last_seen_repo_id = page.repos[-1]["id"] def finalize(self) -> None: scheduler_state = self.get_state_from_scheduler() state_page_id = _parse_page_id(self.state.last_seen_next_link) scheduler_page_id = _parse_page_id(scheduler_state.last_seen_next_link) state_last_repo_id = self.state.last_seen_repo_id or 0 scheduler_last_repo_id = scheduler_state.last_seen_repo_id or 0 if (state_page_id >= scheduler_page_id) and ( state_last_repo_id > scheduler_last_repo_id ): self.updated = True # Marked updated only if it finds new repos diff --git a/swh/lister/gogs/tests/test_lister.py b/swh/lister/gogs/tests/test_lister.py index 5c9b651..bcac533 100644 --- a/swh/lister/gogs/tests/test_lister.py +++ b/swh/lister/gogs/tests/test_lister.py @@ -1,322 +1,330 @@ # Copyright (C) 2022 The Software Heritage developers # See the AUTHORS file at the top-level directory of this distribution # License: GNU General Public License version 3, or any later version # See top-level LICENSE file for more information import json from pathlib import Path from typing import List from unittest.mock import Mock import pytest from requests import HTTPError from swh.lister.gogs.lister import GogsLister, GogsListerPage, _parse_page_id from swh.scheduler.model import ListedOrigin TRY_GOGS_URL = "https://try.gogs.io/api/v1/" def try_gogs_page(n: int): return TRY_GOGS_URL + GogsLister.REPO_LIST_PATH + f"?page={n}&limit=3" P1 = try_gogs_page(1) P2 = try_gogs_page(2) P3 = try_gogs_page(3) P4 = try_gogs_page(4) @pytest.fixture def trygogs_p1(datadir): text = Path(datadir, "https_try.gogs.io", "repos_page1").read_text() headers = {"Link": f'<{P2}>; rel="next"'} page_result = GogsListerPage( repos=GogsLister.extract_repos(json.loads(text)), next_link=P2 ) origin_urls = [r["clone_url"] for r in page_result.repos] return text, headers, page_result, origin_urls @pytest.fixture def trygogs_p2(datadir): text = Path(datadir, "https_try.gogs.io", "repos_page2").read_text() headers = {"Link": f'<{P3}>; rel="next",<{P1}>; rel="prev"'} page_result = GogsListerPage( repos=GogsLister.extract_repos(json.loads(text)), next_link=P3 ) origin_urls = [r["clone_url"] for r in page_result.repos] return text, headers, page_result, origin_urls @pytest.fixture def trygogs_p3(datadir): text = Path(datadir, "https_try.gogs.io", "repos_page3").read_text() headers = {"Link": f'<{P4}>; rel="next",<{P2}>; rel="prev"'} page_result = GogsListerPage( repos=GogsLister.extract_repos(json.loads(text)), next_link=P3 ) origin_urls = [r["clone_url"] for r in page_result.repos] return text, headers, page_result, origin_urls @pytest.fixture def trygogs_p4(datadir): text = Path(datadir, "https_try.gogs.io", "repos_page4").read_text() headers = {"Link": f'<{P3}>; rel="prev"'} page_result = GogsListerPage( repos=GogsLister.extract_repos(json.loads(text)), next_link=P3 ) origin_urls = [r["clone_url"] for r in page_result.repos] return text, headers, page_result, origin_urls @pytest.fixture def trygogs_p3_last(datadir): text = Path(datadir, "https_try.gogs.io", "repos_page3").read_text() headers = {"Link": f'<{P2}>; rel="prev",<{P1}>; rel="first"'} page_result = GogsListerPage( repos=GogsLister.extract_repos(json.loads(text)), next_link=None ) origin_urls = [r["clone_url"] for r in page_result.repos] return text, headers, page_result, origin_urls @pytest.fixture def trygogs_p3_empty(): origins_urls = [] body = {"data": [], "ok": True} headers = {"Link": f'<{P2}>; rel="prev",<{P1}>; rel="first"'} page_result = GogsListerPage(repos=GogsLister.extract_repos(body), next_link=None) text = json.dumps(body) return text, headers, page_result, origins_urls def check_listed_origins(lister_urls: List[str], scheduler_origins: List[ListedOrigin]): """Asserts that the two collections have the same origin URLs. Does not test last_update.""" assert set(lister_urls) == {origin.url for origin in scheduler_origins} def test_gogs_full_listing( swh_scheduler, requests_mock, mocker, trygogs_p1, trygogs_p2, trygogs_p3_last ): kwargs = dict( url=TRY_GOGS_URL, instance="try_gogs", page_size=3, api_token="secret" ) lister = GogsLister(scheduler=swh_scheduler, **kwargs) lister.get_origins_from_page: Mock = mocker.spy(lister, "get_origins_from_page") p1_text, p1_headers, p1_result, p1_origin_urls = trygogs_p1 p2_text, p2_headers, p2_result, p2_origin_urls = trygogs_p2 p3_text, p3_headers, p3_result, p3_origin_urls = trygogs_p3_last requests_mock.get(P1, text=p1_text, headers=p1_headers) requests_mock.get(P2, text=p2_text, headers=p2_headers) requests_mock.get(P3, text=p3_text, headers=p3_headers) stats = lister.run() assert stats.pages == 3 assert stats.origins == 9 calls = map(mocker.call, [p1_result, p2_result, p3_result]) lister.get_origins_from_page.assert_has_calls(list(calls)) scheduler_origins = swh_scheduler.get_listed_origins(lister.lister_obj.id).results check_listed_origins( p1_origin_urls + p2_origin_urls + p3_origin_urls, scheduler_origins ) assert ( lister.get_state_from_scheduler().last_seen_next_link == P3 ) # P3 didn't provide any next link so it remains the last_seen_next_link def test_gogs_auth_instance( swh_scheduler, requests_mock, trygogs_p1, trygogs_p2, trygogs_p3_empty ): """Covers token authentication, token from credentials, instance inference from URL.""" api_token = "secret" instance = "try_gogs" # Test lister initialization without api_token or credentials: with pytest.raises(ValueError, match="No credentials or API token provided"): kwargs1 = dict(url=TRY_GOGS_URL, instance=instance) GogsLister(scheduler=swh_scheduler, **kwargs1) # Test lister initialization using api_token: kwargs2 = dict(url=TRY_GOGS_URL, api_token=api_token, instance=instance) lister = GogsLister(scheduler=swh_scheduler, **kwargs2) assert lister.session.headers["Authorization"].lower() == "token %s" % api_token # Test lister initialization with credentials and run it: creds = {"gogs": {instance: [{"username": "u", "password": api_token}]}} kwargs3 = dict(url=TRY_GOGS_URL, credentials=creds, instance=instance, page_size=3) lister = GogsLister(scheduler=swh_scheduler, **kwargs3) assert lister.session.headers["Authorization"].lower() == "token %s" % api_token assert lister.instance == "try_gogs" # setup requests mocking p1_text, p1_headers, _, _ = trygogs_p1 p2_text, p2_headers, _, _ = trygogs_p2 p3_text, p3_headers, _, _ = trygogs_p3_empty requests_mock.get(P1, text=p1_text, headers=p1_headers) requests_mock.get(P2, text=p2_text, headers=p2_headers) requests_mock.get(P3, text=p3_text, headers=p3_headers) # lister should run without any error and extract the origins stats = lister.run() assert stats.pages == 3 assert stats.origins == 6 @pytest.mark.parametrize("http_code", [400, 500, 502]) def test_gogs_list_http_error( swh_scheduler, requests_mock, http_code, trygogs_p1, trygogs_p3_last ): """Test handling of some HTTP errors commonly encountered""" lister = GogsLister(scheduler=swh_scheduler, url=TRY_GOGS_URL, api_token="secret") p1_text, p1_headers, _, p1_origin_urls = trygogs_p1 - p3_text, p3_headers, _, _ = trygogs_p3_last + p3_text, p3_headers, _, p3_origin_urls = trygogs_p3_last base_url = TRY_GOGS_URL + lister.REPO_LIST_PATH requests_mock.get( base_url, [ {"text": p1_text, "headers": p1_headers, "status_code": 200}, {"status_code": http_code}, {"text": p3_text, "headers": p3_headers, "status_code": 200}, ], ) - with pytest.raises(HTTPError): + # pages with fatal repositories should be skipped (no error raised) + # See T4423 for more details + if http_code == 500: lister.run() + else: + with pytest.raises(HTTPError): + lister.run() + # Both P1 and P3 origins should be listed in case of 500 error + # While in other cases, only P1 origins should be listed scheduler_origins = swh_scheduler.get_listed_origins(lister.lister_obj.id).results check_listed_origins( - p1_origin_urls, scheduler_origins - ) # Only the first page is listed + (p1_origin_urls + p3_origin_urls) if http_code == 500 else p1_origin_urls, + scheduler_origins, + ) def test_gogs_incremental_lister( swh_scheduler, requests_mock, mocker, trygogs_p1, trygogs_p2, trygogs_p3, trygogs_p3_last, trygogs_p3_empty, trygogs_p4, ): kwargs = dict( url=TRY_GOGS_URL, instance="try_gogs", page_size=3, api_token="secret" ) lister = GogsLister(scheduler=swh_scheduler, **kwargs) lister.get_origins_from_page: Mock = mocker.spy(lister, "get_origins_from_page") # First listing attempt: P1 and P2 return 3 origins each # while P3 (current last page) is empty. p1_text, p1_headers, p1_result, p1_origin_urls = trygogs_p1 p2_text, p2_headers, p2_result, p2_origin_urls = trygogs_p2 p3_text, p3_headers, p3_result, p3_origin_urls = trygogs_p3_empty requests_mock.get(P1, text=p1_text, headers=p1_headers) requests_mock.get(P2, text=p2_text, headers=p2_headers) requests_mock.get(P3, text=p3_text, headers=p3_headers) attempt1_stats = lister.run() assert attempt1_stats.pages == 3 assert attempt1_stats.origins == 6 scheduler_origins = swh_scheduler.get_listed_origins(lister.lister_obj.id).results lister_state = lister.get_state_from_scheduler() assert lister_state.last_seen_next_link == P3 assert lister_state.last_seen_repo_id == p2_result.repos[-1]["id"] assert lister.updated check_listed_origins(p1_origin_urls + p2_origin_urls, scheduler_origins) lister.updated = False # Reset the flag # Second listing attempt: P3 isn't empty anymore. # The lister should restart from last state and hence revisit P3. p3_text, p3_headers, p3_result, p3_origin_urls = trygogs_p3_last requests_mock.get(P3, text=p3_text, headers=p3_headers) lister.session.get = mocker.spy(lister.session, "get") attempt2_stats = lister.run() assert attempt2_stats.pages == 1 assert attempt2_stats.origins == 3 scheduler_origins = swh_scheduler.get_listed_origins(lister.lister_obj.id).results page_id = _parse_page_id(lister_state.last_seen_next_link) query_params = lister.query_params query_params["page"] = page_id lister.session.get.assert_called_once_with( TRY_GOGS_URL + lister.REPO_LIST_PATH, params=query_params ) # All the 9 origins (3 pages) should be passed on to the scheduler: check_listed_origins( p1_origin_urls + p2_origin_urls + p3_origin_urls, scheduler_origins ) lister_state = lister.get_state_from_scheduler() assert lister_state.last_seen_next_link == P3 assert lister_state.last_seen_repo_id == p3_result.repos[-1]["id"] assert lister.updated lister.updated = False # Reset the flag # Third listing attempt: No new origins # The lister should revisit last seen page (P3) attempt3_stats = lister.run() assert attempt3_stats.pages == 1 assert attempt3_stats.origins == 3 lister_state = lister.get_state_from_scheduler() assert lister_state.last_seen_next_link == P3 assert lister_state.last_seen_repo_id == p3_result.repos[-1]["id"] assert lister.updated is False # No new origins so state isn't updated. # Fourth listing attempt: Page 4 is introduced and returns 3 new origins # The lister should revisit last seen page (P3) as well as P4. p3_text, p3_headers, p3_result, p3_origin_urls = trygogs_p3 # new P3 points to P4 p4_text, p4_headers, p4_result, p4_origin_urls = trygogs_p4 requests_mock.get(P3, text=p3_text, headers=p3_headers) requests_mock.get(P4, text=p4_text, headers=p4_headers) attempt4_stats = lister.run() assert attempt4_stats.pages == 2 assert attempt4_stats.origins == 6 lister_state = lister.get_state_from_scheduler() assert lister_state.last_seen_next_link == P4 assert lister_state.last_seen_repo_id == p4_result.repos[-1]["id"] assert lister.updated # All the 12 origins (4 pages) should be passed on to the scheduler: scheduler_origins = swh_scheduler.get_listed_origins(lister.lister_obj.id).results check_listed_origins( p1_origin_urls + p2_origin_urls + p3_origin_urls + p4_origin_urls, scheduler_origins, )