diff --git a/PKG-INFO b/PKG-INFO
index 838ac17..968ea38 100644
--- a/PKG-INFO
+++ b/PKG-INFO
@@ -1,69 +1,69 @@
 Metadata-Version: 2.1
 Name: swh.indexer
-Version: 0.0.59
+Version: 0.0.60
 Summary: Software Heritage Content Indexer
 Home-page: https://forge.softwareheritage.org/diffusion/78/
 Author: Software Heritage developers
 Author-email: swh-devel@inria.fr
 License: UNKNOWN
 Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
 Project-URL: Funding, https://www.softwareheritage.org/donate
 Project-URL: Source, https://forge.softwareheritage.org/source/swh-indexer
 Description: swh-indexer
         ============
         
         Tools to compute multiple indexes on SWH's raw contents:
         - content:
           - mimetype
           - ctags
           - language
           - fossology-license
           - metadata
         - revision:
           - metadata
         
         An indexer is in charge of:
         - looking up objects
         - extracting information from those objects
         - store those information in the swh-indexer db
         
         There are multiple indexers working on different object types:
           - content indexer: works with content sha1 hashes
           - revision indexer: works with revision sha1 hashes
           - origin indexer: works with origin identifiers
         
         Indexation procedure:
         - receive batch of ids
         - retrieve the associated data depending on object type
         - compute for that object some index
         - store the result to swh's storage
         
         Current content indexers:
         
         - mimetype (queue swh_indexer_content_mimetype): detect the encoding
           and mimetype
         
         - language (queue swh_indexer_content_language): detect the
           programming language
         
         - ctags (queue swh_indexer_content_ctags): compute tags information
         
         - fossology-license (queue swh_indexer_fossology_license): compute the
           license
         
         - metadata: translate file into translated_metadata dict
         
         Current revision indexers:
         
         - metadata: detects files containing metadata and retrieves translated_metadata
           in content_metadata table in storage or run content indexer to translate
           files.
         
 Platform: UNKNOWN
 Classifier: Programming Language :: Python :: 3
 Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
 Classifier: Operating System :: OS Independent
 Classifier: Development Status :: 5 - Production/Stable
 Description-Content-Type: text/markdown
 Provides-Extra: testing
diff --git a/debian/changelog b/debian/changelog
index 4b07cdf..5c2c28a 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,479 +1,481 @@
-swh-indexer (0.0.59-1~swh1~bpo9+1) stretch-swh; urgency=medium
+swh-indexer (0.0.60-1~swh1) unstable-swh; urgency=medium
 
-  * Rebuild for stretch-backports.
+  * v0.0.60
+  * origin_head: Make next step optional
+  * tests: Increase coverage
 
- -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Tue, 20 Nov 2018 14:27:20 +0100
+ -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Wed, 21 Nov 2018 12:33:13 +0100
 
 swh-indexer (0.0.59-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.59
   * fossology license: Fix issue on license computation
   * Improve docstrings
   * Fix pep8 violations
   * Increase coverage on content indexers
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Tue, 20 Nov 2018 14:27:20 +0100
 
 swh-indexer (0.0.58-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.58
   * Add missing default configuration for fossology license indexer
   * tests: Remove dead code
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Tue, 20 Nov 2018 12:06:56 +0100
 
 swh-indexer (0.0.57-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.57
   * storage: Open new endpoint on fossology license range retrieval
   * indexer: Open new fossology license range indexer
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Tue, 20 Nov 2018 11:44:57 +0100
 
 swh-indexer (0.0.56-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.56
   * storage.api: Open new endpoints (mimetype range, fossology range)
   * content indexers: Open mimetype and fossology range indexers
   * Remove orchestrator modules
   * tests: Improve coverage
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Mon, 19 Nov 2018 11:56:06 +0100
 
 swh-indexer (0.0.55-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.55
   * swh.indexer: Let task reschedule itself through the scheduler
   * Use swh.scheduler instead of celery leaking all around
   * swh.indexer.orchestrator: Fix orchestrator initialization step
   * swh.indexer.tasks: Fix type error when no result or list result
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Mon, 29 Oct 2018 10:41:54 +0100
 
 swh-indexer (0.0.54-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.54
   * swh.indexer.tasks: Fix task to use the scheduler's
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Thu, 25 Oct 2018 20:13:51 +0200
 
 swh-indexer (0.0.53-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.53
   * swh.indexer.rehash: Migrate to latest swh.model.hashutil.MultiHash
   * indexer: Add the origin intrinsic metadata indexer
   * indexer: Add OriginIndexer and OriginHeadIndexer.
   * indexer.storage: Add the origin intrinsic metadata storage database
   * indexer.storage: Autogenerate the Indexer Storage HTTP API.
   * setup: prepare for pypi upload
   * tests: Add a tox file
   * tests: migrate to pytest
   * tests: Add tests around celery stack
   * docs: Improve documentation and reuse README in generated
     documentation
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Thu, 25 Oct 2018 19:03:56 +0200
 
 swh-indexer (0.0.52-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.52
   * swh.indexer.storage: Refactor fossology license get (first external
   * contribution, cf. /CONTRIBUTORS)
   * swh.indexer.storage: Fix typo in invariable name metadata
   * swh.indexer.storage: No longer use temp table when reading data
   * swh.indexer.storage: Clean up unused import
   * swh.indexer.storage: Remove dead entry points origin_metadata*
   * swh.indexer.storage: Update docstrings information and format
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Wed, 13 Jun 2018 11:20:40 +0200
 
 swh-indexer (0.0.51-1~swh1) unstable-swh; urgency=medium
 
   * Release swh.indexer v0.0.51
   * Update for new db_transaction{,_generator}
 
  -- Nicolas Dandrimont <nicolas@dandrimont.eu>  Tue, 05 Jun 2018 14:10:39 +0200
 
 swh-indexer (0.0.50-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.50
   * swh.indexer.api.client: Permit to specify the query timeout option
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Thu, 24 May 2018 12:19:06 +0200
 
 swh-indexer (0.0.49-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.49
   * test_storage: Instantiate the tools during tests' setUp phase
   * test_storage: Deallocate storage during teardown step
   * test_storage: Make storage test fixture connect to postgres itself
   * storage.api.server: Only instantiate storage backend once per import
   * Use thread-aware psycopg2 connection pooling for database access
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Mon, 14 May 2018 11:09:30 +0200
 
 swh-indexer (0.0.48-1~swh1) unstable-swh; urgency=medium
 
   * Release swh.indexer v0.0.48
   * Update for new swh.storage
 
  -- Nicolas Dandrimont <nicolas@dandrimont.eu>  Sat, 12 May 2018 18:30:10 +0200
 
 swh-indexer (0.0.47-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.47
   * d/control: Fix runtime typo in packaging dependency
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Thu, 07 Dec 2017 16:54:49 +0100
 
 swh-indexer (0.0.46-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.46
   * Split swh-indexer packages in 2 python3-swh.indexer.storage and
   * python3-swh.indexer
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Thu, 07 Dec 2017 16:18:04 +0100
 
 swh-indexer (0.0.45-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.45
   * Fix usual error raised when deploying
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Thu, 07 Dec 2017 15:01:01 +0100
 
 swh-indexer (0.0.44-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.44
   * swh.indexer: Make indexer use their own storage
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Thu, 07 Dec 2017 13:20:44 +0100
 
 swh-indexer (0.0.43-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.43
   * swh.indexer.mimetype: Work around problem in detection
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Wed, 29 Nov 2017 10:26:11 +0100
 
 swh-indexer (0.0.42-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.42
   * swh.indexer: Make indexers register tools in prepare method
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Fri, 24 Nov 2017 11:26:03 +0100
 
 swh-indexer (0.0.41-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.41
   * mimetype: Use magic library api instead of parsing `file` cli output
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Mon, 20 Nov 2017 13:05:29 +0100
 
 swh-indexer (0.0.39-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.39
   * swh.indexer.producer: Fix argument to match the abstract definition
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Thu, 19 Oct 2017 10:03:44 +0200
 
 swh-indexer (0.0.38-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.38
   * swh.indexer.indexer: Fix argument to match the abstract definition
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Wed, 18 Oct 2017 19:57:47 +0200
 
 swh-indexer (0.0.37-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.37
   * swh.indexer.indexer: Fix argument to match the abstract definition
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Wed, 18 Oct 2017 18:59:42 +0200
 
 swh-indexer (0.0.36-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.36
   * packaging: Cleanup
   * codemeta: Adding codemeta.json file to document metadata
   * swh.indexer.mimetype: Fix edge case regarding empty raw content
   * docs: sanitize docstrings for sphinx documentation generation
   * swh.indexer.metadata: Add RevisionMetadataIndexer
   * swh.indexer.metadata: Add ContentMetadataIndexer
   * swh.indexer: Refactor base class to improve inheritance
   * swh.indexer.metadata: First draft of the metadata content indexer
   * for npm (package.json)
   * swh.indexer.tests: Added tests for language indexer
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Wed, 18 Oct 2017 16:24:24 +0200
 
 swh-indexer (0.0.35-1~swh1) unstable-swh; urgency=medium
 
   * Release swh.indexer 0.0.35
   * Update tasks to new swh.scheduler API
 
  -- Nicolas Dandrimont <nicolas@dandrimont.eu>  Mon, 12 Jun 2017 18:02:04 +0200
 
 swh-indexer (0.0.34-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.34
   * Fix unbound local error on edge case
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Wed, 07 Jun 2017 11:23:29 +0200
 
 swh-indexer (0.0.33-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.33
   * language indexer: Improve edge case policy
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Wed, 07 Jun 2017 11:02:47 +0200
 
 swh-indexer (0.0.32-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.32
   * Update fossology license to use the latest swh-storage
   * Improve language indexer to deal with potential error on bad
   * chunking
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Tue, 06 Jun 2017 18:13:40 +0200
 
 swh-indexer (0.0.31-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.31
   * Reduce log verbosity on language indexer
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Fri, 02 Jun 2017 19:08:52 +0200
 
 swh-indexer (0.0.30-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.30
   * Fix wrong default configuration
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Fri, 02 Jun 2017 18:01:27 +0200
 
 swh-indexer (0.0.29-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.29
   * Update indexer to resolve indexer configuration identifier
   * Adapt language indexer to use partial raw content
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Fri, 02 Jun 2017 16:21:27 +0200
 
 swh-indexer (0.0.28-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.28
   * Add error resilience to fossology indexer
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Mon, 22 May 2017 12:57:55 +0200
 
 swh-indexer (0.0.27-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.27
   * swh.indexer.language: Incremental encoding detection
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Wed, 17 May 2017 18:04:27 +0200
 
 swh-indexer (0.0.26-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.26
   * swh.indexer.orchestrator: Add batch size option per indexer
   * Log caught exception in a unified manner
   * Add rescheduling option (not by default) on rehash + indexers
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Wed, 17 May 2017 14:08:07 +0200
 
 swh-indexer (0.0.25-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.25
   * Add reschedule on error parameter for indexers
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Fri, 12 May 2017 12:13:15 +0200
 
 swh-indexer (0.0.24-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.24
   * Make rehash indexer more resilient to errors by rescheduling
     contents
   * in error (be it reading or updating problems)
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Thu, 04 May 2017 14:22:43 +0200
 
 swh-indexer (0.0.23-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.23
   * Improve producer to optionally make it synchroneous
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Wed, 03 May 2017 15:29:44 +0200
 
 swh-indexer (0.0.22-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.22
   * Improve mimetype indexer implementation
   * Make the chaining option in the mimetype indexer
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Tue, 02 May 2017 16:31:14 +0200
 
 swh-indexer (0.0.21-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.21
   * swh.indexer.rehash: Actually make the worker log
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Tue, 02 May 2017 14:28:55 +0200
 
 swh-indexer (0.0.20-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.20
   * swh.indexer.rehash:
   * Improve reading from objstorage only when needed
   * Fix empty file use case (which was skipped)
   * Add logging
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Fri, 28 Apr 2017 09:39:09 +0200
 
 swh-indexer (0.0.19-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.19
   * Fix rehash indexer's default configuration file
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Thu, 27 Apr 2017 19:17:20 +0200
 
 swh-indexer (0.0.18-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.18
   * Add new rehash indexer
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Wed, 26 Apr 2017 15:23:02 +0200
 
 swh-indexer (0.0.17-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.17
   * Add information on indexer tools (T610)
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Fri, 02 Dec 2016 18:32:54 +0100
 
 swh-indexer (0.0.16-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.16
   * bug fixes
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Tue, 15 Nov 2016 19:31:52 +0100
 
 swh-indexer (0.0.15-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.15
   * Improve message producer
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Tue, 15 Nov 2016 18:16:42 +0100
 
 swh-indexer (0.0.14-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.14
   * Update package dependency on fossology-nomossa
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Tue, 15 Nov 2016 14:13:41 +0100
 
 swh-indexer (0.0.13-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.13
   * Add new license indexer
   * ctags indexer: align behavior with other indexers regarding the
   * conflict update policy
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Mon, 14 Nov 2016 14:13:34 +0100
 
 swh-indexer (0.0.12-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.12
   * Add runtime dependency on universal-ctags
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Fri, 04 Nov 2016 13:59:59 +0100
 
 swh-indexer (0.0.11-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.11
   * Remove dependency on exuberant-ctags
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Thu, 03 Nov 2016 16:13:26 +0100
 
 swh-indexer (0.0.10-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.10
   * Add ctags indexer
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Thu, 20 Oct 2016 16:12:42 +0200
 
 swh-indexer (0.0.9-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.9
   * d/control: Bump dependency to latest python3-swh.storage api
   * mimetype: Use the charset to filter out data
   * orchestrator: Separate 2 distincts orchestrators (one for all
   * contents, one for text contents)
   * mimetype: once index computed, send text contents to text
     orchestrator
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Thu, 13 Oct 2016 15:28:17 +0200
 
 swh-indexer (0.0.8-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.8
   * Separate configuration file per indexer (no need for language)
   * Rename module file_properties to mimetype consistently with other
   * layers
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Sat, 08 Oct 2016 11:46:29 +0200
 
 swh-indexer (0.0.7-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.7
   * Adapt indexer language and mimetype to store result in storage.
   * Clean up obsolete code
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Sat, 08 Oct 2016 10:26:08 +0200
 
 swh-indexer (0.0.6-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.6
   * Fix multiple issues on production
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Fri, 30 Sep 2016 17:00:11 +0200
 
 swh-indexer (0.0.5-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.5
   * Fix debian/control dependency issue
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Fri, 30 Sep 2016 16:06:20 +0200
 
 swh-indexer (0.0.4-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.4
   * Upgrade dependencies issues
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Fri, 30 Sep 2016 16:01:52 +0200
 
 swh-indexer (0.0.3-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.3
   * Add encoding detection
   * Use encoding to improve language detection
   * bypass language detection for binary files
   * bypass ctags for binary files or decoding failure file
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Fri, 30 Sep 2016 12:30:11 +0200
 
 swh-indexer (0.0.2-1~swh1) unstable-swh; urgency=medium
 
   * v0.0.2
   * Provide one possible sha1's name for the multiple tools to ease
   * information extrapolation
   * Fix debian package dependency issue
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Thu, 29 Sep 2016 21:45:44 +0200
 
 swh-indexer (0.0.1-1~swh1) unstable-swh; urgency=medium
 
   * Initial release
   * v0.0.1
   * First implementation on poc
 
  -- Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com>  Wed, 28 Sep 2016 23:40:13 +0200
diff --git a/swh.indexer.egg-info/PKG-INFO b/swh.indexer.egg-info/PKG-INFO
index 838ac17..968ea38 100644
--- a/swh.indexer.egg-info/PKG-INFO
+++ b/swh.indexer.egg-info/PKG-INFO
@@ -1,69 +1,69 @@
 Metadata-Version: 2.1
 Name: swh.indexer
-Version: 0.0.59
+Version: 0.0.60
 Summary: Software Heritage Content Indexer
 Home-page: https://forge.softwareheritage.org/diffusion/78/
 Author: Software Heritage developers
 Author-email: swh-devel@inria.fr
 License: UNKNOWN
 Project-URL: Bug Reports, https://forge.softwareheritage.org/maniphest
 Project-URL: Funding, https://www.softwareheritage.org/donate
 Project-URL: Source, https://forge.softwareheritage.org/source/swh-indexer
 Description: swh-indexer
         ============
         
         Tools to compute multiple indexes on SWH's raw contents:
         - content:
           - mimetype
           - ctags
           - language
           - fossology-license
           - metadata
         - revision:
           - metadata
         
         An indexer is in charge of:
         - looking up objects
         - extracting information from those objects
         - store those information in the swh-indexer db
         
         There are multiple indexers working on different object types:
           - content indexer: works with content sha1 hashes
           - revision indexer: works with revision sha1 hashes
           - origin indexer: works with origin identifiers
         
         Indexation procedure:
         - receive batch of ids
         - retrieve the associated data depending on object type
         - compute for that object some index
         - store the result to swh's storage
         
         Current content indexers:
         
         - mimetype (queue swh_indexer_content_mimetype): detect the encoding
           and mimetype
         
         - language (queue swh_indexer_content_language): detect the
           programming language
         
         - ctags (queue swh_indexer_content_ctags): compute tags information
         
         - fossology-license (queue swh_indexer_fossology_license): compute the
           license
         
         - metadata: translate file into translated_metadata dict
         
         Current revision indexers:
         
         - metadata: detects files containing metadata and retrieves translated_metadata
           in content_metadata table in storage or run content indexer to translate
           files.
         
 Platform: UNKNOWN
 Classifier: Programming Language :: Python :: 3
 Classifier: Intended Audience :: Developers
 Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
 Classifier: Operating System :: OS Independent
 Classifier: Development Status :: 5 - Production/Stable
 Description-Content-Type: text/markdown
 Provides-Extra: testing
diff --git a/swh/indexer/origin_head.py b/swh/indexer/origin_head.py
index 54123ac..6a1ca96 100644
--- a/swh/indexer/origin_head.py
+++ b/swh/indexer/origin_head.py
@@ -1,217 +1,221 @@
 # Copyright (C) 2018  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import re
 import click
 import logging
 
 from swh.scheduler import get_scheduler
 from swh.scheduler.utils import create_task_dict
 from swh.indexer.indexer import OriginIndexer
 
 
 class OriginHeadIndexer(OriginIndexer):
     """Origin-level indexer.
 
     This indexer is in charge of looking up the revision that acts as the
     "head" of an origin.
 
     In git, this is usually the commit pointed to by the 'master' branch."""
 
     ADDITIONAL_CONFIG = {
         'tools': ('dict', {
             'name': 'origin-metadata',
             'version': '0.0.1',
             'configuration': {},
         }),
+        'tasks': ('dict', {
+            'revision_metadata': 'revision_metadata',
+            'origin_intrinsic_metadata': 'origin_metadata',
+        })
     }
 
     CONFIG_BASE_FILENAME = 'indexer/origin_head'
 
-    revision_metadata_task = 'revision_metadata'
-    origin_intrinsic_metadata_task = 'origin_metadata'
-
     def filter(self, ids):
         yield from ids
 
     def persist_index_computations(self, results, policy_update):
         """Do nothing. The indexer's results are not persistent, they
         should only be piped to another indexer."""
         pass
 
     def next_step(self, results, task):
         """Once the head is found, call the RevisionMetadataIndexer
         on these revisions, then call the OriginMetadataIndexer with
         both the origin_id and the revision metadata, so it can copy the
         revision metadata to the origin's metadata.
 
         Args:
             results (Iterable[dict]): Iterable of return values from `index`.
 
         """
         super().next_step(results, task)
-        if self.revision_metadata_task is None and \
-                self.origin_intrinsic_metadata_task is None:
+        revision_metadata_task = self.config['tasks']['revision_metadata']
+        origin_intrinsic_metadata_task = self.config['tasks'][
+            'origin_intrinsic_metadata']
+        if revision_metadata_task is None and \
+                origin_intrinsic_metadata_task is None:
             return
-        assert self.revision_metadata_task is not None
-        assert self.origin_intrinsic_metadata_task is not None
+        assert revision_metadata_task is not None
+        assert origin_intrinsic_metadata_task is not None
 
         # Second task to run after this one: copy the revision's metadata
         # to the origin
         sub_task = create_task_dict(
-            self.origin_intrinsic_metadata_task,
+            origin_intrinsic_metadata_task,
             'oneshot',
             origin_head={
                 str(result['origin_id']):
                     result['revision_id'].decode()
                 for result in results},
             policy_update='update-dups',
             )
         del sub_task['next_run']  # Not json-serializable
 
         # First task to run after this one: index the metadata of the
         # revision
         task = create_task_dict(
-            self.revision_metadata_task,
+            revision_metadata_task,
             'oneshot',
             ids=[res['revision_id'].decode() for res in results],
             policy_update='update-dups',
             next_step={
                 **sub_task,
                 'result_name': 'revisions_metadata'},
             )
         if getattr(self, 'scheduler', None):
             scheduler = self.scheduler
         else:
             scheduler = get_scheduler(**self.config['scheduler'])
         scheduler.create_tasks([task])
 
     # Dispatch
 
     def index(self, origin):
         origin_id = origin['id']
         latest_snapshot = self.storage.snapshot_get_latest(origin_id)
         method = getattr(self, '_try_get_%s_head' % origin['type'], None)
         if method is None:
             method = self._try_get_head_generic
         rev_id = method(latest_snapshot)
         if rev_id is None:
             return None
         result = {
                 'origin_id': origin_id,
                 'revision_id': rev_id,
                 }
         return result
 
     # VCSs
 
     def _try_get_vcs_head(self, snapshot):
         try:
             if isinstance(snapshot, dict):
                 branches = snapshot['branches']
                 if branches[b'HEAD']['target_type'] == 'revision':
                     return branches[b'HEAD']['target']
         except KeyError:
             return None
 
     _try_get_hg_head = _try_get_git_head = _try_get_vcs_head
 
     # Tarballs
 
     _archive_filename_re = re.compile(
             rb'^'
             rb'(?P<pkgname>.*)[-_]'
             rb'(?P<version>[0-9]+(\.[0-9])*)'
             rb'(?P<preversion>[-+][a-zA-Z0-9.~]+?)?'
             rb'(?P<extension>(\.[a-zA-Z0-9]+)+)'
             rb'$')
 
     @classmethod
     def _parse_version(cls, filename):
         """Extracts the release version from an archive filename,
         to get an ordering whose maximum is likely to be the last
         version of the software
 
         >>> OriginHeadIndexer._parse_version(b'foo')
         (-inf,)
         >>> OriginHeadIndexer._parse_version(b'foo.tar.gz')
         (-inf,)
         >>> OriginHeadIndexer._parse_version(b'gnu-hello-0.0.1.tar.gz')
         (0, 0, 1, 0)
         >>> OriginHeadIndexer._parse_version(b'gnu-hello-0.0.1-beta2.tar.gz')
         (0, 0, 1, -1, 'beta2')
         >>> OriginHeadIndexer._parse_version(b'gnu-hello-0.0.1+foobar.tar.gz')
         (0, 0, 1, 1, 'foobar')
         """
         res = cls._archive_filename_re.match(filename)
         if res is None:
             return (float('-infinity'),)
         version = [int(n) for n in res.group('version').decode().split('.')]
         if res.group('preversion') is None:
             version.append(0)
         else:
             preversion = res.group('preversion').decode()
             if preversion.startswith('-'):
                 version.append(-1)
                 version.append(preversion[1:])
             elif preversion.startswith('+'):
                 version.append(1)
                 version.append(preversion[1:])
             else:
                 assert False, res.group('preversion')
         return tuple(version)
 
     def _try_get_ftp_head(self, snapshot):
         archive_names = list(snapshot['branches'])
         max_archive_name = max(archive_names, key=self._parse_version)
         r = self._try_resolve_target(snapshot['branches'], max_archive_name)
         return r
 
     # Generic
 
     def _try_get_head_generic(self, snapshot):
         # Works on 'deposit', 'svn', and 'pypi'.
         try:
             if isinstance(snapshot, dict):
                 branches = snapshot['branches']
         except KeyError:
             return None
         else:
             return (
                     self._try_resolve_target(branches, b'HEAD') or
                     self._try_resolve_target(branches, b'master')
                     )
 
     def _try_resolve_target(self, branches, target_name):
         try:
             target = branches[target_name]
             while target['target_type'] == 'alias':
                 target = branches[target['target']]
             if target['target_type'] == 'revision':
                 return target['target']
             elif target['target_type'] == 'content':
                 return None  # TODO
             elif target['target_type'] == 'directory':
                 return None  # TODO
             elif target['target_type'] == 'release':
                 return None  # TODO
             else:
                 assert False
         except KeyError:
             return None
 
 
 @click.command()
 @click.option('--origins', '-i',
               help='Origins to lookup, in the "type+url" format',
               multiple=True)
 def main(origins):
     rev_metadata_indexer = OriginHeadIndexer()
     rev_metadata_indexer.run(origins, 'update-dups', parse_ids=True)
 
 
 if __name__ == '__main__':
     logging.basicConfig(level=logging.INFO)
     main()
diff --git a/swh/indexer/tests/test_ctags.py b/swh/indexer/tests/test_ctags.py
index ae45338..21939d7 100644
--- a/swh/indexer/tests/test_ctags.py
+++ b/swh/indexer/tests/test_ctags.py
@@ -1,104 +1,153 @@
 # Copyright (C) 2017-2018  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import unittest
 import logging
-from swh.indexer.ctags import CtagsIndexer
+
+from unittest.mock import patch
+from swh.indexer.ctags import (
+    CtagsIndexer, run_ctags
+)
+
 from swh.indexer.tests.test_utils import (
     BasicMockIndexerStorage, MockObjStorage, CommonContentIndexerTest,
     CommonIndexerWithErrorsTest, CommonIndexerNoTool,
     SHA1_TO_CTAGS, NoDiskIndexer
 )
 
 
+class BasicTest(unittest.TestCase):
+    @patch('swh.indexer.ctags.subprocess')
+    def test_run_ctags(self, mock_subprocess):
+        """Computing licenses from a raw content should return results
+
+        """
+        output0 = """
+{"name":"defun","kind":"function","line":1,"language":"scheme"}
+{"name":"name","kind":"symbol","line":5,"language":"else"}"""
+        output1 = """
+{"name":"let","kind":"var","line":10,"language":"something"}"""
+
+        expected_result0 = [
+            {
+                'name': 'defun',
+                'kind': 'function',
+                'line': 1,
+                'lang': 'scheme'
+            },
+            {
+                'name': 'name',
+                'kind': 'symbol',
+                'line': 5,
+                'lang': 'else'
+            }
+        ]
+
+        expected_result1 = [
+            {
+                'name': 'let',
+                'kind': 'var',
+                'line': 10,
+                'lang': 'something'
+            }
+        ]
+        for path, lang, intermediary_result, expected_result in [
+                (b'some/path', 'lisp', output0, expected_result0),
+                (b'some/path/2', 'markdown', output1, expected_result1)
+        ]:
+            mock_subprocess.check_output.return_value = intermediary_result
+            actual_result = list(run_ctags(path, lang=lang))
+            self.assertEqual(actual_result, expected_result)
+
+
 class InjectCtagsIndexer:
     """Override ctags computations.
 
     """
     def compute_ctags(self, path, lang):
         """Inject fake ctags given path (sha1 identifier).
 
         """
         return {
             'lang': lang,
             **SHA1_TO_CTAGS.get(path)
         }
 
 
 class CtagsIndexerTest(NoDiskIndexer, InjectCtagsIndexer, CtagsIndexer):
     """Specific language whose configuration is enough to satisfy the
        indexing tests.
     """
     def prepare(self):
         self.config = {
             'tools': {
                 'name': 'universal-ctags',
                 'version': '~git7859817b',
                 'configuration': {
                     'command_line': '''ctags --fields=+lnz --sort=no '''
                                     ''' --links=no <filepath>''',
                     'max_content_size': 1000,
                 },
             },
             'languages': {
                 'python': 'python',
                 'haskell': 'haskell',
                 'bar': 'bar',
             }
         }
         self.idx_storage = BasicMockIndexerStorage()
         self.log = logging.getLogger('swh.indexer')
         self.objstorage = MockObjStorage()
         self.tool_config = self.config['tools']['configuration']
         self.max_content_size = self.tool_config['max_content_size']
         self.tools = self.register_tools(self.config['tools'])
         self.tool = self.tools[0]
         self.language_map = self.config['languages']
 
 
 class TestCtagsIndexer(CommonContentIndexerTest, unittest.TestCase):
     """Ctags indexer test scenarios:
 
     - Known sha1s in the input list have their data indexed
     - Unknown sha1 in the input list are not indexed
 
     """
     def setUp(self):
         self.indexer = CtagsIndexerTest()
 
         # Prepare test input
         self.id0 = '01c9379dfc33803963d07c1ccc748d3fe4c96bb5'
         self.id1 = 'd4c647f0fc257591cc9ba1722484229780d1c607'
         self.id2 = '688a5ef812c53907562fe379d4b3851e69c7cb15'
 
         tool_id = self.indexer.tool['id']
         self.expected_results = {
             self.id0: {
                 'id': self.id0,
                 'indexer_configuration_id': tool_id,
                 'ctags': SHA1_TO_CTAGS[self.id0],
             },
             self.id1: {
                 'id': self.id1,
                 'indexer_configuration_id': tool_id,
                 'ctags': SHA1_TO_CTAGS[self.id1],
             },
             self.id2: {
                 'id': self.id2,
                 'indexer_configuration_id': tool_id,
                 'ctags': SHA1_TO_CTAGS[self.id2],
             }
         }
 
 
 class CtagsIndexerUnknownToolTestStorage(
         CommonIndexerNoTool, CtagsIndexerTest):
     """Fossology license indexer with wrong configuration"""
 
 
 class TestCtagsIndexersErrors(
         CommonIndexerWithErrorsTest, unittest.TestCase):
     """Test the indexer raise the right errors when wrongly initialized"""
     Indexer = CtagsIndexerUnknownToolTestStorage
diff --git a/swh/indexer/tests/test_origin_head.py b/swh/indexer/tests/test_origin_head.py
index 335ced7..f7e07a1 100644
--- a/swh/indexer/tests/test_origin_head.py
+++ b/swh/indexer/tests/test_origin_head.py
@@ -1,91 +1,91 @@
-# Copyright (C) 2017  The Software Heritage developers
+# Copyright (C) 2017-2018  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import unittest
 import logging
 
 from swh.indexer.origin_head import OriginHeadIndexer
 from swh.indexer.tests.test_utils import MockIndexerStorage, MockStorage
 
 
 class OriginHeadTestIndexer(OriginHeadIndexer):
     """Specific indexer whose configuration is enough to satisfy the
        indexing tests.
     """
-
-    revision_metadata_task = None
-    origin_intrinsic_metadata_task = None
-
     def prepare(self):
         self.config = {
             'tools': {
                 'name': 'origin-metadata',
                 'version': '0.0.1',
                 'configuration': {},
             },
+            'tasks': {
+                'revision_metadata': None,
+                'origin_intrinsic_metadata': None,
+            }
         }
         self.storage = MockStorage()
         self.idx_storage = MockIndexerStorage()
         self.log = logging.getLogger('swh.indexer')
         self.objstorage = None
         self.tools = self.register_tools(self.config['tools'])
         self.tool = self.tools[0]
         self.results = None
 
     def persist_index_computations(self, results, policy_update):
         self.results = results
 
 
 class OriginHead(unittest.TestCase):
     def test_git(self):
         indexer = OriginHeadTestIndexer()
         indexer.run(
                 ['git+https://github.com/SoftwareHeritage/swh-storage'],
                 'update-dups', parse_ids=True)
         self.assertEqual(indexer.results, [{
             'revision_id': b'8K\x12\x00d\x03\xcc\xe4]bS\xe3\x8f{'
                            b'\xd7}\xac\xefrm',
             'origin_id': 52189575}])
 
     def test_ftp(self):
         indexer = OriginHeadTestIndexer()
         indexer.run(
                 ['ftp+rsync://ftp.gnu.org/gnu/3dldf'],
                 'update-dups', parse_ids=True)
         self.assertEqual(indexer.results, [{
             'revision_id': b'\x8e\xa9\x8e/\xea}\x9feF\xf4\x9f\xfd\xee'
                            b'\xcc\x1a\xb4`\x8c\x8by',
             'origin_id': 4423668}])
 
     def test_deposit(self):
         indexer = OriginHeadTestIndexer()
         indexer.run(
                 ['deposit+https://forge.softwareheritage.org/source/'
                  'jesuisgpl/'],
                 'update-dups', parse_ids=True)
         self.assertEqual(indexer.results, [{
             'revision_id': b'\xe7n\xa4\x9c\x9f\xfb\xb7\xf76\x11\x08{'
                            b'\xa6\xe9\x99\xb1\x9e]q\xeb',
             'origin_id': 77775770}])
 
     def test_pypi(self):
         indexer = OriginHeadTestIndexer()
         indexer.run(
                 ['pypi+https://pypi.org/project/limnoria/'],
                 'update-dups', parse_ids=True)
         self.assertEqual(indexer.results, [{
             'revision_id': b'\x83\xb9\xb6\xc7\x05\xb1%\xd0\xfem\xd8k'
                            b'A\x10\x9d\xc5\xfa2\xf8t',
             'origin_id': 85072327}])
 
     def test_svn(self):
         indexer = OriginHeadTestIndexer()
         indexer.run(
                 ['svn+http://0-512-md.googlecode.com/svn/'],
                 'update-dups', parse_ids=True)
         self.assertEqual(indexer.results, [{
             'revision_id': b'\xe4?r\xe1,\x88\xab\xec\xe7\x9a\x87\xb8'
                            b'\xc9\xad#.\x1bw=\x18',
             'origin_id': 49908349}])
diff --git a/swh/indexer/tests/test_origin_metadata.py b/swh/indexer/tests/test_origin_metadata.py
index 1ed3024..7166434 100644
--- a/swh/indexer/tests/test_origin_metadata.py
+++ b/swh/indexer/tests/test_origin_metadata.py
@@ -1,126 +1,130 @@
 # Copyright (C) 2018  The Software Heritage developers
 # See the AUTHORS file at the top-level directory of this distribution
 # License: GNU General Public License version 3, or any later version
 # See top-level LICENSE file for more information
 
 import time
 import logging
 import unittest
 from celery import task
 
 from swh.indexer.metadata import OriginMetadataIndexer
 from swh.indexer.tests.test_utils import MockObjStorage, MockStorage
 from swh.indexer.tests.test_utils import MockIndexerStorage
 from swh.indexer.tests.test_origin_head import OriginHeadTestIndexer
 from swh.indexer.tests.test_metadata import RevisionMetadataTestIndexer
 
 from swh.scheduler.tests.scheduler_testing import SchedulerTestFixture
 
 
 class OriginMetadataTestIndexer(OriginMetadataIndexer):
     def prepare(self):
         self.config = {
             'storage': {
                 'cls': 'remote',
                 'args': {
                     'url': 'http://localhost:9999',
                 }
             },
             'tools': {
                 'name': 'origin-metadata',
                 'version': '0.0.1',
                 'configuration': {}
             }
         }
         self.storage = MockStorage()
         self.idx_storage = MockIndexerStorage()
         self.log = logging.getLogger('swh.indexer')
         self.objstorage = MockObjStorage()
         self.tools = self.register_tools(self.config['tools'])
         self.tool = self.tools[0]
         self.results = []
 
 
 @task
 def revision_metadata_test_task(*args, **kwargs):
     indexer = RevisionMetadataTestIndexer()
     indexer.run(*args, **kwargs)
     return indexer.results
 
 
 @task
 def origin_intrinsic_metadata_test_task(*args, **kwargs):
     indexer = OriginMetadataTestIndexer()
     indexer.run(*args, **kwargs)
     return indexer.results
 
 
 class OriginHeadTestIndexer(OriginHeadTestIndexer):
-    revision_metadata_task = 'revision_metadata_test_task'
-    origin_intrinsic_metadata_task = 'origin_intrinsic_metadata_test_task'
+    def prepare(self):
+        super().prepare()
+        self.config['tasks'] = {
+            'revision_metadata': 'revision_metadata_test_task',
+            'origin_intrinsic_metadata': 'origin_intrinsic_metadata_test_task',
+        }
 
 
 class TestOriginMetadata(SchedulerTestFixture, unittest.TestCase):
     def setUp(self):
         super().setUp()
         self.maxDiff = None
         MockIndexerStorage.added_data = []
         self.add_scheduler_task_type(
             'revision_metadata_test_task',
             'swh.indexer.tests.test_origin_metadata.'
             'revision_metadata_test_task')
         self.add_scheduler_task_type(
             'origin_intrinsic_metadata_test_task',
             'swh.indexer.tests.test_origin_metadata.'
             'origin_intrinsic_metadata_test_task')
         RevisionMetadataTestIndexer.scheduler = self.scheduler
 
     def tearDown(self):
         del RevisionMetadataTestIndexer.scheduler
         super().tearDown()
 
     def test_pipeline(self):
         indexer = OriginHeadTestIndexer()
         indexer.scheduler = self.scheduler
         indexer.run(
                 ["git+https://github.com/librariesio/yarn-parser"],
                 policy_update='update-dups',
                 parse_ids=True)
 
         self.run_ready_tasks()  # Run the first task
         time.sleep(0.1)  # Give it time to complete and schedule the 2nd one
         self.run_ready_tasks()  # Run the second task
 
         metadata = {
             '@context': 'https://doi.org/10.5063/schema/codemeta-2.0',
             'url':
                 'https://github.com/librariesio/yarn-parser#readme',
             'schema:codeRepository':
                 'git+https://github.com/librariesio/yarn-parser.git',
             'schema:author': 'Andrew Nesbitt',
             'license': 'AGPL-3.0',
             'version': '1.0.0',
             'description':
                 'Tiny web service for parsing yarn.lock files',
             'codemeta:issueTracker':
                 'https://github.com/librariesio/yarn-parser/issues',
             'name': 'yarn-parser',
             'keywords': ['yarn', 'parse', 'lock', 'dependencies'],
         }
         rev_metadata = {
             'id': '8dbb6aeb036e7fd80664eb8bfd1507881af1ba9f',
             'translated_metadata': metadata,
             'indexer_configuration_id': 7,
         }
         origin_metadata = {
             'origin_id': 54974445,
             'from_revision': '8dbb6aeb036e7fd80664eb8bfd1507881af1ba9f',
             'metadata': metadata,
             'indexer_configuration_id': 7,
         }
         expected_results = [
                 ('origin_intrinsic_metadata', True, [origin_metadata]),
                 ('revision_metadata', True, [rev_metadata])]
 
         results = list(indexer.idx_storage.added_data)
         self.assertCountEqual(expected_results, results)
diff --git a/version.txt b/version.txt
index d1c0402..52f6aed 100644
--- a/version.txt
+++ b/version.txt
@@ -1 +1 @@
-v0.0.59-0-g45c8f94
\ No newline at end of file
+v0.0.60-0-ga1332dd
\ No newline at end of file