As the npm registry is a CouchDB database, we should be able to benefit from the changes feed [1]
in order to implement an incremental lister returning only packages updated since the last listing operation.
The scenario to keep the archive in synch with the npm registry content would then be the following:
- As a first step, list the whole registry content using the full lister already implemented (T1380) and create one shot loading tasks to ingest the packages into the archive
- Once the first batch of loading tasks have been executed, execute the incremental lister on a regular basis (daily for instance) to get all new/updated packages since the last listing operation and create one shot loading tasks to add/update the package content into the archive.
The key to succeed here is to backup before each listing operation the update_seq value returned by the root registry endpoint:
{ "db_name": "registry", "doc_count": 843873, "doc_del_count": 82100, "update_seq": 6920520, "purge_seq": 0, "compact_running": false, "disk_size": 8767004930, "other": { "data_size": 21195329396 }, "data_size": 7294970847, "sizes": { "file": 8767004930, "active": 7294970847, "external": 21195329396 }, "instance_start_time": "1543443801840837", "disk_format_version": 6, "committed_update_seq": 6920520, "compacted_seq": 6911034, "uuid": "370e266567ec9d1242acc2612839d6a7" }
This value corresponds to the last changeset id in the CouchDB database and can be provided as parameter to the changes feed endpoint
to list subsequent changesets.
antoine@guggenheim:~$ curl https://replicate.npmjs.com/_changes?since=6920520 | jq '.' { "results": [ { "seq": 6920521, "id": "@comparaonline/ui-offer-card", "changes": [ { "rev": "71-d021f7d41e2af4b4337f3cc9b8a8e166" } ] }, { "seq": 6920522, "id": "@comparaonline/ui-offer-card-list", "changes": [ { "rev": "16-0fe503d1b4660b74ac465d6ecf8df5b4" } ] }, { "seq": 6920523, "id": "@comparaonline/ui-offer-comparison-drawer", "changes": [ { "rev": "39-6e5f99ddf9e153d6606f8b8e8f9e78f6" } ] } ],
[1] http://docs.couchdb.org/en/2.2.0/api/database/changes.html