Page MenuHomeSoftware Heritage

Open /api/1/revision/origin/<ORIG_ID>[/branch/<BRANCH>][/ts/<TIMESTAMP>]/
Closed, MigratedEdits Locked

Description

Instead of having to specify a (root) revision by SHA1_GIT, users might want to
specify a place and a time. In SWH a "place" is an origin, with an optional
branch name; a "time" is a timestamp at which some place has been observed by
SWH crawlers.

Wherever a revision context is expected in a path (i.e., a
"/revision/<SHA1_GIT>/" path fragment) we can put in its stead a path fragment
of the form /origin/<ORIG_ID>[/branch/<BRANCH>][/ts/<TIMESTAMP>/]. Such a
fragment is resolved, internally by the SWH archive, to a SHA1_GIT as follows:

  • [if <TIMESTAMP> is absent] look for the most recent crawl of origin <ORIG_ID>
  • [if <TIMESTAMP> is given] look for the most recent crawl of origin <ORIG_ID> whose timestamp is <= <TS>
  • [if <BRANCH> is given] look for the branch <BRANCH>
  • [if <BRANCH> is absent] look for branch "master"
  • return the <SHA1_GIT> pointed by the chosen branch

The already mentioned URL for revision contexts can therefore be alternatively specified by users as /revision/origin/<ORIG_ID>[/branch/<BRANCH>][/ts/<TIMESTAMP>]/

Typing:

  • <ORIG_ID>s are given as integer identifiers, pointing into the origin table. There will be separate mechanisms for finding origins by other means (e.g., URLs, metadata, etc). Once an origin is found, it can be used by ID into the above URL schemes
  • <BRANCH> names are given as per the corresponding VCS (e.g., Git) and might therefore contains characters that are either invalid in URLs, or that might make the above URL schemes ambiguous (e.g., '/'). All those characters will need to be URL-escaped. (e.g., '/' will become '%2F')
  • <TIMESTAMP>s are given in a format as liberal as possible, to uphold the principle of least surprise. At the very minimum it should be possible to enter timestamps as:
    • ISO 8601 timestamps (see for instance the output of date -I, date -Is)
    • YYYY[MM[DD[HH[MM[SS]]]]] ad-hoc format

      Implementation proposal: use Python dateutil's parser and be done with it https://dateutil.readthedocs.org/en/latest/parser.html . Note: that dateutil does *not* allow to use classical UNIX timestamps expressed as seconds since the epoch (i.e., date +%s output). We will need to single case them.

      The same escaping considerations given for <BRANCH> apply.

Event Timeline

ardumont raised the priority of this task from to Normal.
ardumont updated the task description. (Show Details)
ardumont added a project: Web app.
ardumont added a subscriber: ardumont.
olasd changed the visibility from "All Users" to "Public (No Login Required)".May 13 2016, 5:08 PM