Page MenuHomeSoftware Heritage

Update docs with release artifact and test scenarios
ClosedPublic

Authored by moranegg on Jun 19 2019, 4:35 PM.

Details

Summary

Update loading specification with current state and release artifact

  • review text and update to current state
  • place technical details in a new spec document
  • edit tables
  • add artifact specs from api point of view
  • add release artifact (prepare for implementation)

Closes T1717

Diff Detail

Repository
rDDEP Push deposit
Branch
docs
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 6359
Build 8819: tox-on-jenkinsJenkins
Build 8818: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
docs/tests/tests_HAL.rst
65

There should be some way on hal's side to deal with us down.
Even though, that should not happen, it still could due to 'reasons' (migration comes to mind...).

moranegg marked 10 inline comments as done.
  • Update specs and test scenarios in docs with D1617 comments
moranegg marked an inline comment as not done.Jun 20 2019, 3:50 PM
moranegg added subscribers: rdicosmo, grouss.
moranegg added inline comments.
docs/specs/spec-loading.rst
40

occurrence isn't a branch in a snapshot?
and occurrence history is obsolete then?

44

this is the way it was, don't mind deleting the _

111

This is the api output, I imagine you know.
Which is the json representation of the artifact.
I can do a table with columns, but I find this output clear.

or maybe when adding the artifact name it is clearer:

snapshot:  {
      "branches": {
          "master": {
              "target": "396b1ff29f7c75a0a3cc36f30e24ff7bae70bb52",
              "target_type": "revision",
              "target_url": "/api/1/revision/396b1ff29f7c75a0a3cc36f30e24ff7bae70bb52/"
          }
      },
      "id": "a3773941561cc557853898773a19c07cfe2efc5a",
      "next_branch": null
  }

I've use this schema on all api artifacts, can you say if this works?

115

The archive is deposited with a set of descriptive metadata, in the CodeMeta vocabulary.
The following CodeMeta terms implies that the artifact is a release:

136

how about, specifying above from where we get the metadata and here change to CodeMeta term

163

good thinking! and you are right... it's not very readable.
Going for:

the deposit revision is not a commit, therefore the revision created is a `synthetic` revision.

187

The distinction between the comitter_date and the date for a deposit should be specified and documented better.
After high level decisions on the legacy software archival.
@rdicosmo

Here the date is committer_date populated from deposit with CodeMeta term publicationDate

189

Here the date is the date (aka. author_date) populated from a deposit with CodeMeta term creationDate.

This is the date used by @grouss.

docs/specs/spec-technical.rst
8

ack.
Where are the archives stored?

9

Yes sure, it is still specs :-D
I will change that to

A client to test the communication with SWORD protocol

because we still need and have a client (swh client) to test the communication.
maybe this should be linked to the SWORD client part.

19

sure.
Can you verify that the content is still accurate?

24

:-)

docs/tests/tests_HAL.rst
63

the bug was that v2 had the swh-id identifier that was generated for v1. I will reword.

65

yes you are right. let's hope it will happen someday:-)

  • Update specs with date mapping
docs/specs/spec-loading.rst
40

Yes, obsolete.
They were replaced by snapshot.

origin -> origin-visit -> snapshot -> {revision, release}

  • Update specs: ammend occurrences into branches
docs/specs/spec-loading.rst
111

This is the api output, I imagine you know.

Yes, i just realized that's the main api indeed.

I've use this schema on all api artifacts, can you say if this works?

The description works.

136

works for me ;)

Note that all tables are a bit misaligned.
To align it properly, i'm letting emacs' org-mode feature do it
(using <TAB> key within the table and the alignment automagically sets itself correctly)

151

Maybe take a real id (that's a hash as well).
Because otherwise, it's not consistent with other identifiers mentions elsewhere (just below for example).

164

I tend to try and shorten sentences where i can.
I'd go for a simpler:

The deposit revision is synthetically created.

And then letting the readers understand the synthetic: True on their own.

175

That's a judgment call, i'm not sure we want to add that here.

Also i'd remove this and amend the chapter when legacy software will actually happen.

docs/specs/spec-technical.rst
8

on disk (but not an objstorage) with references in the db.

19

Can you verify that the content is still accurate?

I don't understand.
What must i check?

43

Use - as the previous chapter.

docs/tests/tests_HAL.rst
58

why do you use + on that line here?

zack requested changes to this revision.Jul 8 2019, 2:51 PM
zack added inline comments.
docs/specs/spec-loading.rst
4–6

"this part" ... of what?

Alternative beginning suggestion: "This specification describes the …"

Also, on first use I like to have Software Heritage in full, e.g., "Software Heritage (SWH) archive", so that the acronym is defined for later.

20–261

"Artifacts creation" should be enough here, SWH is kinda implicit anyway.

32

I don't get what this means. "origin_visit" is an internal table name. You probably mean a specific field in that table? Either way, I suggest to use a descriptive name, with the SQL path as additional detail, e.g., "timestamp of the visit (origin_visit.field_name)".

36

minor grammar issue: "at visit" ← not sure what this means

38–39

Are the branches actually optional? Don't we always have at least one release here?

Relatedly: this use case seems similar with package manager listers/loaders, we should compare with what they do and make sure we are consistent. In particular, I'm not sure we have a "master" branch there, most likely we have a "HEAD" branch, pointing to the most recent version at visit time + one branch for each release (the current one + all previous ones).

43–46

I see what you mean here, and I agree with the arrangement. But the description of these two can probably be simpler, how about:

  • revision: synthetic revision pointing to the expanded submitted tarball
  • directory: root directory of the expanded submitted tarball
51–53

active voice is generally preferable to passive voice (this is a general remark that applies to most of the paragraphs in the spec)

151

Seconded.

164

Just remove this sentence. We have already said it's going to be a synthetic revision above, no need to further expand on the matter, IMO.

175

Agreed. Just remove the above paragraph; the subsequent (factual) paragraph is all we need in a spec.

181

I don't understand what "dates in a deposit" mean. Are these fields that are available via SWORD? if so, we should add a sentence specifying that this column should be interpreted in that context.

183

typo: "recieved"

192

typo "targetting"

340–341

don't call this "hash", it's a "Software Heritage Peristent Identifier" (or SWH PID for short), and you can link that text to the PID doc instead of adding a cf. parenthesis

docs/specs/spec-technical.rst
83

better "status changes"

More generally, this should really be a state diagram, rather than a state transition function serialized as a set of lines. No need to add a separate figure for it, but at least a ASCII art state diagram would be nice here.

docs/tests/tests_HAL.rst
1–5

Why does this belong to our repo? swh-deposit is not HAL-specific, if anything it's SWORD specific. So if there're HAL-specific specs, maybe they belong to some HAL-specific website?

This revision now requires changes to proceed.Jul 8 2019, 2:51 PM
moranegg added inline comments.
docs/specs/spec-loading.rst
4–6

I agree. This page is a part of a collection of specifications, this is why it starts that way, but I prefer starting fresh as you suggest also with the full definition of the acronym.

20–261

sure.

32

I kept it from the original specs when there was no snapshots.
Maybe the best solution it to erase the line above.
and change in snaphot the deposit representation to:

snapshotreception of all occurrences (branches)

How does it sound?

36

I agree. see comment above.

38–39

We do not at the moment. Only master.

With this specs we are introducing the concept of release to a deposit.
Only deposits with certain metadata will have another branch for the release.

@ardumont can you comment about package manager?

43–46

ack.

51–53

Can you show me an example, I tried to avoid using We like "We create an origin from the url used in the metadata".
But I can change everything to active with we.

151

ack.

164

sure.

175

Ok. You should know it is really complicating things and I'm pulling my hair on the Legacy Software.
Maybe that's why I've written it ;-)

181

dates in a deposit can be found via SWORD in the header or in the xml metadata.
for example the first two dates are from the SWORD protocol and result in fields in our DB.
The two other dates are sent in the metadata, but are not mandatory.

183

ack.

192

ack.

340–341

good call.

docs/specs/spec-technical.rst
43

ack

83

we already have one but in latex :-/
I'll see what I can do.

docs/tests/tests_HAL.rst
1–5

Today this information is only in my head.
There are no testers on HAL and if it were for them, the users are the testers.
In my opinion, it should be kept somewhere on our side, if we don't want this process to disappear..
Also, at some point we want to automate this tests on with integration tests, so the automated tests should be detailed before implementation.
SWH as an archival service recommends HAL as a way to inject code into the archive, and we need to be sure it works.
If you have a better SWH location for it, I don't mind changing it.

58

oh no. thanks!

moranegg marked 23 inline comments as done.
  • Update specs: status image and other modifications
moranegg added inline comments.
docs/specs/spec-loading.rst
136

it's very strange, I'm using Atom and not Emacs and I've openned the file in Emacs and it is misaligned.
If I correct it and go back to Atom it is broken again in Atom..

docs/specs/spec-loading.rst
38–39

Are the branches actually optional?

Good catch.
No, the snapshot branch is not optional.
There is at least one targetting the revision (the one targetting the tarball's uncompressed root directory).

Don't we always have at least one release here?

In the current state, we don't have release yet, only 1 revision.
That's part of the specification amending morane is trying to push here.
My understanding is that new release is optional, depending on the metadata received from the clients (hal, etc...).

@ardumont can you comment about package manager?

We plan to refactor the deposit loader according to the package manager loader indeed.
For the branch, we chose "master" at the time. But indeed, most of the other use HEAD (which might have been decided after the deposit, don't remember the details exactly).

We can change it in the spec and in the implementation.
But i'm (always) wondering about migration when we discuss such things...
Do we want to update the old swh-ids? (that might be painful communication wise)

210

committer

moranegg marked 4 inline comments as done.
  • Fix typo
zack added inline comments.
docs/specs/spec-loading.rst
51–53

I'm letting this one go for now. The style change will require quite some effort, and it's more important to land the spec than block on this.

This revision is now accepted and ready to land.Jul 11 2019, 2:56 PM
This revision was automatically updated to reflect the committed changes.