Page MenuHomeSoftware Heritage

Update docs with release artifact and test scenarios
ClosedPublic

Authored by moranegg on Jun 19 2019, 4:35 PM.

Details

Summary

Update loading specification with current state and release artifact

  • review text and update to current state
  • place technical details in a new spec document
  • edit tables
  • add artifact specs from api point of view
  • add release artifact (prepare for implementation)

Closes T1717

Diff Detail

Repository
rDDEP Push deposit
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ardumont added inline comments.Jun 20 2019, 11:26 AM
docs/tests/tests_HAL.rst
66

There should be some way on hal's side to deal with us down.
Even though, that should not happen, it still could due to 'reasons' (migration comes to mind...).

moranegg updated this revision to Diff 5400.Jun 20 2019, 3:47 PM
moranegg marked 10 inline comments as done.
  • Update specs and test scenarios in docs with D1617 comments
moranegg marked an inline comment as not done.Jun 20 2019, 3:50 PM
moranegg added subscribers: rdicosmo, grouss.
moranegg added inline comments.
docs/specs/spec-loading.rst
188 ↗(On Diff #5400)

The distinction between the comitter_date and the date for a deposit should be specified and documented better.
After high level decisions on the legacy software archival.
@rdicosmo

Here the date is committer_date populated from deposit with CodeMeta term publicationDate

190 ↗(On Diff #5400)

Here the date is the date (aka. author_date) populated from a deposit with CodeMeta term creationDate.

This is the date used by @grouss.

40 ↗(On Diff #5376)

occurrence isn't a branch in a snapshot?
and occurrence history is obsolete then?

44 ↗(On Diff #5376)

this is the way it was, don't mind deleting the _

111 ↗(On Diff #5376)

This is the api output, I imagine you know.
Which is the json representation of the artifact.
I can do a table with columns, but I find this output clear.

or maybe when adding the artifact name it is clearer:

snapshot:  {
      "branches": {
          "master": {
              "target": "396b1ff29f7c75a0a3cc36f30e24ff7bae70bb52",
              "target_type": "revision",
              "target_url": "/api/1/revision/396b1ff29f7c75a0a3cc36f30e24ff7bae70bb52/"
          }
      },
      "id": "a3773941561cc557853898773a19c07cfe2efc5a",
      "next_branch": null
  }

I've use this schema on all api artifacts, can you say if this works?

115 ↗(On Diff #5376)

The archive is deposited with a set of descriptive metadata, in the CodeMeta vocabulary.
The following CodeMeta terms implies that the artifact is a release:

136 ↗(On Diff #5376)

how about, specifying above from where we get the metadata and here change to CodeMeta term

163 ↗(On Diff #5376)

good thinking! and you are right... it's not very readable.
Going for:

the deposit revision is not a commit, therefore the revision created is a `synthetic` revision.

docs/specs/spec-technical.rst
8 ↗(On Diff #5376)

ack.
Where are the archives stored?

9 ↗(On Diff #5376)

Yes sure, it is still specs :-D
I will change that to

A client to test the communication with SWORD protocol

because we still need and have a client (swh client) to test the communication.
maybe this should be linked to the SWORD client part.

19 ↗(On Diff #5376)

sure.
Can you verify that the content is still accurate?

24 ↗(On Diff #5376)

:-)

docs/tests/tests_HAL.rst
64

the bug was that v2 had the swh-id identifier that was generated for v1. I will reword.

66

yes you are right. let's hope it will happen someday:-)

moranegg updated this revision to Diff 5426.Jun 21 2019, 4:32 PM
  • Update specs with date mapping
ardumont added inline comments.Jun 21 2019, 5:34 PM
docs/specs/spec-loading.rst
40 ↗(On Diff #5376)

Yes, obsolete.
They were replaced by snapshot.

origin -> origin-visit -> snapshot -> {revision, release}

moranegg updated this revision to Diff 5515.Jun 26 2019, 3:45 PM
  • Update specs: ammend occurrences into branches
ardumont added inline comments.Jul 5 2019, 12:39 PM
docs/specs/spec-loading.rst
152 ↗(On Diff #5515)

Maybe take a real id (that's a hash as well).
Because otherwise, it's not consistent with other identifiers mentions elsewhere (just below for example).

165 ↗(On Diff #5515)

I tend to try and shorten sentences where i can.
I'd go for a simpler:

The deposit revision is synthetically created.

And then letting the readers understand the synthetic: True on their own.

176 ↗(On Diff #5515)

That's a judgment call, i'm not sure we want to add that here.

Also i'd remove this and amend the chapter when legacy software will actually happen.

111 ↗(On Diff #5376)

This is the api output, I imagine you know.

Yes, i just realized that's the main api indeed.

I've use this schema on all api artifacts, can you say if this works?

The description works.

136 ↗(On Diff #5376)

works for me ;)

Note that all tables are a bit misaligned.
To align it properly, i'm letting emacs' org-mode feature do it
(using <TAB> key within the table and the alignment automagically sets itself correctly)

docs/specs/spec-technical.rst
43 ↗(On Diff #5515)

Use - as the previous chapter.

8 ↗(On Diff #5376)

on disk (but not an objstorage) with references in the db.

19 ↗(On Diff #5376)

Can you verify that the content is still accurate?

I don't understand.
What must i check?

docs/tests/tests_HAL.rst
59

why do you use + on that line here?

zack requested changes to this revision.Jul 8 2019, 2:51 PM
zack added inline comments.
docs/specs/spec-loading.rst
4 ↗(On Diff #5515)

"this part" ... of what?

Alternative beginning suggestion: "This specification describes the …"

Also, on first use I like to have Software Heritage in full, e.g., "Software Heritage (SWH) archive", so that the acronym is defined for later.

20–21 ↗(On Diff #5515)

"Artifacts creation" should be enough here, SWH is kinda implicit anyway.

33 ↗(On Diff #5515)

I don't get what this means. "origin_visit" is an internal table name. You probably mean a specific field in that table? Either way, I suggest to use a descriptive name, with the SQL path as additional detail, e.g., "timestamp of the visit (origin_visit.field_name)".

37 ↗(On Diff #5515)

minor grammar issue: "at visit" ← not sure what this means

39–40 ↗(On Diff #5515)

Are the branches actually optional? Don't we always have at least one release here?

Relatedly: this use case seems similar with package manager listers/loaders, we should compare with what they do and make sure we are consistent. In particular, I'm not sure we have a "master" branch there, most likely we have a "HEAD" branch, pointing to the most recent version at visit time + one branch for each release (the current one + all previous ones).

44–47 ↗(On Diff #5515)

I see what you mean here, and I agree with the arrangement. But the description of these two can probably be simpler, how about:

  • revision: synthetic revision pointing to the expanded submitted tarball
  • directory: root directory of the expanded submitted tarball
52–54 ↗(On Diff #5515)

active voice is generally preferable to passive voice (this is a general remark that applies to most of the paragraphs in the spec)

152 ↗(On Diff #5515)

Seconded.

165 ↗(On Diff #5515)

Just remove this sentence. We have already said it's going to be a synthetic revision above, no need to further expand on the matter, IMO.

176 ↗(On Diff #5515)

Agreed. Just remove the above paragraph; the subsequent (factual) paragraph is all we need in a spec.

182 ↗(On Diff #5515)

I don't understand what "dates in a deposit" mean. Are these fields that are available via SWORD? if so, we should add a sentence specifying that this column should be interpreted in that context.

184 ↗(On Diff #5515)

typo: "recieved"

193 ↗(On Diff #5515)

typo "targetting"

391–392 ↗(On Diff #5515)

don't call this "hash", it's a "Software Heritage Peristent Identifier" (or SWH PID for short), and you can link that text to the PID doc instead of adding a cf. parenthesis

docs/specs/spec-technical.rst
83 ↗(On Diff #5515)

better "status changes"

More generally, this should really be a state diagram, rather than a state transition function serialized as a set of lines. No need to add a separate figure for it, but at least a ASCII art state diagram would be nice here.

docs/tests/tests_HAL.rst
2–6

Why does this belong to our repo? swh-deposit is not HAL-specific, if anything it's SWORD specific. So if there're HAL-specific specs, maybe they belong to some HAL-specific website?

This revision now requires changes to proceed.Jul 8 2019, 2:51 PM
moranegg planned changes to this revision.Jul 8 2019, 4:14 PM
moranegg added inline comments.
docs/specs/spec-loading.rst
4 ↗(On Diff #5515)

I agree. This page is a part of a collection of specifications, this is why it starts that way, but I prefer starting fresh as you suggest also with the full definition of the acronym.

20–21 ↗(On Diff #5515)

sure.

33 ↗(On Diff #5515)

I kept it from the original specs when there was no snapshots.
Maybe the best solution it to erase the line above.
and change in snaphot the deposit representation to:

snapshotreception of all occurrences (branches)

How does it sound?

37 ↗(On Diff #5515)

I agree. see comment above.

39–40 ↗(On Diff #5515)

We do not at the moment. Only master.

With this specs we are introducing the concept of release to a deposit.
Only deposits with certain metadata will have another branch for the release.

@ardumont can you comment about package manager?

44–47 ↗(On Diff #5515)

ack.

52–54 ↗(On Diff #5515)

Can you show me an example, I tried to avoid using We like "We create an origin from the url used in the metadata".
But I can change everything to active with we.

152 ↗(On Diff #5515)

ack.

165 ↗(On Diff #5515)

sure.

176 ↗(On Diff #5515)

Ok. You should know it is really complicating things and I'm pulling my hair on the Legacy Software.
Maybe that's why I've written it ;-)

182 ↗(On Diff #5515)

dates in a deposit can be found via SWORD in the header or in the xml metadata.
for example the first two dates are from the SWORD protocol and result in fields in our DB.
The two other dates are sent in the metadata, but are not mandatory.

184 ↗(On Diff #5515)

ack.

193 ↗(On Diff #5515)

ack.

391–392 ↗(On Diff #5515)

good call.

docs/specs/spec-technical.rst
43 ↗(On Diff #5515)

ack

83 ↗(On Diff #5515)

we already have one but in latex :-/
I'll see what I can do.

docs/tests/tests_HAL.rst
2–6

Today this information is only in my head.
There are no testers on HAL and if it were for them, the users are the testers.
In my opinion, it should be kept somewhere on our side, if we don't want this process to disappear..
Also, at some point we want to automate this tests on with integration tests, so the automated tests should be detailed before implementation.
SWH as an archival service recommends HAL as a way to inject code into the archive, and we need to be sure it works.
If you have a better SWH location for it, I don't mind changing it.

59

oh no. thanks!

moranegg updated this revision to Diff 5713.Jul 9 2019, 4:16 PM
moranegg marked 23 inline comments as done.
  • Update specs: status image and other modifications
moranegg marked 2 inline comments as done.Jul 9 2019, 4:17 PM
moranegg added inline comments.
docs/specs/spec-loading.rst
136 ↗(On Diff #5376)

it's very strange, I'm using Atom and not Emacs and I've openned the file in Emacs and it is misaligned.
If I correct it and go back to Atom it is broken again in Atom..

ardumont added inline comments.Jul 10 2019, 11:49 AM
docs/specs/spec-loading.rst
212 ↗(On Diff #5713)

committer

39–40 ↗(On Diff #5515)

Are the branches actually optional?

Good catch.
No, the snapshot branch is not optional.
There is at least one targetting the revision (the one targetting the tarball's uncompressed root directory).

Don't we always have at least one release here?

In the current state, we don't have release yet, only 1 revision.
That's part of the specification amending morane is trying to push here.
My understanding is that new release is optional, depending on the metadata received from the clients (hal, etc...).

@ardumont can you comment about package manager?

We plan to refactor the deposit loader according to the package manager loader indeed.
For the branch, we chose "master" at the time. But indeed, most of the other use HEAD (which might have been decided after the deposit, don't remember the details exactly).

We can change it in the spec and in the implementation.
But i'm (always) wondering about migration when we discuss such things...
Do we want to update the old swh-ids? (that might be painful communication wise)

moranegg updated this revision to Diff 5749.Jul 10 2019, 2:21 PM
moranegg marked 4 inline comments as done.
  • Fix typo
zack accepted this revision.Jul 11 2019, 2:56 PM
zack added inline comments.
docs/specs/spec-loading.rst
52–54 ↗(On Diff #5515)

I'm letting this one go for now. The style change will require quite some effort, and it's more important to land the spec than block on this.

This revision is now accepted and ready to land.Jul 11 2019, 2:56 PM
This revision was automatically updated to reflect the committed changes.