Making explicit a direct answer to one of @lewo's question (hinted at by both @olasd and @rdicosmo): no, we do not want a new type of SWHID (swh:1:tar:...) for source code containers, which from our point of view are ephemeral.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jun 14 2020
Yeah, for having played with it quite a bit in recent times, the current state of timestamp offsets isn't great. I'm fine with the idea of switching them to bytestrings as proposed.
Jun 13 2020
Jun 12 2020
(ping @zack who has done some actual analysis on the timezone-related data in the archive)
Jun 11 2020
Jun 10 2020
Jun 9 2020
Jun 8 2020
Jun 6 2020
An important issue indeed :-)
Jun 5 2020
Thanks for submitting this!
Jun 3 2020
Jun 1 2020
May 29 2020
May 28 2020
May 27 2020
Re-reading (and realizing again) the argument on metadata being phased out. So,
we can drop those unused [1] metadata field from origin-visit/origin-visit-status then.
I think we should do away with the metadata attributes completely.
I'm not sure why we added them to OriginVisitStatus in the end, instead of
removing them, after we checked that they were always empty in production
data.
That's been running since 15:30 or so.
In T2421#44878, @douardda wrote:@olasd any reason not to do the step 1. ? Wether or not we "fix" the remaining hashability issue by overwriting the __hash__ method, I see no harm and it makes the model (IMHO) cleaner.
In T2421#44857, @douardda wrote:Conceptually, both Snapshot.branches and Directory.entries are mappings, not arrays of objects: snapshots and directories must not contain entries with repeated names (which is something that's currently enforced for snapshots, but not for directories).
What about order preservation ?
@olasd any reason not to do the step 1. ? Wether or not we "fix" the remaining hashability issue by overwriting the __hash__ method, I see no harm and it makes the model (IMHO) cleaner.
Conceptually, both Snapshot.branches and Directory.entries are mappings, not arrays of objects: snapshots and directories must not contain entries with repeated names (which is something that's currently enforced for snapshots, but not for directories).
May 26 2020
In the git "specification" (in the git code really), extra headers are a sequence of arbitrary (key: bytes, value: bytes) tuples, that are serialized in the commit object between the common headers and the commit message.
I think there's a conflation of two issues:
May 25 2020
- migrate remaining data (data that did not get migrated during the first migration, while loaders continued their work)
May 15 2020
May 14 2020
Build is green
Rebase on latest master
Build is green
Rebase on latest master
May 13 2020
Build is green
Adapt according to review
Looks good to me.
May 6 2020
- Deploy storage (and migrate data, this now can occur while loaders are running)
May 1 2020
After further discussion, for testing purposes, we agreed on dumping and
restoring origins and origin-visits from production to staging. And try to
check what the behavior is (2).
Apr 30 2020
In the mean time, I've started to work on 1. as i'm a bit uneasy about letting
master in a state we cannot really deploy yet. As there is some work nearly
ready to be deployed (new metadata api endpoints) and i don't want to prevent
its deployment.
Alright, closing this then.
I hear your concerns, but the discussion already took place and we have now already 2 published articles out there using visit, plus documentation shared that uses this new terminology.
So, no, we cannot change this now.
Plus numerous visits of a same origin can point to the same snapshot so the confusion here.
I agree that turning visit key into a snapshot one is less confusing.
We considered snapshot vs visit when choosing the qualifier name, and we settled with visit, as it conveys the idea that a repository snapshot is taken when a visit is performed (not necessarily through the same origin).
Status on this.
Apr 29 2020
Apr 28 2020
I've prepared a draft IANA registration template. We have some work in flux on the SWHID spec, so I'll submit the registration request once it's done.
We are going with 1.
Apr 27 2020
Apr 26 2020
Upon (admittedly quick) review, I don't think that anything more than D3068 is needed to address this.
Double-checking/feedback welcome!
Apr 24 2020
I wrote that little script to check the number of origin IRIs and URIs in the archive