Thanks a lot for this!
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jul 1 2021
(3) should be ideally implemented in a way that guarantees that extid that were resolvable in previous versions of the mapping will always be resolvable in future versions
I don't understand. Option 3 is to remove relations between extids and SWHID, so it won't be resolvable anymore.
Jun 30 2021
I've the feeling that option (1) will lead in the long run to an explosion on the size of the mapping which will make us eventually converge (slowly) toward option (3).
Jun 21 2021
Jun 19 2021
Jun 18 2021
In D5899#150882, @DanSeraf wrote:
LGTM in general, but needs unit tests (in addition to the two nitpicks above about docstrings)
Jun 17 2021
Jun 16 2021
I think this also needs bumping the versioned dependency on swh-model (and a release of that).
Jun 15 2021
Thanks @ardumont for following up to this task.
landed in d58bcb59a0999ae124de23db88fc9f73603d452a
Jun 11 2021
Note: when this is (reasonably) done, we should document the addition of SourceForge to the archive coverage page at archive.s.o and also to the archive changelog.
Jun 10 2021
Superseded by D5825. Abandoning this one.
Jun 9 2021
In D5825#148571, @vlorentz wrote:wouldn't it make sense to have a separate command (eg. recursive-identify instead of identify --recursive)?
Jun 8 2021
In D5825#148548, @DanSeraf wrote:But what was the process before? Did it ignore directory entries?
It checks only the given directories generating a from_disk.Directory object for each directory. Should it uses the same logic used for the recursive option?
In D5825#148516, @DanSeraf wrote:
- If relevant, could you implement --verify too?
Sure, if it is useful i could open another diff for it
I'm adding here a note about something to consider in terms of pros/cons: accessibility. As for the most part we are archiving textual information, we really want it to be accessible for all users. Right now we go further than that, ensuring that the Web UI can be browser with a textual browser: so, for instance, w3m https://archive.softwareheritage.org/swh:1:cnt:c839dea9e8e6f0528b468214348fee8669b305b2 just works out of the box. I'm not up to date on what's the accessibility impact of current JS frameworks, nor that we should have as a requirement that the archive is browsable without JavaScript enabled (as per today's standards "browsable with free javascript" is probably good enough for us, and we have a curl-able API anyway), but accessibility per se is definitely going to be a requirement.
Jun 7 2021
how about just collecting all raw timings in an output CSV file (or several files if needed) and compute the stats downstream (e.g., with pandas)?
that would allow changing the percentiles later on as well as compute different stats, without having to rerun the benchmarks
Jun 4 2021
Minor request, which can also be implemented in a subsequent commit, can we have the mapping documented somewhere? As a first approximation even a docstring would do, so that it will show up at docs.s.o. (Not sure if the files being modified here are the most relevant place for it though, it can also go at the root of the mercurial loader Python hierarchy, up to you !)
Jun 3 2021
That explains it, and it's good enough for me, thanks :)
My remaining question then is: how about, instead of branch-{tip,heads,closed-heads}/name we use branches/{heads,closed,tip}/name ?
In T3352#65753, @marmoute wrote:They are all branch heads (git "branch" are about heads too, bookmarks too), so a heads/ prefix does not bring much.
My point here is for user looking at the structure to easily distinguish between the different mapping format. Something based on the "visit data" and associated documentation seems quite fragile.
Jun 2 2021
(i'm marking this as on hold until we have reached a decision on T3352, just to avoid this gets deployed by mistake. But feel free to go ahead with the rest of the review or even override, if you think there is a better safeguard to avoid this gets deployed)
May 31 2021
Understood. To explain my thinking here, the refs/... structure is something we picked to represent git branch names as faithfully as possible, adding as little as possible on top of it. In trying to represent branch names from another VCS, as a first approximation I'd rather reuse the same *approach* than a *result* that is similar, if that makes sense. So, to pivot the question around, what is the minimal (also in the sense that it is shorter / has less cruft) naming scheme that would allow us to represent without ambiguity all the Mercurial naming aspects that you want to capture?
Is the ability to recognize that a snapshot comes from Mercurial an actual goal here? I don't think we care about "clashes" between snapshot created from different VCS, but maybe I'm missing something.
May 28 2021
May 24 2021
Thanks for raising this. I wanted to do so too, but couldn't find the time :)
May 21 2021
@anlambert @vsellier: question about this, in order to document the status quo.
Currently, where are the django web app logs stored and for how long are they kept?
May 19 2021
In T3202#65224, @anlambert wrote:What I can do is enabling the guided tour by configuration. This way we can deactivate it in production
until we got something stable and usable while we can test the feature on staging.
May 18 2021
May 10 2021
In T3317#64886, @vlorentz wrote:@zack WebAPIClient.known takes a list of strings, not a string
May 8 2021
Closing this as it was a vague meta-task from 2020 roadmap (but we'll keep the actual sub-tasks, which were more clearly identified and are still relevant).
May 7 2021
May 6 2021
In T3311#64737, @vlorentz wrote:I think the only issue with (3) is not being retroactive
This is a good idea, thanks for raising it.
Why 6 hours and not, say, 1 week or even 1 month?
It is very common these days to remain connected for that long, and the UX in having to relogin often is a lot worse.
May 3 2021
In D5420#143690, @vlorentz wrote:@KShivendu zack's comment explains how the code should work, and gives a pointer to an existing implementation of the technique (swh-scanner); this should be enough to start.
But don't feel obligated to continue working on this; as mentioned the task is harder than we expected.
Apr 30 2021
nice hack/trade-off !
great wording, thanks !
Apr 29 2021
the fact that algo_min is treated differently than other cases is horrible :-P, but it's not new in this diff, so ok :)
also, in the requirements you should probably put what's the minimum version of swh.core that you need, but that too is not a big deal
Ah, this is an interesting practical problem.
I'm not a fan of changing the spec of SWHID version 1 to make them case insensitive, as it seems to be a significant change (in particular for the code that checks for the syntactic correctness of IDs).
But we can totally add a "SHOULD" section to the resolvers part of the spec recommending (but not mandating) that resolvers treat core SWHIDs as case insensitive. (Of course all the contextual parts cannot be considered case insensitive.)
Apr 28 2021
Apr 27 2021
docs !
Apr 26 2021
In T3087#63887, @rdicosmo wrote:In T3087#63791, @douardda wrote:So what about exports of the archive available on git-annex?