Feed Advanced Search

Advanced Search
Use Results
Edit Query
Hide Query

	Include stories about projects I am a member of.

Aug 2 2018

zack added a comment to T421: PyPI loader.

In T421#21642, @ardumont wrote:

The pypi api provides already quite the information (P288, P289 for examples).
For now, the current implementation leverages it.

Aug 2 2018, 3:45 PM · PyPI loader, Origin-Pypi

zack updated subscribers of T421: PyPI loader.

The basic loader will be the tarball loader, yes. In addition to that there are two aspects to be defined:

the stack of objects to be added to the DAG
the metadata to extract

Aug 2 2018, 2:37 PM · PyPI loader, Origin-Pypi

Jul 29 2018

zack added a comment to D398: [WIP] "packing" object storage design documentation.

First of all, thanks for this design document. I have read through it (although not verified the details of offsets, sizes, etc, mind you :-)) and it looks reasonable to me. A few questions/comments below:

Jul 29 2018, 1:52 PM · Object storage

zack added a comment to T1161: SVN loader: Create local dump of remote repository to speed up loading task.

good catch !

Jul 29 2018, 8:19 AM · SVN Loader

Jul 26 2018

zack added a comment to T1158: hg loader: Clean up wrong snapshots/releases during hg loading of googlecode.

In T1158#21531, @ardumont wrote:

@zack or @olasd if you have some time to review P286 at one point in time, that would be awesome.

Jul 26 2018, 6:18 PM · Archive content

Jul 25 2018

zack added a comment to T960: draft specs for deposit with incomplete tarball .

In T960#21455, @moranegg wrote:

Regarding implementation, no plans of implementing it are on the horizon, it is something to consider for the priority/yearly planning.
I can also open a review documentation subtask.

Jul 25 2018, 10:29 AM · SWORD deposit, Directory loader

zack added a comment to T422: PyPI lister.

In T422#21473, @ardumont wrote:
If your consumer is actually an organization or service that will be downloading a lot of packages from PyPI, consider using your own index mirror or cache.
That's not a sustainable way. If we choose that path for all the forges we need to archive... that will be difficult in terms of infrastructure and maintenance.

Jul 25 2018, 10:28 AM · Developers, Origin-Pypi

zack added a comment to T420: mirror PyPI.

better LWN link to the actual article covering this: https://lwn.net/Articles/751458/

Jul 25 2018, 10:25 AM · Origin-Pypi

zack added a comment to T420: mirror PyPI.

In T420#21471, @ardumont wrote:

Looking at the faq [4], they also (now?) recommend bandersnatch. Quoting it:

Jul 25 2018, 10:25 AM · Origin-Pypi

Jul 23 2018

zack added a comment to T960: draft specs for deposit with incomplete tarball .

I'll be AFK for a while, so I can't check the diff, but if you (@moranegg ) can point me to the current version (on docs.s.o?, if it's deployed), I'll be happy to have a look before it's implemented

Jul 23 2018, 9:01 PM · SWORD deposit, Directory loader

Jul 20 2018

zack added a comment to T336: "save code now".

In T336#21437, @ardumont wrote:

E.g., you don't "schedule" the addition of an entire forge as a single task,

Yes, there are 2 tasks for now (incremental, full) but if we also hide that detail within T1157... Then that could be a win, i think ;)

Jul 20 2018, 10:08 PM · General

zack updated subscribers of T336: "save code now".

In T336#21431, @ardumont wrote:

Does adding a supported forge (e.g gitlab instance) considered a possible save now request?

Jul 20 2018, 5:50 PM · General

zack added inline comments to D395: swh-loader-mercurial: Fix invalid release target and add missing data.

Jul 20 2018, 5:46 PM

zack added a project to T1157: Generic scheduler task creation according to task type: Scheduling utilities.

Jul 20 2018, 5:23 PM · Scheduling utilities

Jul 19 2018

zack added a comment to T1155: Mercurial loader: release target is invalid.

Thanks for spotting. We also need a separate task to correct the
revisions that were already loaded in the archive. Can you please file
it? (tag "archive content")

Jul 19 2018, 4:09 PM · Mercurial loader

zack added a comment to T1153: deposit: Keep raw metadata received.

Good idea!

Jul 19 2018, 1:25 PM · SWORD deposit (2018-09-25-HAL-opening)

zack added a comment to T1152: deposit of tarball/zip: return as main swh-id the directory id, add the synthetic revision id as ancillary information.

With that, I do think it is important to have the metadata accessible and keep in mind that with the contextual URL which is used by HAL, the metadata is easily found !

Jul 19 2018, 1:24 PM · SWORD deposit (2018-09-25-HAL-opening)

zack added a comment to T1152: deposit of tarball/zip: return as main swh-id the directory id, add the synthetic revision id as ancillary information.

nobody (not even HAL, or any other depositor, including the ones concerned by the compliance use cases) can recompute independently from us this swh-id SR, because it depends not only on the metadata added, but also on the particular mangling of this metadata done during the ingestion, that may well change over time; providing only SR as a swh-id for such a deposit makes it impossible for somebody that may have a copy of the same code and an article mentioning the swh-id SR to check that the code is the same withouth accessing SWH: that would make us a middle man and for our long term strategy we do not want middle men, not even us

Jul 19 2018, 1:14 PM · SWORD deposit (2018-09-25-HAL-opening)

zack added a comment to T1152: deposit of tarball/zip: return as main swh-id the directory id, add the synthetic revision id as ancillary information.

TL;DR: by ingesting a revision and not returning its ID, we will have a protocol that — at the protocol level — loses information, and that is a bad idea.

Jul 19 2018, 1:19 AM · SWORD deposit (2018-09-25-HAL-opening)

zack added a comment to T1152: deposit of tarball/zip: return as main swh-id the directory id, add the synthetic revision id as ancillary information.

In T1152#21324, @rdicosmo wrote:

It is essential for reproducibility that the shw-id offered to researchers
to reference a deposited piece of software depend only on the software
deposited itself: if three papers use the same software tree, they must
show the same swh-id, no matter whether this software tree has been
deposited once, twice, or three times.

In the case of .zip/.tar files this is the swh-id of the root directory,
not the shw-id of the synthetic commit.

Jul 19 2018, 1:14 AM · SWORD deposit (2018-09-25-HAL-opening)

Jul 18 2018

zack added a comment to T1152: deposit of tarball/zip: return as main swh-id the directory id, add the synthetic revision id as ancillary information.

Can you (and/or @rdicosmo ) elaborate on the rationale for this?

Jul 18 2018, 6:02 PM · SWORD deposit (2018-09-25-HAL-opening)

Jul 17 2018

zack added a comment to T1137: Deploy gitlab instance lister to infra.

great, thanks for working on this!

Jul 17 2018, 10:04 PM · Origin-GitLab

zack committed rMSLDc8c82009f870: check in OSCON 2018 slides (authored by zack).

check in OSCON 2018 slides

Jul 17 2018, 2:16 PM

Jul 12 2018

zack added a project to T1126: Move away non-gunicorn services from banco: System administration.

Jul 12 2018, 4:23 PM · System administration

zack committed rDMODad2c349864aa: refactor CLI tests to avoid duplicate assertion pairs (authored by zack).

refactor CLI tests to avoid duplicate assertion pairs

Jul 12 2018, 4:22 PM

zack closed T1135: swh-identify: follow symlink by default for paths given as args as Resolved by committing rDMOD07208f047d18: swh-identify: follow symlinks for CLI arguments (by default).

Jul 12 2018, 4:22 PM · Data Model

zack committed rDMODabffb2255753: cli.py: prefer os.fsdecode() over manual fiddling with locale.getpref... (authored by zack).

cli.py: prefer os.fsdecode() over manual fiddling with locale.getpref...

Jul 12 2018, 4:22 PM

zack committed rDMOD07208f047d18: swh-identify: follow symlinks for CLI arguments (by default) (authored by zack).

swh-identify: follow symlinks for CLI arguments (by default)

Jul 12 2018, 4:22 PM

zack triaged T1126: Move away non-gunicorn services from banco as Normal priority.

Jul 12 2018, 3:42 PM · System administration

zack committed rDMOD89f8d114b4f9: swh-identify: add support for passing multiple CLI arguments (authored by zack).

swh-identify: add support for passing multiple CLI arguments

Jul 12 2018, 3:32 PM

zack closed T1134: swh-identify: support multiple path arguments as Resolved by committing rDMOD89f8d114b4f9: swh-identify: add support for passing multiple CLI arguments.

Jul 12 2018, 3:32 PM · Data Model

zack closed T1133: swh-identify: show filename in output as Resolved by committing rDMODf53989093669: swh-identify: show filename in output (by default).

Jul 12 2018, 3:01 PM · Data Model

zack committed rDMODf53989093669: swh-identify: show filename in output (by default) (authored by zack).

swh-identify: show filename in output (by default)

Jul 12 2018, 3:01 PM

zack closed T1133: swh-identify: show filename in output, a subtask of T1136: swh-identify: support recursive checksumming of directories, as Resolved.

Jul 12 2018, 3:01 PM · Data Model

zack added a parent task for T1133: swh-identify: show filename in output: T1136: swh-identify: support recursive checksumming of directories.

Jul 12 2018, 2:19 PM · Data Model

zack added a subtask for T1136: swh-identify: support recursive checksumming of directories: T1133: swh-identify: show filename in output.

Jul 12 2018, 2:19 PM · Data Model

zack triaged T1136: swh-identify: support recursive checksumming of directories as Normal priority.

Jul 12 2018, 2:19 PM · Data Model

zack triaged T1135: swh-identify: follow symlink by default for paths given as args as Normal priority.

Jul 12 2018, 2:16 PM · Data Model

zack created T1135: swh-identify: follow symlink by default for paths given as args.

Jul 12 2018, 2:16 PM · Data Model

zack updated the task description for T1133: swh-identify: show filename in output.

Jul 12 2018, 2:04 PM · Data Model

zack triaged T1134: swh-identify: support multiple path arguments as Normal priority.

Jul 12 2018, 2:02 PM · Data Model

zack triaged T1133: swh-identify: show filename in output as Normal priority.

Jul 12 2018, 2:00 PM · Data Model

Jun 28 2018

zack committed rDSNIP3d22648a68a9: sql/swh-graph: add driver script to re-launch (authored by zack).

sql/swh-graph: add driver script to re-launch

Jun 28 2018, 4:59 PM

zack committed rDSNIP08a41c8df48d: sql/swh-graph: update script to take snapshots in accounts (authored by zack).

sql/swh-graph: update script to take snapshots in accounts

Jun 28 2018, 4:59 PM

zack updated subscribers of T1123: refuse deposit submissions that contains a single archive file (within the deposit archive).

Jun 28 2018, 10:33 AM · SWORD deposit

zack triaged T1123: refuse deposit submissions that contains a single archive file (within the deposit archive) as Normal priority.

Jun 28 2018, 10:33 AM · SWORD deposit

zack renamed T1122: properly handle ingestion of archives within archives (recursive extraction) from Decide how to handle software deposits containing double archive wrapping to properly handle ingestion of archives within archives (recursive extraction).

Jun 28 2018, 10:31 AM · General

zack triaged T1122: properly handle ingestion of archives within archives (recursive extraction) as Normal priority.

The general problem (see below for the deposit-specific case) is indeed complex to deal with (both conceptually in a pure Merkle setting and practically due to the existence of zip bombs). I think a workable solution might be ingest the archive as is and also ingest a separate directory corresponding to the archive content, with some metadata linking the two. That way by default we will only return what we have ingested (without recursion), but we will offer ways to dig-in recursively, e.g., in the web app. There will be plenty of devils in plenty of details for this though.

Jun 28 2018, 10:31 AM · General

Jun 27 2018

zack triaged T1121: save code now API entry point as High priority.

Jun 27 2018, 10:55 AM · Web app

zack closed T940: Cannot ssh to the Unibo test VM after reboot as Resolved.

Jun 27 2018, 8:27 AM · System administration

zack triaged T1021: SWORD deposit of metadata about an existing SWH object as Normal priority.

Jun 27 2018, 8:26 AM · Core Loader, SWORD deposit

zack triaged T1100: Caches should be cleared when deploying the webapp as Normal priority.

Jun 27 2018, 8:25 AM · Web app, System administration

zack changed the visibility for F3171354: adblock-whitelist.png.

Jun 27 2018, 8:24 AM

zack added a comment to T1120: save code now moderation UI.

just as an idea for the UI for whitelist/blacklist URL patterns, here is what adblock does, which is quite nice:

Jun 27 2018, 8:24 AM · Web app

zack renamed T1119: save code now submission form from save origin now web form to save code now submission form.

Jun 27 2018, 8:22 AM · Web app

zack renamed T1120: save code now moderation UI from save origin now moderation UI to save code now moderation UI.

Jun 27 2018, 8:22 AM · Web app

zack updated the task description for T1119: save code now submission form.

Jun 27 2018, 8:21 AM · Web app

zack edited projects for T1119: save code now submission form, added: Web app; removed General.

Jun 27 2018, 8:20 AM · Web app

zack triaged T1120: save code now moderation UI as High priority.

Jun 27 2018, 8:20 AM · Web app

zack triaged T1119: save code now submission form as High priority.

Jun 27 2018, 8:15 AM · Web app

zack edited projects for T336: "save code now", added: General; removed Web app.

I've generalized the title of this task, will add sub-tasks for the specific features that are still missing to complete this.

Jun 27 2018, 8:11 AM · General

zack renamed T336: "save code now" from "save origin now" form to "save code now".

Jun 27 2018, 8:11 AM · General

zack triaged T1022: SWORD deposit requesting to save content existing on an external code hosting platform as Normal priority.

Jun 27 2018, 8:09 AM · Core Loader, SWORD deposit

Jun 26 2018

zack added a comment to T1118: browse: add identifiers resolution in search form.

I agree we need a more user friendly way of resolving IDs. (And, in passing, I think we also need an API method /resolve for programmatically resolving PIDs.)
But rather than adding a separate search form, I think we should generalize the current one, to be a Google-style, catch-all search box.

Jun 26 2018, 11:54 AM · Web app

zack added a comment to T1115: Improve error messages when resolving PURLs containing a broken/incorrect origin.

[ Aside on the actual bug here. @rdicosmo can you change the edit policy of this task to "public"? It's the default and it's generally the right one as it allows to do stuff like change task tags and the like. ]

Jun 26 2018, 9:38 AM · Web app

Jun 25 2018

zack closed T1114: question: where is the API documentation repository? as Invalid.

it's here https://forge.softwareheritage.org/source/swh-web/

Jun 25 2018, 4:05 PM

Jun 21 2018

zack added inline comments to D346: identifiers: Make invalid persistent identifier parsing raise error.

Jun 21 2018, 12:08 PM

zack added inline comments to D312: Fix scheduler listener on buster's celery version (4.1.0-4).

Jun 21 2018, 11:33 AM

zack accepted D347: Update blake2 support to be less Debian-specific.

Jun 21 2018, 11:31 AM

zack requested changes to D346: identifiers: Make invalid persistent identifier parsing raise error.

Jun 21 2018, 11:29 AM

Jun 20 2018

zack added a comment to T1104: parse_persistent_identifier() should raise a parsing exception on invalid identifiers.

In T1104#20616, @ardumont wrote:

I recall some remarks about the persistent identifier representation being too simple or something.

I don't know what's wrong with that simple representation as:

everyone can manipulate dict

Jun 20 2018, 12:22 PM · Data Model