In T3656#72364, @grouss wrote:according to the list of nodes provided by seirl there were ~21,000,000 revisions without ancestors according to swh-graph snapshot (2020-12-15)
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Feed Advanced Search
Advanced Search
Advanced Search
Oct 15 2021
Oct 15 2021
Oct 14 2021
Oct 14 2021
zack committed rDDOC5efb7bfae0db: archive changelog: fix markup errors in recent links (authored by zack).
archive changelog: fix markup errors in recent links
zack updated the task description for T3639: prepare quote for "granet2", next gen swh-graph compression server.
zack committed rMSLD26cb7cb8692f: check in slides for talk at Telecom Paris, DIG team (authored by zack).
check in slides for talk at Telecom Paris, DIG team
zack retitled D6470: Make it explicit that the "main" docs page is actually devel doc from Explicit the main docs page is actually the devel instance to Make it explicit that the "main" docs page is actually devel doc.
Oct 13 2021
Oct 13 2021
zack renamed T3650: documentation: rename docs.s.o/users/ (plural) to docs.s.o/user/ (singular) from documentation: rename docs.s.o/users/ (plural) to docs.s.o/user/ (singula) to documentation: rename docs.s.o/users/ (plural) to docs.s.o/user/ (singular).
zack triaged T3650: documentation: rename docs.s.o/users/ (plural) to docs.s.o/user/ (singular) as High priority.
+1 on dropping the / -> /devel/ redirect and have at / a landing page allowing to choose between the 3 bodies of documentation.
Oct 12 2021
Oct 12 2021
Oct 11 2021
Oct 11 2021
Aside from the specific needs of the mirroring stack, the question at hand is whether the read-only object storage should be by default open to the public or not.
Oct 9 2021
Oct 9 2021
zack committed rDGRPHc29ee2787e27: findEarliestRevision: avoid failing in case of unknown SWHIDs (authored by zack).
findEarliestRevision: avoid failing in case of unknown SWHIDs
Oct 8 2021
Oct 8 2021
zack triaged T3639: prepare quote for "granet2", next gen swh-graph compression server as High priority.
Oct 7 2021
Oct 7 2021
This should stay pending until we resolve the archiving policy discussion in T3627, so I'm marking it as such.
zack added a comment to T3627: Consider dropping pull request references from the git loader ingestion.
Thanks for your feedback @olasd. I see three main arguments raised there: (1) the raciness of archiving those data via other means (= related forks), (2) the completeness of our canvassing of synthetic refs, (3) annotating rather than not archiving "synthetic" refs.
Oct 6 2021
Oct 6 2021
Oct 4 2021
Oct 4 2021
zack added a comment to T3627: Consider dropping pull request references from the git loader ingestion.
In T3627#71640, @ardumont wrote:No, the snippet mentioned filters out refs whose name starts with refs/pulls and finishes with /merge
(i realize i made a typo in the main description..., it's now fixed)
zack added a comment to T3627: Consider dropping pull request references from the git loader ingestion.
According to the snippet referenced by @ardumont, all branch names starting with refs/pull/ should be filtered out.
But in the recent snapshot of torvalds/linux there are a lot of branch names like that.
How come?
Oct 2 2021
Oct 2 2021
zack raised the priority of T3623: Run swh-graph with gunicorn to support multiple/parallel requests from Normal to High.
zack raised the priority of T3624: Update swh-graph from 0.3.0 to 0.5.0 on granet from Normal to High.
zack renamed T3623: Run swh-graph with gunicorn to support multiple/parallel requests from Run swh-graph with gunicorn to Run swh-graph with gunicorn to support multiple/parallel requests.
Oct 1 2021
Oct 1 2021
Sep 30 2021
Sep 30 2021
Sep 27 2021
Sep 27 2021
Sep 24 2021
Sep 24 2021
Sep 23 2021
Sep 23 2021
Sep 22 2021
Sep 22 2021
Approved, but please fix the minutia I'm mentioning in the above comment before landing.
In T1805#70880, @douardda wrote:
- pagination comes from proper usage of links https://swagger.io/docs/specification/links/
- batches comes from proper usage of parameter serialization https://swagger.io/docs/specification/serialization/
it's true these do not come "for free" but I still have the impression there is an "Open API way" of handling these and we should stick to them.
Sep 18 2021
Sep 18 2021
Sep 1 2021
Sep 1 2021
In T3544#69746, @olasd wrote:I can see a few alternatives to using git:// over tcp:
- Give our swh bot accounts SSH keys, and use that to clone from GitHub over ssh.
Aug 31 2021
Aug 31 2021
Here's an opinionated and prioritized list.
quick comment on the "Miscellaneous" category:
- it's not a great name, and it really feels they are "less important" than the others even if we say explicitly they aren't (or maybe because we say so :-))
- and shouldn't the two items in there (nix, guix) go under "regular crawling" anyway? (that would trivially solve the previous point)
Aug 24 2021
Aug 24 2021
LGTM, thanks! But please note that remaining "-a/--add" reference in the new docstring, which should be changed to "-e/..." for consistency. Please fix that before landing this change.
Aug 23 2021
Aug 23 2021
zack requested changes to D6114: swh-scanner: retrieve additional information about software artifacts.
Nice! And I also like the refactoring out of client.py.
Aug 19 2021
Aug 19 2021
Aug 17 2021
Aug 17 2021
Thanks for this, and for the screenshots, they look gorgeous!
Aug 10 2021
Aug 10 2021
zack added a reviewer for D6073: bytes_to_str: Format strings directly, instead of constructing ExtendedSWHID: seirl.
zack updated the task description for T3475: leverage Shodan scans to find and ingest the "penumbra" of FOSS.
zack updated the task description for T3475: leverage Shodan scans to find and ingest the "penumbra" of FOSS.
zack triaged T3475: leverage Shodan scans to find and ingest the "penumbra" of FOSS as Low priority.
zack raised the priority of T3457: Some git repositories are failing to be ingested because of MemoryError from Normal to High.
Aug 9 2021
Aug 9 2021
I'm approving this, but please fix the remaining occurrence of 1000 before landing, as per comment above.
Jul 29 2021
Jul 29 2021
Jul 28 2021
Jul 28 2021
I'm requesting some minor changes (+ some other changes to be submitted in a separate diff which I've noticed only now, sorry!).
Jul 23 2021
Jul 23 2021
zack committed rDDOCf9451bd1038a: changelog: fix sphinx markup error in sourceforge entry (authored by zack).
changelog: fix sphinx markup error in sourceforge entry
Jul 22 2021
Jul 22 2021
zack added a comment to D5952: changelog: Reference first completion of sourceforge git/svn origins.
awesome \o/
(diff accepted)
I'm accepting this diff, but note that I've added a few suggestions for improved language above. Please integrate them before this is final.
Jul 21 2021
Jul 21 2021
Wonderful, thanks for adding the order tests! LGTM.
Jul 19 2021
Jul 19 2021
thanks @KShivendu, this is a great start!
Jul 17 2021
Jul 17 2021
@zack
It is okay with you if I add it in the next diff? This one has become extremely long because of lots of build failures.
Jul 16 2021
Jul 16 2021
In D5990#154613, @zack wrote:Can we have some documentation of the query language, included in this diff?
E.g., a file under docs/ which will then be rendered on docs.s.o as user documentation for how to use the query language.
Can we have some documentation of the query language, included in this diff?
E.g., a file under docs/ which will then be rendered on docs.s.o as user documentation for how to use the query language.
Thanks for this update, great work!
Looks great! I've noted down only some nits.
Jul 15 2021
Jul 15 2021
zack renamed T3431: Implement a MongoDB backend for SWH-provenance from Implement a MonoDB backend for SWH-provenance to Implement a MongoDB backend for SWH-provenance .
Jul 8 2021
Jul 8 2021
zack changed the status of T2730: scanner: should output the root SWHID as well from Open to Work in Progress.
zack changed the status of T2692: Move the output related functions to another (sub)module from Open to Work in Progress.
zack moved T3318: scanner should use the known() method of web.client from In progress to Backlog on the Code scanner board.
zack added a parent task for T2635: web client: add async API: T3318: scanner should use the known() method of web.client.
zack added a subtask for T3318: scanner should use the known() method of web.client: T2635: web client: add async API.
zack accepted D5926: swh.scanner: use model.from_disk instead of scanner.model to store a source code project.
Please note down on the sides two remaining TODOs about all this:
- adding a test case with a deduplicated source tree, to make sure nodes that are deduplicated at the Merkle DAG level are present multiple times in the output
- adding a test case for a path that is not decodable in utf-8, to make sure it can be handled propertly
Jul 7 2021
Jul 7 2021
zack added inline comments to D5926: swh.scanner: use model.from_disk instead of scanner.model to store a source code project.
Jul 6 2021
Jul 6 2021
zack requested changes to D5926: swh.scanner: use model.from_disk instead of scanner.model to store a source code project.
Thanks, both the general structure and implementation look OK.
I'm requesting changes to address two main issues:
Jul 5 2021
Jul 5 2021
zack changed the status of T3420: scanner: make the various query algorithms user-selectable from Open to Work in Progress.
zack changed the status of T3318: scanner should use the known() method of web.client from Open to Work in Progress.
Jul 2 2021
Jul 2 2021
Jul 1 2021
Jul 1 2021
zack added inline comments to D5951: model: make deduplication optional when iterating over the merkle tree.
zack requested changes to D5951: model: make deduplication optional when iterating over the merkle tree.
only minor changes requested to the docstring on my part
zack added a comment to D5952: changelog: Reference first completion of sourceforge git/svn origins.
In D5952#152586, @ardumont wrote:I don't really have an ETA yet [1]. We are roughly 67% done for git and 84.6% for svn
[2]. For mercurial, it's not started as other blocking points are being worked on.
Bazaar and cvs origins are listed but we don't have any loader on that front yet.
zack requested changes to D5952: changelog: Reference first completion of sourceforge git/svn origins.
zack added a comment to D5952: changelog: Reference first completion of sourceforge git/svn origins.
Thanks a lot for this!
zack added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.
(3) should be ideally implemented in a way that guarantees that extid that were resolvable in previous versions of the mapping will always be resolvable in future versions
I don't understand. Option 3 is to remove relations between extids and SWHID, so it won't be resolvable anymore.
Jun 30 2021
Jun 30 2021
zack added a comment to T3418: Decide a consistent policy on having multiple archived objects for the same extid.
I've the feeling that option (1) will lead in the long run to an explosion on the size of the mapping which will make us eventually converge (slowly) toward option (3).