Page MenuHomeSoftware Heritage
Feed Advanced Search

Jan 8 2021

anlambert closed T2900: Public graph/ API does not handle streaming results from endpoints as Resolved by committing rDWAPPSe605c3fa701a: api/graph: Stream responses as in the proxied graph service.
Jan 8 2021, 3:07 PM · System administration, Compressed graph service, Web app

Jan 7 2021

zack added a revision to T2595: Add a default configuration based on graph size (eg: batch_size): D4820: config: sane default for batch_size using a heuristic on ram size.
Jan 7 2021, 10:34 PM · Compressed graph service
anlambert added a revision to T2900: Public graph/ API does not handle streaming results from endpoints: D4824: api/graph: Stream responses as in the proxied graph service.
Jan 7 2021, 6:58 PM · System administration, Compressed graph service, Web app

Dec 17 2020

zack added a project to T2900: Public graph/ API does not handle streaming results from endpoints: System administration.
Dec 17 2020, 4:15 PM · System administration, Compressed graph service, Web app
zack added a project to T2900: Public graph/ API does not handle streaming results from endpoints: Compressed graph service.
Dec 17 2020, 4:15 PM · System administration, Compressed graph service, Web app

Nov 23 2020

zack triaged T2807: document swh.graph.graph module as Low priority.
Nov 23 2020, 1:23 PM · Documentation, Compressed graph service

Nov 12 2020

anlambert closed T2768: unbreak swh-graph CI as Resolved by committing rDGRPH262432b1295d: server/app: Fix aiohttp >= 3.7 exception related errors.
Nov 12 2020, 3:29 PM · Continuous Integration, Compressed graph service
anlambert added a revision to T2768: unbreak swh-graph CI: D4466: server/app: Fix aiohttp >= 3.7 exception related errors.
Nov 12 2020, 3:18 PM · Continuous Integration, Compressed graph service

Nov 10 2020

zack triaged T2768: unbreak swh-graph CI as High priority.
Nov 10 2020, 2:23 PM · Continuous Integration, Compressed graph service

Oct 13 2020

anlambert closed T2642: swh-graph: fix CI as Resolved.

Closing this as the new release of aiohttp (3.6.3) mitigates the issue mentioned above and the CI build of swh-graph is now fixed.

Oct 13 2020, 11:18 AM · Compressed graph service

Oct 1 2020

zack added a subtask for T1926: FUSE filesystem to navigate the archive: T2654: modprobe fuse on the CI build machine.
Oct 1 2020, 2:12 PM · Software Heritage filesystem

Sep 30 2020

anlambert closed T2589: expose swh-graph API at archive.s.o/api/1/graph/ as Resolved by committing rDWAPPS2d69cbc46a16: api: Add /graph endpoint proxying Software Heritage Graph service.
Sep 30 2020, 11:05 AM · System administration, Web app, Compressed graph service

Sep 29 2020

anlambert added a comment to T2642: swh-graph: fix CI.

I managed to track the dependency bump that broke the CI build: it is the upgrade of yarl (dependency of aiohttp) from 1.5.1 to 1.6.0. The issue is surely due to that commit.

Sep 29 2020, 4:15 PM · Compressed graph service
zack triaged T2647: add LLP support to graph compression pipeline as Normal priority.
Sep 29 2020, 2:48 PM · Compressed graph service
anlambert added a revision to T2589: expose swh-graph API at archive.s.o/api/1/graph/: D4077: api: Add /graph endpoint proxying Software Heritage Graph service.
Sep 29 2020, 2:10 PM · System administration, Web app, Compressed graph service
zack added a revision to T1926: FUSE filesystem to navigate the archive: D4064: Early FUSE implementation, with support for blob and directory objects.
Sep 29 2020, 9:38 AM · Software Heritage filesystem
zack changed the status of T1926: FUSE filesystem to navigate the archive from Open to Work in Progress.
Sep 29 2020, 9:38 AM · Software Heritage filesystem

Sep 26 2020

zack updated the task description for T2642: swh-graph: fix CI.
Sep 26 2020, 12:10 PM · Compressed graph service
zack added a project to T2642: swh-graph: fix CI: Compressed graph service.
Sep 26 2020, 12:09 PM · Compressed graph service

Sep 25 2020

zack added a revision to T1926: FUSE filesystem to navigate the archive: D4042: docs: add design notes.
Sep 25 2020, 4:03 PM · Software Heritage filesystem

Sep 22 2020

anlambert changed the status of T2589: expose swh-graph API at archive.s.o/api/1/graph/ from Open to Work in Progress.
Sep 22 2020, 3:39 PM · System administration, Web app, Compressed graph service
anlambert added a revision to T2113: swh-graph: add support to optionally resolve ori PIDs to origin URLs: D4009: common/service: Add lookup_origins_by_sha1s function.
Sep 22 2020, 3:39 PM · Compressed graph service
anlambert added a revision to T2589: expose swh-graph API at archive.s.o/api/1/graph/: D4009: common/service: Add lookup_origins_by_sha1s function.
Sep 22 2020, 3:39 PM · System administration, Web app, Compressed graph service
moranegg moved T2431: Document how to export the graph edge dataset from Backlog to archive-users (docs/user-guides/) on the Documentation board.
Sep 22 2020, 2:37 PM · Documentation, Compressed graph service, Datasets

Sep 21 2020

vlorentz closed T2053: support graph export for the cassandra backend as Resolved.
Sep 21 2020, 3:39 PM · Compressed graph service, Storage manager

Sep 19 2020

zack closed T2114: swh-graph API: add ?limit=N method variants to return first N results as Resolved.

this has been fixed a while ago by D2669

Sep 19 2020, 8:08 PM · Easy hack, Compressed graph service

Sep 18 2020

zack added a comment to T2589: expose swh-graph API at archive.s.o/api/1/graph/.

You are right, they are not stored in database but there is a storage.origin_get_by_sha1 method.

Sep 18 2020, 12:54 PM · System administration, Web app, Compressed graph service
anlambert added a comment to T2589: expose swh-graph API at archive.s.o/api/1/graph/.

However, unless I'm missing something, I think right now origin sha1s are not stored at all in swh-storage, or are they?
If they indeed aren't, a required sub-task of this one is adding sha1s to the origin table, together with an index to do the reverse sha1 -> url, and a matching swh-storage API method.

Sep 18 2020, 12:15 PM · System administration, Web app, Compressed graph service

Sep 17 2020

zack added a comment to T2589: expose swh-graph API at archive.s.o/api/1/graph/.
  • We can process swh-graph responses to enrich the data (notably get origin urls from their sha1 or turn swhids into dicts) and returns them in JSON format
Sep 17 2020, 10:25 PM · System administration, Web app, Compressed graph service
ardumont renamed T2579: swh-graph: display server and dataset versions in the live server instance from swh-graph: display sever and dataset versions in the live sever instance to swh-graph: display server and dataset versions in the live server instance.
Sep 17 2020, 12:29 PM · Compressed graph service
zack changed the status of T1847: fully automate export of the graph dataset from Open to Work in Progress.
Sep 17 2020, 9:04 AM · Compressed graph service, Datasets

Sep 16 2020

seirl added a comment to T1847: fully automate export of the graph dataset.

No, only the edge part is done, we still need a parquet and a CSV exporter :/

Sep 16 2020, 10:59 PM · Compressed graph service, Datasets
seirl closed T1868: refresh compressed representation of the archive as Resolved.

It is already running on granet :-)

Sep 16 2020, 9:47 PM · Compressed graph service
zack removed a subtask for T1868: refresh compressed representation of the archive: T1848: refresh graph dataset export.
Sep 16 2020, 8:43 PM · Compressed graph service
zack added a comment to T1847: fully automate export of the graph dataset.

I think this is (reasonably) done now, please check and close it.

Sep 16 2020, 8:43 PM · Compressed graph service, Datasets
zack assigned T1868: refresh compressed representation of the archive to seirl.

We have now a newer version of the compressed graph (2020-05-20), but it's not yet running on granet (I *think*, and, lacking T2579, I haven't checked).
Please make granet run that version of this task and close this task. (Or just close this task if it's already done.)

Sep 16 2020, 8:41 PM · Compressed graph service
douardda added a comment to T2589: expose swh-graph API at archive.s.o/api/1/graph/.

FTR, in a previous life, I've set up a json web token auth validation in varnish.

Sep 16 2020, 2:58 PM · System administration, Web app, Compressed graph service

Sep 15 2020

haltode added a comment to T2595: Add a default configuration based on graph size (eg: batch_size).

Must update the quickstart documentation guide once implemented.

Sep 15 2020, 11:32 AM · Compressed graph service
haltode triaged T2595: Add a default configuration based on graph size (eg: batch_size) as Low priority.
Sep 15 2020, 11:27 AM · Compressed graph service

Sep 14 2020

anlambert added a comment to T2589: expose swh-graph API at archive.s.o/api/1/graph/.

I agree with @olasd to do the reverse proxy at the webapp level. The main advantages are:

  • We can use the same Wep API authentication backend to manage authentication and user permissions. API authentication is based on the use of an OIDC offline refresh token and access token renewal is handled in the Django DRF authentication backend. While it should be possible to implement that process at reverse proxy level, users filtering should not be as easy as using fine-grained permissions from Django User API.
  • We can process swh-graph responses to enrich the data (notably get origin urls from their sha1 or turn swhids into dicts) and returns them in JSON format
Sep 14 2020, 4:12 PM · System administration, Web app, Compressed graph service
olasd added a comment to T2589: expose swh-graph API at archive.s.o/api/1/graph/.

So, my first instinct for this was to implement the "mount" at the reverse proxy level (before even hitting swh-web), but:

Sep 14 2020, 2:43 PM · System administration, Web app, Compressed graph service
zack renamed T2589: expose swh-graph API at archive.s.o/api/1/graph/ from expose the compressed graph API at archive.s.o/api/1/graph/ to expose swh-graph API at archive.s.o/api/1/graph/.
Sep 14 2020, 2:37 PM · System administration, Web app, Compressed graph service
zack triaged T2589: expose swh-graph API at archive.s.o/api/1/graph/ as Normal priority.
Sep 14 2020, 2:37 PM · System administration, Web app, Compressed graph service
zack assigned T1926: FUSE filesystem to navigate the archive to haltode.
Sep 14 2020, 9:59 AM · Software Heritage filesystem

Sep 9 2020

zack triaged T2579: swh-graph: display server and dataset versions in the live server instance as Normal priority.
Sep 9 2020, 11:35 AM · Compressed graph service

Sep 7 2020

zack raised the priority of T1926: FUSE filesystem to navigate the archive from Wishlist to Normal.
Sep 7 2020, 10:59 AM · Software Heritage filesystem

Sep 6 2020

zack updated subscribers of T1926: FUSE filesystem to navigate the archive.

Noting down that I had a tentative very preliminary implementation in the feature/fuse branch of swh-graph; see in particular fuse.py there.
It's probably no worth picking up and we should restart from scratch at this point, but might still contain useful material.
(The webclient in there has since become a proper thing, see T2279. So that part is definitely obsolete.)

Sep 6 2020, 4:48 PM · Software Heritage filesystem

Sep 4 2020

douardda closed T2530: Write a simple "quick start" for swh-graph as Resolved.

closed by 8c937da20785699ae2a0a604104a9e458eced201

Sep 4 2020, 4:58 PM · Documentation, Compressed graph service
douardda added a revision to T2530: Write a simple "quick start" for swh-graph: D3871: Add a short `quickstart` guide.
Sep 4 2020, 4:57 PM · Documentation, Compressed graph service

Aug 24 2020

zack added a project to T2530: Write a simple "quick start" for swh-graph: Documentation.
Aug 24 2020, 11:36 AM · Documentation, Compressed graph service
douardda triaged T2530: Write a simple "quick start" for swh-graph as High priority.
Aug 24 2020, 10:59 AM · Documentation, Compressed graph service

Jul 23 2020

seirl renamed T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory from swh-graph: loading maps fail when swhgraphshm is running: Cannot allocate memory to swh-graph: loading maps fail when available memory is too low: Cannot allocate memory.
Jul 23 2020, 4:06 PM · Compressed graph service
seirl added a comment to T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory.

Just to be clear, the problem here wasn't directly linked to swhgraphshm but simply to the amount of available memory, because the MAP_PRIVATE flag tried to reserve all that memory to be able to perform copy on write. Using MAP_SHARED + PROT_READ avoids having this memory reservation and fixes the issue. swhgraphshm was just a random process taking a lot of the available ram, not specifically the reason why it failed.

Jul 23 2020, 4:06 PM · Compressed graph service
seirl closed T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory as Resolved by committing rDGRPH39430074227c: pid: use PROT_READ/MAP_SHARED for readonly maps.
Jul 23 2020, 12:32 PM · Compressed graph service
olasd added a comment to T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory.

The ZFS ARC (zfs's page cache) is set to grow without bounds.

Jul 23 2020, 12:23 PM · Compressed graph service
seirl added a revision to T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory: D3599: pid: use PROT_READ/MAP_SHARED for readonly maps.
Jul 23 2020, 12:22 PM · Compressed graph service

Jul 15 2020

zack renamed T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory from swh-graph: memory mapping swhid<->node maps fail: Cannot allocate memory to swh-graph: loading maps fail when swhgraphshm is running: Cannot allocate memory.
Jul 15 2020, 5:49 PM · Compressed graph service
zack added a comment to T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory.

I think it's related to the shm trick.

Jul 15 2020, 5:44 PM · Compressed graph service
zack added a project to T2492: swh-graph: loading maps fail when available memory is too low: Cannot allocate memory: Compressed graph service.
Jul 15 2020, 5:22 PM · Compressed graph service

Jun 3 2020

zack renamed T2431: Document how to export the graph edge dataset from Documentat how to export the graph edge dataset to Document how to export the graph edge dataset.
Jun 3 2020, 4:36 PM · Documentation, Compressed graph service, Datasets
seirl triaged T2431: Document how to export the graph edge dataset as Normal priority.
Jun 3 2020, 4:34 PM · Documentation, Compressed graph service, Datasets

Feb 14 2020

legau moved T2114: swh-graph API: add ?limit=N method variants to return first N results from Backlog to In progress on the Easy hack board.
Feb 14 2020, 2:39 PM · Easy hack, Compressed graph service

Feb 13 2020

legau added a revision to T2114: swh-graph API: add ?limit=N method variants to return first N results: D2669: Add ?limit=N method variants to return first N results.
Feb 13 2020, 12:43 PM · Easy hack, Compressed graph service

Feb 7 2020

legau added a comment to T2114: swh-graph API: add ?limit=N method variants to return first N results.

Would this param replace /last altogether as it would be equivalent to ?limit=1 or are they mutually exclusive ?

Feb 7 2020, 3:00 PM · Easy hack, Compressed graph service

Jan 29 2020

zack added a project to T2114: swh-graph API: add ?limit=N method variants to return first N results: Easy hack.
Jan 29 2020, 2:09 PM · Easy hack, Compressed graph service

Jan 22 2020

vlorentz added a project to T2220: swh-graph in production: Compressed graph service.
Jan 22 2020, 4:36 PM · Roadmap 2022, meta-task, Roadmap 2021, Compressed graph service
vlorentz closed T1731: Intrinsic identifiers for origins, a subtask of T1867: compress Merkle DAG and origin nodes together, as Resolved.
Jan 22 2020, 2:18 PM · Compressed graph service

Jan 8 2020

vlorentz added a comment to T2053: support graph export for the cassandra backend.

I just ran it on Azure. It has a different schema (the "revision" table with split into "revision" and "revision_parent") so the benchmarks are not exactly comparable.
I still use 16 workers, all running on the same machine, and with no compression

Jan 8 2020, 4:54 PM · Compressed graph service, Storage manager

Dec 6 2019

zack closed T2112: make "swh graph map lookup" accept lists of identifiers as Resolved by committing rDGRPH71ce98054b4e: CLI: generalize 'map lookup' to lookup many identifiers at once.
Dec 6 2019, 4:19 PM · Compressed graph service

Nov 30 2019

zack added a revision to T2112: make "swh graph map lookup" accept lists of identifiers: D2379: CLI: generalize 'map lookup' to lookup many identifiers at once.
Nov 30 2019, 2:52 PM · Compressed graph service

Nov 27 2019

zack triaged T2114: swh-graph API: add ?limit=N method variants to return first N results as Normal priority.
Nov 27 2019, 4:24 PM · Easy hack, Compressed graph service
zack created T2114: swh-graph API: add ?limit=N method variants to return first N results.
Nov 27 2019, 4:24 PM · Easy hack, Compressed graph service
zack triaged T2113: swh-graph: add support to optionally resolve ori PIDs to origin URLs as Low priority.
Nov 27 2019, 4:22 PM · Compressed graph service
zack renamed T2112: make "swh graph map lookup" accept lists of identifiers from make "swh graph map lookup" takes list of identifiers to make "swh graph map lookup" accept lists of identifiers.
Nov 27 2019, 4:18 PM · Compressed graph service
zack triaged T2112: make "swh graph map lookup" accept lists of identifiers as Low priority.
Nov 27 2019, 4:18 PM · Compressed graph service
zack updated subscribers of T1926: FUSE filesystem to navigate the archive.

proposed CLI interface:

swh [ -C config.yml ] graph mount PID DIR

will mount the content of the given PID to the given local DIR.

Nov 27 2019, 3:59 PM · Software Heritage filesystem

Nov 25 2019

olasd closed T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs as Resolved by committing rDSTO2cac3392eb49: Implement origin lookup by sha1.
Nov 25 2019, 3:30 PM · Compressed graph service, Storage manager
seirl added a comment to T2083: provide systemd service file for swh-graph.

Sure, it's as simple as you'd expect:

Nov 25 2019, 10:41 AM · Compressed graph service
zack added a comment to T2083: provide systemd service file for swh-graph.

can you also post it here, please?

Nov 25 2019, 8:37 AM · Compressed graph service

Nov 24 2019

seirl closed T2083: provide systemd service file for swh-graph as Resolved.
Nov 24 2019, 11:24 PM · Compressed graph service
seirl added a comment to T2083: provide systemd service file for swh-graph.

It's up on granet, not committed yet because it should be included in a puppet integration diff.

Nov 24 2019, 11:24 PM · Compressed graph service

Nov 22 2019

olasd added a revision to T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs: D2346: Implement origin lookup by sha1.
Nov 22 2019, 6:29 PM · Compressed graph service, Storage manager
olasd added a comment to T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs.

Launched on somerset:

Nov 22 2019, 5:40 PM · Compressed graph service, Storage manager
olasd added a comment to T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs.

We should consider just adding a btree index on sha1(url) and see where that takes us.

Nov 22 2019, 5:37 PM · Compressed graph service, Storage manager
vlorentz added a comment to T2053: support graph export for the cassandra backend.

Forgot to mention it here, but it's done now.

Nov 22 2019, 10:55 AM · Compressed graph service, Storage manager

Nov 19 2019

olasd triaged T2103: (Debian) package py4j as Normal priority.
Nov 19 2019, 5:48 PM · Compressed graph service
olasd added a subtask for T2100: Bootstrap Debian packaging for swh.graph: T2102: Clean up Debian packaging branch bootstrapping scripts.
Nov 19 2019, 5:47 PM · Compressed graph service
olasd triaged T2100: Bootstrap Debian packaging for swh.graph as Normal priority.
Nov 19 2019, 5:45 PM · Compressed graph service

Nov 18 2019

zack raised the priority of T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs from Normal to High.
Nov 18 2019, 5:50 PM · Compressed graph service, Storage manager
zack added a project to T2045: add support for reverse lookup from swh:1:ori:... PIDs to origin URLs: Compressed graph service.
Nov 18 2019, 5:50 PM · Compressed graph service, Storage manager
olasd added a comment to T2096: CNAME for graph service: graph.internal.softwareheritage.org (?).

That's now also been deployed.

Nov 18 2019, 3:12 PM · Compressed graph service, System administration
zack raised the priority of T1868: refresh compressed representation of the archive from Normal to High.
Nov 18 2019, 3:09 PM · Compressed graph service
zack closed T2096: CNAME for graph service: graph.internal.softwareheritage.org (?) as Resolved by committing rSPSITEc54e88a14607: add CNAME graph -> granet.
Nov 18 2019, 3:05 PM · Compressed graph service, System administration
zack added a revision to T2096: CNAME for graph service: graph.internal.softwareheritage.org (?): D2297: add CNAME graph -> granet.
Nov 18 2019, 2:54 PM · Compressed graph service, System administration
zack lowered the priority of T1847: fully automate export of the graph dataset from High to Normal.
Nov 18 2019, 2:50 PM · Compressed graph service, Datasets
zack added a project to T1847: fully automate export of the graph dataset: Compressed graph service.
Nov 18 2019, 2:48 PM · Compressed graph service, Datasets
zack raised the priority of T1868: refresh compressed representation of the archive from Low to Normal.
Nov 18 2019, 2:48 PM · Compressed graph service
zack closed T2084: swh-graph: add /last endpoint variants to the REST API as Resolved by committing rDGRPH51d6b602c3e8: add /last sub-endpoint to only return destination in walks.
Nov 18 2019, 1:56 PM · Compressed graph service
zack placed T1969: graph: reduce RAM usage for /walk up for grabs.
Nov 18 2019, 1:51 PM · Compressed graph service
zack reopened T1969: graph: reduce RAM usage for /walk as "Open".
Nov 18 2019, 1:51 PM · Compressed graph service