Page MenuHomeSoftware Heritage

adds a filter by node type as a query argument
ClosedPublic

Authored by Hakimb on Apr 22 2021, 5:27 PM.

Details

Summary

We now have a "return types" query argument on endpoints:
*leaves
*neighbors
*visit/nodes
*random walk.

This allows us to filter the returned nodes by discriminating the nodes returned during the visit by a set of node types.
The argument is a string that takes the form :

"*" or "nodetype" or "nodetype,nodetype...."

Diff Detail

Event Timeline

Build is green

Patch application report for D5577 (id=19933)

Rebasing onto 04a1656c10...

First, rewinding head to replay your work on top of it...
Applying: adds a filter by node type as a query argument
Changes applied before test
commit 65755f09f9197fde2a85af18b5b993914bf6b12a
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Thu Apr 22 17:20:14 2021 +0200

    adds a filter by node type as a query argument

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/90/ for more details.

Looks good overall, but I'm not familiar with this piece of code.

Could you add some more comments in NodesFiltering.java? Two examples would be nice too.

If possible, could you add tests?

Some small style comments below:

java/src/main/java/org/softwareheritage/graph/Entry.java
117–119

What is the difference between t.leavesVisitor(srcNodeId, this::writeNode); and this?

(same question for the other two functions below)

java/src/main/java/org/softwareheritage/graph/NodesFiltering.java
6–10

What about overloading the constructor with a variant with no argument, instead of a magic value *?

25–27

Wouldn't this work?

for (String type : types) {
    restrictedNodesTypes.add(Node.Type.fromStr(type));
}

or even this?

Collections.addAll(restrictedNodesTypes, types)
36–41

Doesn't this return a value *and* modify the argument? this could be confusing to use.

What about creating a new ArrayList and adding matching elements to it?

swh/graph/server/app.py
100–105

no need to create a temporary list, instead we can use a generator expression to pass a generator to any.

109–110

use comments

This revision now requires changes to proceed.Apr 22 2021, 5:52 PM
java/src/main/java/org/softwareheritage/graph/NodesFiltering.java
36–41

And I'm not very familiar with Java, but isn't ArrayList backed by a single dynamic array? This means every remove operation needs to copy all elements on the right, so filterByNodeTypes runs in quadratic time.

java/src/main/java/org/softwareheritage/graph/Entry.java
117–119

In Traversal.java, we can see that the functions "leaves" "neighbors" etc... are not used, because the code you show calls directly leavesVisitor with, as "consumer", a function that displays on the output the nodes found during the visit, on the fly.

However, I need to filter the nodes found after the visit, not during, so I used the functions "leaves" "neighbors" etc... which store in an arraylist all the nodes found during the visit, to be able to do my filtering, and I still use the writeNode function to display them at the end.

adding comments, tests, and fix code style

Build is green

Patch application report for D5577 (id=19984)

Rebasing onto 04a1656c10...

First, rewinding head to replay your work on top of it...
Applying: adds a filter by node type as a query argument
Applying: adds a filter by node type as a query argument
Changes applied before test
commit 025054c31f8cf5c5d680db8e87b870e73361709e
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Fri Apr 23 18:09:25 2021 +0200

    adds a filter by node type as a query argument

commit 073a54a86120c71dd36600dcb7bf889640359014
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Thu Apr 22 17:20:14 2021 +0200

    adds a filter by node type as a query argument

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/94/ for more details.

add comment and examples of queries

Build is green

Patch application report for D5577 (id=19985)

Rebasing onto 04a1656c10...

First, rewinding head to replay your work on top of it...
Applying: adds a filter by node type as a query argument
Applying: adds a filter by node type as a query argument
Applying: add comment and examples of queries
Changes applied before test
commit 16a8f167b74b6939cf3fe6e6da51fe3f2c7fd2b0
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Fri Apr 23 18:24:30 2021 +0200

    add comment and examples of queries

commit 05927df6b86aceb50d88c424513a6432a001f8c4
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Fri Apr 23 18:09:25 2021 +0200

    adds a filter by node type as a query argument

commit a7732696e21b9afb296642c5a9612f1c00ac4bcd
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Thu Apr 22 17:20:14 2021 +0200

    adds a filter by node type as a query argument

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/95/ for more details.

Could you add some more comments in NodesFiltering.java? Two examples would be nice too.

I meant examples with (executable) code.

  • add comments and examples in NodeFiltering class

rebase and add comments and examples in NodeFiltering class

Build is green

Patch application report for D5577 (id=20093)

Rebasing onto 6f21897a7e...

Current branch diff-target is up to date.
Changes applied before test
commit 452b503cc26571695f2d9fbbcc8d514529c38168
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Wed Apr 28 13:51:24 2021 +0200

    add a filter by node type as a query parameter

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/101/ for more details.

Thanks! One last thing: as this code block documents the whole class, could you move it at the beginning of the file?

java/src/main/java/org/softwareheritage/graph/NodesFiltering.java
89

Build is green

Patch application report for D5577 (id=20259)

Rebasing onto 1a8b00f4cd...

First, rewinding head to replay your work on top of it...
Applying: adds a filter by node type as a query argument
Using index info to reconstruct a base tree...
M	java/src/main/java/org/softwareheritage/graph/Entry.java
M	java/src/main/java/org/softwareheritage/graph/Traversal.java
M	swh/graph/backend.py
M	swh/graph/client.py
M	swh/graph/graph.py
M	swh/graph/server/app.py
Falling back to patching base and 3-way merge...
Auto-merging swh/graph/server/app.py
CONFLICT (content): Merge conflict in swh/graph/server/app.py
Auto-merging swh/graph/graph.py
CONFLICT (content): Merge conflict in swh/graph/graph.py
Auto-merging swh/graph/client.py
CONFLICT (content): Merge conflict in swh/graph/client.py
Auto-merging swh/graph/backend.py
CONFLICT (content): Merge conflict in swh/graph/backend.py
Auto-merging java/src/main/java/org/softwareheritage/graph/Traversal.java
CONFLICT (content): Merge conflict in java/src/main/java/org/softwareheritage/graph/Traversal.java
Auto-merging java/src/main/java/org/softwareheritage/graph/Entry.java
CONFLICT (content): Merge conflict in java/src/main/java/org/softwareheritage/graph/Entry.java
Patch failed at 0001 adds a filter by node type as a query argument

Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".

Rebase failed (ret=1)!

Could not rebase; Attempt merge onto 1a8b00f4cd...

Already up to date.
Changes applied before test
commit 073881b6f2e7898306f15f79f75e72ab7a58cb31
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Tue May 4 16:30:46 2021 +0200

    adds a filter by node type as a query argument

commit 2cbb0bb4e20c5538c5426f4642ad54a4d4511673
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Wed Apr 28 12:44:52 2021 +0200

    add comments and examples in NodeFiltering class

commit 2c206f35bce1100d4bc34dabb8e5b0aa54390008
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Fri Apr 23 18:24:30 2021 +0200

    add comment and examples of queries

commit beb9de1def38c04e5408e0a3465d769f8c6b795f
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Fri Apr 23 18:09:25 2021 +0200

    adds a filter by node type as a query argument

commit b788d817c33a42302e7257a2b8cb7afc76c7fa6c
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Thu Apr 22 17:20:14 2021 +0200

    adds a filter by node type as a query argument

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/112/ for more details.

This revision is now accepted and ready to land.May 4 2021, 4:43 PM
vlorentz requested changes to this revision.May 4 2021, 5:01 PM

Er, I didn't notice but you have too many commits in the diff now, please squash them

This revision now requires changes to proceed.May 4 2021, 5:01 PM

Build has FAILED

Patch application report for D5577 (id=20262)

Rebasing onto 1a8b00f4cd...

First, rewinding head to replay your work on top of it...
Applying: adds a filter by node type as a query argument
Using index info to reconstruct a base tree...
M	java/src/main/java/org/softwareheritage/graph/Entry.java
M	java/src/main/java/org/softwareheritage/graph/Traversal.java
M	swh/graph/backend.py
M	swh/graph/client.py
M	swh/graph/graph.py
M	swh/graph/server/app.py
Falling back to patching base and 3-way merge...
Auto-merging swh/graph/server/app.py
CONFLICT (content): Merge conflict in swh/graph/server/app.py
Auto-merging swh/graph/graph.py
CONFLICT (content): Merge conflict in swh/graph/graph.py
Auto-merging swh/graph/client.py
CONFLICT (content): Merge conflict in swh/graph/client.py
Auto-merging swh/graph/backend.py
CONFLICT (content): Merge conflict in swh/graph/backend.py
Auto-merging java/src/main/java/org/softwareheritage/graph/Traversal.java
CONFLICT (content): Merge conflict in java/src/main/java/org/softwareheritage/graph/Traversal.java
Auto-merging java/src/main/java/org/softwareheritage/graph/Entry.java
CONFLICT (content): Merge conflict in java/src/main/java/org/softwareheritage/graph/Entry.java
Patch failed at 0001 adds a filter by node type as a query argument

Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".

Rebase failed (ret=1)!

Could not rebase; Attempt merge onto 1a8b00f4cd...

Already up to date.
Changes applied before test
commit 9316536b9b923873bd3a29311699b5b5d77a5a23
Merge: 1a8b00f 073881b
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Tue May 4 16:56:22 2021 +0200

    adds a filter by node type as a query argument

commit 073881b6f2e7898306f15f79f75e72ab7a58cb31
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Tue May 4 16:30:46 2021 +0200

    adds a filter by node type as a query argument

commit 2cbb0bb4e20c5538c5426f4642ad54a4d4511673
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Wed Apr 28 12:44:52 2021 +0200

    add comments and examples in NodeFiltering class

commit 2c206f35bce1100d4bc34dabb8e5b0aa54390008
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Fri Apr 23 18:24:30 2021 +0200

    add comment and examples of queries

commit beb9de1def38c04e5408e0a3465d769f8c6b795f
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Fri Apr 23 18:09:25 2021 +0200

    adds a filter by node type as a query argument

commit b788d817c33a42302e7257a2b8cb7afc76c7fa6c
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Thu Apr 22 17:20:14 2021 +0200

    adds a filter by node type as a query argument

Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/113/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/113/console

You added another commit. To clean up everything, I suggest that you drop the existing commits and create a new one. Here is how you can do it:

  1. Make sure you are on your master branch
  2. Removes your commit from the local master branch, so the state of the branch is the same as the main (origin) repo: git reset origin/master. This will keep the local content of the files (as unstaged changes), because you don't pass --hard.
  3. Create a new commit from scratch, based on the local content: git add [...] then git commit -m "adds a filter by node type as a query argument"
  4. finally, update the diff: arc diff --update D5577 -m "squash commits"
This revision is now accepted and ready to land.May 4 2021, 5:18 PM

Build has FAILED

Patch application report for D5577 (id=20264)

Rebasing onto 1a8b00f4cd...

Current branch diff-target is up to date.
Changes applied before test
commit 35391f89acfb33d542d3b6182142ce364418af47
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Tue May 4 17:17:09 2021 +0200

    adds a filter by node type as a query argument

Link to build: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/114/
See console output for more information: https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/114/console

fix argument in Entry and Traversal + an assert in test_api_client

Build is green

Patch application report for D5577 (id=20265)

Rebasing onto 1a8b00f4cd...

Current branch diff-target is up to date.
Changes applied before test
commit 85880ae749995213856cea9b9afad5a2b59f519c
Author: Hakim Baaloudj <hbaaloud@po461-pro.paris.inria.fr>
Date:   Tue May 4 17:30:10 2021 +0200

    adds a filter by node type as a query argument

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/115/ for more details.