Page MenuHomeSoftware Heritage

Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph
ClosedPublic

Authored by seirl on Jan 14 2022, 11:16 PM.

Details

Summary

This commit is a massive refactor of the project to apply a stronger
separation of concerns which will allow the project to be more
maintainable and extensible in the future.

The principal change is the switch from a single SwhGraph containing
everything SWH-specific to two new classes:

  • SwhUnidirectionalGraph, an ImmutableGraph augmented with SWH-specific methods such as getSWHID(), getNodeType(), etc.
  • SwhBidirectionalGraph, which leverages BidirectionalImmutableGraph to store two SwhUnidirectionnalGraph, one for each direction.

To share common behavior and storage between these two classes, such as
handling node types and node properties, another class has been created:

  • SwhGraphProperties, which is designed to contain the graph properties that can apply on the graph whatever its direction is (node types, node properties, dataset version, etc.)

Finally, to allow users of this API to use the two classes
interchangeably when the direction(s) do not mattern, both classes
implement a new interface:

  • SwhGraph, which is designed to hold the common SWH-specific behavior between SwhUnidirectionalGraph and SwhBidirectionalGraph.
                ┌──────────────┐
                │ImmutableGraph◄────────┐
                └────▲─────────┘        │extends
                     │                  │
                     │       ┌──────────┴────────────────┐
              extends│       │BidirectionalImmutableGraph│
                     │       └────────────▲──────────────┘
                     │                    │extends
      ┌──────────────┴───────┐     ┌──────┴──────────────┐
      │SwhUnidirectionalGraph│◄────┤SwhBidirectionalGraph│
      └──┬──────────────┬────┘     └────────┬───────────┬┘
         │              │    contains x2    │           │
         │              │                   │           │
         │    implements│                   │implements │
         │             ┌▼──────────┐        │           │
         │             │SwhGraph(I)◄────────┘           │
contains │             └───────────┘                    │contains
         │                                              │
         │            ┌──────────────────┐              │
         └────────────►SwhGraphProperties◄──────────────┘
                      └──────────────────┘

BidirectionalImmutableGraph is included in this pull-request, but is
intended to be merged upstream eventually:

This commit includes a (partial) documentation overhaul. Notably,
using cross-dependency compiling of java documentation allows us to
remove redundant method documentations when they are inherited from
webgraph classes. It also includes cross-dependency *linking*, which
allows users to click on external links to unimi.it libraries.

Finally, this commits introduces stub methods for optionally loading
*labelled* graphs, which will be useful to store edges properties such
as file names, etc. To avoid a combinatorial explosion of types
(SwhUnidirectionalLabelledGraph, SwhBidirectionalLabelledGraph, etc..),
this is only checked at runtime. Classes hold both the unlabelled and
labelled versions of the underlying graphs, with the labelled version
being equal to null when the labels have not been loaded.

Fixes T2983

Diff Detail

Repository
rDGRPH Compressed graph representation
Branch
graph_refactor
Lint
No Linters Available
Unit
No Unit Test Coverage
Build Status
Buildable 26063
Build 40732: Phabricator diff pipeline on jenkinsJenkins console · Jenkins
Build 40731: arc lint + arc unit

Event Timeline

Build is green

Patch application report for D6953 (id=25199)

Rebasing onto 64f35108dd...

Current branch diff-target is up to date.
Changes applied before test
commit 98cd99c8e017739e46aadff0cfbfca2418e63294
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Sat Dec 4 00:50:57 2021 +0100

    Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph
    
    This commit is a massive refactor of the project to apply a stronger
    separation of concerns which will allow the project to be more
    maintainable and extensible in the future.
    
    The principal change is the switch from a single SwhGraph containing
    everything SWH-specific to two new classes:
    
    - SwhUnidirectionalGraph, an ImmutableGraph augmented with SWH-specific
      methods such as getSWHID(), getNodeType(), etc.
    
    - SwhBidirectionalGraph, which leverages BidirectionalImmutableGraph to
      store two SwhUnidirectionnalGraph, one for each direction.
    
    To share common behavior and storage between these two classes, such as
    handling node types and node properties, another class has been created:
    
    - SwhGraphProperties, which is designed to contain the graph properties
      that can apply on the graph whatever its direction is (node types,
      node properties, dataset version, etc.)
    
    Finally, to allow users of this API to use the two classes
    interchangeably when the direction(s) do not mattern, both classes
    implement a new interface:
    
    - SwhGraph, which is designed to hold the common SWH-specific
      behavior between SwhUnidirectionalGraph and SwhBidirectionalGraph.
    
                        ┌──────────────┐
                        │ImmutableGraph◄────────┐
                        └────▲─────────┘        │extends
                             │                  │
                             │       ┌──────────┴────────────────┐
                      extends│       │BidirectionalImmutableGraph│
                             │       └────────────▲──────────────┘
                             │                    │extends
              ┌──────────────┴───────┐     ┌──────┴──────────────┐
              │SwhUnidirectionalGraph│◄────┤SwhBidirectionalGraph│
              └──┬──────────────┬────┘     └────────┬───────────┬┘
                 │              │    contains x2    │           │
                 │              │                   │           │
                 │    implements│                   │implements │
                 │             ┌▼──────────┐        │           │
                 │             │SwhGraph(I)◄────────┘           │
        contains │             └───────────┘                    │contains
                 │                                              │
                 │            ┌──────────────────┐              │
                 └────────────►SwhGraphProperties◄──────────────┘
                              └──────────────────┘
    
    BidirectionalImmutableGraph is included in this pull-request, but is
    intended to be merged upstream eventually:
    
    - https://github.com/vigna/webgraph/pull/5
    - https://github.com/vigna/webgraph-big/pull/2
    
    This commit includes a (partial) documentation overhaul. Notably,
    using cross-dependency compiling of java documentation allows us to
    remove redundant method documentations when they are inherited from
    webgraph classes. It also includes cross-dependency *linking*, which
    allows users to click on external links to unimi.it libraries.
    
    Finally, this commits introduces stub methods for optionally loading
    *labelled* graphs, which will be useful to store edges properties such
    as file names, etc. To avoid a combinatorial explosion of types
    (SwhUnidirectionalLabelledGraph, SwhBidirectionalLabelledGraph, etc..),
    this is only checked at runtime. Classes hold both the unlabelled and
    labelled versions of the underlying graphs, with the labelled version
    being equal to null when the labels have not been loaded.

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/152/ for more details.

Build is green

Patch application report for D6953 (id=25200)

Rebasing onto 64f35108dd...

Current branch diff-target is up to date.
Changes applied before test
commit fe5627675344cec017d0566988008e1cfc953cb2
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Sat Dec 4 00:50:57 2021 +0100

    Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph
    
    This commit is a massive refactor of the project to apply a stronger
    separation of concerns which will allow the project to be more
    maintainable and extensible in the future.
    
    The principal change is the switch from a single SwhGraph containing
    everything SWH-specific to two new classes:
    
    - SwhUnidirectionalGraph, an ImmutableGraph augmented with SWH-specific
      methods such as getSWHID(), getNodeType(), etc.
    
    - SwhBidirectionalGraph, which leverages BidirectionalImmutableGraph to
      store two SwhUnidirectionnalGraph, one for each direction.
    
    To share common behavior and storage between these two classes, such as
    handling node types and node properties, another class has been created:
    
    - SwhGraphProperties, which is designed to contain the graph properties
      that can apply on the graph whatever its direction is (node types,
      node properties, dataset version, etc.)
    
    Finally, to allow users of this API to use the two classes
    interchangeably when the direction(s) do not mattern, both classes
    implement a new interface:
    
    - SwhGraph, which is designed to hold the common SWH-specific
      behavior between SwhUnidirectionalGraph and SwhBidirectionalGraph.
    
                        ┌──────────────┐
                        │ImmutableGraph◄────────┐
                        └────▲─────────┘        │extends
                             │                  │
                             │       ┌──────────┴────────────────┐
                      extends│       │BidirectionalImmutableGraph│
                             │       └────────────▲──────────────┘
                             │                    │extends
              ┌──────────────┴───────┐     ┌──────┴──────────────┐
              │SwhUnidirectionalGraph│◄────┤SwhBidirectionalGraph│
              └──┬──────────────┬────┘     └────────┬───────────┬┘
                 │              │    contains x2    │           │
                 │              │                   │           │
                 │    implements│                   │implements │
                 │             ┌▼──────────┐        │           │
                 │             │SwhGraph(I)◄────────┘           │
        contains │             └───────────┘                    │contains
                 │                                              │
                 │            ┌──────────────────┐              │
                 └────────────►SwhGraphProperties◄──────────────┘
                              └──────────────────┘
    
    BidirectionalImmutableGraph is included in this pull-request, but is
    intended to be merged upstream eventually:
    
    - https://github.com/vigna/webgraph/pull/5
    - https://github.com/vigna/webgraph-big/pull/2
    
    This commit includes a (partial) documentation overhaul. Notably,
    using cross-dependency compiling of java documentation allows us to
    remove redundant method documentations when they are inherited from
    webgraph classes. It also includes cross-dependency *linking*, which
    allows users to click on external links to unimi.it libraries.
    
    Finally, this commits introduces stub methods for optionally loading
    *labelled* graphs, which will be useful to store edges properties such
    as file names, etc. To avoid a combinatorial explosion of types
    (SwhUnidirectionalLabelledGraph, SwhBidirectionalLabelledGraph, etc..),
    this is only checked at runtime. Classes hold both the unlabelled and
    labelled versions of the underlying graphs, with the labelled version
    being equal to null when the labels have not been loaded.

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/153/ for more details.

Remove Graph -> SwhBidirectionalGraph strings

Build is green

Patch application report for D6953 (id=25201)

Rebasing onto 64f35108dd...

Current branch diff-target is up to date.
Changes applied before test
commit 672a13332be500b0140d8a8053fbdb04c0f7615b
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Sat Dec 4 00:50:57 2021 +0100

    Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph
    
    This commit is a massive refactor of the project to apply a stronger
    separation of concerns which will allow the project to be more
    maintainable and extensible in the future.
    
    The principal change is the switch from a single SwhGraph containing
    everything SWH-specific to two new classes:
    
    - SwhUnidirectionalGraph, an ImmutableGraph augmented with SWH-specific
      methods such as getSWHID(), getNodeType(), etc.
    
    - SwhBidirectionalGraph, which leverages BidirectionalImmutableGraph to
      store two SwhUnidirectionnalGraph, one for each direction.
    
    To share common behavior and storage between these two classes, such as
    handling node types and node properties, another class has been created:
    
    - SwhGraphProperties, which is designed to contain the graph properties
      that can apply on the graph whatever its direction is (node types,
      node properties, dataset version, etc.)
    
    Finally, to allow users of this API to use the two classes
    interchangeably when the direction(s) do not mattern, both classes
    implement a new interface:
    
    - SwhGraph, which is designed to hold the common SWH-specific
      behavior between SwhUnidirectionalGraph and SwhBidirectionalGraph.
    
                        ┌──────────────┐
                        │ImmutableGraph◄────────┐
                        └────▲─────────┘        │extends
                             │                  │
                             │       ┌──────────┴────────────────┐
                      extends│       │BidirectionalImmutableGraph│
                             │       └────────────▲──────────────┘
                             │                    │extends
              ┌──────────────┴───────┐     ┌──────┴──────────────┐
              │SwhUnidirectionalGraph│◄────┤SwhBidirectionalGraph│
              └──┬──────────────┬────┘     └────────┬───────────┬┘
                 │              │    contains x2    │           │
                 │              │                   │           │
                 │    implements│                   │implements │
                 │             ┌▼──────────┐        │           │
                 │             │SwhGraph(I)◄────────┘           │
        contains │             └───────────┘                    │contains
                 │                                              │
                 │            ┌──────────────────┐              │
                 └────────────►SwhGraphProperties◄──────────────┘
                              └──────────────────┘
    
    BidirectionalImmutableGraph is included in this pull-request, but is
    intended to be merged upstream eventually:
    
    - https://github.com/vigna/webgraph/pull/5
    - https://github.com/vigna/webgraph-big/pull/2
    
    This commit includes a (partial) documentation overhaul. Notably,
    using cross-dependency compiling of java documentation allows us to
    remove redundant method documentations when they are inherited from
    webgraph classes. It also includes cross-dependency *linking*, which
    allows users to click on external links to unimi.it libraries.
    
    Finally, this commits introduces stub methods for optionally loading
    *labelled* graphs, which will be useful to store edges properties such
    as file names, etc. To avoid a combinatorial explosion of types
    (SwhUnidirectionalLabelledGraph, SwhBidirectionalLabelledGraph, etc..),
    this is only checked at runtime. Classes hold both the unlabelled and
    labelled versions of the underlying graphs, with the labelled version
    being equal to null when the labels have not been loaded.

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/154/ for more details.

Fix more buggy Graph -> SwhBidirectionalGraph

Build is green

Patch application report for D6953 (id=25202)

Rebasing onto 64f35108dd...

Current branch diff-target is up to date.
Changes applied before test
commit 83fcf6bb815618ef711383d4bd17a4436caca0aa
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Sat Dec 4 00:50:57 2021 +0100

    Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph
    
    This commit is a massive refactor of the project to apply a stronger
    separation of concerns which will allow the project to be more
    maintainable and extensible in the future.
    
    The principal change is the switch from a single SwhGraph containing
    everything SWH-specific to two new classes:
    
    - SwhUnidirectionalGraph, an ImmutableGraph augmented with SWH-specific
      methods such as getSWHID(), getNodeType(), etc.
    
    - SwhBidirectionalGraph, which leverages BidirectionalImmutableGraph to
      store two SwhUnidirectionnalGraph, one for each direction.
    
    To share common behavior and storage between these two classes, such as
    handling node types and node properties, another class has been created:
    
    - SwhGraphProperties, which is designed to contain the graph properties
      that can apply on the graph whatever its direction is (node types,
      node properties, dataset version, etc.)
    
    Finally, to allow users of this API to use the two classes
    interchangeably when the direction(s) do not mattern, both classes
    implement a new interface:
    
    - SwhGraph, which is designed to hold the common SWH-specific
      behavior between SwhUnidirectionalGraph and SwhBidirectionalGraph.
    
                        ┌──────────────┐
                        │ImmutableGraph◄────────┐
                        └────▲─────────┘        │extends
                             │                  │
                             │       ┌──────────┴────────────────┐
                      extends│       │BidirectionalImmutableGraph│
                             │       └────────────▲──────────────┘
                             │                    │extends
              ┌──────────────┴───────┐     ┌──────┴──────────────┐
              │SwhUnidirectionalGraph│◄────┤SwhBidirectionalGraph│
              └──┬──────────────┬────┘     └────────┬───────────┬┘
                 │              │    contains x2    │           │
                 │              │                   │           │
                 │    implements│                   │implements │
                 │             ┌▼──────────┐        │           │
                 │             │SwhGraph(I)◄────────┘           │
        contains │             └───────────┘                    │contains
                 │                                              │
                 │            ┌──────────────────┐              │
                 └────────────►SwhGraphProperties◄──────────────┘
                              └──────────────────┘
    
    BidirectionalImmutableGraph is included in this pull-request, but is
    intended to be merged upstream eventually:
    
    - https://github.com/vigna/webgraph/pull/5
    - https://github.com/vigna/webgraph-big/pull/2
    
    This commit includes a (partial) documentation overhaul. Notably,
    using cross-dependency compiling of java documentation allows us to
    remove redundant method documentations when they are inherited from
    webgraph classes. It also includes cross-dependency *linking*, which
    allows users to click on external links to unimi.it libraries.
    
    Finally, this commits introduces stub methods for optionally loading
    *labelled* graphs, which will be useful to store edges properties such
    as file names, etc. To avoid a combinatorial explosion of types
    (SwhUnidirectionalLabelledGraph, SwhBidirectionalLabelledGraph, etc..),
    this is only checked at runtime. Classes hold both the unlabelled and
    labelled versions of the underlying graphs, with the labelled version
    being equal to null when the labels have not been loaded.

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/155/ for more details.

The principal change is the switch from a single SwhGraph containing
everything SWH-specific to two new classes:

- SwhUnidirectionalGraph, an ImmutableGraph augmented with SWH-specific
  methods such as getSWHID(), getNodeType(), etc.

- SwhBidirectionalGraph, which leverages BidirectionalImmutableGraph to
  store two SwhUnidirectionnalGraph, one for each direction.

To share common behavior and storage between these two classes, such as
handling node types and node properties, another class has been created:

- SwhGraphProperties, which is designed to contain the graph properties
  that can apply on the graph whatever its direction is (node types,
  node properties, dataset version, etc.)

Finally, to allow users of this API to use the two classes
interchangeably when the direction(s) do not mattern, both classes
implement a new interface:

- SwhGraph, which is designed to hold the common SWH-specific
  behavior between SwhUnidirectionalGraph and SwhBidirectionalGraph.

                    ┌──────────────┐
                    │ImmutableGraph◄────────┐
                    └────▲─────────┘        │extends
                         │                  │
                         │       ┌──────────┴────────────────┐
                  extends│       │BidirectionalImmutableGraph│
                         │       └────────────▲──────────────┘
                         │                    │extends
          ┌──────────────┴───────┐     ┌──────┴──────────────┐
          │SwhUnidirectionalGraph│◄────┤SwhBidirectionalGraph│
          └──┬──────────────┬────┘     └────────┬───────────┬┘
             │              │    contains x2    │           │
             │              │                   │           │
             │    implements│                   │implements │
             │             ┌▼──────────┐        │           │
             │             │SwhGraph(I)◄────────┘           │
    contains │             └───────────┘                    │contains
             │                                              │
             │            ┌──────────────────┐              │
             └────────────►SwhGraphProperties◄──────────────┘
                          └──────────────────┘

Could you write this in docs/?

@vlorentz this is all written in the docstrings of the classes, which will be added to swh-docs once https://forge.softwareheritage.org/T1971 is implemented (which is going to be easier to do thanks to this diff :-))

The only thing missing is the UML diagram. I could add it to the docs of SwhBidirectionalGraph.

SwhBidirectionalGraph: add UML diagram of class hierarchy

Build is green

Patch application report for D6953 (id=25205)

Rebasing onto 64f35108dd...

Current branch diff-target is up to date.
Changes applied before test
commit bb9b5fc59d8d9a46231beba384a140a06cf9959e
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Mon Jan 17 13:46:12 2022 +0100

    SwhBidirectionalGraph: add UML diagram of class hierarchy

commit 83fcf6bb815618ef711383d4bd17a4436caca0aa
Author: Antoine Pietri <antoine.pietri1@gmail.com>
Date:   Sat Dec 4 00:50:57 2021 +0100

    Refactor Graph class in SwhUnidirectionalGraph and SwhBidirectionalGraph
    
    This commit is a massive refactor of the project to apply a stronger
    separation of concerns which will allow the project to be more
    maintainable and extensible in the future.
    
    The principal change is the switch from a single SwhGraph containing
    everything SWH-specific to two new classes:
    
    - SwhUnidirectionalGraph, an ImmutableGraph augmented with SWH-specific
      methods such as getSWHID(), getNodeType(), etc.
    
    - SwhBidirectionalGraph, which leverages BidirectionalImmutableGraph to
      store two SwhUnidirectionnalGraph, one for each direction.
    
    To share common behavior and storage between these two classes, such as
    handling node types and node properties, another class has been created:
    
    - SwhGraphProperties, which is designed to contain the graph properties
      that can apply on the graph whatever its direction is (node types,
      node properties, dataset version, etc.)
    
    Finally, to allow users of this API to use the two classes
    interchangeably when the direction(s) do not mattern, both classes
    implement a new interface:
    
    - SwhGraph, which is designed to hold the common SWH-specific
      behavior between SwhUnidirectionalGraph and SwhBidirectionalGraph.
    
                        ┌──────────────┐
                        │ImmutableGraph◄────────┐
                        └────▲─────────┘        │extends
                             │                  │
                             │       ┌──────────┴────────────────┐
                      extends│       │BidirectionalImmutableGraph│
                             │       └────────────▲──────────────┘
                             │                    │extends
              ┌──────────────┴───────┐     ┌──────┴──────────────┐
              │SwhUnidirectionalGraph│◄────┤SwhBidirectionalGraph│
              └──┬──────────────┬────┘     └────────┬───────────┬┘
                 │              │    contains x2    │           │
                 │              │                   │           │
                 │    implements│                   │implements │
                 │             ┌▼──────────┐        │           │
                 │             │SwhGraph(I)◄────────┘           │
        contains │             └───────────┘                    │contains
                 │                                              │
                 │            ┌──────────────────┐              │
                 └────────────►SwhGraphProperties◄──────────────┘
                              └──────────────────┘
    
    BidirectionalImmutableGraph is included in this pull-request, but is
    intended to be merged upstream eventually:
    
    - https://github.com/vigna/webgraph/pull/5
    - https://github.com/vigna/webgraph-big/pull/2
    
    This commit includes a (partial) documentation overhaul. Notably,
    using cross-dependency compiling of java documentation allows us to
    remove redundant method documentations when they are inherited from
    webgraph classes. It also includes cross-dependency *linking*, which
    allows users to click on external links to unimi.it libraries.
    
    Finally, this commits introduces stub methods for optionally loading
    *labelled* graphs, which will be useful to store edges properties such
    as file names, etc. To avoid a combinatorial explosion of types
    (SwhUnidirectionalLabelledGraph, SwhBidirectionalLabelledGraph, etc..),
    this is only checked at runtime. Classes hold both the unlabelled and
    labelled versions of the underlying graphs, with the labelled version
    being equal to null when the labels have not been loaded.

See https://jenkins.softwareheritage.org/job/DGRPH/job/tests-on-diff/157/ for more details.

In D6953#180800, @seirl wrote:

@vlorentz this is all written in the docstrings of the classes

Auto-generated docs (javadoc/Doxygen/Sphinx autodoc/...) is not the right place for introductory content or architectural/design documentation.

The point of this kind of documentation is to help people know where to start, and so having this hidden behind the long list of classes isn't helpful.

I agree in general, but I disagree for the specifics of this diff. This is a description of the design of a specific component, which could be copied in an architecture presentation if it makes sense, but is certainly not an architecture presentation in itself.

In terms of consistency, the generated documentation here is totally in line with the way WebGraph documents component-size concepts, in the auto-generated documentation of the classes, for instance: https://webgraph.di.unimi.it/docs/it/unimi/dsi/webgraph/labelling/ArcLabelledImmutableGraph.html

I filed T3855 to document the architecture of the java code of swh-graph so that it stays on our radar.

WebGraph also has an overview of how classes interact with each other at the package level. https://webgraph.di.unimi.it/docs/it/unimi/dsi/webgraph/labelling/package-summary.html

Right, this is what T3855 will cover, but there is also a dependency on T1971 to figure out where this will go exactly (package summary vs sphinx)

This revision is now accepted and ready to land.Jan 17 2022, 4:24 PM