Skip to content

Conversation

@jurepustos
Copy link
Contributor

Fixed issue #1224 by adding an optional argument to for a continuous node id mapping, i.e. mapping of current node ids to [0,n).
Replaced raw accesses into vectors by the index operator with calls to .at() which performs bounds checks. This fixes an unreported bug which causes undefined behaviour and specifically segmentation faults when node ids are not in the [0,n) interval. The new behaviour is to throw an exception with a descriptive message instead.
If a node id mapping is passed to the algorithm, it maps all node ids through the mapping (and performs bounds checks as in above paragraph). If node ids are not mapped to [0,n), the bounds checks will cause the algorithm to throw an exception with an appropriate message.
Also refactored tests for TopologicalSort and added new tests for bounds checks and node id mappings.

@jurepustos jurepustos force-pushed the feature/topological-sort branch from e75abb5 to 83ee59e Compare July 26, 2024 15:17
@fabratu
Copy link
Member

fabratu commented Aug 2, 2024

Hi,

thank you for your first contribution to NetworKit!

In general the PR looks good. However due to the scope of the project (supporting large scale analysis) avoid using .at() when it is called frequently.

@jurepustos
Copy link
Contributor Author

If there's a better solution that avoids out of bounds accesses, I'm all for it. But I think the tradeoff of a small performance impact due to bounds checks vs. accidental memory corruption should be cleary in the favour of bounds checks. With memory corruption, the best scenario is a segmentation fault, while the worst is subtle breakage and wrong calculations in completely unrelated code.

I did look at Graph.upperNodeIdBound() before. Is it equal to the highest node id? If it is, I could eliminate a significant portion of .at() calls by checking this number. Or, if that's not the case, I could make the continuous property a part of Graph itself. As for the node id mapping, I could make a new structure that encapsulates the mapping and makes sure it stays continuous. I'll greatly appreciate feedback on this.

@fabratu
Copy link
Member

fabratu commented Aug 6, 2024

But I think the tradeoff of a small performance impact due to bounds checks vs. accidental memory corruption should be cleary in the favour of bounds checks.

The problem in this case is, that the overhead is not small in general for .at(). I did some random access on a mid-sized vector on a M2 macOS and it shows a 50% overhead.

I did look at Graph.upperNodeIdBound() before. Is it equal to the highest node id? If it is, I could eliminate a significant portion of .at() calls by checking this number.

Yes, your impression is correct. It gives the highest node id. There also exists the function getContinuousNodeIds, which retrieves the continuous mapping of a graph.

Or, if that's not the case, I could make the continuous property a part of Graph itself. As for the node id mapping, I could make a new structure that encapsulates the mapping and makes sure it stays continuous.

The rationale concerning performance also applies here. In principal, adding a bool variable to the Graph class is cheap. However keeping track of it is not necessarily cheap. You would for example have to keep track of the number of id holes in order to avoid O(n). As a result, I would not encourage adding such a property.

@jurepustos
Copy link
Contributor Author

The problem in this case is, that the overhead is not small in general for .at(). I did some random access on a mid-sized vector on a M2 macOS and it shows a 50% overhead.

Noted, I was under the impression that the impact wouldn't be this big. I should have benchmarked both versions of the algorithm to see the difference.

Yes, your impression is correct. It gives the highest node id. There also exists the function getContinuousNodeIds, which retrieves the continuous mapping of a graph.

Nice, upperNodeIdBound makes things much easier. I'm also aware of getContinuousNodeIds - in hindsight, I should have asked if calling this function when a mapping is not passed would be a good decision. I'll implement it like that instead.

Considering that, I'm thinking of the following. Since the caller might have calculated a mapping already, it would still make sense to allow the caller to pass their own mapping, hence my original decision to make it a parameter. However, by having this parameter, I believe that we have the responsibility of checking the caller's mapping for correctness, or at least that mapped node ids fall in [0,n). GraphTools.getCompactedGraph(), which similarly takes a continuous node ids mapping, doesn't perform any checks on the mapping, so I'm guessing we're okay with not checking it? I personally dislike this and would love a better solution, perhaps a wrapper class around std::unordered_map that maintains continuity. Would that be wanted, or a complete waste of time?

@fabratu
Copy link
Member

fabratu commented Aug 7, 2024

Yeah, I understand the part about not liking it. We occasionally implement an additional parameter, which enables safety checks (like addEdge in Python) and document the costs of these check. We could also go this route here.

In general your impression is again correct, most functions per default don't check for sanity (if it is too costly). As a result, the maintenance and correct preparation of data is relayed to the user.

Copy link
Member

@fabratu fabratu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to remove the static class member and use const for the node map. Added some code suggestion to make this work. The only significant change is in mapNode, since const does not support operator[] for unordered_map.

@jurepustos jurepustos force-pushed the feature/topological-sort branch from 7c2557b to b918efe Compare September 24, 2024 19:22
@fabratu
Copy link
Member

fabratu commented Sep 27, 2024

I approved the PR. Thanks again for your contribution!

@fabratu fabratu merged commit 618e8d3 into networkit:master Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants