Conversation
|
Just some very quick notes without looking through it fully:
You seem to be using some raw |
|
I have no idea; this has always been this way before I joined the development of igraph. I think we could change this easily. You could try changing the typedef of |
src/cycle_basis.c
Outdated
| int igraph_cycle_basis_unweighted_undirected(const igraph_t* graph, | ||
| igraph_vector_t* cycles) { | ||
|
|
||
| // FIXME: deal with unconnected graphs |
There was a problem hiding this comment.
Check whether MSVC in C89 mode is okay with comments starting with double slashes. I have always tried to avoid this because I wasn't sure whether MSVC likes it or not and it's too complicated for me to get hold of a Windows machine :)
There was a problem hiding this comment.
MSVC 2015 and later do support // comments as "a Microsoft extension", unless standards mode is triggered. https://docs.microsoft.com/en-us/cpp/c-language/c-comments?view=vs-2015 This can also be tested e.g. at godbolt.org
I think earlier versions support it too, but I do not have proof.
I always went with /* ... */ just in case. You never know what compilers people use on what weird platforms that we don't have access to. Maybe someone uses an IBM compiler on a Power architecture machine. I'd be in favour of sticking with the standard (even though there are already several // comments in the igraph code base—my IDE keeps flagging them actually).
There was a problem hiding this comment.
// is C99 I believe. Older MSVC compilers do no support it. I don't know what Python 2.x uses now to compile its extensions, but R is C99, so if you want to use them I don't mind.
There was a problem hiding this comment.
There are two separate issues:
-
Is it okay to use arbitrary C99 features? No, because even MSVC 2019 doesn't support them fully. We should not use e.g. variable-length arrays for this reason.
-
Is it okay to use
//comments? Compilers typically support//as an extension, even if they do not fully support C99. So it might not be a problem in practice. It is claimed that MSVC 2010 already supported//comments.
This breaks the R interface because is there a logical is an |
|
Some more comments: I think the following would be reasonable names / prototypes: int igraph_fundamental_cycle_basis(const igraph_t *graph, igraph_vector_ptr_t *basis);
int igraph_minimum_cycle_basis(const igraph_t *graph, igraph_vector_ptr_t *basis, const igraph_vector_t *weights);Alternatively, I do not think we need to include "undirected" in the name, as this is understood. In many applications, people would run this on directed (i.e. oriented) graphs and also want to know if each edge points along the cycle direction or opposite to it. This information could even be optionally returned by the function in an additional vector (however, it is easy enough to compute, so maybe we don't need to include it). I do not think we need to include "unweighted" in the name, as weights are not relevant for bases unless we look for the minimum weight one (which is a special application). However, it is useful to indicate that it returns a fundamental basis. |
|
I believe the implementation can be made more efficient in the following way. While doing a BFS, you maintain a vector int ancestor_i = i;
int ancestor_j = j;
while (ancestor_i != ancestor_j)
{
if (dist[ancestor_i] > dist[ancestor_j])
ancestor_i = father[ancestor_i];
else if (dist[ancestor_i] < dist[ancestor_j])
ancestor_j = father[ancestor_j];
else
{
ancestor_i = father[ancestor_i];
ancestor_j = father[ancestor_j];
}
}By maintaining the traversed list of ancestors, you have the list of vertices in the cycle. If |
|
Thank you Vincent, you mean the BFS happens instead of my DFS in the code? Or outside of the for loop? Or we build the tree as we backtrack already?
…On Tue, Sep 10, 2019, at 09:37, Vincent Traag wrote:
I believe the implementation can be made more efficient in the following way.
While doing a BFS, you maintain a vector `father` that denotes for each
node the node through which it was first visited (i.e. this corresponds
to the spanning tree). Let us initially set `father` to `-1` for all
nodes, so that if `father[i] < 0` it means that node `i` was not yet
visited. In addition, we keep a vector `dist` which is the distance
from the root vertex. Whenever you encounter a node `i` that was
already visited (`father[i] > 0`) when traversing the edge `j -> i`,
you backtrack the fathers of `i` and `j` (i.e. ancestors) until
`dist[i] == dist[j]` and then keep iterating until the ancestors are
identical. In other words:
int ancestor_i = i;
int ancestor_j = j;
while (ancestor_i != ancestor_j)
{
if (dist[ancestor_i] > dist[ancestor_j])
ancestor_i = father[ancestor_i];
else if (dist[ancestor_i] < dist[ancestor_j])
ancestor_j = father[ancestor_j];
else
{
ancestor_i = father[ancestor_i];
ancestor_j = father[ancestor_j];
}
}
By maintaining the traversed list of ancestors, you have the list of
vertices in the cycle. If `ancestor_i == j` you know that the cycle was
directed and otherwise it is an undirected cycle (i.e. a cycle ignoring
the edge directions). If you want a vector of edges instead of vertices
(which is probably more useful in a multigraph) you can use edge ids
instead of vertex ids to identify the father (i.e. the edge that leads
to the father). Finally, for multiple components you should include a
`root` vector that identifies from which root the node was reached. If
`root[i] != root[j]` no cycle can be identified.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1257?email_source=notifications&email_token=AAJFEADFCWHZS4PBWP6UNITQI5FCVA5CNFSM4IUVNRMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6KEYDA#issuecomment-529812492>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAJFEAASZ7HY3NI6VSYQPR3QI5FCVANCNFSM4IUVNRMA>.
|
|
I believe that the initial BFS that you perform is sufficient for immediately identifying the cycles, while creating the spanning tree. In other words, you identify the cycles as you are building the spanning tree in one go. You now repeatedly perform a DFS on the resulting spanning tree, but you will then repeatedly backtrack the same paths I believe (i.e. the |
|
And probably you should indeed call it |
|
Sounds good, I'll get rid of those DFS loops :-)
I knew it was optimizable, but wanted to have a rough version out there to start the discussion
…On Tue, Sep 10, 2019, at 11:03, Vincent Traag wrote:
And probably you should indeed call it `parent` instead of `father` :)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1257?email_source=notifications&email_token=AAJFEAHT3FUZAXEH5FW23HLQI5PH5A5CNFSM4IUVNRMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6KMPLY#issuecomment-529844143>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAJFEAHIHMKG3S7WCOWHFGTQI5PH5ANCNFSM4IUVNRMA>.
|
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
|
@iosonofabio If you are still working on this, feel free to remove the |
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions. |
|
I restarted working on this one, don't worry about the massive merge commits for now, I can squash-rebase at the end. |
|
Great! I noticed you merged in |
|
Good point. I'm working on the weighted graph function which is significantly more messy than the unweighted one, so I'm unclear on the time frame - not so familiar with Horton cycles I admit. There's also no urgency I believe. So maybe develop is fine? But I'm also fine rebasing against master |
|
I'd say keep it on |
|
Ok, I'm sharpening the tools for this one. Some advice would be appreciated, but in terms of general approach and specific code. Two points in particular:
Thanks! |
I actually ran into the same publication recently. It sounded like an interesting option, although I have not yet explored the details. |
|
I can read the paper, that's not the problem. How much difference do you think it will make in practice versus the increased complexity? My understanding is the more recent approach does not use all vertices but just a feedback vertex set (which we have to implement separately - can default to all vertices of degree != 2 for now if I'm not wrong). Moreover it also only analyzes cycles in distinct subtrees from the root. However this algorithm requires travelling across the trees from root to leaves, which is more difficult than the other way (just storing a parent vector). That's what I'm wondering right now, i.e. what an efficient data type would be for that purpose. Wait, I should just look a little longer. Seems like the minimum spanning tree prim function uses a vector of edges to store the tree, we can do the same here. Then we can travel from root to children relatively easily. |
If you do need this is practice, I think it would make a great contribution to the |
src/cycle_basis.c
Outdated
| igraph_integer_t from, to; | ||
| igraph_edge(graph, (igraph_integer_t) edge, &from, &to); | ||
|
|
||
| /* we only know that either one is actnode, swap if needed */ | ||
| if (to==actnode) { | ||
| to = from; | ||
| from = to; | ||
| } |
There was a problem hiding this comment.
Noting in case you were not aware:
We have IGRAPH_TO and IGRAPH_FROM, which are both simpler to use and more efficient than igraph_edge (they avoid any function call).
There is also IGRAPH_OTHER. If an undirected edge is u-v then IGRAPH_OTHER(graph, edge, v) gives u. Did you need IGRAPH_OTHER(graph, edge, actnode) here?
There was a problem hiding this comment.
This is also a copycat from spanning_trees.c. Shall I make a PR to fix that?
There was a problem hiding this comment.
I think it should be fixed in spanning_trees.c as well. That file is a very old part of the codebase, probably well before the time we've added IGRAPH_TO and IGRAPH_FROM. It probably does not make any difference performance-wise, but it helps people to pick up best practices if we have less places in the codebase where we don't use the shortcuts we've implemented to make our life easier in the long run.
There was a problem hiding this comment.
It probably does not make any difference performance-wise
It depends. If this is the only function call in a loop, eliminating that call may very well make a noticeable difference. Function calls are expensive.
src/cycle_basis.c
Outdated
| IGRAPH_CHECK(igraph_incident(graph, &inc_edges, (igraph_integer_t) actnode, | ||
| IGRAPH_ALL)); | ||
|
|
||
| for (j=0; j<igraph_vector_size(&inc_edges); j++) { |
There was a problem hiding this comment.
Please avoid using vector_size in for loops like this. Instead, get the size first, store it in a variable, then use that variable.
Here it would not matter that much because there are already several other function calls in the loop, but in general, it's good to get into the habit of storing the size in a variable. In a tight loop, that function call would bring a noticeable performance hit.
There was a problem hiding this comment.
This is copycat of the exact same pattern in spanning_trees.c, e.g. line 321. I think the compiler will put the igraph_vector_size(&inc_edges) into a variable anyway, so I'm not sure a human should deal with these kinds of optimizations.
So if this is a performance problem because the compiler does not do its job, I can open a separate PR to fix it everywhere else too. @ntamas @vtraag ?
There was a problem hiding this comment.
I have been surprised several times in the past by the complexity of optimizations compilers can do automatically, but I'm still unsure :)
On one hand, yes, it should be the compiler's job to optimize this and leave it to us mere mortals to deal with the more interesting parts of programming. On the other hand, in order to do the optimization here in this particular case, the compiler (let it be gcc, clang, MSVC or something else) must be able to infer that 1) igraph_vector_size() is side-effect-free (okay, that's actually quite easy), and that 2) the thing that igraph_vector_size() calculates is an invariant of the loop so it can be lifted to an external variable (which might even end up being in a register only). If the compiler can infer this, that's great.
As for me, I routinely lift the size of a vector into a variable on its own and then use that variable in the loop -- compilers were not that good in the past and old habits die hard. Also, it gives me peace of mind to do so because then I can say that it probably works even on a crappy compiler. However, if someone leaves the function call in the loop, that's also okay if I know it's a simple function call that does not do anything complex.
There was a problem hiding this comment.
The compiler cannot infer that igraph_vector_size is side-effect free because it does not have the definition of that function. It is in a different translation unit.
Link-time optimization might be able to mitigate this to some extent, but it's not guaranteed, and LTO does not even work on all platforms. We should not rely on it.
There was a problem hiding this comment.
@iosonofabio In C++ one can just do for (int i=0; i < vec.size(); ++i) because vec.size() will be an inline function that lives in a header. The compiler can analyse it and figure out that it does not need to call it for every loop iteration. In C, this is not possible because there is no general support for inline functions and the definition will simply not be visible to the compiler.
It is good practice to make a habit of avoiding this pattern. It's very easy to do so and then you don't have to keep thinking about how big the performance impact is anyway.
src/cycle_basis.c
Outdated
| while (! igraph_dqueue_empty(&q)) { | ||
| /* current node, incoming edge, and distance from tree root */ | ||
| long int actnode=(long int) igraph_dqueue_pop(&q); | ||
| long int actedge=(long int) igraph_dqueue_pop(&q); |
There was a problem hiding this comment.
Let's not use tab characters for indentation. It appears that there are lots of tabs in this file, mixed with spaces.
There was a problem hiding this comment.
Speaking about tabs vs spaces: I think we have an .editorconfig at the root, but not all editors pick up the settings from .editorconfig by default, so we need another safety mechanism to prevent misformatted code from slipping into the codebase. Initially I wanted to propose using pre-commit for validating the codebase with pre-commit hooks, but then I realized that third-party contributors would probably not have pre-commit installed and then we would be back to square one. Do you guys know about any Github plugin that could be used to validate code style in the PRs, or do we have to create a CI test for that?
src/cycle_basis.c
Outdated
| /* Clean */ | ||
| igraph_vector_destroy(&fvs); | ||
| igraph_vector_ptr_destroy_all(trees); | ||
| IGRAPH_FINALLY_CLEAN(1 + n + 1); // FIXME: Check this number |
There was a problem hiding this comment.
This may be a useful read:
Unfortunately, the handling of vector_ptrs in an error conditions is not completely standardized yet.
I am not aware of any existing case where all vector_ptr elements are put on the finally stack, and maybe it would be better not to start doing this. See the above thread for how this is usually handled.
Something to pay attention to:
Should the function exit due to an error condition or interruption, it should be safe for the caller to destroy/free any non-NULL elements of the ptr_vector. Thus, anything you destroy must be set to NULL. Alternatively, if everything was destroyed, the vector_ptr can be cleared.
As far as I can tell, existing functions do not leave any elements undestroyed or unfreed in an error condition.
Tip: igraph_Free(ptr) will set ptr to NULL, but igraph_free will not (cannot). The first is a macro, to be used in normal situations. The second is a function, to be used with IGRAPH_FINALLY, where one must take its address.
There was a problem hiding this comment.
I am not aware of any existing case where all vector_ptr elements are put on the finally stack
The size of the finally stack is limited so probably not all elements would fit there anyway. So far I was following the principle that the vector_ptr takes "ownership" of any pointer that you put into it, so you only need to put a cleanup function on the finally stack that iterates over the elements of the vector_ptr and destroys and frees the non-null ones. This function can then be popped off the finally stack when the vector_ptr is returned to the user. The "item destructor" pattern outlined in the Discourse group is supposed to help with the implementation.
|
Thanks @szhorvat . At this stage I'm not fixing memory leaks and similar stuff, but thanks. I haven't seen any comments on the general strategy, I'm just pushing ahead then. I'll come back to the nitty gritty details once the core of the algorithm is implemented 👍 |
|
@iosonofabio Is the fundamental cycles part considered to be ready now? For the minimum cycle basis, I am familiar with Horton's algorithm, which is simple:
I am not familiar with the paper you linked. I'd need to look at it in detail to comment, which would take a few hours. I need to force myself off igraph for now, so I can make progress on other projects ... |
|
Fantastic @szhorvat , thanks for the recap. The paper is an incremental improvement over that. Don't worry too much about this one, it'll take me a few more days ;-) |
|
FYI, I just came across |
|
This one is also not urgent, esp. since @szhorvat has implemented a similar algorithm. Closing for now. |
DO NOT MERGE YET!
Trying to implement functions for finding a cycle basis:
Final TODO before merging:
developbranchigraph_vector_ptrsince current convention is shaky