Skip to content

[fix] Ubuntu 17.10 bad memory address segfault#90

Merged
Frenzie merged 2 commits intokoreader:masterfrom
Frenzie:fix-ubuntu-1710-segfault
Oct 20, 2017
Merged

[fix] Ubuntu 17.10 bad memory address segfault#90
Frenzie merged 2 commits intokoreader:masterfrom
Frenzie:fix-ubuntu-1710-segfault

Conversation

@Frenzie
Copy link
Copy Markdown
Member

@Frenzie Frenzie commented Oct 19, 2017

@poire-z I don't quite know what to make of this.

@Frenzie Frenzie requested a review from poire-z October 19, 2017 21:21
Copy link
Copy Markdown
Contributor

@poire-z poire-z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no C expert, so I don't know. But I guess I would have included what you added in a patch without thinking much about it if it avoided a sefgault.

/// returns true for invalid/deleted node ot NULL this pointer
inline bool isNull() const { return this == NULL || _handle._dataIndex==0; }

So, may be you're in a case where it's not a null pointer, and neither a ldomNode ...
Does that happens on any document or just some specific ones?
Different gcc version? with different default options and optimisations?
@frankyifei knows probably better than me what C stuff could be happening here.

@Frenzie
Copy link
Copy Markdown
Member Author

Frenzie commented Oct 19, 2017

I'm no C expert, so I don't know. But I guess I would have included what you added in a patch without thinking much about it if it avoided a sefgault.

Of course. (But I'm certainly no C expert either.) I'm just confused about this never having happened before. And worried that similar issues might pop up elsewhere.

Legibility better like this or "properly" with ||?

Does that happens on any document or just some specific ones?

All that I tried. (Quickstart and three or four other docs.)

Different gcc version? with different default options and optimisations?

7.2.0 in Ubuntu 17.10

6.3.0 in Stretch (I think I previously accidentally typed Jessie somewhere, about cmake) and Ubuntu 17.04

Could be an idea to look at it with -fsanitize=undefined I suppose.

@Frenzie
Copy link
Copy Markdown
Member Author

Frenzie commented Oct 19, 2017

So, may be you're in a case where it's not a null pointer, and neither a ldomNode ...

Yes, it's an undefined address or something (0x0).

@poire-z
Copy link
Copy Markdown
Contributor

poire-z commented Oct 19, 2017

Legibility better like this or "properly" with ||?

Like it is, with a comment indicating why what happened when :)

https://stackoverflow.com/questions/32736392/gcc-o-segmentation-fault
You could try without any -O optimisation.
https://gcc.gnu.org/ml/gcc-help/2015-03/msg00000.html
Or add -fno-delete-null-pointer-checks

You're using the updating cmake needed for sdcv? Did it happen to with the old cmake? (could then be cmake using gcc/g++ flags of its own...)

@Frenzie
Copy link
Copy Markdown
Member Author

Frenzie commented Oct 20, 2017

sdcv didn't use cmake before, but in any case its failure to build on Ubuntu 17.10 is unrelated: https://gist.github.com/HenningJW/c70c1b7c81a05ae16ffe000eae67b056

Perhaps Mac OS had a similar problem: Dushistov/sdcv@c7c8dab

Or add -fno-delete-null-pointer-checks

Heck no. :P Unless you mean locally out of curiosity to see if it still occurs. I could try that.

@Frenzie Frenzie merged commit 64893eb into koreader:master Oct 20, 2017
@Frenzie Frenzie deleted the fix-ubuntu-1710-segfault branch October 20, 2017 08:07
@poire-z
Copy link
Copy Markdown
Contributor

poire-z commented Oct 20, 2017

Yes, I meant locally, just to understand if it's the compiler that is causing this, so we may have a global solution when similar issues do pop up elsewhere :|

And I meant: is it now a different version of cmake that is used to build crengine? If yes: could it be that version of cmake that is giving g++ different options that may cause this problem?

@Frenzie
Copy link
Copy Markdown
Member Author

Frenzie commented Oct 20, 2017

Well yes, everything's newer and hopefully better in the *buntu 17.10 release. libc, cmake, gcc… :-P

Ubuntu is still mostly Debian, Buster in this case, isn't it?

@Frenzie
Copy link
Copy Markdown
Member Author

Frenzie commented Oct 20, 2017

-fno-delete-null-pointer-checks works around the crash.

-O0 makes no difference.

-fsanitize=undefined does have an interesting effect. Perhaps it's pointing out the root cause of the problem? Regardless, I think I should probably add this to the debug flags.

screenshot_2017-10-20_23-56-18

@poire-z
Copy link
Copy Markdown
Contributor

poire-z commented Oct 20, 2017

Interesting indeed! I had never seen clearer "runtime error" message from C stuff :)

@Frenzie
Copy link
Copy Markdown
Member Author

Frenzie commented Oct 21, 2017

Some more info here: https://developers.redhat.com/blog/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/

Btw, there are more sanitize options, some of which are mutually exclusive.

Clang has 'em too. (Based on the same lib?) http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#ubsan-checks

@bahusoid
Copy link
Copy Markdown

Just my 2 cents on it. @poire-z You shouldn't compare this with NULL/nullptr. It's invalid C++ with undefined behavior. It means it might work with one compiler version but won't with another compiler/version (can be optimized out as noop).
So all null reference checks should be done on instance and not inside instance method.

You can read more about this in the following discussion:
https://www.reddit.com/r/cpp_questions/comments/3roy3e/this_nullptr_can_this_ever_be_true/

@poire-z
Copy link
Copy Markdown
Contributor

poire-z commented Oct 22, 2017

Thanks for ur 2 cents !
(I didn't compare it, but our current crengine code does :)
At least, it seems there is only ldomNode:isNull() that does it, but there are so many other isNull() methods on all kind of classes that we won't manage to track all objects where it is used on that would need this fix.
So, @Frenzie 's fix was the right one (is if (!node) equivalent to if (node == NULL) ?)
Would -fno-delete-null-pointer-checks be the right compiler switch to use so we can still live with these coding errors ?

@bahusoid
Copy link
Copy Markdown

bahusoid commented Oct 22, 2017

if (!node) Yeah. It's a C style which still works. But it's not very welcome now in cpp. It's better to distinguish null checks from bool checks and use: if(node == NULL)
Though it doesn't really matter at least for now.

I'm no expert in gcc compiler. So maybe it will work. But I don't think it's a good idea. For debugging purposes it's better to have null reference exception the sooner the better. And allowing instance calls on null references can really complicate debugging. So maybe as a temp solution - but you really should refactor those null checks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants