Skip to content

Add raw values enumeration to fix segfaults#1312

Merged
bcoconni merged 1 commit intoJSBSim-Team:masterfrom
heshpdx:enum_to_int
Aug 7, 2025
Merged

Add raw values enumeration to fix segfaults#1312
bcoconni merged 1 commit intoJSBSim-Team:masterfrom
heshpdx:enum_to_int

Conversation

@heshpdx
Copy link
Contributor

@heshpdx heshpdx commented Jul 30, 2025

After the fine work done in #1304 and #1311, my colleague is still getting a segfault when building with clang -flto=full -fvirtual-function-elimination -fvisibility=hidden.

We believe the issue is from how the code is converting from a generic type to an Int type for Enum use. In this case, a casting occurs, changing the underlying object type, and thus swapping from an object type for which the vtable has been optimized away to a type that expects to have a vtable.

This PR shows one way to work around the issue. With this patch, we confirm that the JSBSim HEAD successfully runs on each of the CPUv8 inputs without segfaulting. Please let us know what you think. Thanks!

#834

Works around a segfault in LLVM builds.
@codecov
Copy link

codecov bot commented Jul 30, 2025

Codecov Report

❌ Patch coverage is 70.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 24.87%. Comparing base (5200260) to head (bf00d70).
⚠️ Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
src/input_output/FGPropertyManager.h 65.38% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1312      +/-   ##
==========================================
+ Coverage   24.81%   24.87%   +0.05%     
==========================================
  Files         169      169              
  Lines       19631    19655      +24     
==========================================
+ Hits         4872     4889      +17     
- Misses      14759    14766       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@seanmcleod70
Copy link
Contributor

Just had a quick initial look. So it looks like the issue here is we're dealing with methods that return an enum, whereas the last 2 PRs were dealing with methods that took an enum as an argument?

@seanmcleod70
Copy link
Contributor

Actually not a member method returning an enum, rather it's a member field of type enum?

@seanmcleod70
Copy link
Contributor

When it segfaults, is there any debug info mentioning exactly which missing virtual function it was trying to call?

@bramoore
Copy link

The segfault comes during untie(), where the value gets resolved.

inline int
SGPropertyNode::get_int () const
{
  if (_tied)
      return (static_Cast<SGRawValue<int>*>(_value.val)->getValue();
  else
    return _local_val.int_val;
}

From what I can tell, LLVM is optimizing away the vtable from the actual SGRawValueMethods type. Casting to a SGRawValue<int> parent type is not the same, and expects a vtable. Making an SGRawValueMethodsEnum<> that inherits from SGRawValue<int> gets the compiler into the place that it understands that these types are connected... Deriving from SGRawValue<Enum> is not the same as SGRawValue<int>.

As for the other calls, I believe it is also coming back to the same root cause, but the cast of the pointer from Enum* to int* is sufficient to avoid the issue.

@bcoconni
Copy link
Member

bcoconni commented Aug 1, 2025

I have to say that I am not comfortable with this PR.

Unless proven mistaken, the code in SGPropertyNode::get_int is legal C++ code. So we are in a completely different ballpark than PRs #1304 and #1311. And I am struggling to see why we should change JSBSim code to accomodate extremely aggressive optimizations from LLVM which make legal C++ code fail ?

@bcoconni
Copy link
Member

bcoconni commented Aug 1, 2025

I mean before merging the PRs #1304 and #1311, we were using reinterpret_cast<> under the hood and this casting is notoriously known to generate undefined behavior, in particular preventing the compiler to check that the conversion is legal.

In SGPropertyNode::get_int, the code is using static_cast<> and compilers would complain if the conversion was illegal. So unless someone proves us that the code in SGPropertyNode::get_int is invalid or undefined behaviour, I am challenging the validity of the LLVM optimization.

@bramoore
Copy link

bramoore commented Aug 1, 2025

Static casting of a parent pointer to a derived pointer is fine, as long as you are casting to a type that matches the underlying data. _value.val is a SGRaw*, so it is legal to cast to SGRawValue<int>*, as long as the pointer is pointing to an object of that type.

SGRawValue<Enum> is not the same type as SGRawValue<int>, and so this is not valid C++.

Consider the following:

struct Base {
    Base() = default;
    virtual ~Base() = default;
};

struct Derived1 : public Base  {
    virtual ~Derived1() = default;
};

struct Derived2 : public Base  {
    virtual ~Derived2() = default;
};


int main(void) {
    Base * v = new Derived1();
    Derived2 *v2 = static_cast<Derived2*>(v);

    Derived1 d1;
    Derived2 * pd1 = static_cast<Derived2*>(&d1);

    return 0;
}
$ g++ ./test.cc
./test.cc: In function ‘int main()’:
./test.cc:20:22: error: invalid ‘static_cast’ from type ‘Derived1*’ to type ‘Derived2*’
   20 |     Derived2 * pd1 = static_cast<Derived2*>(&d1);
      |                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~

The first cast is not shown as an error, because we have laundered the type through the base pointer. When directly casting, without laundering the pointer, the compiler knows that it is an invalid cast.

@bcoconni
Copy link
Member

bcoconni commented Aug 1, 2025

Thanks @bramoore. I stand corrected.

Copy link
Member

@bcoconni bcoconni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I have misunderstood the purpose of the patch, the class SGRawValueMethodsIndexedEnum also needs to inherit from SGRawValue<int> ?

Also I am unsure why SGRawValueMethodsEnum::getValue returns an int while SGRawValueMethodsIndexedEnum::getValue is returning the type T?

virtual T getValue() const {
if (_getter) { return (_obj.*_getter)(_index); }
else { return SGRawValue<T>::DefaultValue(); }
}

@bcoconni
Copy link
Member

bcoconni commented Aug 1, 2025

And I'm wondering if using dynamic_cast<> in SGPropertyNode::get_int and its sibling would not be some good practice.

inline int
SGPropertyNode::get_int () const
{
if (_tied)
return (static_cast<SGRawValue<int>*>(_value.val))->getValue();
else
return _local_val.int_val;
}

I guess that dynamic_cast<SGRawValue<int>*>(_value.val) would return void for the case at stake ? So something along the lines below would help intercepting any future similar occurrence ?

   if (_tied) {
     auto raw_value = dynamic_cast<SGRawValue<int>*>(_value.val);
     assert(raw_value); // Intercepts errors when compiled with NDEBUG.
     return raw_value->getValue();
   }
   else 
     return _local_val.int_val; 

This change would be implemented on our copy of SimGear only but this would help making sure that future changes are not introducing similar issues.

@seanmcleod70
Copy link
Contributor

Yep, just confirmed that dynamic_cast<SGRawValue<int>*>(_value.val) does return nullptr in this case.

Any reason not to also ask the maintainers of SimGear to switch to using dynamic_cast for these cases?

@bcoconni
Copy link
Member

bcoconni commented Aug 1, 2025

Yep, just confirmed that dynamic_cast<SGRawValue<int>*>(_value.val) does return nullptr in this case.

Nice ! At least we now have a way to intercept these errors without recurring to some specific optimization from one compiler. However as far as I am aware dynamic_cast<> is using more CPU cycles than static_cast<> and I have some vague recollection that JSBSim is already spending much time in the SGPropertyNode code.

Any reason not to also ask the maintainers of SimGear to switch to using dynamic_cast for these cases?

Why not ? However I am not sure they have creative usages of SGRawValue<T> just as we have.

@seanmcleod70
Copy link
Contributor

seanmcleod70 commented Aug 1, 2025

Yep, static_cast<> is literally 3 mov operations:

       auto raw_value = static_cast<SGRawValue<int>*>(_value.val);
00000001402AB0CA  mov         rax,qword ptr [this]  
00000001402AB0CF  mov         rax,qword ptr [rax+98h]  
00000001402AB0D6  mov         qword ptr [rsp+20h],rax 
        return raw_value->getValue();

Whereas dynamic_cast calls into the C++ runtime's dynamic cast function.

        auto raw_value = dynamic_cast<SGRawValue<int>*>(_value.val);
00000001402AB0CA  mov         dword ptr [rsp+20h],0  
00000001402AB0D2  lea         r9,[SGRawValue<int> `RTTI Type Descriptor' (01404AE258h)]  
00000001402AB0D9  lea         r8,[SGRaw `RTTI Type Descriptor' (01404AD8F8h)]  
00000001402AB0E0  xor         edx,edx  
00000001402AB0E2  mov         rax,qword ptr [this]  
00000001402AB0E7  mov         rcx,qword ptr [rax+98h]  
00000001402AB0EE  call        __RTDynamicCast (0140365CB2h)  
00000001402AB0F3  mov         qword ptr [rsp+30h],rax  
        return raw_value->getValue();

Which is at least another 2 function calls deep.

image

So maybe we should only ifdef the dynamic_cast for debug builds, and make sure we include some debug builds that run the existing test cases etc.

@bramoore
Copy link

bramoore commented Aug 6, 2025

Checking in to see what the next steps are. Is there anything more you need from us?

@seanmcleod70
Copy link
Contributor

I think we should go-ahead with this PR, but leave the dynamic_cast suggestion until later, to decide whether to use it at all, or only in debug builds, do some performance testing on it etc. @bcoconni your thoughts?

@bcoconni bcoconni merged commit b005a91 into JSBSim-Team:master Aug 7, 2025
29 checks passed
@bcoconni
Copy link
Member

bcoconni commented Aug 7, 2025

I think we should go-ahead with this PR, but leave the dynamic_cast suggestion until later, to decide whether to use it at all, or only in debug builds, do some performance testing on it etc.

Yep ! I was about to say the same thing. So I 've pushed the "merge" button.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants