Skip to content

Optimize Object::cast_to for final classes by using typeid instead of dynamic_cast.#103693

Closed
Ivorforce wants to merge 1 commit intogodotengine:masterfrom
Ivorforce:optimize-cast-to
Closed

Optimize Object::cast_to for final classes by using typeid instead of dynamic_cast.#103693
Ivorforce wants to merge 1 commit intogodotengine:masterfrom
Ivorforce:optimize-cast-to

Conversation

@Ivorforce
Copy link
Copy Markdown
Member

@Ivorforce Ivorforce commented Mar 6, 2025

Object::cast_to is used to check if an object is of a certain type. It uses RTTI (runtime type information) to do this (dynamic_cast). If the object type matches, it returns the pointer (reinterpreting the pointee), otherwise it returns a nullptr.

Explanation

For final classes, this operation is as easy as checking the type ID itself, because the type cannot be inherited from. The inheritance tree does not need to be checked, which is the slowest part of dynamic_cast.
For some reason, compilers (at least clang) are not optimizing this case. I'm not sure why. Either I'm missing something fundamental, or they just haven't managed to do it yet. In the future, this explicit check may become unnecessary.

Benchmarks

I benchmarked a performance difference of at least 740x (see misses; due to a call that prevents inlining the loop the hits are slower).

Code
struct Object {
	virtual ~Object() = default;
	virtual void a() {}
};

struct B : public Object {
	void a() override {}
};

struct C : public B {
	void a() override {}
	__attribute__ ((noinline)) void test() {}
};

struct CF final : public B {
	void a() override {}
	__attribute__ ((noinline)) void test() {}
};

template <typename T>
static T *cast_to(Object *p_object) {
	if constexpr (std::is_final_v<T>) {
		return (p_object && typeid(*p_object) == typeid(T)) ? static_cast<T *>(p_object) : nullptr;
	} else
	{
		return dynamic_cast<T *>(p_object);
	}
}

template <typename T>
__attribute__ ((noinline)) void test(Object *a) {
	for (int i = 0; i < 100000000; ++i) {
		__attribute__ ((noinline)) T *c = cast_to<T>(a);
		if (c) {
			c->test();
		}
	}
}

template <typename T>
static T *cast_to_old(Object *p_object) {
	return dynamic_cast<T *>(p_object);
}

template <typename T>
__attribute__ ((noinline)) void test_old(Object *a) {
	for (int i = 0; i < 100000000; ++i) {
		T *c = cast_to_old<T>(a);
		if (c) {
			c->test();
		}
	}
}

int main()
{
	{
		auto t0 = std::chrono::high_resolution_clock::now();
		test<C>(new C());
		auto t1 = std::chrono::high_resolution_clock::now();
		std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n";
	}
	{
		auto t0 = std::chrono::high_resolution_clock::now();
		test<C>(new B());
		auto t1 = std::chrono::high_resolution_clock::now();
		std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n";
	}
	{
		auto t0 = std::chrono::high_resolution_clock::now();
		test<CF>(new CF());
		auto t1 = std::chrono::high_resolution_clock::now();
		std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n";
	}
	{
		auto t0 = std::chrono::high_resolution_clock::now();
		test<CF>(new B());
		auto t1 = std::chrono::high_resolution_clock::now();
		std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n";
	}

	{
		auto t0 = std::chrono::high_resolution_clock::now();
		test_old<C>(new C());
		auto t1 = std::chrono::high_resolution_clock::now();
		std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n";
	}
	{
		auto t0 = std::chrono::high_resolution_clock::now();
		test_old<C>(new B());
		auto t1 = std::chrono::high_resolution_clock::now();
		std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n";
	}
	{
		auto t0 = std::chrono::high_resolution_clock::now();
		test_old<CF>(new CF());
		auto t1 = std::chrono::high_resolution_clock::now();
		std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n";
	}
	{
		auto t0 = std::chrono::high_resolution_clock::now();
		test_old<CF>(new B());
		auto t1 = std::chrono::high_resolution_clock::now();
		std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n";
	}
}

This printed:

(new implementation)
942ms (hit: cast C to C)
754ms (miss: cast B to C)
86ms (hit: cast C final to C final)
0ms (miss: cast B to C final)

(old implementation)
917ms (hit: cast C to C)
748ms (miss: cast B to C)
910ms (hit: cast C final to C final)
742ms (miss: cast B to C final)

Caveats

It's not clear whether we need this PR yet:

Currently, (almost?) no Object derived classes are even final.
But for a lot of existing Object classes, it doesn't make sense to derive from them. Some could probably be made final without much risk.

Also, it is unclear to me whether dynamic_cast is used in any performance critical functions. There are around 2500 separate calls, but one would hope none are in per-tick functions.

Still, I wanted to open this PR, just to show that this optimization is possible.

@Ivorforce Ivorforce requested a review from a team as a code owner March 6, 2025 13:21
@lawnjelly
Copy link
Copy Markdown
Member

lawnjelly commented Mar 6, 2025

Also, it is unclear to me whether dynamic_cast is used in any performance critical functions.

Object::cast_to is used all over the shop, including traversing scene trees, so it should be fast.

In 3.x we have some alternate implementation on some platforms (Android, web looks like):

	template <class T>
	static const T *cast_to(const Object *p_object) {
#ifndef NO_SAFE_CAST
		return dynamic_cast<const T *>(p_object);
#else
		if (!p_object)
			return NULL;
		if (p_object->is_class_ptr(T::get_class_ptr_static()))
			return static_cast<const T *>(p_object);
		else
			return NULL;
#endif
	}

@Ivorforce
Copy link
Copy Markdown
Member Author

Closing because this is probably not needed after #103708 has been merged.

@Ivorforce Ivorforce closed this Mar 28, 2025
@Ivorforce Ivorforce deleted the optimize-cast-to branch March 28, 2025 17:57
@AThousandShips AThousandShips removed this from the 4.x milestone Mar 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants