Reflection is something a lot of people wish the Rust language had: It is not hard to stumble across somebody with an interesting use case for it.
People want to use it for serialization, GCs, better interop, and so, so much more.
If you can think of a task, there is somebody out there wishing they could implement it using reflection.
Sadly, it does not look like it is coming any time soon.
Still, silly things like “this feature does not exist yet” can’t stop us from talking about how it could or even must work.
Reflection interacts with the safety features of Rust in a somewhat counter-intuitive way. Those interactions force any reflection API to obey certain rules.
Those restrictions may seem obvious to people designing a reflection API, or to Rust experts, but they seem to be rarely mentioned.
This article will try to explain those rules in simple, digestible terms. I will not only show how things have to work, but I will also explain why those requirements exist.
Additionally, I will explore the knock-on effects of such restrictions - mainly revolving around accessing fields. You may be surprised just how much those affect the shape of potential reflection APIs.
In a lot of languages, reflection is able to access all the fields of an object, no matter if they are private or not. Reflection just seems to be a bit special, and able to bend the rules here and there.
There are, of course, some restrictions to this, but, in general, the rules around reflection tend to be less strict.
Coming from a high-level language with reflection, it seems reasonable to assume that this is a natural part of reflection. You can set the value of a private field in C#:
class Test {
private float _secret;
}
void SetSecret(Test test, float val) {
typeof (Test).GetField("_secret", BindingFlags.NonPublic | BindingFlags.Instance).SetValue(test, val);
}
In Java:
class Test {
private float _secret;
}
class Spy {
void SetSecret(Test test, float val) {
Field field = getField(test.getClass(), “_secret”);
// Oh, this field is inaccessible?
field.setAccessible(true);
// Now it is accessible.
field.set(test, val);
}
}
This is even easier in other languages, like python or lua; There, the notion of privacy is more of a courtesy than a rule.
class TestClass:
__secret = "Spooky"
# No privacy in snake-land...
setattr(a, '_TestClass__secret',val)
# This works too :)
test._TestClass__secret = val
test.secret = val;
-- In Lua, everything is just a table.
test["secret"] = val;
Doing things this way is often seen as an anti-pattern, since it breaks encapsulation. Nevertheless, it is useful in certain scenarios; for example, when serializing and deserializing data. After all, requiring all serializable / deserializable fields to be public would probably bring more trouble than it is worth.
You may ask: So what? Yes, in a lot of modern languages, reflection is able to access private fields, but what has it got to do with Rust?
Well, things can’t work this way in Rust, no matter what reflection API ends up being implemented. Let me show you why.
Since reflection is not something trivial to understand, you might expect the example showing problems with it to be complex too.
Thankfully, it can be demonstrated with some widely used Rust types, like Box<i32> - which means you don't need too much knowledge to understand how things work.
Let us suppose we have some way of accessing all fields using reflection, and we use that hypothetical API to write a function like this:
/// Gets a mutable reference to the first field of a struct using reflection
fn get_first_field_mut<T,F>(t:&mut T)->&mut F{
// Implemented using reflection
}
At a first glance, it does not seem all that scary. It doesn’t do much, besides just getting a mutable reference to a (potentially) private field.
In other languages, accessing private fields is considered a necessary evil. Sure, if you get too... creative with it, your coworkers might want to make you past tense. Still, as long as you have a very good reason to use reflection like this, and use it sparingly, everything is going to be fine.
In Rust, this is something that simply can’t be done - allowing reflection to access private fields is fundamentally unsafe.
How? Let me show you.
Using this seemingly innocent function, we write something like this:
// A `Box<i32>` points to a heap-allocated integer
let boxed = Box::new(1_i32);
// Walk through internal types of `Box`, to find its internal pointer
let unique = get_first_field_mut(&mut boxed);
let non_null = get_first_field_mut(unique);
let raw_ptr:&mut *const i32 = get_first_field_mut(non_null);
// After getting the internal pointer of `Box<i32>`, we replace it with an invalid one.
*raw_ptr = 0xDEAD_BEEF_usize as *const i32;
// The box now points to an invalid memory location, so trying to read its value is UB
// and may cause our program to crash
let val:i32 = *boxed;
This may be a bit hard to follow, so stick with me.
As you may know, the Box type allows us to store data on the heap. Inside a Box, we can find a pointer to this data.
A Rust Box is made up from several layers of types(*T, NonNull<T>, Unique<T>, Box<T>), each telling us more about the pointer.
Here, we use reflection to go through all of those types, and replace that pointer (eg. 0x7dff_3340) with a garbage value (0xDEAD_BEEF).
Now, the box "thinks" its value is at some unrelated address. When we try to get that value back, we will cause Undefined Behavior(UB).
Depending on what exactly is present at that memory location, our program might crash, or read some unrelated piece of data.
This effectively breaks memory safety, so it must be, by definition, impossible in safe Rust.
At this point, I think you can also see exactly why a safe reflection API can't allow this to happen. So, reflection in Rust must respect the field access rules.
In other languages, breaking those rules is going to make your program hard to read. In Rust, those rules are strictly necessary for enforcing safety. Breaking them kind of breaks the whole language, which is less than ideal.
So, reflection in Rust must either respect access rules (aka “field privacy”) or be unsafe.
Immediately, you may have a few questions.
All of them are important, so let's start with the first few.
At first, it may seem like we need mutable access to cause any trouble. After all, if we can’t change the underlying data, we can’t really cause any harm, can we? So, you might think that kind of restriction on reflection should be enough.
Well, we can still do quite a bit of harm with an “immutable” reference. While most Rust types can’t be changed without obtaining a mutable reference, some of them can. So, allowing any sort of unintended access to private fields is a recipe for disaster.
Consider the reference count of an Arc. The counter is an AtomicUSize - a type which can be changed, without the need for a mutable reference. If we could, for example, set its counter to 1, when it ought to be higher, we could cause an use-after-free bug.
let a = Arc::new(7);
let b = a.clone();
let count = get_arc_count_via_reflection(&a);
count.store(1, Ordering::Relaxed);
// Frees the underlying arc - b still points to the arc, and is now dangling!
drop(a);
// Reads freed memory - UB, and may lead to crashes
let val = *b;
This example also shows the second issue: it is not just pointers we need to worry about.
Changing pretty much any private data can cause issues. Consider the length of a Vec - it is nothing more than a plain integer(usize).
let mut two_strs = vec!["Hello","world!"];
let len:&mut usize = get_vec_len_via_reflection(&mut two_strs);
// Corrupt the vec len:
*len = 1024;
// Try printing the elements of the vec - this is UB and will most likely cause a crash
for s in two_strs{
println!("s:{s}");
}
By just changing a simple integer, we have caused a lot of damage.
This hopefully shows that reflection with access to private fields could lead to issues.
You might be tempted to ask: are those issues realistic? Yeah, allowing reflection to access private fields can lead to unsafety, but will it do so in practice?
The simplest, but a bit boring answer to this question is that this does not matter.
Even if something is “unlikely” to cause UB, it still must be unsafe in Rust.
So, if accessing private fields via reflection could cause UB, then it is by definition, unsafe.
A safe reflection API simply can't allow for that to happen.
This answer is, admittedly, a bit technical. I don't like just showing that something is unsafe in theory, and instead prefer giving more concrete examples.
Many C libraries give out integers, representing opaque handles to objects. So, a handle OpenGL texture might just be a simple wrapper around an integer on the Rust side.
/// An OpenGL texture
struct Texture(u32);
If somebody tries to serialize such a type(using reflection), bad things will happen: that handle is only valid within the current thread. So, somebody could think they are saving the underlying image data, but be in for a nasty surprise.
// Save the result of rendering a frame to the disk
reflect_serialize(out_file, render_result);
The snippet above does not look wrong on the first glance, but it will cause crashes.
This is something we definitely want to avoid.
Rust is a language focused on security, and requiring people to know what can and can’t be safely serialized just does not mesh well with that.
By this point, you probably get the general gist of things. Reflection in Rust simply can't safely access private fields. I think a lot of people also think that allowing reflection to access private fields is a bad idea in general. So, not much is lost here.
Still, saying that something is “not possible and also a bad idea” feels a bit anticlimactic. Surely, there must be some way to eat our cake and have it too. Enabling reflection to access private fields is arguably useful: a lot of languages use reflection to implement serialization.
Ideally, we would not want to force all serializable types to only have public fields. That is probably going to cause a lot more trouble than it is worth.
You might think: Ok, so we can’t allow reflection to access all private fields, since some of them may be unsafe to access. We still may want it to access some fields that we would like to stay private otherwise.
Maybe we could implement a visibility marker that allows for reflective access to such fields? Something like:
struct MostlyPrivate{
/// Normally private, but accessible via reflection
pub(reflect) id:u32,
}
This, at least to me, looks reasonable. We explicitly opt in to reflection, clearly marking which fields it can and can't change. We still get some of the benefits of making things private. Such fields are not accessible from "normal"(human-written) code, so it is unlikely somebody will access them by accident.
Additionally, we could extend this syntax in the future, to allow more control over which crate can access those fields via reflection.
struct MostlyPrivate{
/// Normally private, but accessible via reflection
pub(reflect(serde, some_other_crate)) id:u32,
}
It is hard to gauge if the added verbosity of such syntax would be worth it, but it is still something to consider.
Overall, this solution is not perfect. Adding those reflection markers gives us more control over which fields we expose or not. This forces us to acknowledge the potential issues, and work around them.
This added control is also the biggest drawback of this idea. It would require some change to the thinking of crate authors, who would have to explicitly add reflection support for their types.
Alternatively, we could consider poaching a few ideas from the unsafe fields feature.
struct FloatAndEven{
/// Safety: must be even!
// Invisible to reflection, since it is unsafe
unsafe even:i32,
// Accessible via reflection, since it is not unsafe
float:f32,
}
If we assume any field which is not unsafe can be safely reflected on, then maybe things will work out? This feels a bit flimsy(people can forget to mark a field unsafe), and will not work with code written before that feature gets stabilized.
I am also not convinced mixing reflection access and field safety is a good idea.
I will touch on that later, but changing anything accessible to reflection can be a breaking change.
struct MyType{
float:f32,
// This new field is safe, but private
// Right now, adding it is *not a breaking change*
// With reflection, external code may stop working after it is added.
debug_descr:&'static str,
}
How big of a deal is this? Hard to say.
Still, I feel like this solution is something that ought to be mentioned for the sake of completeness.
At this point, you probably see why reflection in Rust must respect field access rules. I think most people would agree that allowing reflection to break them is a bad idea in general. So, I don’t think anybody is going to be too mad at this “restriction”.
Most proposals I have seen so far do things this way anyway, so this is not a big discovery. Still, I like to explain why something is a certain way before I explain the consequences of that behaviour. Now that this is out of the way, we can start talking about something far more interesting: the knock-on effects.
Think for a second about what should happen when something goes wrong during reflection. You can’t safely deserialize a type like this:
struct NotOk{
// Private field reflection can’t access
private:u32,
}
Reflection can’t access some of its fields, so it can’t be used for deserialization. This leads me to ask a simple question: what should happen then? The obvious answer is that this should result in a compiler error, but… it is not as simple as that.
To give an example, let us say that we create a serialization function like this:
pub fn serialize<T: /* What should go there???*/>(t:&T, out: &mut impl Write)->SerializationResult{
/* Implemented using reflection*/
}
How should we express the bounds of reflection-based serialization? To serialize a type using reflection, we need it to fulfill a set of non-trivial criteria.
All of its fields need to be accessible via reflection Each and every single field type must also be serializable, that is either: A primitive type, e.g. a:i32 A type with an explicit serialization implementation, eg. b:Box<i32> A type which also fulfils our criteria c:SomeOtherSerializableType Thus, a simple question arises: how are we supposed to express those complex bounds?
One idea might be to… just not add any such bounds at all? If our reflection-based serialization framework errors out with a descriptive message, then isn’t this bound kind of redundant?
error[E1234]: reflection error
--> my_type.rs:32:64
|
32 | pub(crate) not_serializable:*mut i32,
| -^^^^- This field cannot be accessed via reflection
| |
| type MyType can’t be serialized because of this field
|
= note: Error during compile-time reflection
note: this error was caused by a reflection-based function called here
--> test_serialization.rs:128:256
|
128| serialize(&my_type, &mut out_file).unwrap();
| ^^^^
Sadly, no. Consider a much more complex example. Who is to say that the problematic function is called directly? The error could, for example, originate deep within a networking framework, which ensures a piece of data stays in sync across the network.
Should the compiler emit a backtrace in such a case?
reflection_serializer::serialize<MyType>
network_framework::SharedData<MyType>::send
network_framework::SharedData<MyType>::sync
network_framework::SharedData<MyType>::new
my_crate::my_code
This feels a bit... subpar. The idea of compiler errors containing backtraces like this seems very cumbersome.
Most of this info is not all that relevant to the underlying problem.
As reflection becomes more advanced, so do the ways in which it can fail. We definitely don't want anybody using a crate to sift through our code to figure out what went wrong.
If we had some support for expressing reflection bounds, the error would be much cleaner. We simply did not fulfill the bounds of the top-level function.
error[E1234]: reflection bound not satisfied
--> my_type.rs:32:64
|
32 | let score = SharedData::new(MyType::default());
| -^^^^- This call requires `MyType` to fulfill the reflection bound
| |
| type MyType does not fulfill the `reflection_serializer::is_serializable` bound
|
Additionally, without clear bounds, it is possible for somebody to write code without knowing about all the requirements of the functions they call.
fn log<T>(t:&T){
reflection_serialize(LOG_FILE.lock().unwrap(),t);
}
If you just look at the signature of this function, you would most likely assume it can log any type, and not just the ones which are serializable.
Learning about this requirement from some later compiler error would be infuriating.
Oh, so this stupid function tells me it works with any type, then works just fine 20 times in a row, but now, it's decided... it does not feel like working anymore?
I think you would agree that all of those issues would make your developer experience worse.
So, since reflection can “fail” (eg. not have access to all the data it needs), we need to handle that in a sane way.
Being explicit about our bound seems like the most reasonable and user-friendly solution. If we could do something like this:
fn serialize<T:SerializableViaReflection>(t:&T, out:&mut impl Write) ->SerializationResult{ /**/}
Then, all the crates using such a reflection-based serialization framework could also transparently express this requirement.
// In network_framework
fn send<T:reflection_serializer::SerializableViaReflection>(t:T){ /**/}
That feels much better... but how do we go about writing a bound like this?
Well, that is not going to be easy.
We need those "reflection bounds" to be very flexible, since people want to use reflection for a variety of purposes. We don't want to limit its use cases. Checking if a type is serializable, iterating through "children" of a type for GC, ensuring correct interop layout - all of those have vastly different requirements.
Calculating the type layout does not need any kind of access to the fields of a type - it just needs to check where those fields are, and what are their types.
#[repr(C)]
struct Interop{
// All of those fields are private - but we don't care.
// All that matters is their offset, and type.
natural:c_int,
f_point:float,
}
Iterating through children of a type for the purpose of implementing a GC does not care about private fields - as long as those fields are not GC handles.
struct MyGCType{
// This field needs to be accessible for the GC to work
pub object:Option<GC<MyGCType>>,
// but we don't care about this one:
number:f32,
}
Serializing a type, on the other hand, requires all of its fields to be accessible.
struct SomeType{
// OK - accessible to reflection
pub i:i32,
// Not OK - not accessible to reflection
pub u:u32,
}
All of those scenarios are widely different. With even more complex use cases(eg. blitting data), those requirements can get even more specific. Still, we need some way to support all of them.
Our bounds also need to be very, very precise. We don't want to have any types which fulfill the bounds, but then cause reflection to error-out. That kind of defeats the whole point of having them in the first place.
Ideally, they should also be at least relatively easy to write.
That is a lot of requirements... What are our options?
Your gut feeling may be adding some simple-ish mechanism to check if a generic condition holds true for all fields.
However, this may not be nearly as simple as one might imagine.
As I mentioned, those requirements can vary quite a bit, and there may not be a one-size-fits-all solution.
What are our other options?
One idea here might be to also use reflection to express those requirements.
We could try something like this:
const fn is_serializable<T>()->bool{
/* Do our complex checks using reflection*/
}
// Only valid if is_serializable<T>() returns true
fn serialize<T>(t:&T, out:impl Write) where True<{is_serializable::<T>()}>
In this scenario, we express the bound required for reflection… using reflection. This approach has the benefit of being very flexible. We are not limited by what kind of checks we could possibly perform, and we could ensure some very complex safety invariants at compile time, which is neat.
Additionally, since we express our requirements using reflection, we can draw inspiration from the original code we wrote. We can compare the bound to the implementation, and spot any potential mismatches this way.
However, once again, this approach is not without its own drawbacks. Writing those bounds is not trivial, and would introduce a fair bit of boilerplate. Still, reflection is not easy in general, so one might argue that this does not matter as much here. It is hard to say if there exists an easier way to express those bounds.
The bigger drawback is, in my opinion, the problem of self-referential types. Consider this one:
struct IsThisThingSerializable{
normal_field:u32,
recursive:Option<Box<Self>>,
}
Checking if this thing fulfils our complex reflection requirements is not trivial. This type can be serialized if all of its fields(including recursive) can be serialized. To determine if that field can be serialized, we need to check if... the original type in question can be serialized.
A naive approach, written using recursion, would quickly get stuck in a loop.
There are ways around that, but they also don't tend to be all that easy to understand. There could exist some kind of crate that makes this process easier, but I still feel like the complexity of reflection rears its ugly head here too.
Overall, it seems like this is, once again, something that makes implementing reflection in Rust all that more tricky.
Going back to the safety of reflection at this point may seem a bit strange, but I want to leave you with something to consider before the article ends.
Up till this point, I more or less assumed reflection must be safe. If accessing private fields can lead to issues, it may seem reasonable to simply forbid that. This is what most of the reflection proposals I am aware of do. It is the clean, neat, albeit more limited solution.
There is an alternate solution to this problem: making the problematic reflection APIs unsafe, but still providing them. We could grant deeper, more low-level access to people who need it, and offer safe, more restricted alternatives for everybody else.
On one hand, exposing unsafe APIs with more low-level access is nothing new. This keyword exists for a reason: it allows us to do things the compiler can't prove are safe. Using unsafe is not necessarily "wrong" or "bad" - it just means that we are 100% responsible for ensuring our code is correct. This feels like getting the best of both worlds: we get safety and powerful reflection. What is there not to love about this?
Well, things are not as simple as that. Allowing access to private fields leaks implementation detail, and could break versioning. Even when an API is unsafe, it probably should not break between versions. Still, one might argue that adding any kind of reflection affects versioning at least a tiny bit. For example, adding a new field to a type like this:
#[non_exhaustive]
pub struct Foo {
pub f1: i32,
}
is not currently a breaking change. However, since it is going to be visible to reflection, it could become a breaking change.
We also have to consider that, if the unsafe reflection API is much more powerful than the safe one, then it may become the preferred option. People might hear: "oh, doing that with safe reflection is a little tricky", and just go with the unsafe variant. The difficulty of unsafe code does not lay in what you can do, but rather in what you can't do. So, having unsafe reflection APIs could lead to people writing a lot of unsound code using them.
Another important question is: should we allow safe reflection to get information about the private fields? Sometimes, when we write some interop / unsafe code, we might want to double-check the layout of the data we are dealing with. Such a check could save us a headache, and allow us to write code with more confidence. Still, it would expose the implementation details, which might change between crate versions. This could turn minor changes into breaking ones. Changes to private fields would become externally observable, and somebody could depend on that.
I would argue that code breaking at compile time is better than a runtime error, but this is still something to consider.
I think the biggest question we have to answer is: how useful is reflection without access to private fields? How will it affect the code we write?
If we want a type to be serializable via reflection, we have to make all of its fields accessible outside our crate. The definition of that type becomes publicly visible, and is no longer just an implementation detail. We can't control how and where our data is changed.
struct MyType{
// We *intend* to allow the serialization framework to access those fields,
// but anybody can change them using reflection.
pub(reflect) a:i32,
pub(reflect) b:i32,
// Adding any new field - even one that *does not end up being serialized*
// is a breaking change
f: Option<SomeDebugInfo>,
}
With proc-macros, we can access private fields in a safe manner(in the derived macro implementation). We also get full control over exactly what piece of code has access to those fields. That additional level of control can prevent a lot of issues.
#[derive(Serialize,Deserialize)]
struct MyType{
// Accessible only to the serialization crate we use.
a:i32,
b:i32,
// Not a breaking change - not publicly accessible,
// and does not end up serialized.
#[do_not_serialize]
f: Option<SomeDebugInfo>,
}
If reflection ends up being more restricted and less reliable than proc-macros, then we might not see all that many benefits to adding it to the language.
It could be faster(depending on the implementation), but I think most people wish reflection could do more, not less.
This is, I believe, the biggest challenge for reflection in Rust: it needs to carefully dance around those potential pitfalls.
It needs to find some perfect spot, where it bends the rules just enough to fit most of the use cases, while not breaking anything.
How can we achieve this delicate balance?
To be frank... I don't know.
I feel a bit weird leaving this article without presenting a neat solution to the problems I discuss.
However, I think this is the most important thing to take away here: we kind of are in uncharted waters. Reflection has been implemented in other languages before, but... Rust is a bit unique.
It is a language focused on security, first and foremost. Rust has a pretty uncommon approach to memory management - all of that changes how a reflection API works.
If memory safety is enforced via the type system, reflection also needs to adapt to that. How, exactly? That is hard to say with certainty.
Truth be told, this article is kind of a result of my own work on reflection. I have tried quite a few different ideas to slay this beast of a feature, with... mixed results.
Time and time again, I feel like I kind of smashed into the same potential issues, despite using vastly different approaches.
Sometimes, I feel like attempting to add reflection to Rust is kind of like trying to cover a table with too small of a tablecloth: you always end up sacrificing something. Some of my ideas were very powerful, decently user-friendly, but ultimately unsafe. Others were safe, but required writing more boilerplate, and did not cover all the use cases I, and countless other rustecians had in mind.
As the year 2024 is coming to an end, I came to a realization: I don't need to solve all of those problems.
I am a bit of a perfectionist, and got stuck in a pretty simple trap: I did not want to write anything about reflection till I figured everything out. However, letting my 10 draft articles bit-rot in the cloud is not doing anybody any good.
Rust is a language that lives or dies by its community: The best thing I can do is get more eyes on the problem.
I'd like to thank all the people who helped me improve this article by providing some feedback before publication. Fixing the issues they pointed out helped me improve it significantly.