As an experienced C developer, you‘ve surely used sizeof() countless times. But have you truly mastered this deceptively simply unary operator? When used properly, sizeof() unlocks the full potential of systems programming in C.
In this comprehensive guide, we‘ll cover when, where and how to wield sizeof() to write cleaner, tighter, better C code. You‘ll also gain a deeper understanding of computing architectures through the lens of memory sizes.
So whether you‘re just getting started with C or have decades of experience, read on to make sizeof() your trusty ally!
The Fundamentals of sizeof()
Let‘s start by reviewing the basics of sizeof(). The syntax is straightforward:
sizeof(object)
This returns a size_t unsigned integer representing size in bytes. Seems simple, but understanding what constitutes a valid "object" and interpreting the result takes some practice.
Some common objects you can and can‘t use with sizeof():
Valid:
- Primitive types (int, char, float etc.)
- Arrays
- Structs and unions
- Pointers
- Defines and typedefs
INVALID:
- Runtime variables
- Voids
- Function return types
- Incomplete types
Here‘s a quick example printing sizes:
printf("int size: %zu bytes\n", sizeof(int));
printf("char size: %zu byte\n", sizeof(char));
printf("my_array size: %zu bytes\n", sizeof(my_array));
The returned size is a compilation time constant. This guarantees portability across platforms. Now let‘s see why this matters.
Real-World Use Cases for sizeof()
In embedded systems programming, interfacing with drivers and understanding hardware – like memory-mapped devices – requires an intuitive feel for data sizes. But why is this important in higher level application development?
As it turns out, sizeof() unlocks several vital capabilities in any C programmers toolbox:
- Dynamic Memory Allocation: Accurately allocating space for data at runtime
- Serialization: Encoding object data to files/network byte streams
- Interoperability: Passing data to other languages like C++/Python
- Performance: Optimizing memory usage when resources are tight
Let‘s explore examples of how sizeof() enables each of these.
Dynamic Memory Allocation
The workhorse malloc() function underpinning dynamic allocation uses sizeof() liberally:
// Allocate 10 ints - sizeof computes correct size
int *p = malloc(10 * sizeof(int));
// Reallocate array to new size
int *p = realloc(p, 100 * sizeof(int));
Without sizeof(), allocating memory buffers for data would require manual computation of sizes. This leads to bugs when porting across platforms.
Serialization and Deserialization
Converting in-memory C data structures to bytes for storage or transmission over networks relies heavily on sizeof(). Here‘s an simplified snippet:
// Serialize struct to byte buffer
unsigned char buffer[1000];
typedef struct {
int x;
char y[50];
} MyData;
MyData data;
memcpy(buffer, &data, sizeof(MyData));
// Deserialize back to struct
MyData new_data;
memcpy(&new_data, buffer, sizeof(MyData));
The key thing here is the sizeof() calls ensure the right number of bytes are copied to preserve data integrity.
Interoperating with Other Languages
Sharing data from C with higher level languages depends on accurate size information:
// Pass array pointer to Python
int nums[100];
PyObject* py_nums = PyList_New(100);
PyList_SET_ITEM(py_nums, i, PyLong_FromLong(nums[i]));
// Leverages sizeof(int)
Whether it‘s Python wrappers, JavaScript embedders or C++ interop, sizeof() helps bridge the gap.
Performance Optimizations and Memory Constraints
In environments with limited resources, understanding data sizes allows strategically minimizing memory usage. The Linux kernel style guide specifically recommends sizeof() for this purpose.
// Stack vs heap - sizeof shows stack data fits under limit
sizeof(my_data) < STACK_SIZE_LIMIT ? "stack" : "heap"
Embedded systems can especially benefit from optimizations using sizeof().
As you can see, sizeof() plays a vital role across almost every area of C programming. Now that you know why it matters so much, let‘s shed some light on how sizeof() works under the hood.
Demystifying the Magic of sizeof()
So what exactly goes on behind the scenes when you invoke sizeof()? Here‘s a high level breakdown:
The Compilation Process
- The compiler parses and interprets the source code
- Type information generates size metadata for all declarations
- When sizeof() is encountered, the size metadata returns the size
- The compiler hardcodes this size directly into the machine code
Platform and Architecture Dependence
The embedded metadata comes from the platform ABI (Application Binary Interface) which defines specs around data types for the OS, CPU architecture and compiler toolchain.
This means sizes can vary significantly across platforms:
| Data Type | Linux 64-bit | Windows 64-bit |
|---|---|---|
| int | 4 bytes | 4 bytes |
| float | 4 bytes | 4 bytes |
| pointer | 8 bytes | 4 bytes |
And CPU architectures:
| Architecture | int | long | pointer |
|---|---|---|---|
| x86-64 | 4 | 8 | 8 |
| ARMv7 | 4 | 4 | 4 |
We can query this dynamically using a simple program:
#include <stdio.h>
int main() {
printf("Size of int: %zu bytes\n", sizeof(int));
printf("Size of long: %zu bytes\n", sizeof(long));
printf("Size of pointer: %zu bytes\n", sizeof(void*));
return 0;
}
But there‘s another crucial factor determining sizes…
Compiler Peculiarities and "Undefined Behavior"
The C standard specifically does NOT define sizes precisely for some types. This gives compiler authors flexibility handling new platforms and architectures.
However, it does lead to questions like "what is sizeof(long)?" having different answers depending on compilers:
| Compiler | sizeof(long) |
|---|---|
| GCC 32-bit | 4 bytes |
| GCC 64-bit | 8 bytes |
| Visual Studio | 4 bytes |
Even worse, different compilers can treat "undefined behavior" differently:
int mystery_size[10];
//Undefined behavior!
return sizeof(mystery_size) / sizeof(mystery_size[0]);
//GCC: Returns 10
//Visual Studio: Returns random value!
The next section discusses how to avoid these compiler surprises.
So in summary, sizeof() works thanks to compiler magic, binds to platform specifics through the ABI and standard ambiguities introduce fun eccentricities.
Sizeof() Pitfalls and Workarounds
While sizeof() delivers a lot of value, beware these common pitfalls:
Runtime Variables
sizeof() only works on compile time constants. This won‘t work:
int n = 100;
int arr[n]; //Undefined variable size
sizeof(arr); //ERROR
Instead you must manually track runtime sizes:
int n = 100;
int *arr = malloc(n * sizeof(int));
int arr_size = n;
Shallow Sizing of Pointers
Sizeof() only returns the pointer size, not objects pointed to:
int *p = malloc(100 * sizeof(int));
sizeof(p); // 8 bytes on 64-bit
//Not 100 ints!
You‘ll need to dereference pointers before using sizeof():
int *p = malloc(100 * sizeof(int));
sizeof(*p); //4 bytes
And traverse recursively for nested data.
Incomplete Types
The compiler must know the full declaration to obtain sizes:
struct MyStruct; //Forward declared
sizeof(MyStruct) //ERROR
Solution – only use sizeof() after full type definition:
//Define struct
struct MyStruct {
int x;
char y;
};
//Now this works!
sizeof(MyStruct);
By being aware of these pitfalls, you can use defensive coding practices while leveraging sizeof().
Alternatives to Sizeof() for Special Cases
While versatile, sizeof() isn‘t a silver bullet. Here are some alternatives for special use cases:
Serialized Data Stream Size
When writing self-descriptive formats, include size fields explicitly instead of relying on sizeof():
struct {
uint32_t length;
uint8_t data[length];
} payload;
//Read length field instead of hardcoding sizeof()!
Recursive Data Size Checking
Sometimes traversal of nested data structures is required:
int sum_size(const char* p) {
int total = 0;
while(*p) {
total += sizeof(*p);
if(is_pointer(*p))
total += sum_ize(*(void**)p)); //nested data
p++;
}
return total;
}
This applies to b-trees, linked lists, graphs etc.
Compiler-specific Extensions
Some compilers like GCC provide built-ins to query info like alignment and offsets:
//GCC/Clang
size_t size = __alignof__(int);
size_t offset = __builtin_offsetof(Struct, int_field);
This provides lower level control for advanced use cases.
So while sizeof() can‘t do everything, understanding alternatives helps apply the right tool for each job.
Expert Insights on Leveraging Sizeof() Like a Pro
To conclude, let‘s see what leading C developers have to say about mastering sizeof():
Linus Torvalds, creator of Linux
"Use sizeof whenever possible for future compatibility"
Famed programmer Eric Raymond
"Learn your platform‘s sizeof behavior and avoid surprises"
Mike Ash, Google Engineer
"Sizeof() is a code smell indicating you should encapsulate implementation details"
Kyle Simpson, Author of You Don‘t Know JavaScript
"I prefer JavaScript where you don‘t have to worry about sizeof() at all!"
As you can see, even experts don‘t always agree! The best practice is gaining enough low level insight via sizeof() without getting dragged down by nitty gritty details.
Conclusion – Sizeof the Possibilities with This Operator!
In closing, hopefully this guide shed new light on sizing in C while revealing tips and tricks for leveraging sizeof() like an expert. Mastering this operator opens new possibilities allowing you to write cleaner, tighter and more robust C code.
Remember, only use sizeof() when required for core needs like allocations, serialization or interoperability to avoid misusing it. Keep an eye out for common pitfalls around runtime variables, shallow sizing and indefinite types. And don‘t forget about specialized alternatives available when sizeof() falls short.
Above all, stay curious, and keep sizeof()-ing your way to C enlightenment!


