57 Comments
User's avatar
⭠ Return to thread
Célio Cidral's avatar

> duplication is actually reducing cognitive load, because the “chunk” is self-contained

I've been thinking about that and I'm still not convinced. In my experience most of the time duplicate code increases cognitive load because every time I suspect a block of code is duplicate I have to go back and check if the other duplicate places do exactly the same thing or something slightly different. Just as DRY can be abused, PRY will be as well, and I'm afraid the consequences will be worse than the consequences of DRY abuse.

Oliver Phillip Roer's avatar

I don't buy it either. At least not as a blanket statement.

For a simple expression as shown here, sure, I'll buy it, PRY away.

For a block of 20 lines of code, I think most would agree to move that to a function, if the alternative was to paste it in many different places in your codebase, especially if a change to one copy would require the same change to the other copies.

So the boring answer is there's no simple answer - it depends. The more interesting question to consider then is how you can tell when to DRY and when to PRY.

pascal martin's avatar

Did everybody miss the "sometime"? Granted, it was not in bold and this colored the statement in a way probably not intended.

Shawn Willden's avatar

For me the deciding factor is whether the chunk's meaning is clearer than the best helper function name I can come up with, which depends on both the complexity of the chunk and on how much I can simplify the reader's cognitive load with a good helper function name.

I actually think Linus' example is a fairly bad one. `(a<<16) + b` looks very simple and straightforward, but it results in undefined behavior for many values of `a` if `a` and `b` are `uint16_t` and should probably add some masks if they're `uint32_t`. Handling this correctly depends on the exact input types and how they're promoted to perform the operation. Having seen too many bugs caused by this sort of naive calculation, when *I* see `(a << 16) + b`, I'm immediately going to stop and scrutinize the input types and value ranges to see whether it's correct. And even if the text of the operation is identical in every case, I'm going to have to check it every time, because the input types might be different.

If `a` and `b` are `uint16_t`, the correct code is `((uint32_t) a << 16) + b`. Without the explicit cast to `uint32_t` what happens is that `a` gets promoted to `int` (whatever that happens to be on the platform) and then shifted. If `int` is 32 bits and the value in `a` is large enough that a `1` gets shifted into the sign bit then you get UB. The cast prevents the promotion to `int` and makes the code work correctly.

In case you think UB doesn't matter because the generated code will compute the right value anyway... not always. Compilers are free to assume code never produces UB and those assumptions can result in very weird behavior, especially at higher optimization levels.

If `a` and `b` are `uint32_t`, the correct code is `((a & 0xFFFF) << 16) + (b & 0xFFFF)`. The mask on `a` isn't actually necessary, but I think it makes the intent clearer.

Also, I think I'd use `|` instead of `+`. I think it's clearer, and if I saw `+` I'd probably pause for a fraction of a second to convince myself that adding isn't doing something different than just mashing the bits together.

Where Linus was right was that the helper function's name was bad and created ambiguity which would have required the reader to go check the implementation... and then work out the sequence of promotions and calculations to make sure the code wasn't naively wrong. So the bad helper name did indeed make it worse.

But... a *good* helper name would have made it a lot better. A good name would allow me read the calling code and understand what the helper was supposed to be doing without immediately having to go check, and then when I read the helper I'd have been able to rely on the declared argument types, and to check its correctness exactly once, rather than every time the code snippet appeared.

Except in the simplest and clearest cases (which do *not* include many apparently-simple expressions with implicit promotions and which behave differently based on the input types), DRY reduces cognitive load, and PRY increases it.

Mob's avatar

Why do you have to go and check that whether the other duplicates are doing same thing or slightly different?

Shawn Willden's avatar

In this case (and many others, especially with integer arithmetic) it depends heavily on what the input types are. For example, the code Linus was talking about has Undefined Behavior for uint16_ts and some input values.

Vaibhav's avatar

Exactly. If it works why go touch it.

Mob's avatar

If you have to edit many lines because of one single change due to duplication then yes you need to touch it.