"Fun bug of the month, mesa edition, episode may"
-
"Fun bug of the month, mesa edition, episode may"
so if you do "uint64_t some_var = 1 << 31;" in C you get "0xffffffff80000000" as the value, because that's super obvious and not confusing at all.
It's pretty funny getting reminded how non-intuitive and broken C is from time to time.
@karolherbst I think that's UB? see C99 6.5.7 "Bitwise shift operators" - the LHS is signed and the result of the computation is not representable in the result type
-
@karolherbst I think that's UB? see C99 6.5.7 "Bitwise shift operators" - the LHS is signed and the result of the computation is not representable in the result type
@karolherbst but apparently gcc has decided to not treat it as UB, except when using UBSAN: https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
-
@karolherbst but apparently gcc has decided to not treat it as UB, except when using UBSAN: https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
@jann yeah technically it's UB, but there is only so much you can optimize with a 1-2 instruction pattern that it doesn't really matter in practice, because most impls will do the same (more or less).
Like there is UB and then there is UB.
-
@jann yeah technically it's UB, but there is only so much you can optimize with a 1-2 instruction pattern that it doesn't really matter in practice, because most impls will do the same (more or less).
Like there is UB and then there is UB.
@karolherbst yeah, I guess my point is that, for the code you showed, a C compiler would be well within its rights to refuse to build that code or complain about it, so this is not entirely the language's fault
-
@karolherbst but apparently gcc has decided to not treat it as UB, except when using UBSAN: https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
It was not UB in C90. That is why it was UB without ubsan ...
-
@karolherbst yeah, I guess my point is that, for the code you showed, a C compiler would be well within its rights to refuse to build that code or complain about it, so this is not entirely the language's fault
@jann ohh it's totally the languages fault even if it wouldn't be UB, because that's just the worst way to specify this.
Like it's just a design bug really. And no matter how much this is UB or not won't change that.
-
@puppethead @karolherbst When not using the U (or L) suffix 1<<31 triggers clang's -Wshift-sign-overflow warning. However that warning is not enabled by default and gcc doesn't support it at all.
-
@jann yeah technically it's UB, but there is only so much you can optimize with a 1-2 instruction pattern that it doesn't really matter in practice, because most impls will do the same (more or less).
Like there is UB and then there is UB.
It’s UB in the general case because, if the operand is not a constant, you want to lower it to a shift instruction but C works with targets that have different number representations. Ones or twos complements, or explicit sign bits are all permitted, but all of these will give different behaviours if you flip the top bit.
For wider shifts, different ISAs had different semantics for shifts wider than the register, so C made that fully undefined.
This combination lets you lower source-level shifts to a shift instruction.
C also doesn’t mandate that this be constant evaluated unless the result is used as a constant, so there’s no way to force implementations to diagnose the UB at compile time for this case. But, as a QoI issue, it is permitted and compilers should.
-
@karolherbst For my understanding: That's default int promotion + sign extend on 64 bit extension? Would 1L << 31L fix this or is there other pitfalls with that?
@trilader @karolherbst "Um, actually..."
I believe it would work as expected with:
1U << 31;
The unexpected part is that sign extension from 32 to 64 bits takes place before reinterpretation to unsigned. The C99 standard is admittedly opaque on this point. If you make the rvalue unsigned as well, then you get the (presumably) expected result.
(Just tested it with GCC 13.3 -- it works.)
-
@trilader @karolherbst "Um, actually..."
I believe it would work as expected with:
1U << 31;
The unexpected part is that sign extension from 32 to 64 bits takes place before reinterpretation to unsigned. The C99 standard is admittedly opaque on this point. If you make the rvalue unsigned as well, then you get the (presumably) expected result.
(Just tested it with GCC 13.3 -- it works.)
@ewhac @karolherbst Yes. In the other thread leg of the original post I also posted about that clang even warns you about the behavior of 1 << 31 without U or L suffix, provided you enable the right, off by default, warning that GCC doesn't have. Another leg notes that GCC catches this at runtime with ubsan enabled.
-
"Fun bug of the month, mesa edition, episode may"
so if you do "uint64_t some_var = 1 << 31;" in C you get "0xffffffff80000000" as the value, because that's super obvious and not confusing at all.
It's pretty funny getting reminded how non-intuitive and broken C is from time to time.
@karolherbst C is 1973 or so. If you believe it is confusing, try assembly :-). -
@karolherbst For my understanding: That's default int promotion + sign extend on 64 bit extension? Would 1L << 31L fix this or is there other pitfalls with that?
@trilader @karolherbst Yes, int promotion. 1L<<31L would still be broken on 32-bit architectures (as long is 32 bit there). You'd need 1LL or something, AFAICT. -
@karolherbst For my understanding: That's default int promotion + sign extend on 64 bit extension? Would 1L << 31L fix this or is there other pitfalls with that?
@trilader @karolherbst Actually right solution would be uint64_t some_var = ((uint 64_t)1) << 31; AFAICT. -
It’s UB in the general case because, if the operand is not a constant, you want to lower it to a shift instruction but C works with targets that have different number representations. Ones or twos complements, or explicit sign bits are all permitted, but all of these will give different behaviours if you flip the top bit.
For wider shifts, different ISAs had different semantics for shifts wider than the register, so C made that fully undefined.
This combination lets you lower source-level shifts to a shift instruction.
C also doesn’t mandate that this be constant evaluated unless the result is used as a constant, so there’s no way to force implementations to diagnose the UB at compile time for this case. But, as a QoI issue, it is permitted and compilers should.
@david_chisnall @jann at least C23 fixes one part of this by requiring two's complement for integers.
But also, I just wished C would mandate that constants are just assumed to be of the "expected" type, because in 99.999999% of all cases a programmer really meant the obvious thing with "uint64_t x = 1 << 31".
But I guess we'll just keep those horrible semantics C has in a couple of areas, because nobody want to fix those things, because "it could break things".
-
@trilader @karolherbst Actually right solution would be uint64_t some_var = ((uint 64_t)1) << 31; AFAICT.
-
@david_chisnall @jann at least C23 fixes one part of this by requiring two's complement for integers.
But also, I just wished C would mandate that constants are just assumed to be of the "expected" type, because in 99.999999% of all cases a programmer really meant the obvious thing with "uint64_t x = 1 << 31".
But I guess we'll just keep those horrible semantics C has in a couple of areas, because nobody want to fix those things, because "it could break things".
@karolherbst @david_chisnall @jann That particular change would break a lot!
-
"Fun bug of the month, mesa edition, episode may"
so if you do "uint64_t some_var = 1 << 31;" in C you get "0xffffffff80000000" as the value, because that's super obvious and not confusing at all.
It's pretty funny getting reminded how non-intuitive and broken C is from time to time.
@karolherbst I recently read https://www.os2museum.com/wp/bitfield-pitfalls/ which is a similar pitfall?
-
@karolherbst @trilader Yeah, travelling to 1972 and changing the language would be best :-).
-
"Fun bug of the month, mesa edition, episode may"
so if you do "uint64_t some_var = 1 << 31;" in C you get "0xffffffff80000000" as the value, because that's super obvious and not confusing at all.
It's pretty funny getting reminded how non-intuitive and broken C is from time to time.
@karolherbst taking off my "understands sign extension" badge
wore it with pride, but the pride was misplaced
-
@karolherbst For my understanding: That's default int promotion + sign extend on 64 bit extension? Would 1L << 31L fix this or is there other pitfalls with that?
@trilader @karolherbst 1L << alone would do the trick on real world 64-bit machines, but i think compiler is still fully allowed to do the wrong thing. msvc perhaps still has 32-bit longs on 64-bit platforms?
i think you need to make sure it's unsigned so that sign extension has no chance of occurring, so my money is on 1U << 31?
