Someone on Lobsters wondered "how a modern compiler would fare against hand-optimized asm" in reference to Abrash's TransformVector (3x3 matrix-vector multiply) hand-written x87 routine in Quake.
-
@rygorous
I guess STM/LDM makes way more sense when you have no instruction cache, so every extra cycle your instruction spends moving more data would've otherwise been an instruction fetch@rygorous
also rep movs -
@zeux @TomF @wolf480pl @pervognsen In general, the thing to keep in mind is that the part that matters for density is usually boring int code, which is the majority of it almost everywhere.
You can have 90% of the instructions in your manual have awkwardly redundant encodings and not have it matter too much for size as long as the encodings for the 10-15 insns that really matter are good.
@zeux @TomF @wolf480pl @pervognsen Just to be self-contained, the stuff that really matters:
- load/store to (reg+small_imm)
- add, sub reg/reg and reg/imm, mov if 2-address
- sign/zero extends, as needed
- nearby conditional branches (whether it be compare + branch form or a branch-if-cond form) and nearby unconditional branches ("nearby" meaning small-offset region, +-4k range is most important)
- prologue/epilogue insns like PUSH/POP or LDP/STP if applicable
- CALL/branch-and-link, return -
@zeux @TomF @wolf480pl @pervognsen Just to be self-contained, the stuff that really matters:
- load/store to (reg+small_imm)
- add, sub reg/reg and reg/imm, mov if 2-address
- sign/zero extends, as needed
- nearby conditional branches (whether it be compare + branch form or a branch-if-cond form) and nearby unconditional branches ("nearby" meaning small-offset region, +-4k range is most important)
- prologue/epilogue insns like PUSH/POP or LDP/STP if applicable
- CALL/branch-and-link, return@zeux @TomF @wolf480pl @pervognsen there's some more variants of this (like yeah maybe AND/bit tests too) but if you look at instruction traces (and also disassembly in general) it's funny just how much of it is just this
-
@wren6991 @TomF @zeux @wolf480pl @pervognsen no clue, I've looked at the C extension but not the newer stuff.
@rygorous @TomF @zeux @wolf480pl @pervognsen Zcb is compressed forms of byte load/store, sign- and zero-extend, mul and not. Zcmp is compressed push/pop/return and a limited form of double-mov. They put together a nice spreadsheet showing the impact of each instruction: https://docs.google.com/spreadsheets/d/1bFMyGkuuulBXuIaMsjBINoCWoLwObr1l9h5TAWN8s7k/edit?gid=1837831327#gid=1837831327
-
@zeux @TomF @wolf480pl @pervognsen there's some more variants of this (like yeah maybe AND/bit tests too) but if you look at instruction traces (and also disassembly in general) it's funny just how much of it is just this
@zeux @TomF @wolf480pl @pervognsen anyway, re: the link I posted
x86 is sometimes unfairly valorized as being very dense (I think that might just go back to the early 90s when the primary competition was all the classic 32-bit RISCs, in which case, yeah) and sometimes unfairly slandered as being pathologically bad, and neither is true.
It is thoroughly, blandly, middle-of-the-road, neither as dense as encodings that optimized for density nor as bloated as the classic RISC encodings.
-
@zeux @TomF @wolf480pl @pervognsen anyway, re: the link I posted
x86 is sometimes unfairly valorized as being very dense (I think that might just go back to the early 90s when the primary competition was all the classic 32-bit RISCs, in which case, yeah) and sometimes unfairly slandered as being pathologically bad, and neither is true.
It is thoroughly, blandly, middle-of-the-road, neither as dense as encodings that optimized for density nor as bloated as the classic RISC encodings.
@zeux @TomF @wolf480pl @pervognsen There are many complaints to be made about x86.
The original sin of the x86 encoding is that you can't tell the total size of an instruction from just its first 1-2 bytes. That is an actual design mistake that necessitates relatively complex to build and verify instruction-length decoder hardware that could be either gone entirely (fixed-size encodings) or at least way simpler than it is.
-
@zeux @TomF @wolf480pl @pervognsen There are many complaints to be made about x86.
The original sin of the x86 encoding is that you can't tell the total size of an instruction from just its first 1-2 bytes. That is an actual design mistake that necessitates relatively complex to build and verify instruction-length decoder hardware that could be either gone entirely (fixed-size encodings) or at least way simpler than it is.
@zeux @TomF @wolf480pl @pervognsen That said, while annoying and an ongoing cost for those who build x86s (they sell them by the tens of millions, they'll be fine), ILD adds maybe 1 pipeline stage more than it needs to.
If the x86 encoding is kinda meh, then so are its mistakes. Yeah, if you were planning to build a lasting arch now, you certainly wouldn't do it that way. It adds extra overhead. But not in a way or to an extent that is particularly damning.
-
R relay@relay.infosec.exchange shared this topic