some good thoughts on RISC vs CISC from Bob Colwell.

regehr@mastodon.social

some good thoughts on RISC vs CISC from Bob Colwell. at this point it's kind of funny and hard to imagine people getting actually angry while debating this, but there you go.

alt: the text is too long for my instance's alt text, but it's from page 4 here: https://www.sigmicro.org/media/oralhistories/colwell.pdf

rygorous@mastodon.gamedev.place

@regehr the entire interview is so good

dougmerritt@mathstodon.xyz

@regehr
> There was already the sentiment that you could get
carried away with your instruction set.

Very true, and the VAX 11/780 instruction set was one such. For instance it had a polynomial evaluation instruction -- which was slower than just coding the equivalent by hand.

But still, those of us designing CISC instruction set architectures were disappointed, it took away a lot of the fun.

OTOH it took a while for the most purist of the RISC folks to see that it really did make sense to have *some* "complex" instructions.

Multiply, for instance.

geofflangdale@mastodon.social

@dougmerritt @regehr Not to mention things like PDEP/PEXT, crypto and GF2P8AFFINEQB and the like. Fancy SIMD operations, not to mention all the other conveniences of the fast x86 (or "IA", I guess - sigh) cores muddy the whole CISC/RISC debate beyond recognition.

brouhaha@mastodon.social

@regehr
Sacred cows make the best burgers!

synlogic4242@social.vivaldi.net

@regehr love that excerpt

regehr@mastodon.social

@dougmerritt also addressing modes. I love good addressing modes.

regehr@mastodon.social

@lispi314 maybe, but we're not exactly stuck, there's Morello

acsawdey@fosstodon.org

@regehr The real answer was neither CISC nor RISC, but to codesign the processor and the compiler. (Not like that, Intel & HP!)

david_chisnall@infosec.exchange

@regehr @lispi314

There’s more CHERI than just Morello. We got our first ICENI (CHERIoT) chips on boards back from the fab this week, there’s a RISC-V CHERI base architecture undergoing standardisation, and Codasip has licensable IP cores for microcontroller and application processors.

There big difference between CHERI and historical capability systems is that we co-designed the ISA and the compiler. We didn’t do things in hardware that are easy to do in software.

I think our 2023 MICRO paper about CHERIoT does the best job of identifying the problem we are trying to solve by framing memory safety as a set of properties that a compiler wishes to enforce against code compiled by other arbitrary compilers.

david_chisnall@infosec.exchange

@regehr @dougmerritt

And condition codes. Microarchitects hated condition codes because they meant that instructions would do partial updates to the same register, which made register rename annoying. As a result, most early RISC architectures omitted them. But it turns out that there are some techniques for implementing them that aren’t that bad in superscalar chips and, on pretty much all scales of design, the performance win from having them outweighs most other things you could do with the same complexity budget. Arm’s designers really wanted to remove them for AArch64 but found that the performance hit from doing so was too high.

I’m sad Patterson was willing to put his name on RISC-V because it’s an example of the kind of architecture that you design if you don’t measure anything.

EDIT: The same is true of conditional move. RISC-V doesn’t have a conditional move because Krste read a paper by the Alpha architects about how it required an additional read port on the register file and this was painful. There is a very narrow window of microarchitectures for which this is true. As an experiment, I had a student add a conditional move to RSIC-V in the compiler and a simple in-order processor. The area overhead was negligible and the student found that a conditional move allowed you to get the same performance with about 25% less branch predictor state, so ended up being an area saving. I mentioned this to some Arm folks and was told that this wasn’t a new result: ‘everyone’ in the industry knew that there was roughly a 25% saving on branch predictor state from having a conditional move (and conditional moves are much easier to make useful if you have condition-code registers). AArch64 has some quite exciting variants of conditional moves, one of which gives about a 10% speedup in the hot loop of bzip2 (I think, might be some other compression algorithm) if it’s used.

david_chisnall@infosec.exchange

@koakuma @regehr @dougmerritt

I think so. As I recall, they don’t have two versions for the instruction variants that take an immediate (that would consume a huge amount of encoding space) but they do have some options for the three-register versions (which need 15 bits for the operands, so you can fit a lot of these in a 32-bit instruction).

fanf@mendeddrum.org

@david_chisnall @regehr @dougmerritt i still refer back to john mashey’s risc-vs-cisc analysis from the 1990s https://www.yarchive.net/comp/risc_definition.html

in which he counts instruction set features to demonstrate that risc and cisc correspond to real phenomena and aren’t just marketing

but the main reason i like it is because these days programmers get the impression that risc vs cisc is arm vs x86 but they are really bad exemplars, because arm is the least riscy risc and x86 is the least ciscy cisc

much better exemplars are mips and alpha vs vax and 68020

unfortunately mashey didn’t include arm in his analysis so the reader needs a fair amount of knowledge to fill in the gap

then there’s amd64 and arm64 which postdate mashey’s analysis and are even closer to the middle ground

there’s clearly a thesis/antithesis/synthesis but the synthesis hasn’t been given a catchy name so it is talked about in risc-vs-cisc terms even tho that no longer makes sense

eg wrt addressing modes the convergence looks like base + offset * stride, not just an address as in risc, and no indirection or other extraneous memory accesses as in cisc

and things like plenty of registers, 0 or maybe 1 memory accesses in most instructions, complexity is ok if it’s register-to-register, simd is great

david_chisnall@infosec.exchange

@fanf @regehr @dougmerritt

Arm stopped referring to their architecture as RISC about 15 years ago. They now call it a ‘load-store architecture’. This applies even to AArch32, which is a pretty large instruction set, but remains mostly orthogonal and does not have instructions that implicitly touch memory.

My favourite example of Too CISCy is one of the IBM mainframe architectures that uses a tag in a branch operand to indicate that it was double indirect. If the bit was set in the address operand, rather than jumping to that address it would load the address at the target and jump there. But this could also have the indirect bit set and so you could chain the indirections. Unfortunately, this meant it was possible to introduce cycles and so they ended up with a counter to see how many indirection steps you’d followed and gave up after a fixed number.

resistor@mastodon.online

@david_chisnall @koakuma @regehr @dougmerritt AArch64's approach is to have conditions codes, but (a) only read/write them on a few opcodes as possible, and (b) always do a complete, not partial, condition code write.

pinskia@hachyderm.io

@koakuma @david_chisnall @regehr @dougmerritt powerpc multiple conditional registers is the best thing out there.
It also have ior, xor and and on each bit of the cr. Gcc never got around to using them (i dont know if llvm did).
The dot instructions would set cr0 or cr7(fp).
Even the altivec compare equal had a dot version which would allow for all, any and none.
Ppc did add isel (via booke first and then later on).

mansr@society.oftrolls.com

@regehr CISC is just RISC with hardware macros.

regehr@mastodon.social

@resistor @david_chisnall @koakuma @dougmerritt AArch64 has some imaginative condition code instructions as well, I really enjoy CCMP / CCMN

CIRCLE WITH A DOT

some good thoughts on RISC vs CISC from Bob Colwell.