Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. C interpreters underlie many of our most widely used language implementations -- but they're slow.

C interpreters underlie many of our most widely used language implementations -- but they're slow.

Scheduled Pinned Locked Moved Uncategorized
10 Posts 3 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • ltratt@mastodon.socialL This user is from outside of this forum
    ltratt@mastodon.socialL This user is from outside of this forum
    ltratt@mastodon.social
    wrote last edited by
    #1

    C interpreters underlie many of our most widely used language implementations -- but they're slow. Wouldn't it be great if we could turn them into JIT compiling VMs? This video shows what happens when we do just that to the normal Lua VM (first) and "yklua" (Lua w/JIT, second).

    ltratt@mastodon.socialL 2 Replies Last reply
    2
    0
    • ltratt@mastodon.socialL ltratt@mastodon.social

      C interpreters underlie many of our most widely used language implementations -- but they're slow. Wouldn't it be great if we could turn them into JIT compiling VMs? This video shows what happens when we do just that to the normal Lua VM (first) and "yklua" (Lua w/JIT, second).

      ltratt@mastodon.socialL This user is from outside of this forum
      ltratt@mastodon.socialL This user is from outside of this forum
      ltratt@mastodon.social
      wrote last edited by
      #2

      This isn't just a technique for Lua, though -- it works for any C interpreter compilable with LLVM! More about how and why in this new post 'Retrofitting JIT Compilers into C Interpreters' looking at our new 'yk' system. https://tratt.net/laurie/blog/2026/retrofitting_jit_compilers_into_c_interpreters.html

      david_chisnall@infosec.exchangeD llimllib@hachyderm.ioL 3 Replies Last reply
      0
      • ltratt@mastodon.socialL ltratt@mastodon.social

        C interpreters underlie many of our most widely used language implementations -- but they're slow. Wouldn't it be great if we could turn them into JIT compiling VMs? This video shows what happens when we do just that to the normal Lua VM (first) and "yklua" (Lua w/JIT, second).

        ltratt@mastodon.socialL This user is from outside of this forum
        ltratt@mastodon.socialL This user is from outside of this forum
        ltratt@mastodon.social
        wrote last edited by
        #3

        I want to say a big thanks to Shopify and the Royal Academy of Engineering who graciously funded this research. I'd like to dedicate this work to the late Chris Seaton, who was an early champion of yk: he is much missed by me and many others.

        1 Reply Last reply
        0
        • ltratt@mastodon.socialL ltratt@mastodon.social

          This isn't just a technique for Lua, though -- it works for any C interpreter compilable with LLVM! More about how and why in this new post 'Retrofitting JIT Compilers into C Interpreters' looking at our new 'yk' system. https://tratt.net/laurie/blog/2026/retrofitting_jit_compilers_into_c_interpreters.html

          david_chisnall@infosec.exchangeD This user is from outside of this forum
          david_chisnall@infosec.exchangeD This user is from outside of this forum
          david_chisnall@infosec.exchange
          wrote last edited by
          #4

          @ltratt

          This hopefully makes the trade-off yk offers clear: yklua does not reach the performance peaks of the wonderful, carefully hand-written, LuaJIT

          For what it's worth:

          In igk, I use sol3, which lets you select the Lua implementation as a build-time option. I don't use any fancy new Lua features in this (64-bit integers are really important for some other things where I looked at Lua, but not for igk), so I tried both Lua and LuaJIT. There wasn't much difference in terms of performance, but LuaJIT was a bit slower than the interpreter.

          My guess is that this is primarily because FFI is slower with LuaJIT and my code did a lot of FFI (basically everything it's doing is calling back into C++ to manipulate the text tree).

          I presume that yklua uses exactly the same memory layout as the C version, so I'd expect it to be better here.

          This is also a problem with a lot of Python JITs: If you make Python faster and make CPython-compatible FFI slower, you generally make Python programs slower.

          ltratt@mastodon.socialL 1 Reply Last reply
          0
          • ltratt@mastodon.socialL ltratt@mastodon.social

            This isn't just a technique for Lua, though -- it works for any C interpreter compilable with LLVM! More about how and why in this new post 'Retrofitting JIT Compilers into C Interpreters' looking at our new 'yk' system. https://tratt.net/laurie/blog/2026/retrofitting_jit_compilers_into_c_interpreters.html

            llimllib@hachyderm.ioL This user is from outside of this forum
            llimllib@hachyderm.ioL This user is from outside of this forum
            llimllib@hachyderm.io
            wrote last edited by
            #5

            @ltratt the videos in your post are not working on my phone

            (Cool work!)

            ltratt@mastodon.socialL 1 Reply Last reply
            0
            • llimllib@hachyderm.ioL llimllib@hachyderm.io

              @ltratt the videos in your post are not working on my phone

              (Cool work!)

              ltratt@mastodon.socialL This user is from outside of this forum
              ltratt@mastodon.socialL This user is from outside of this forum
              ltratt@mastodon.social
              wrote last edited by
              #6

              @llimllib Which browser? They work on my Android phone's browsers, but video compatibility beyond that is a bit of an unknown to me.

              llimllib@hachyderm.ioL 1 Reply Last reply
              0
              • ltratt@mastodon.socialL ltratt@mastodon.social

                This isn't just a technique for Lua, though -- it works for any C interpreter compilable with LLVM! More about how and why in this new post 'Retrofitting JIT Compilers into C Interpreters' looking at our new 'yk' system. https://tratt.net/laurie/blog/2026/retrofitting_jit_compilers_into_c_interpreters.html

                david_chisnall@infosec.exchangeD This user is from outside of this forum
                david_chisnall@infosec.exchangeD This user is from outside of this forum
                david_chisnall@infosec.exchange
                wrote last edited by
                #7

                @ltratt

                There's another approach that's worth mentioning, popularised by Apple's old shader JIT, which looks like a more ad-hoc version of what you've built.

                Each operation was written as a function that took a pointer to the interpreter state and updated it. The interpreter is then a big switch statement calling these functions. These typically all get inlined so you end up with one massive function that runs in a loop.

                To build the JIT, you compile those individual functions to LLVM IR, then JIT compile a function that is equivalent to the calls of a sequence of bytecode. The normal LLVM optimisers can then inline small or infrequently-used opcode bodies, and optimise across the whole program (or whole function, trace, or whatever else you want to JIT). The JIT'd code has the same interpreter state (though may update it only at the end of a trace - apparently marking it as not-aliasing-anything gets you around 10% extra performance), so you can JIT whatever size fragment makes sense.

                ltratt@mastodon.socialL 1 Reply Last reply
                0
                • ltratt@mastodon.socialL ltratt@mastodon.social

                  @llimllib Which browser? They work on my Android phone's browsers, but video compatibility beyond that is a bit of an unknown to me.

                  llimllib@hachyderm.ioL This user is from outside of this forum
                  llimllib@hachyderm.ioL This user is from outside of this forum
                  llimllib@hachyderm.io
                  wrote last edited by
                  #8

                  @ltratt iPhone, I’m not sure how to get error output so this is a terrible bug report

                  1 Reply Last reply
                  0
                  • david_chisnall@infosec.exchangeD david_chisnall@infosec.exchange

                    @ltratt

                    This hopefully makes the trade-off yk offers clear: yklua does not reach the performance peaks of the wonderful, carefully hand-written, LuaJIT

                    For what it's worth:

                    In igk, I use sol3, which lets you select the Lua implementation as a build-time option. I don't use any fancy new Lua features in this (64-bit integers are really important for some other things where I looked at Lua, but not for igk), so I tried both Lua and LuaJIT. There wasn't much difference in terms of performance, but LuaJIT was a bit slower than the interpreter.

                    My guess is that this is primarily because FFI is slower with LuaJIT and my code did a lot of FFI (basically everything it's doing is calling back into C++ to manipulate the text tree).

                    I presume that yklua uses exactly the same memory layout as the C version, so I'd expect it to be better here.

                    This is also a problem with a lot of Python JITs: If you make Python faster and make CPython-compatible FFI slower, you generally make Python programs slower.

                    ltratt@mastodon.socialL This user is from outside of this forum
                    ltratt@mastodon.socialL This user is from outside of this forum
                    ltratt@mastodon.social
                    wrote last edited by
                    #9

                    @david_chisnall I assumed that LuaJIT did quite a good job with FFI performance (the API it defined has spread more widely), but I haven't benchmarked it! That said, there are some heuristics in LuaJIT that do not always play well with real-world code.

                    yklua will just do whatever PUC Lua does, but it will probably inline right up until the FFI call, which might help. That said, right now, you can still hit missing bits that tank performance in any yk interpreter, so it's difficult to say!

                    1 Reply Last reply
                    0
                    • david_chisnall@infosec.exchangeD david_chisnall@infosec.exchange

                      @ltratt

                      There's another approach that's worth mentioning, popularised by Apple's old shader JIT, which looks like a more ad-hoc version of what you've built.

                      Each operation was written as a function that took a pointer to the interpreter state and updated it. The interpreter is then a big switch statement calling these functions. These typically all get inlined so you end up with one massive function that runs in a loop.

                      To build the JIT, you compile those individual functions to LLVM IR, then JIT compile a function that is equivalent to the calls of a sequence of bytecode. The normal LLVM optimisers can then inline small or infrequently-used opcode bodies, and optimise across the whole program (or whole function, trace, or whatever else you want to JIT). The JIT'd code has the same interpreter state (though may update it only at the end of a trace - apparently marking it as not-aliasing-anything gets you around 10% extra performance), so you can JIT whatever size fragment makes sense.

                      ltratt@mastodon.socialL This user is from outside of this forum
                      ltratt@mastodon.socialL This user is from outside of this forum
                      ltratt@mastodon.social
                      wrote last edited by
                      #10

                      @david_chisnall A very early prototype of yk used LLVM for these purposes, but the compilation performance was awful (from memory something like 1000x worse than we needed). It's not really LLVM's fault though: we were feeding it an input it never expected to see. [We also encountered multiple threading bugs, but I imagine those have been fixed in the interim.]

                      1 Reply Last reply
                      0
                      • R relay@relay.mycrowd.ca shared this topic
                        System shared this topic
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups