Peephole optimizations: adding `opt_respond_to` to the Ruby VM, part 4
December 31, 2024

Peephole optimizations: adding `opt_respond_to` to the Ruby VM, part 4

exist The Holy Grail of Ruby Syntax: New opt_respond_to Ruby VM, Part 3I discovered what I call the “Holy Grail” of Ruby syntax. I’m exaggerating a bit, but it’s a readable, continuous way to see how most Ruby syntax compiles. Here’s a snippet from it as a reminder:

// prism_compile.c
static void
pm_compile_node(rb_iseq_t *iseq, const pm_node_t *node, LINK_ANCHOR *const ret, bool popped, pm_scope_node_t *scope_node)
{
    const pm_parser_t *parser = scope_node->parser;
    //...
    switch (PM_NODE_TYPE(node)) {
      //...
      case PM_ARRAY_NODE: {
        // [foo, bar, baz]
        // ^^^^^^^^^^^^^^^
        const pm_array_node_t *cast = (const pm_array_node_t *) node;
        pm_compile_array_node(iseq, (const pm_node_t *) cast, &cast->elements, &location, ret, popped, scope_node);
        return;
      }
      //...
      case PM_MODULE_NODE: {
        // module Foo; end
        //...
      }
      //...
}

The file where the code resides, prism_compile.cyes huge. pm_compile_node It itself has 1800+ lines, and the overall file has 11 lines thousand Wire. This is daunting to say the least, but there are some obvious directions I can ignore – I’m trying to optimize method calls respond_to?so I can avoid most of the Ruby syntax.

But where exactly am I going?

sage wisdom

What works is that I get two sets of the same orientation Part 3. a copy from Kevin Newtoncreator prism:

https://x.com/kddnewton/status/1872280281409105925?s=46

Another one from bayirutwho inspired the entire series:

https://bsky.app/profile/byroot.bsky.social/post/3le6xypzykc2x

I don’t want to jump to conclusions, but I think I need to take a look at the Peephole Optimizer 😆.

what exactly yes “Peephole Optimizer”? Kevin describes this process as “compile-after-specialization.” from Wikipedia:

Peephole optimization is an optimization technique performed on a small set of compiler-generated instructions (called peepholes or windows) that involves replacing the instructions with a logically equivalent set with better performance.
https://en.wikipedia.org/wiki/Peephole_optimization

This seems to fit my goals very well. I want to replace the current one opt_send_without_block Have special guidance opt_respond_to instructions, optimized as respond_to? method call.

Find an optimizer

So where does the peephole optimization in CRuby happen now? exist Etienneof public relationshe added optimized code to a function called… iseq_peephole_optimize. It’s a little bit on the nose, don’t you think? Kevin’s comments also mention iseq_peephole_optimize – seems to be the winner.

I want to make a connection between iseq_peephole_optimize where we left pm_compile_node. Let’s dig into some code!

Tear down existing optimizations

I’ll use Étienne’s frozen array optimization to access the optimizer and see its relevance. If you’d like to continue, start with the setup instructions here: Part 3.

His optimization only works on frozen arrays and hash literals. So we’re going to write a very small Ruby program to demonstrate this and put it in test.rb In the root directory of our CRuby project:

best way to run test.rb This is using make. Not only does it run the file, it also ensures that things like C files are recompiled as needed when changes are made. Let’s run our file but dump the instructions it produces for the Ruby VM:

RUNOPT0=--dump=insns make runruby

RUNOPT0 Let’s add a new option ruby call, so it is effectively ruby --dump=insns test.rb. Here is the description of what we see – we can confirm that we are getting optimization opt_ary_freeze Instructions from Etienne PR:

== disasm: #./test.rb:3 (3,0)-(3,12)>
0000 putself                      (   3)[Li]
0001 opt_ary_freeze               [], 
0004 opt_send_without_block       
0006 leave

You never know what your code is really doing until you run it. So far, I have just read and browsed the CRuby source code. iseq_peephole_optimize live in compile.c – Let’s set a breakpoint and see 🕵🏼‍♂️.

Use debugging tools

We can debug C code in CRuby almost As easy as we can use debugger/binding.pry.

For MacOS you can use lldbfor Docker/Linux you can use gdb. i will make every effort lldb First I’ll show some equivalent commands gdb back.

Let’s take a look at the peephole optimization code first [].freeze,in iseq_peephole_optimize. I’ll add comments above each line to explain what I think it’s doing:

// compile.c
static int
iseq_peephole_optimize(rb_iseq_t *iseq, LINK_ELEMENT *list, const int do_tailcallopt)
{
         // ...
         // if the instruction is a `newarray` of zero length
3469:    if (IS_INSN_ID(iobj, newarray) && iobj->operands[0] == INT2FIX(0)) {
             // grab the next element after the current instruction
3470:        LINK_ELEMENT *next = iobj->link.next;
             // if `next` is an instruction, and the instruction is `send`
3471:        if (IS_INSN(next) && (IS_INSN_ID(next, send))) {
3472:            const struct rb_callinfo *ci = (struct rb_callinfo *)OPERAND_AT(next, 0);
3473:            const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(next, 1);
3474:
                 // if the callinfo is "simple", with zero arguments,
                 // and there isn't a block provided(?), and the method id (mid) is `freeze`
                 // which is represented by `idFreeze`
3475:            if (vm_ci_simple(ci) && vm_ci_argc(ci) == 0 && blockiseq == NULL && vm_ci_mid(ci) == idFreeze) {
                     // change the instruction to `opt_ary_freeze`
3476:                iobj->insn_id = BIN(opt_ary_freeze);
                     // remove the `send` instruction, we don't need it anymore
3481:                ELEM_REMOVE(next);

Now I will use lldb See where this code is running relative to our prism compilation. In CRuby, to debug you run make lldb-ruby instead of make runruby. You will see some setup code run and then you will see a prompt prefixed with (lldb):

> make lldb-ruby
lldb  -o 'command script import -r ../misc/lldb_cruby.py' ruby --  ../test.rb
(lldb) target create "ruby"
Current executable set to '/Users/johncamara/Projects/ruby/build/ruby' (arm64).
(lldb) settings set -- target.run-args  "../test.rb"
(lldb) command script import -r ../misc/lldb_cruby.py
lldb scripts for ruby has been installed.
(lldb)

At this point, we haven’t actually run anything yet. We can now set breakpoints and execute the program. After all I will add a breakpoint if The statement was successful:

(lldb) break set --file compile.c --line 3476
Breakpoint 1: where = ruby`iseq_peephole_optimize + 2276 at compile.c:3476:17

After setting the breakpoint, we call run Run the program:

You’ll see something similar to the following. It runs the program until it hits a breakpoint, just after identifying the frozen array literal:

(lldb) run
Process 50923 launched: '/ruby/build/ruby' (arm64)
Process 50923 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3476:17
   3473             const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(next, 1);
   3474
   3475             if (vm_ci_simple(ci) && vm_ci_argc(ci) == 0 && blockiseq == NULL && vm_ci_mid(ci) == idFreeze) {
-> 3476                 iobj->insn_id = BIN(opt_ary_freeze);
   3477                 iobj->operand_size = 2;
   3478                 iobj->operands = compile_data_calloc2(iseq, iobj->operand_size, sizeof(VALUE));
   3479                 iobj->operands[0] = rb_cArray_empty_frozen;

I’d like to see where all of our prism compiler code stands. we can use bt Get the traceback:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3476:29
    frame #1: ruby`iseq_optimize(...) at compile.c:4352:17
    frame #2: ruby`iseq_setup_insn(...) at compile.c:1619:5
    frame #3: ruby`pm_iseq_compile_node(...) at prism_compile.c:10139:5
    frame #4: ruby`pm_iseq_new_with_opt_try(...) at iseq.c:1029:5
    frame #5: ruby`rb_protect(...) at eval.c:1033:18
    frame #6: ruby`pm_iseq_new_with_opt(...) at iseq.c:1082:5
    frame #7: ruby`pm_new_child_iseq(...) at prism_compile.c:1271:27
    frame #8: ruby`pm_compile_node(...) at prism_compile.c:9458:40
    frame #9: ruby`pm_compile_node(...) at prism_compile.c:9911:17
    frame #10: ruby`pm_compile_scope_node(...) at prism_compile.c:6598:13
    frame #11: ruby`pm_compile_node(...) at prism_compile.c:9784:9
    frame #12: ruby`pm_iseq_compile_node(...) at prism_compile.c:10122:9
    frame #13: ruby`pm_iseq_new_with_opt_try(...) at iseq.c:1029:5
    frame #14: ruby`rb_protect(...) at eval.c:1033:18
    frame #15: ruby`pm_iseq_new_with_opt(...) at iseq.c:1082:5
    frame #16: ruby`pm_iseq_new_top(...) at iseq.c:906:12
    frame #17: ruby`load_iseq_eval(...) at load.c:756:24
    frame #18: ruby`require_internal(...) at load.c:1296:21
    frame #19: ruby`rb_require_string_internal(...) at load.c:1402:22
    frame #20: ruby`rb_require_string(...) at load.c:1388:12
    frame #21: ruby`rb_f_require(...) at load.c:1029:12
    frame #22: ruby`ractor_safe_call_cfunc_1(...) at vm_insnhelper.c:3624:12
    frame #23: ruby`vm_call_cfunc_with_frame_(...) at vm_insnhelper.c:3801:11
    frame #24: ruby`vm_call_cfunc_with_frame(...) at vm_insnhelper.c:3847:12
    frame #25: ruby`vm_call_cfunc_other(...) at vm_insnhelper.c:3873:16
    frame #26: ruby`vm_call_cfunc(...) at vm_insnhelper.c:3955:12
    frame #27: ruby`vm_call_method_each_type(...) at vm_insnhelper.c:4779:16
    frame #28: ruby`vm_call_method(...) at vm_insnhelper.c:4916:20
    frame #29: ruby`vm_call_general(...) at vm_insnhelper.c:4949:12
    frame #30: ruby`vm_sendish(...) at vm_insnhelper.c:5968:15
    frame #31: ruby`vm_exec_core(...) at insns.def:898:11
    frame #32: ruby`rb_vm_exec(...) at vm.c:2595:22
    frame #33: ruby`rb_iseq_eval(...) at vm.c:2850:11
    frame #34: ruby`rb_load_with_builtin_functions(...) at builtin.c:54:5
    frame #35: ruby`Init_builtin_features at builtin.c:74:5
    frame #36: ruby`ruby_init_prelude at ruby.c:1750:5
    frame #37: ruby`ruby_opt_init(...) at ruby.c:1811:5
    frame #38: ruby`prism_script(...) at ruby.c:2215:13
    frame #39: ruby`process_options(...) at ruby.c:2538:9
    frame #40: ruby`ruby_process_options(...) at ruby.c:3169:12
    frame #41: ruby`ruby_options(...) at eval.c:117:16
    frame #42: ruby`rb_main(...) at main.c:43:26
    frame #43: ruby`main(...) at main.c:68:12

Wow. That thing is so big! This is not the throwback I was expecting! It seems that I missed the code path in my early exploration. I guessed it right until prism_script:

  • main
  • which call rb_main
  • which call ruby_optionsThen ruby_process_optionsThen process_options
  • which call prism_script
  • The next instruction I expect is pm_iseq_new_mainbut we enter ruby_opt_init
  • which call Init_builtin_features

This path seems to go through some gem preloading logic, which is why we see rb_require call:

void
Init_builtin_features(void)
{
    rb_load_with_builtin_functions("gem_prelude", NULL);
}

Default CRuby loading gem_preludeit lives in ruby/gem_prelude.rb. Here is the document, shortened for brevity:

require 'rubygems'
require 'error_highlight'
require 'did_you_mean'
require 'syntax_suggest/core_ext'

Just-in-time compilation

I learned something here that seems obvious in hindsight, but I hadn’t considered it. Ruby will only compile the actual content Loadedand only when it is loaded. If I never load a specific snippet of code, it will never compile. Or if I defer loading it until later, it won’t be compiled until later.

We can actually prove this by deferring the requirement:

sleep 10

require "net/http"

If we run this using make lldb-rubywe can see the actual effect of delayed compilation:

(lldb) break set --file ruby.c --line 2616
(lldb) run
// hits our prism compile code
(lldb) next
(lldb) break set --file compile.c --line 3476
(lldb) continue
// waits 10 seconds, then compiles the contents of "net/http"

Enter our test.rb file

I’d rather just see my code test.rb Compilation is complete, so I set a breakpoint directly on it pm_iseq_new_mainfor me it is in ruby.c online 2616:

(lldb) break set --file ruby.c --line 2616
(lldb) run
Process 32534 launched: '/ruby/build/ruby' (arm64)
Process 32534 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: ruby`process_options(...) at ruby.c:2616:38
   2613         if (!result.ast) {
   2614             pm_parse_result_t *pm = &result.prism;
   2615             int error_state;
-> 2616             iseq = pm_iseq_new_main(&pm->node, opt->script_name, path, parent, optimize, &error_state);
   2617
   2618             pm_parse_result_free(pm);
   2619

Now when we run the traceback I see what I expected since we have skipped gem_prelude compilation. This is the exact process I went through Part 2:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: ruby`process_options(...) at ruby.c:2616:38
    frame #1: ruby`ruby_process_options(...) at ruby.c:3169:12
    frame #2: ruby`ruby_options(...) at eval.c:117:16
    frame #3: ruby`rb_main(...) at main.c:43:26
    frame #4: ruby`main(...) at main.c:68:12

From here, we can set our iseq_peephole_optimize breakpoint and see only our specific code compiled. Since we are already running the program, we call continue Continue execution:

(lldb) break set --file compile.c --line 3476
Breakpoint 2: where = ruby`iseq_peephole_optimize + 2276 at compile.c:3476:17
(lldb) continue
Process 55336 resuming
Process 55336 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
    frame #0: ruby`iseq_peephole_optimize() at compile.c:3476:17
   3473             const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(next, 1);
   3474
   3475             if (vm_ci_simple(ci) && vm_ci_argc(ci) == 0 && blockiseq == NULL && vm_ci_mid(ci) == idFreeze) {
-> 3476                 iobj->insn_id = BIN(opt_ary_freeze);
   3477                 iobj->operand_size = 2;
   3478                 iobj->operands = compile_data_calloc2(iseq, iobj->operand_size, sizeof(VALUE));
   3479                 iobj->operands[0] = rb_cArray_empty_frozen;

if we call bt Backtracking from here, we finally see the connection between prism_compile.c and compile.c. pm_iseq_compile_node incoming call iseq_setup_insnwhich runs the optimization logic. In a previous post I saw iseq_setup_insnbut I have no idea what it means or what it does. Now we know. This is what Kevin Newton mentioned before: specialization comes after compilation. Prism compiles the node in the standard way, then applies the peephole optimization layer – specialization – and then:

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
  * frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3476:17
    frame #1: ruby`iseq_optimize(...) at compile.c:4352:17
    frame #2: ruby`iseq_setup_insn(...) at compile.c:1619:5
    frame #3: ruby`pm_iseq_compile_node(...) at prism_compile.c:10139:5
    frame #4: ruby`pm_iseq_new_with_opt_try(...) at iseq.c:1029:5
    frame #5: ruby`rb_protect(...) at eval.c:1033:18
    frame #6: ruby`pm_iseq_new_with_opt(...) at iseq.c:1082:5
    frame #7: ruby`pm_iseq_new_main(...) at iseq.c:930:12
    frame #8: ruby`process_options(...) at ruby.c:2616:20
    frame #9: ruby`ruby_process_options(...) at ruby.c:3169:12
    frame #10: ruby`ruby_options(...) at eval.c:117:16
    frame #11: ruby`rb_main(...) at main.c:43:26
    frame #12: ruby`main(...) at main.c:68:12

From here we can use inspect and see the current directive expr:

(lldb) expr *(iobj)
(INSN) $4 = {
  link = {
    type = ISEQ_ELEMENT_INSN
    next = 0x000000011f6568d0
    prev = 0x000000011f656850
  }
  insn_id = YARVINSN_newarray
  operand_size = 1
  sc_state = 0
  operands = 0x000000011f640118
  insn_info = (line_no = 1, node_id = 3, events = 0)
}

we see iobj Contains links to follow-up instructions and insn_id and some other metadata. This directive is currently YARVINSN_newarray. if we run nextshould run iobj->insn_id = BIN(opt_ary_freeze);our directive should change:

(lldb) next
(lldb) expr *(iobj)
(INSN) $5 = {
  //...
  insn_id = YARVINSN_opt_ary_freeze
  //...
}

Indeed! The directive has been changed to newarray arrive opt_ary_freeze! Optimization is at least partially done (I’m not sure if it involves more).

take a small step opt_respond_to

This is already the longest and densest post in the series. But I’d love to make some real progress on new guidance. Let’s do pattern matching respond_to? In the peephole optimizer.

This is our sample program:

puts "Did you know you can write to $stdout?" if $stdout.respond_to?(:write)

Run with RUNOPT0=--dump=insns make runrubywe get the following instructions:

== disasm: #./test.rb:1 (1,0)-(1,76)>
0000 getglobal                              :$stdout                  (   1)[Li]
0002 putobject                              :write
0004 opt_send_without_block                 
0006 branchunless                           14
0008 putself
0009 putchilledstring                       "Did you know you can write to $stdout?"
0011 opt_send_without_block                 
0013 leave
0014 putnil
0015 leave

I want to match on this line:

0004 opt_send_without_block       

This is my attempt. what do i want to copy newarray freeze Optimization is in progress, just trying to change a few things to match my example. Just below the code we’ve been debugging newarrayI add this:

// If the instruction is `send_without_block`, ie `0004 opt_send_without_block`
if (IS_INSN_ID(iobj, send_without_block)) {
    // Pull the same info the `newarray` optimization does
    const struct rb_callinfo *ci = (struct rb_callinfo *)OPERAND_AT(iobj, 0);
    const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(iobj, 1);

    // 
    // 1. We have ARGS_SIMPLE, which is probably what `vm_ci_simple(ci)` checks for
    // 2. We have argc:1, which should match `vm_ci_argc(ci) == 1`
    // 3. We send without a block, hence blockiseq == NULL
    // 4. The method id (mid) for `vm_ci_mid(ci)` matches `idRespond_to`. I searched around for names
    //    that seemed similar to idFreeze, but replacing `idFreeze` with `idRespond` and found `idRespond_to`
    if (vm_ci_simple(ci) && vm_ci_argc(ci) == 1 && blockiseq == NULL && vm_ci_mid(ci) == idRespond_to) {
        int i = 0;
    }
}

Now I will follow the same debugging as before but I will add a breakpoint compile.c I added new code in it. Specifically, I set a breakpoint at int i = 0; so i’m in it if statement:

(lldb) break set --file ruby.c --line 2616
Breakpoint 1: where = ruby`process_options + 4068 at ruby.c:2616:38
(lldb) run
(lldb) break set --file compile.c --line 3491
Breakpoint 2: where = ruby`iseq_peephole_optimize + 2536 at compile.c:3491:17
(lldb) continue
Process 61925 resuming
Process 61925 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
    frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3491:17
   3488         const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(iobj, 1);
   3489
   3490         if (vm_ci_simple(ci) && vm_ci_argc(ci) == 1 && blockiseq == NULL && vm_ci_mid(ci) == idRespond_to) {
-> 3491             int i = 0;
   3492         }
   3493     }
   3494

I think it works! its pattern matches the features respond_to? Call and hit the breakpoint setting int i = 0;. It’s a small step, but it’s the first step toward adding optimization.

use gdb

For anyone wanting to do the same job using gdbvery similar. Let’s first create a breakpoints.gdb The file is located in the root directory of the project. This will set an initial breakpoint for you, similar to how we operate lldband set a breakpoint before calling run:

when you run make gdb-rubyyou can use the same traceback command, bt:

> make gdb-ruby
Thread 1 "ruby" hit Breakpoint 4, process_options (...) at ../ruby.c:2616
2616	            iseq = pm_iseq_new_main(&pm->node, opt->script_name, path, parent, optimize, &error_state);
(gdb) bt
#0  process_options (...) at ../ruby.c:2616
#1  in ruby_process_options (...) at ../ruby.c:3169
#2  in ruby_options (...) at ../eval.c:117
#3  in rb_main (...) at ../main.c:43
#4  in main (...) at ../main.c:68
(gdb)

From here you can set the next breakpoint so that you can only see the compilation newarray our instructions test.rb program:

(gdb) break compile.c:3476
Breakpoint 5 at 0xaaaabaa22f14: file ../compile.c, line 3476
(gdb) continue
Continuing.

Thread 1 "ruby" hit Breakpoint 5, iseq_peephole_optimize (...) at ../compile.c:3476
3476	                iobj->insn_id = BIN(opt_ary_freeze);

similar lldb Order exprwe can check local content using p or print exist gdb:

(gdb) p *(iobj)
$2 = {link = {type = ISEQ_ELEMENT_INSN, next = 0xaaaace797ef0, prev = 0xaaaace797e70}, insn_id = YARVINSN_newarray,
  operand_size = 1, sc_state = 0, operands = 0xaaaace796ac8, insn_info = {line_no = 1, node_id = 3, events = 0}}

Finished

Well, it took a long time. So glad you stuck with me! We’ve found the optimizer, and we’ve matched the pattern to respond_to? call. Next, we need to add a new directive definition and try to actually replace send with our new instructions. See you next time! 👋🏼

2024-12-28 10:32:36

Leave a Reply

Your email address will not be published. Required fields are marked *