
Peephole optimizations: adding `opt_respond_to` to the Ruby VM, part 4
exist The Holy Grail of Ruby Syntax: New opt_respond_to
Ruby VM, Part 3I discovered what I call the “Holy Grail” of Ruby syntax. I’m exaggerating a bit, but it’s a readable, continuous way to see how most Ruby syntax compiles. Here’s a snippet from it as a reminder:
// prism_compile.c
static void
pm_compile_node(rb_iseq_t *iseq, const pm_node_t *node, LINK_ANCHOR *const ret, bool popped, pm_scope_node_t *scope_node)
{
const pm_parser_t *parser = scope_node->parser;
//...
switch (PM_NODE_TYPE(node)) {
//...
case PM_ARRAY_NODE: {
// [foo, bar, baz]
// ^^^^^^^^^^^^^^^
const pm_array_node_t *cast = (const pm_array_node_t *) node;
pm_compile_array_node(iseq, (const pm_node_t *) cast, &cast->elements, &location, ret, popped, scope_node);
return;
}
//...
case PM_MODULE_NODE: {
// module Foo; end
//...
}
//...
}
The file where the code resides, prism_compile.c
yes huge. pm_compile_node
It itself has 1800+ lines, and the overall file has 11 lines thousand Wire. This is daunting to say the least, but there are some obvious directions I can ignore – I’m trying to optimize method calls respond_to?
so I can avoid most of the Ruby syntax.
But where exactly am I going?
sage wisdom
What works is that I get two sets of the same orientation Part 3. a copy from Kevin Newtoncreator prism:
Another one from bayirutwho inspired the entire series:
https://bsky.app/profile/byroot.bsky.social/post/3le6xypzykc2x
I don’t want to jump to conclusions, but I think I need to take a look at the Peephole Optimizer 😆.
what exactly yes “Peephole Optimizer”? Kevin describes this process as “compile-after-specialization.” from Wikipedia:
Peephole optimization is an optimization technique performed on a small set of compiler-generated instructions (called peepholes or windows) that involves replacing the instructions with a logically equivalent set with better performance.
https://en.wikipedia.org/wiki/Peephole_optimization
This seems to fit my goals very well. I want to replace the current one opt_send_without_block
Have special guidance opt_respond_to
instructions, optimized as respond_to?
method call.
Find an optimizer
So where does the peephole optimization in CRuby happen now? exist Etienneof public relationshe added optimized code to a function called… iseq_peephole_optimize
. It’s a little bit on the nose, don’t you think? Kevin’s comments also mention iseq_peephole_optimize
– seems to be the winner.
I want to make a connection between iseq_peephole_optimize
where we left pm_compile_node
. Let’s dig into some code!
Tear down existing optimizations
I’ll use Étienne’s frozen array optimization to access the optimizer and see its relevance. If you’d like to continue, start with the setup instructions here: Part 3.
His optimization only works on frozen arrays and hash literals. So we’re going to write a very small Ruby program to demonstrate this and put it in test.rb
In the root directory of our CRuby project:
best way to run test.rb
This is using make
. Not only does it run the file, it also ensures that things like C files are recompiled as needed when changes are made. Let’s run our file but dump the instructions it produces for the Ruby VM:
RUNOPT0=--dump=insns make runruby
RUNOPT0
Let’s add a new option ruby
call, so it is effectively ruby --dump=insns test.rb
. Here is the description of what we see – we can confirm that we are getting optimization opt_ary_freeze
Instructions from Etienne PR:
== disasm: #./test.rb:3 (3,0)-(3,12)>
0000 putself ( 3)[Li]
0001 opt_ary_freeze [],
0004 opt_send_without_block
0006 leave
You never know what your code is really doing until you run it. So far, I have just read and browsed the CRuby source code. iseq_peephole_optimize
live in compile.c
– Let’s set a breakpoint and see 🕵🏼♂️.
Use debugging tools
We can debug C code in CRuby almost As easy as we can use debugger
/binding.pry
.
For MacOS you can use lldb
for Docker/Linux you can use gdb
. i will make every effort lldb
First I’ll show some equivalent commands gdb
back.
Let’s take a look at the peephole optimization code first [].freeze
,in iseq_peephole_optimize
. I’ll add comments above each line to explain what I think it’s doing:
// compile.c
static int
iseq_peephole_optimize(rb_iseq_t *iseq, LINK_ELEMENT *list, const int do_tailcallopt)
{
// ...
// if the instruction is a `newarray` of zero length
3469: if (IS_INSN_ID(iobj, newarray) && iobj->operands[0] == INT2FIX(0)) {
// grab the next element after the current instruction
3470: LINK_ELEMENT *next = iobj->link.next;
// if `next` is an instruction, and the instruction is `send`
3471: if (IS_INSN(next) && (IS_INSN_ID(next, send))) {
3472: const struct rb_callinfo *ci = (struct rb_callinfo *)OPERAND_AT(next, 0);
3473: const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(next, 1);
3474:
// if the callinfo is "simple", with zero arguments,
// and there isn't a block provided(?), and the method id (mid) is `freeze`
// which is represented by `idFreeze`
3475: if (vm_ci_simple(ci) && vm_ci_argc(ci) == 0 && blockiseq == NULL && vm_ci_mid(ci) == idFreeze) {
// change the instruction to `opt_ary_freeze`
3476: iobj->insn_id = BIN(opt_ary_freeze);
// remove the `send` instruction, we don't need it anymore
3481: ELEM_REMOVE(next);
Now I will use lldb
See where this code is running relative to our prism compilation. In CRuby, to debug you run make lldb-ruby
instead of make runruby
. You will see some setup code run and then you will see a prompt prefixed with (lldb)
:
> make lldb-ruby
lldb -o 'command script import -r ../misc/lldb_cruby.py' ruby -- ../test.rb
(lldb) target create "ruby"
Current executable set to '/Users/johncamara/Projects/ruby/build/ruby' (arm64).
(lldb) settings set -- target.run-args "../test.rb"
(lldb) command script import -r ../misc/lldb_cruby.py
lldb scripts for ruby has been installed.
(lldb)
At this point, we haven’t actually run anything yet. We can now set breakpoints and execute the program. After all I will add a breakpoint if
The statement was successful:
(lldb) break set --file compile.c --line 3476
Breakpoint 1: where = ruby`iseq_peephole_optimize + 2276 at compile.c:3476:17
After setting the breakpoint, we call run
Run the program:
You’ll see something similar to the following. It runs the program until it hits a breakpoint, just after identifying the frozen array literal:
(lldb) run
Process 50923 launched: '/ruby/build/ruby' (arm64)
Process 50923 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3476:17
3473 const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(next, 1);
3474
3475 if (vm_ci_simple(ci) && vm_ci_argc(ci) == 0 && blockiseq == NULL && vm_ci_mid(ci) == idFreeze) {
-> 3476 iobj->insn_id = BIN(opt_ary_freeze);
3477 iobj->operand_size = 2;
3478 iobj->operands = compile_data_calloc2(iseq, iobj->operand_size, sizeof(VALUE));
3479 iobj->operands[0] = rb_cArray_empty_frozen;
I’d like to see where all of our prism compiler code stands. we can use bt
Get the traceback:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3476:29
frame #1: ruby`iseq_optimize(...) at compile.c:4352:17
frame #2: ruby`iseq_setup_insn(...) at compile.c:1619:5
frame #3: ruby`pm_iseq_compile_node(...) at prism_compile.c:10139:5
frame #4: ruby`pm_iseq_new_with_opt_try(...) at iseq.c:1029:5
frame #5: ruby`rb_protect(...) at eval.c:1033:18
frame #6: ruby`pm_iseq_new_with_opt(...) at iseq.c:1082:5
frame #7: ruby`pm_new_child_iseq(...) at prism_compile.c:1271:27
frame #8: ruby`pm_compile_node(...) at prism_compile.c:9458:40
frame #9: ruby`pm_compile_node(...) at prism_compile.c:9911:17
frame #10: ruby`pm_compile_scope_node(...) at prism_compile.c:6598:13
frame #11: ruby`pm_compile_node(...) at prism_compile.c:9784:9
frame #12: ruby`pm_iseq_compile_node(...) at prism_compile.c:10122:9
frame #13: ruby`pm_iseq_new_with_opt_try(...) at iseq.c:1029:5
frame #14: ruby`rb_protect(...) at eval.c:1033:18
frame #15: ruby`pm_iseq_new_with_opt(...) at iseq.c:1082:5
frame #16: ruby`pm_iseq_new_top(...) at iseq.c:906:12
frame #17: ruby`load_iseq_eval(...) at load.c:756:24
frame #18: ruby`require_internal(...) at load.c:1296:21
frame #19: ruby`rb_require_string_internal(...) at load.c:1402:22
frame #20: ruby`rb_require_string(...) at load.c:1388:12
frame #21: ruby`rb_f_require(...) at load.c:1029:12
frame #22: ruby`ractor_safe_call_cfunc_1(...) at vm_insnhelper.c:3624:12
frame #23: ruby`vm_call_cfunc_with_frame_(...) at vm_insnhelper.c:3801:11
frame #24: ruby`vm_call_cfunc_with_frame(...) at vm_insnhelper.c:3847:12
frame #25: ruby`vm_call_cfunc_other(...) at vm_insnhelper.c:3873:16
frame #26: ruby`vm_call_cfunc(...) at vm_insnhelper.c:3955:12
frame #27: ruby`vm_call_method_each_type(...) at vm_insnhelper.c:4779:16
frame #28: ruby`vm_call_method(...) at vm_insnhelper.c:4916:20
frame #29: ruby`vm_call_general(...) at vm_insnhelper.c:4949:12
frame #30: ruby`vm_sendish(...) at vm_insnhelper.c:5968:15
frame #31: ruby`vm_exec_core(...) at insns.def:898:11
frame #32: ruby`rb_vm_exec(...) at vm.c:2595:22
frame #33: ruby`rb_iseq_eval(...) at vm.c:2850:11
frame #34: ruby`rb_load_with_builtin_functions(...) at builtin.c:54:5
frame #35: ruby`Init_builtin_features at builtin.c:74:5
frame #36: ruby`ruby_init_prelude at ruby.c:1750:5
frame #37: ruby`ruby_opt_init(...) at ruby.c:1811:5
frame #38: ruby`prism_script(...) at ruby.c:2215:13
frame #39: ruby`process_options(...) at ruby.c:2538:9
frame #40: ruby`ruby_process_options(...) at ruby.c:3169:12
frame #41: ruby`ruby_options(...) at eval.c:117:16
frame #42: ruby`rb_main(...) at main.c:43:26
frame #43: ruby`main(...) at main.c:68:12
Wow. That thing is so big! This is not the throwback I was expecting! It seems that I missed the code path in my early exploration. I guessed it right until prism_script
:
main
- which call
rb_main
- which call
ruby_options
Thenruby_process_options
Thenprocess_options
- which call
prism_script
- The next instruction I expect is
pm_iseq_new_main
but we enterruby_opt_init
- which call
Init_builtin_features
This path seems to go through some gem preloading logic, which is why we see rb_require
call:
void
Init_builtin_features(void)
{
rb_load_with_builtin_functions("gem_prelude", NULL);
}
Default CRuby loading gem_prelude
it lives in ruby/gem_prelude.rb
. Here is the document, shortened for brevity:
require 'rubygems'
require 'error_highlight'
require 'did_you_mean'
require 'syntax_suggest/core_ext'
Just-in-time compilation
I learned something here that seems obvious in hindsight, but I hadn’t considered it. Ruby will only compile the actual content Loadedand only when it is loaded. If I never load a specific snippet of code, it will never compile. Or if I defer loading it until later, it won’t be compiled until later.
We can actually prove this by deferring the requirement:
sleep 10
require "net/http"
If we run this using make lldb-ruby
we can see the actual effect of delayed compilation:
(lldb) break set --file ruby.c --line 2616
(lldb) run
// hits our prism compile code
(lldb) next
(lldb) break set --file compile.c --line 3476
(lldb) continue
// waits 10 seconds, then compiles the contents of "net/http"
Enter our test.rb file
I’d rather just see my code test.rb
Compilation is complete, so I set a breakpoint directly on it pm_iseq_new_main
for me it is in ruby.c
online 2616
:
(lldb) break set --file ruby.c --line 2616
(lldb) run
Process 32534 launched: '/ruby/build/ruby' (arm64)
Process 32534 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: ruby`process_options(...) at ruby.c:2616:38
2613 if (!result.ast) {
2614 pm_parse_result_t *pm = &result.prism;
2615 int error_state;
-> 2616 iseq = pm_iseq_new_main(&pm->node, opt->script_name, path, parent, optimize, &error_state);
2617
2618 pm_parse_result_free(pm);
2619
Now when we run the traceback I see what I expected since we have skipped gem_prelude
compilation. This is the exact process I went through Part 2:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: ruby`process_options(...) at ruby.c:2616:38
frame #1: ruby`ruby_process_options(...) at ruby.c:3169:12
frame #2: ruby`ruby_options(...) at eval.c:117:16
frame #3: ruby`rb_main(...) at main.c:43:26
frame #4: ruby`main(...) at main.c:68:12
From here, we can set our iseq_peephole_optimize
breakpoint and see only our specific code compiled. Since we are already running the program, we call continue
Continue execution:
(lldb) break set --file compile.c --line 3476
Breakpoint 2: where = ruby`iseq_peephole_optimize + 2276 at compile.c:3476:17
(lldb) continue
Process 55336 resuming
Process 55336 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
frame #0: ruby`iseq_peephole_optimize() at compile.c:3476:17
3473 const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(next, 1);
3474
3475 if (vm_ci_simple(ci) && vm_ci_argc(ci) == 0 && blockiseq == NULL && vm_ci_mid(ci) == idFreeze) {
-> 3476 iobj->insn_id = BIN(opt_ary_freeze);
3477 iobj->operand_size = 2;
3478 iobj->operands = compile_data_calloc2(iseq, iobj->operand_size, sizeof(VALUE));
3479 iobj->operands[0] = rb_cArray_empty_frozen;
if we call bt
Backtracking from here, we finally see the connection between prism_compile.c
and compile.c
. pm_iseq_compile_node
incoming call iseq_setup_insn
which runs the optimization logic. In a previous post I saw iseq_setup_insn
but I have no idea what it means or what it does. Now we know. This is what Kevin Newton mentioned before: specialization comes after compilation. Prism compiles the node in the standard way, then applies the peephole optimization layer – specialization – and then:
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
* frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3476:17
frame #1: ruby`iseq_optimize(...) at compile.c:4352:17
frame #2: ruby`iseq_setup_insn(...) at compile.c:1619:5
frame #3: ruby`pm_iseq_compile_node(...) at prism_compile.c:10139:5
frame #4: ruby`pm_iseq_new_with_opt_try(...) at iseq.c:1029:5
frame #5: ruby`rb_protect(...) at eval.c:1033:18
frame #6: ruby`pm_iseq_new_with_opt(...) at iseq.c:1082:5
frame #7: ruby`pm_iseq_new_main(...) at iseq.c:930:12
frame #8: ruby`process_options(...) at ruby.c:2616:20
frame #9: ruby`ruby_process_options(...) at ruby.c:3169:12
frame #10: ruby`ruby_options(...) at eval.c:117:16
frame #11: ruby`rb_main(...) at main.c:43:26
frame #12: ruby`main(...) at main.c:68:12
From here we can use inspect and see the current directive expr
:
(lldb) expr *(iobj)
(INSN) $4 = {
link = {
type = ISEQ_ELEMENT_INSN
next = 0x000000011f6568d0
prev = 0x000000011f656850
}
insn_id = YARVINSN_newarray
operand_size = 1
sc_state = 0
operands = 0x000000011f640118
insn_info = (line_no = 1, node_id = 3, events = 0)
}
we see iobj
Contains links to follow-up instructions and insn_id
and some other metadata. This directive is currently YARVINSN_newarray
. if we run next
should run iobj->insn_id = BIN(opt_ary_freeze);
our directive should change:
(lldb) next
(lldb) expr *(iobj)
(INSN) $5 = {
//...
insn_id = YARVINSN_opt_ary_freeze
//...
}
Indeed! The directive has been changed to newarray
arrive opt_ary_freeze
! Optimization is at least partially done (I’m not sure if it involves more).
take a small step opt_respond_to
This is already the longest and densest post in the series. But I’d love to make some real progress on new guidance. Let’s do pattern matching respond_to?
In the peephole optimizer.
This is our sample program:
puts "Did you know you can write to $stdout?" if $stdout.respond_to?(:write)
Run with RUNOPT0=--dump=insns make runruby
we get the following instructions:
== disasm: #./test.rb:1 (1,0)-(1,76)>
0000 getglobal :$stdout ( 1)[Li]
0002 putobject :write
0004 opt_send_without_block
0006 branchunless 14
0008 putself
0009 putchilledstring "Did you know you can write to $stdout?"
0011 opt_send_without_block
0013 leave
0014 putnil
0015 leave
I want to match on this line:
0004 opt_send_without_block
This is my attempt. what do i want to copy newarray
freeze
Optimization is in progress, just trying to change a few things to match my example. Just below the code we’ve been debugging newarray
I add this:
// If the instruction is `send_without_block`, ie `0004 opt_send_without_block`
if (IS_INSN_ID(iobj, send_without_block)) {
// Pull the same info the `newarray` optimization does
const struct rb_callinfo *ci = (struct rb_callinfo *)OPERAND_AT(iobj, 0);
const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(iobj, 1);
//
// 1. We have ARGS_SIMPLE, which is probably what `vm_ci_simple(ci)` checks for
// 2. We have argc:1, which should match `vm_ci_argc(ci) == 1`
// 3. We send without a block, hence blockiseq == NULL
// 4. The method id (mid) for `vm_ci_mid(ci)` matches `idRespond_to`. I searched around for names
// that seemed similar to idFreeze, but replacing `idFreeze` with `idRespond` and found `idRespond_to`
if (vm_ci_simple(ci) && vm_ci_argc(ci) == 1 && blockiseq == NULL && vm_ci_mid(ci) == idRespond_to) {
int i = 0;
}
}
Now I will follow the same debugging as before but I will add a breakpoint compile.c
I added new code in it. Specifically, I set a breakpoint at int i = 0;
so i’m in it if
statement:
(lldb) break set --file ruby.c --line 2616
Breakpoint 1: where = ruby`process_options + 4068 at ruby.c:2616:38
(lldb) run
(lldb) break set --file compile.c --line 3491
Breakpoint 2: where = ruby`iseq_peephole_optimize + 2536 at compile.c:3491:17
(lldb) continue
Process 61925 resuming
Process 61925 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
frame #0: ruby`iseq_peephole_optimize(...) at compile.c:3491:17
3488 const rb_iseq_t *blockiseq = (rb_iseq_t *)OPERAND_AT(iobj, 1);
3489
3490 if (vm_ci_simple(ci) && vm_ci_argc(ci) == 1 && blockiseq == NULL && vm_ci_mid(ci) == idRespond_to) {
-> 3491 int i = 0;
3492 }
3493 }
3494
I think it works! its pattern matches the features respond_to?
Call and hit the breakpoint setting int i = 0;
. It’s a small step, but it’s the first step toward adding optimization.
use gdb
For anyone wanting to do the same job using gdb
very similar. Let’s first create a breakpoints.gdb
The file is located in the root directory of the project. This will set an initial breakpoint for you, similar to how we operate lldb
and set a breakpoint before calling run
:
when you run make gdb-ruby
you can use the same traceback command, bt
:
> make gdb-ruby
Thread 1 "ruby" hit Breakpoint 4, process_options (...) at ../ruby.c:2616
2616 iseq = pm_iseq_new_main(&pm->node, opt->script_name, path, parent, optimize, &error_state);
(gdb) bt
#0 process_options (...) at ../ruby.c:2616
#1 in ruby_process_options (...) at ../ruby.c:3169
#2 in ruby_options (...) at ../eval.c:117
#3 in rb_main (...) at ../main.c:43
#4 in main (...) at ../main.c:68
(gdb)
From here you can set the next breakpoint so that you can only see the compilation newarray
our instructions test.rb
program:
(gdb) break compile.c:3476
Breakpoint 5 at 0xaaaabaa22f14: file ../compile.c, line 3476
(gdb) continue
Continuing.
Thread 1 "ruby" hit Breakpoint 5, iseq_peephole_optimize (...) at ../compile.c:3476
3476 iobj->insn_id = BIN(opt_ary_freeze);
similar lldb
Order expr
we can check local content using p
or print
exist gdb
:
(gdb) p *(iobj)
$2 = {link = {type = ISEQ_ELEMENT_INSN, next = 0xaaaace797ef0, prev = 0xaaaace797e70}, insn_id = YARVINSN_newarray,
operand_size = 1, sc_state = 0, operands = 0xaaaace796ac8, insn_info = {line_no = 1, node_id = 3, events = 0}}
Finished
Well, it took a long time. So glad you stuck with me! We’ve found the optimizer, and we’ve matched the pattern to respond_to?
call. Next, we need to add a new directive definition and try to actually replace send
with our new instructions. See you next time! 👋🏼
2024-12-28 10:32:36