A look at how Ruby
interprets your code

Welcome to a new Ruby Magic article! This time we'll be looking at how Ruby interprets our code, and how we can use this knowledge to our advantage. This post will help you understand how code is interpreted, and how this can help lead to faster code.

A subtle difference between symbols

In a previous Ruby Magic article about Escaping characters in Ruby there was an example about escaping line breaks.

In the example below you see how two strings are combined as one String across multiple lines, with either the plus + symbol or with the backslash \.

"foo" +
  "bar"
=> "foobar"

# versus

"foo" \
  "bar"
=> "foobar"

These two examples may look similar, but they behave quite differently. To know the difference between how these are read and interpreted, you'd normally need to know the nitty-gritty about the Ruby interpreter. Or, we can just ask Ruby what the difference is.

InstructionSequence

Using the RubyVM::InstructionSequence class we can ask Ruby how it interprets some code we give it. This class gives us a tool set which we can use to get a glimpse of Ruby's internals.

What is returned in the example below is Ruby code as it's understood by the YARV interpreter.

YARV interpreter

YARV (Yet Another Ruby VM) is the Ruby interpreter introduced in Ruby version 1.9, replacing the original interpreter: MRI (Matz's Ruby Interpreter).

Languages that use interpreters directly execute code without an intermediate compilation step. This means that Ruby does not first compile a program to an optimized machine language program, which compiled languages such as C, Rust and Go do.

In Ruby, a program is first translated to an instruction set for the Ruby VM, and is then executed immediately after. These instructions are an intermediate step between your Ruby code and the code being executed in the Ruby VM.

These instructions make it easier for the Ruby VM to understand Ruby code without having to deal with syntax specific interpretation. That's handled while creating these instructions. Instruction sequences are optimized operations which represent the interpreted code.

During the normal execution of a Ruby program we don't see these instructions, but by viewing them we can review if Ruby has interpreted our code correctly. With InstructionSequence it's possible to see what kind of instructions YARV creates before it executes them.

It's not necessary to understand all of the YARV instructions that make up the Ruby interpreter. Most commands will speak for themselves.

"foo" +
  "bar"
RubyVM::InstructionSequence.compile('"foo" + "bar"').to_a
# ... [:putstring, "foo"], [:putstring, "bar"] ...

# versus

"foo" \
  "bar"
RubyVM::InstructionSequence.compile('"foo" "bar"').to_a
# ... [:putstring, "foobar"] ...

The real output contains a bit more setup commands that we will look at later, but here we can see the real difference between "foo" + "bar" and "foo" "bar".

The former creates two strings and combines them. The latter creates one string. This means that with "foo" "bar" we only create one string, rather than three with "foo" + "bar".

  1       2           3
  ↓       ↓           ↓
"foo" + "bar" # => "foobar"

Of course, this is just about the most basic example we can use, but it shows a good use case of how a small detail in the Ruby language could potentially have a lot of impact:

More allocations: every String object is allocated separately.
More memory usage: every allocated String object takes up memory.
Longer garbage collection: every object, even when short-lived, takes up time to be cleaned by the garbage collector. More allocations means longer garbage collection times.

Disassembling

Another use case is debugging a logic issue. The following is an easy mistake to make, which can have big consequences. Can you spot the difference?

1 + 2 * 3
# versus
(1 + 2) * 3

We can use Ruby to help us find out the difference in this slightly more complex example.

By disassembling this code example we can get Ruby to print a more readable table of the commands it's performing.

1 + 2 * 3
# => 7
puts RubyVM::InstructionSequence.compile("1 + 2 * 3").disasm
# == disasm: <RubyVM::InstructionSequence:<compiled>@<compiled>>==========
# 0000 trace            1                                               (   1)
# 0002 putobject_OP_INT2FIX_O_1_C_
# 0003 putobject        2
# 0005 putobject        3
# 0007 opt_mult         <callinfo!mid:*, argc:1, ARGS_SIMPLE>
# 0009 opt_plus         <callinfo!mid:+, argc:1, ARGS_SIMPLE>
# 0011 leave

# versus

(1 + 2) * 3
# => 9
puts RubyVM::InstructionSequence.compile("(1 + 2) * 3").disasm
# == disasm: <RubyVM::InstructionSequence:<compiled>@<compiled>>==========
# 0000 trace            1                                               (   1)
# 0002 putobject_OP_INT2FIX_O_1_C_
# 0003 putobject        2
# 0005 opt_plus         <callinfo!mid:+, argc:1, ARGS_SIMPLE>
# 0007 putobject        3
# 0009 opt_mult         <callinfo!mid:*, argc:1, ARGS_SIMPLE>
# 0011 leave

The example above is a bit more involved with the number of YARV instructions, but just from the order in which things are printed and executed we see the difference a pair of parentheses can make.

With the parentheses around 1 + 2 we make sure the addition is performed first, by moving it up the order of operations in mathematics.

Note that you don't actually see the parentheses in the disassembly output itself, only their effect on the rest of the code.

Now that we know how the Ruby interpreter converts our developer friendly and readable Ruby code to YARV instructions, we can use this to optimize our applications.

It's possible to pass along entire methods and even entire files to RubyVM::InstructionSequence.

puts RubyVM::InstructionSequence.disasm(method(:foo))
puts RubyVM::InstructionSequence.compile_file("/tmp/hello.rb").disasm

Find out why some piece of code works and why another doesn't. Learn why certain symbols make code behave differently than others. The devil is in the details, and it's good to know how your Ruby code is behaving in your app and if you can optimize it in any way.

Optimizations

Other than being able to view your code on interpreter level and optimize for it, you can use InstructionSequence to optimize your code even further.

With InstructionSequence, it's possible to optimize certain instructions with Ruby's built-in performance optimizations. The full list of available optimizations is available in the RubyVM::InstructionSequence.compile_option = method documentation.

One of these optimizations is Tail Call Optimization.

The RubyVM::InstructionSequence.compile method accepts options to enable this optimization as such:

some_code = <<-EOS
def fact(n, acc=1)
  return acc if n <= 1
  fact(n-1, n*acc)
end
EOS
puts RubyVM::InstructionSequence.compile(some_code, nil, nil, nil, tailcall_optimization: true, trace_instruction: false).disasm
RubyVM::InstructionSequence.compile(some_code, nil, nil, nil, tailcall_optimization: true, trace_instruction: false).eval

You can even turn this optimization on for all your code with RubyVM::InstructionSequence.compile_option =. Just make sure to load this before any of your other code.

RubyVM::InstructionSequence.compile_option = {
  tailcall_optimization: true,
  trace_instruction: false
}

For more information about how Tail Call Optimization works in Ruby check out these articles: Tail Call Optimization in Ruby and Tail Call Optimization in Ruby: Background.

Conclusion

Learn more about how Ruby interprets your code with RubyVM::InstructionSequence and see what your code is really doing so you can make it more performant.

This introduction to InstructionSequence might also be a fun way to learn more about how Ruby works under the hood. Who knows? You might even be interested in working on some of Ruby's code itself.

That concludes our short introduction to code compilation in Ruby. We'd love to know how you liked this article, if you have any questions about it, and what you'd like to read about next, so be sure to let us know at @AppSignal.