Sunday, March 11, 2007

Ruby builtin in pure Ruby

[Update 03/12/2007 If you know how to implement Interge#times in pure ruby and make it have the same behavior as Ruby 1.8.5, please let me know. Thank you!]

One of the best things I love about rubinius project is: their developers try to keep the dependency on system language (C in their case) minimal, and they take it seriously. Take a look of http://code.fallingsnow.net/svn/rubinius/trunk/kernel/core/, you will find out lots of buitlin libraries are implemented in pure ruby, even lots of string functions.

Pure ruby builtin has some drawbacks, though. One is performance penalty, the other is potential side effects. For example, basically Integer#times can be implemented as simple as this in pure ruby:


class Integer
def times
i = 0
while i < self
yield i
i += 1;
end
self
end
end


But this version of Integer#times does not work exactly the same as Ruby 1.8.5. If an user want to override Fixnum#+ (this may never happen in real life):


class Fixnum
def + x
return 9999
end
end


Our Integer#times's behavior will change, while Ruby 1.8.5 won't. That is because Ruby 1.8.5's implementation (int_dotimes() in Numeric.c) optimizes for Fixnum: it does not call '+' method dynamically for Fixnum, instead, it just increases the integer directly. If you want to implement this method as same as Ruby 1.8.5 then you have to write code in system language.

This kind of optimization is all over the place in c ruby. I am not clear about its motivation, but I guess performance is one of the reasons. For example, "30000000.times {|x|}" is about twice faster than "i = 0; while i < 30000000; i +=1; end" in Ruby 1.8.5.

Difference people may have different opinions on what the 'right' behavior should be. As for me, I like the behavior of the pure ruby implementation better.

Tuesday, March 6, 2007

XRuby is faster than Ruby 1.8.5 in most benchmarks

Two weeks ago, Antonio Cangiano compared the performance of different ruby implementations using Ruby 1.9 (YARV)'s benchmark suite. His numbers got me thinking: all alternative implementations performed badly -- most are even way slower than ruby 1.8.5. Does it signal that JVM and .NET are bad platform for Ruby language?

With this doubt I tried the benchmark with XRuby. XRuby is a ruby compiler. Unlike other implementations, it generates Java bytecode that run directly on JVM. But at first the numbers are not impressive: the 0.1.2 version is still slower than Ruby 1.8.5 in most of the cases.

Maybe I should mention that the XRuby team had done virtually nothing for performance before, and we would avoid optimization as long as possible if it makes our code complicated. But after doing some measurements, it turns out our bad performance are largely caused by a logic 'error': as we know Ruby Fixnum can not have singleton methods, but in 0.1.2 it still lookup an empty method table. And along with some bad code practices (iterating an empty ArrayList without checking if it is empty first etc), it makes method lookup much slower than it should be.

I fixed the problem by adding about 10 lines of code, and got great result: In most benchmarks, XRuby 0.1.3 is faster than Ruby 1.8.5. For some, faster in a significant way. There are still some tests in which we are slower, but it looks like caused by poorly implemented builtin.

The following table shows the benchmark result for XRuby 0.1.3. The best part is: we did it without a method cache. YARV is still faster than XRuby, but we have lots of room to improve too.

>java -Xmx512m -jar xruby-0.1.3.jar benchmark\run.rb

TestRuby 1.8.5XRuby 0.1.3
bm_app_answer.rbfailfail
bm_app_factorial.rbfailfail
bm_app_fib.rb20.0212.29
bm_app_mandelbrot.rb7.0998.252
bm_app_pentomino.rb289.8538.5
bm_app_raise.rb4.8463.986
bm_app_strconcat.rb5.8983.234
bm_app_tak.rb26.1422.12
bm_app_tarai.rb20.8918.35
bm_loop_times.rb14.2819.30
bm_loop_whileloop.rb26.0319.27
bm_loop_whileloop2.rb5.2574.786
bm_so_ackermann.rbfailfail
bm_so_array.rb19.1746.84
bm_so_concatenate.rb5.7279.684
bm_so_count_words.rb2.94445.50
bm_so_exception.rb9.7937.399
bm_so_lists.rb3.66624.59
bm_so_matrix.rb6.2498.452
bm_so_nested_loop.rb15.1713.45
bm_so_object.rb21.497.991
bm_so_random.rb6.1695.888
bm_so_sieve.rb2.0422.753
bm_vm1_block.rb64.5738.69
bm_vm1_const.rb47.4725.57
bm_vm1_ensure.rb45.5420.01
bm_vm1_length.rb55.5040.89
bm_vm1_rescue.rb39.6120.64
bm_vm1_simplereturn.rb56.0229.06
bm_vm1_swap.rb76.3530.52
bm_vm2_array.rb19.348.532
bm_vm2_method.rb33.7219.63
bm_vm2_poly_method.rb45.2320.62
bm_vm2_poly_method_ov.rb12.648.261
bm_vm2_proc.rb21.0817.86
bm_vm2_regexp.rb13.0930.87
bm_vm2_send.rb11.7115.75
bm_vm2_super.rb13.927.510
bm_vm2_unif1.rb11.308.292
bm_vm2_zsuper.rb15.717.740
bm_vm3_thread_create_join.rb0.1101.331


* The test environment is Intel Pentium M 1G CPU, 1G Memory, Windows XP SP2, Java 1.5.0_09.