Details
-
Type:
Improvement
-
Status:
Open
-
Priority:
Major
-
Resolution: Unresolved
-
Affects Version/s: JRuby 1.6
-
Fix Version/s: None
-
Component/s: Performance
-
Labels:None
-
Number of attachments :
Description
Another case where we are comparing 100% "native" impls, this time with String#to_i.
Results are mixed; we are all fastest on something. The best of all worlds will make someone fastest on all.
~/projects/rubinius ➔ jruby --server -I benchmark/lib/ benchmark/core/string/bench_to_i.rb
#to_i with an integer in a string
7633077.1 (±7.4%) i/s - 37354603 in 4.934960s (cycle=29623)
#to_i with a float in a string
6733175.0 (±4.9%) i/s - 33241488 in 4.954528s (cycle=31242)
#to_i with an empty string
11830673.8 (±5.1%) i/s - 58289616 in 4.942069s (cycle=47313)
#to_i with an integer and extra text
3821345.2 (±5.7%) i/s - 18957678 in 4.981885s (cycle=41757)
~/projects/rubinius ➔ ruby1.9 -I benchmark/lib/ benchmark/core/string/bench_to_i.rb
#to_i with an integer in a string
7261663.4 (±1.5%) i/s - 36318960 in 5.002643s (cycle=55280)
#to_i with a float in a string
7046664.4 (±1.6%) i/s - 35246844 in 5.003213s (cycle=59139)
#to_i with an empty string
9097651.9 (±1.6%) i/s - 45490088 in 5.001590s (cycle=59542)
#to_i with an integer and extra text
3204945.1 (±9.6%) i/s - 15882468 in 5.002829s (cycle=54022)
~/projects/rubinius ➔ bin/rbx -I benchmark/lib/ benchmark/core/string/bench_to_i.rb
#to_i with an integer in a string
7926463.3 (±3.7%) i/s - 39419937 in 4.983585s (cycle=38761)
#to_i with a float in a string
7818289.9 (±1.9%) i/s - 38956995 in 4.984837s (cycle=33729)
#to_i with an empty string
7941361.7 (±4.5%) i/s - 39475787 in 4.987574s (cycle=42769)
#to_i with an integer and extra text
7955217.3 (±2.2%) i/s - 39643632 in 4.987127s (cycle=42264)
Here's the bench from Rubinius's suite:
require 'benchmark'
require 'benchmark/ips'
Benchmark.ips do |x|
int = "5"
float = "5.0"
empty = ""
with_extra_text = "5 and some extra characters"
x.report "#to_i with an integer in a string" do |times|
i = 0
while i < times
int.to_i
i += 1
end
end
x.report "#to_i with a float in a string" do |times|
i = 0
while i < times
float.to_i
i += 1
end
end
x.report "#to_i with an empty string" do |times|
i = 0
while i < times
empty.to_i
i += 1
end
end
x.report "#to_i with an integer and extra text" do |times|
i = 0
while i < times
with_extra_text.to_i
i += 1
end
end
end
Ok, we are now fastest on all but the last one, and there's a simple reason for it.
The logic in ConvertBytes.bytelistToInum tries to determine before a full parse whether the result will fit in Long.SIZE digits (64). In the last case, where there's trailing garbage, the garbage makes it seem like it could be longer, and so the logic ends up falling on BigInteger parsing. This creates an intermediate BigInteger that ends up normalizing back to Fixnum anyway.
This would be fixable by doing a better job calculating which characters will actually be used for the resulting number, or by going ahead and calculating the result and checking how many bytes were actually used before failing over on the BigInteger path.