Paweł Świątkowski


Ruby comes distributed with a vast standard library. We only use a fraction of it, usually. Everyone knows date, set, json, maybe even csv. But there are many more things hidden there.

Some time ago a discussion somewhere (Reddit perhaps?) prompted me to take a deep plunge into Ruby stdlib and in this post I described what I found. Not all things were new to me, some of them were simply forgotten. I chose ones I found most entertaining, interested or standing out in any other way.

While reading through those and asking yourself “why would I use it in a web app?”, please bear in mind that Ruby was not designed to be a language powering one of the most important web frameworks in history. Things listed here are more suitable for system scripting etc.

Parsing command-line options

These days when we write a Ruby script it usually comes as a Rake task. But even if it’s a standalone file, it is usually steered in a similar way: via environment variables or just by positional parameters accessed via ARGV array. However, in stdlib we can find two libraries for handling more complex input.

GetoptLong

One of them is GetoptLong. Let’s see it in action:

require 'getoptlong' opts = GetoptLong.new( [ '--url', GetoptLong::REQUIRED_ARGUMENT ], [ '--count', '-c', GetoptLong::OPTIONAL_ARGUMENT ], [ '--verbose', GetoptLong::NO_ARGUMENT ]
) opts.each do |option, value| p [option, value]
end

As you see, I defined three options:

  • url - it is a required argument
  • count - which is optional
  • verbose which serves as a flag

After that there is code that for each option prints its name and value. So when I test it with ruby getoptlong.rb -c 5 --verbose --url http://github.com I get:

["--count", "5"]
["--verbose", ""]
["--url", "http://github.com"]

There are few interesting quirks with that. For example, if I omit url totally, nothing happens. Only if I use it as a flag (ruby getoptlong.rb --url), I get an exception. Also, if I use some option that is not defined, it throws an error as well.

You can find docs for GetoptLong here.

OptionParser

This solution is much more robust and advanced. Let’s see it in action with a similar example:

require 'optparse' OptionParser.new do |opts| opts.banner = 'OptionParser example script' opts.on('--url URL') do |url| puts "url: #{url}" end opts.on('-c N', '--count N') do |n| puts "#{n} times" end opts.on('--verbose') do puts 'Verbose mode ON' end opts.on('-h', '--help') do puts opts end
end.parse!

The code is much more idiomatic here. The result is as expected. Behaviour regarding extra options etc. is the same as with GetoptLong. One thing we get for (almost) free here is a help message. Try it with ruby optparse.rb -h:

OptionParser example script --url URL -c, --count N --verbose -h, --help

But there’s much more to OptionParser than that - coercing types, something called conventions etc. Read more in the docs.

Simple persistent key-value store

When we, Ruby developers, think about a key-value store, we usually have some kind of server-based solution, such as Redis or Riak. However, when writing simple application it’s usually more reasonable to use embedded stores. Lately, RocksDB from Facebook became famous as one of such solutions. But with Ruby, we are lucky to have embedded key-value store right in the standard library.

And, there’s more… It’s not one KV store. It’s three of them: DBM, GDBM and SDBM. They are really similar to one another, so I will only quickly outline differences:

  • DBM relies on what’s installed on your system. It can use many things under the hood and most of the times it will be incompatible between different machines (or even on the same machine when system configuration changes). Therefore it’s not well-suited for a persistent storage but is good for temporary applications.
  • GDBM is based on one particular implementation of KV store call, not surprisingly, GDBM. Aforementioned DBM may, in some cases, choose to use GDBM as it’s underlying storage. It should be compatible between different systems.
  • SDBM’s code, contrary to previous ones, is shipped with Ruby, so it should be same for all machines.

How do we use it? For example with SDBM (because we don’t need to install anything extra to have it):

require 'sdbm' SDBM.open 'fruits' do |db| db['apple'] = 'fruit' db['pear'] = 'fruit' db['carrot'] = 'vegetable' db['tomato'] = 'vegetable' db.update('peach' => 'fruit', 'tomato' => 'fruit') db.each do |key, value| puts "Key: #{key}, Value: #{value}" end
end

This creates two files in current directory. fruits.dir is empty (I really don’t know why), but real data is in fruits.pag. You can peek into it with hexdump -C fruits.pag:

00000000 08 00 fb 03 f6 03 f2 03 ed 03 e7 03 de 03 d8 03 |................|
00000010 cf 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
000003c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 76 |...............v|
000003d0 65 67 65 74 61 62 6c 65 74 6f 6d 61 74 6f 76 65 |egetabletomatove|
000003e0 67 65 74 61 62 6c 65 63 61 72 72 6f 74 66 72 75 |getablecarrotfru|
000003f0 69 74 70 65 61 72 66 72 75 69 74 61 70 70 6c 65 |itpearfruitapple|
00000400

The data is actually there.

Usefulness of this solution is probably quite limited. You can use it when you want to persist some state between script runs. Or when you really care about memory. Having some big hashes loaded in RAM all the time can slow down your program. With (S/G)DBM you can dump data which is unused for a while to disk and pick it up later when you need it.

Persisting whole objects hierarchy with PStore

Speaking of persisting… In examples above we could only use strings. That’s ok in many cases, but not always. What if you want to save part of your application state - with objects, their states, and relations?

Ruby stdlib has you covered! PStore is exactly what you are looking for. In this example we are going to create some very simple Finite-State-Machine-like structure with states connected via named edges to each other:

class State def initialize(name) @name = name @edges = {} end def connect_to(word, state) @edges[word] = state end def traverse(indent = 0) tab = " " * indent puts "#{tab}State #{@name}:" @edges.each do |word, state| puts "#{tab} '#{word}':" state.traverse(indent + 4) end end
end

A traverse method simply displays connections from start node to the end (watch out, we don’t handle loops!). So now let’s create some structure and traverse it:

s0 = State.new('start')
s1 = State.new('first')
s2 = State.new('second')
s3 = State.new('third')
s4 = State.new('fourth') s0.connect_to('aa', s1)
s0.connect_to('aaa', s2)
s1.connect_to('b', s3)
s3.connect_to('c', s4)
s2.connect_to('d', s4) s0.traverse

What we got is:

State start: 'aa': State first: 'b': State third: 'c': State fourth: 'aaa': State second: 'd': State fourth:

Now let’s save it using PStore to a file on disk:

require "pstore" storage = PStore.new('fsm.pstore')
storage.transaction do storage['start'] = s0
end

And then, in a different script, we load and traverse:

class State # omitting, definition same as above
end require "pstore" storage = PStore.new('fsm.pstore')
storage.transaction do start = storage['start'] start.traverse
end

And output is exactly the same! If you’re curious, like me, you can peek into fsm.pstore file using hexdump again:

00000000 04 08 7b 06 49 22 0a 73 74 61 72 74 06 3a 06 45 |..{.I".start.:.E|
00000010 54 6f 3a 0a 53 74 61 74 65 07 3a 0a 40 6e 61 6d |To:.State.:.@nam|
00000020 65 49 22 0a 73 74 61 72 74 06 3b 00 54 3a 0b 40 |eI".start.;.T:.@|
00000030 65 64 67 65 73 7b 07 49 22 07 61 61 06 3b 00 54 |edges{.I".aa.;.T|
00000040 6f 3b 06 07 3b 07 49 22 0a 66 69 72 73 74 06 3b |o;..;.I".first.;|
00000050 00 54 3b 08 7b 06 49 22 06 62 06 3b 00 54 6f 3b |.T;.{.I".b.;.To;|
00000060 06 07 3b 07 49 22 0a 74 68 69 72 64 06 3b 00 54 |..;.I".third.;.T|
00000070 3b 08 7b 06 49 22 06 63 06 3b 00 54 6f 3b 06 07 |;.{.I".c.;.To;..|
00000080 3b 07 49 22 0b 66 6f 75 72 74 68 06 3b 00 54 3b |;.I".fourth.;.T;|
00000090 08 7b 00 49 22 08 61 61 61 06 3b 00 54 6f 3b 06 |.{.I".aaa.;.To;.|
000000a0 07 3b 07 49 22 0b 73 65 63 6f 6e 64 06 3b 00 54 |.;.I".second.;.T|
000000b0 3b 08 7b 06 49 22 06 64 06 3b 00 54 40 13 |;.{.I".d.;.T@.|
000000be

Useful? Perhaps not, but maybe? I can see the potential to save a state of some simple game this way, for example.

Observer pattern

Usage of Ruby’s Observable was actually part of the first (?) book from which I learned Ruby back in 2008 (?). So it’s not new to me, but it’s worth reminding that we have such thing built-in. It actually can make the code cleaner in some cases.

To illustrate how it works, I’m going to implement yet another FizzBuzz (it will be a bit incorrect though because will print a number every time):

require 'observer' class Incrementor include Observable def initialize @number = 0 end def runto(num) loop do @number += 1 changed # note this! print "#{@number} " notify_observers(@number) puts "" break if @number >= num end end
end class FizzObserver def update(num) print "Fizz" if num % 3 == 0 end
end class BuzzObserver def update(num) print "Buzz" if num % 5 == 0 end
end inc = Incrementor.new
inc.add_observer(FizzObserver.new)
inc.add_observer(BuzzObserver.new)
inc.runto(30)

If you run this code, you’ll se it works. There are just two things to remember: call changed to indicate that the object has changed and calling notify_observers when you want to emit new values.

Why useful? You can abstract some things (such as logging) outside of your main class. Note, however, that abusing it will lead to callback hell, which would be hard to debug and understand. Just like ActiveRecord callbacks.

DRb

DRb or dRuby is a real gem in the standard library. Described simply as “distributed object system for Ruby”, it can give you a lot of fun. To see it live, I decided to go with something really useful: a service that prints random number from 0 to @max_num every @interval seconds. Here the code, with DRb included:

require 'drb/drb' class RandomService def initialize set_max_num(100) set_interval(1) end def run while @should_stop.nil? puts rand(@max_num) sleep(@interval) end end def set_max_num(num) @max_num = num end def set_interval(time) @interval = time end def stop! @should_stop = true end
end service = RandomService.new
DRb.start_service('druby://localhost:9394', service)
service.run

The class itself is really straightforward and I’m not going into details about it. The only (hopefully) unfamiliar thing here is the call to DRb, where we wrap our service in dRuby protocol. Basically what it does is exposing our interface on localhost on port 9394. Now, remembering it, I recommend to start the service and split your terminal in two (iTerm can do it on Mac, I recommend Tilix for Linux).

Now, when we have our little service running, fire up irb in second terminal and type:

irb(main):001:0> require 'drb/drb'
=> true
irb(main):002:0> service = DRbObject.new_with_uri('druby://localhost:9394')
=> #<DRb::DRbObject:0x007fd51a8072c0 @uri="druby://localhost:9394", @ref=nil>

When it’s done, you can start to play by calling methods on service. Decrease interval to 0.1, set max_num to 1000 – whatever you want. Finally, stop the show by running service.stop!. All that you have done is reflected immediately in the process you’re running in a completely different process in a different terminal! Needless to say, you can also do it over the network, if you wish.

You may think right now that this is just a nice toy. But I’ve actually seen things like that used in practice. Probably most notable example was an IRC bot where from Ruby console you could do many things, starting from temporary adding admins to some array usually populated on start (so, no downtime for restart required!), ending by defining completely new methods and commends to test them out before actually putting them in the code. I can also imagine exposing such interface to, for example, manipulate the size of some workers pool etc. Actually, the sky is pretty much the limit here.

Other

There are many more things in stdlib. I’m going to mention few of them but without such details descriptions.

tsort

I had a bit of trouble understanding what tsort is really for. What it does is a topological sorting of directed acyclic graphs. If this sounds pretty specific, that’s because it is. This kind of sorting is mostly useful in dependency sorting, when you have a graph of dependencies (A depends on B and C, B depends on D, E depends on A) and you need to determine an order of installing those dependencies, so that every item has its dependencies already installed when being installed.

There is a great article by Lawson Kurtz explaining how it’s used in Bundler.

Math

Some math-related classed in Ruby standard library:

  • Matrix has methods for matrix operations, such as (but not limited to): conjugate, determinant, eigensystem, inverse and many more (see the docs)
  • Prime represents an infinite set of all prime numbers. You don’t need to implement this Eratosthenes sieve yourself!
  • [sidenote] I was surprised that there is no Complex class in stdlib, especially after I learned that it used to be there, but was removed. It turns out that it actually made it to core (so it is automatically required). Check this out by firing up your irb and writing: (2 + 3i) * (-6i) (spoiler: it won’t be a NameError because fo undefined i)

abbrev

This is probably more of a toy that really useful tool, but in case you need it, it’s there. Abbrev module has one method abbrev that takes a list of strings and returns possible abbreviations that are non-ambiguous. For example:

Abbrev.abbrev(%w[ruby rubic russia])
#=> {"ruby"=>"ruby", "rubic"=>"rubic", "rubi"=>"rubic", "russia"=>"russia", "russi"=>"russia", "russ"=>"russia", "rus"=>"russia"}

So, you know you can’t use ru as an abbreviation.

zlib

Last but not least, there is zlib. To quote:

Zlib is designed to be a portable, free, general-purpose, legally unencumbered – that is, not covered by any patents – lossless data-compression library for use on virtually any computer hardware and operating system.

For me, it sounds quite good. Compared to gzip:

The zlib format was designed to be compact and fast for use in memory and on communications channels. The gzip format was designed for single-file compression on file systems, has a larger header than zlib to maintain directory information, and uses a different, slower check method than zlib.

So zlib could actually be a good choice to reduce overhead when you send something over the network. To check it, I took Pride and Prejudice from Gutenberg and checked how it can be compressed:

require 'zlib'
source = File.read('path/to/pride-and-prejudice.txt')
compressed = Zlib::Deflate.deflate(source)
decompressed = Zlib::Inflate.inflate(compressed) puts "Source size: #{source.bytesize}"
puts "Compressed: #{compressed.bytesize}"
puts "Decompressed: #{decompressed.bytesize}"
puts "Compression: #{(1 - (compressed.bytesize.to_f / source.bytesize)).round(4)}"

The result was:

Source size: 724725
Compressed: 260549
Decompressed: 724725
Compression: 0.6405

I say it’s pretty impressive!

More?

Yes! There is more hidden in Ruby stdlib. Have I missed something? Do you think something is even more interesting? Let me know.