Some time ago someone asked on Stackoverflow how Ruby’s
Enumerators
do their job. Consider there is an implementation of an arbitrary Object
that features an each method, then it is possible to get an Enumerator for that object
via Kernel#to_enum:
1 2 3 4 5 6 7 8 | |
Enumerators come in very handy, especially in situations where you want the basic
functionality of each, but with a twist - let’s say you would like to iterate
through the characters of a String but at the same time you want access to the index
of the current character, without having to maintain that index manually. That’s what
each_with_index does, something you probably use a dozen times a day. The importance
here is that the each_with_index functionality is not something that is provided
by the String class, but something that comes with Enumerator. In fact, String#each_char
returns an instance of Enumerator.
1 2 3 4 5 6 7 8 | |
Besides String#each_char, each_byte, each_line etc. there are other classes typically
returning Enumerators as well when each is called with no arguments, or they have special
each_xxx methods similar to String#each_char.
1 2 3 4 5 | |
So it turns out that we use Enumerators in quite a few places. It would be a real bummer if
these weren’t implemented efficiently, right? Now with those Enumerators that are built-in
we can certainly trust that they are implemented efficiently because we are in the context
of a specific class and probably know how to do things clean and fast in that context.
But an interesting question is how to proceed in the case of arbitrary Objects that
implement an each method, as was the case in our first example. A simple, straight-forward
implementation of an Enumerator in that case immediately jumps to mind: iterate through all
the elements in the object, store them in an Array and then hand them out one by one. This
certainly works, an implementation could look like the following. Let’s assume that the object
implements nothing but a bare each method that takes the usual block.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | |
So it works, but it has one major drawback: All the references of the values in the enumerable
are duplicated into an Array immediately when the EagerEnumerator is created. That’s not
a biggie if the enumerable is small, but it’s certainly a waste of memory and could become
a problem if the enumerable is quite large or the time to retrieve a single object in each
is significant. That’s a problem that is probably well known to Pythonistas, and they solved
this by introducing Generators. The goal is evident.
Rather than doing the whole work at once, we would like to do the work only on request, only
when it is necessary. As it turns out, Ruby has a very nice feature that allows us to do
exactly that, and this is also how an Enumerator is implemented for Object#to_enum.
Continuation is the tool that fulfills our
requirements perfectly, and Ruby offers it in the form of Fibers. Let’s have a look at
how Object#to_enum is implemented
in CRuby:
1 2 3 4 5 6 7 8 9 10 11 | |
As we can see, to_enum only works for Objects that have an each method (the sym_each).
So what rb_enumeratorize now does is it creates an Enumerator instance, and the really
interesting part is what happens in calls to next:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | |
The real interesting part is what happens in get_next_values. There, initially
a Fiber is set up that pulls out the values from the each method one by one.
It’s here where the beauty of Fibers and Continuations really shines. Without
them, we would be stuck here, because calls to each are of a all-or-nothing
nature, we either process all of the values immediately or we don’t. But with
the additional functionality of a Fiber we can elegantly “pause” its execution
immediately after we retrieved a single value from each, and return that to
the caller. Later, if another value is requested, we can simply pick up in the
Fiberfrom where we left off, get the next value, pause again, return
the value to the caller and so on.
The benefit of this approach is that we don’t have to do the housekeeping for any
excess values, we simply pass on the current value. Unlike before, where each
has practically “flooded” us with all its values at once, we now have more
control over the process and with this finer-grained control we are now able to
tell each: “Stop. One is enough. I’ll ask for more later!”.
This pattern can be really useful in everyday tasks, and saving some memory or
time is probably never a bad idea. Fortunately for us, it’s really easy to
implement in Ruby itself. Let’s have a look at a lazy version of our Enumerator:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | |
This is pretty much the equivalent Ruby code for how Enumerators are implemented internally
in CRuby.