Some time ago someone asked on Stackoverflow how Ruby’s
Enumerators
do their job. Consider there is an implementation of an arbitrary Object
that features an each
method, then it is possible to get an Enumerator
for that object
via Kernel#to_enum
:
1 2 3 4 5 6 7 8 |
|
Enumerators
come in very handy, especially in situations where you want the basic
functionality of each
, but with a twist - let’s say you would like to iterate
through the characters of a String
but at the same time you want access to the index
of the current character, without having to maintain that index manually. That’s what
each_with_index
does, something you probably use a dozen times a day. The importance
here is that the each_with_index
functionality is not something that is provided
by the String
class, but something that comes with Enumerator
. In fact, String#each_char
returns an instance of Enumerator
.
1 2 3 4 5 6 7 8 |
|
Besides String#each_char
, each_byte
, each_line
etc. there are other classes typically
returning Enumerators
as well when each
is called with no arguments, or they have special
each_xxx
methods similar to String#each_char
.
1 2 3 4 5 |
|
So it turns out that we use Enumerators
in quite a few places. It would be a real bummer if
these weren’t implemented efficiently, right? Now with those Enumerators
that are built-in
we can certainly trust that they are implemented efficiently because we are in the context
of a specific class and probably know how to do things clean and fast in that context.
But an interesting question is how to proceed in the case of arbitrary Objects
that
implement an each
method, as was the case in our first example. A simple, straight-forward
implementation of an Enumerator
in that case immediately jumps to mind: iterate through all
the elements in the object, store them in an Array
and then hand them out one by one. This
certainly works, an implementation could look like the following. Let’s assume that the object
implements nothing but a bare each
method that takes the usual block.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
So it works, but it has one major drawback: All the references of the values in the enumerable
are duplicated into an Array
immediately when the EagerEnumerator
is created. That’s not
a biggie if the enumerable is small, but it’s certainly a waste of memory and could become
a problem if the enumerable is quite large or the time to retrieve a single object in each
is significant. That’s a problem that is probably well known to Pythonistas, and they solved
this by introducing Generators. The goal is evident.
Rather than doing the whole work at once, we would like to do the work only on request, only
when it is necessary. As it turns out, Ruby has a very nice feature that allows us to do
exactly that, and this is also how an Enumerator
is implemented for Object#to_enum
.
Continuation is the tool that fulfills our
requirements perfectly, and Ruby offers it in the form of Fibers
. Let’s have a look at
how Object#to_enum
is implemented
in CRuby:
1 2 3 4 5 6 7 8 9 10 11 |
|
As we can see, to_enum
only works for Objects
that have an each
method (the sym_each
).
So what rb_enumeratorize
now does is it creates an Enumerator
instance, and the really
interesting part is what happens in calls to next
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
The real interesting part is what happens in get_next_values
. There, initially
a Fiber
is set up that pulls out the values from the each
method one by one.
It’s here where the beauty of Fibers
and Continuations really shines. Without
them, we would be stuck here, because calls to each
are of a all-or-nothing
nature, we either process all of the values immediately or we don’t. But with
the additional functionality of a Fiber
we can elegantly “pause” its execution
immediately after we retrieved a single value from each
, and return that to
the caller. Later, if another value is requested, we can simply pick up in the
Fiber
from where we left off, get the next value, pause again, return
the value to the caller and so on.
The benefit of this approach is that we don’t have to do the housekeeping for any
excess values, we simply pass on the current value. Unlike before, where each
has practically “flooded” us with all its values at once, we now have more
control over the process and with this finer-grained control we are now able to
tell each
: “Stop. One is enough. I’ll ask for more later!”.
This pattern can be really useful in everyday tasks, and saving some memory or
time is probably never a bad idea. Fortunately for us, it’s really easy to
implement in Ruby itself. Let’s have a look at a lazy version of our Enumerator
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
This is pretty much the equivalent Ruby code for how Enumerators
are implemented internally
in CRuby.