Ruby enumerables considered helpful
Ruby's Enumerable methods help you make powerful code simple — by filtering, transforming, and processing data like the best engineers do. These methods are available on Arrays, Hashes, and many (many) other objects, and similarly-named methods are available on even more. If you don't already know these methods well, then the most valuable time you can spend in Ruby is on mastering them.
But not all data can take advantage of Enumerable methods, at least not directly. What if you don't have an Array or Hash? Can you still use the Enumerable methods? With all the time you spent becoming an Enumerable master, wouldn't it be nice if you could treat more things like Enumerables — like lists of items across multiple pages, or even items that slowly stream in from an API over time?
Ruby's Enumerator class is exactly what you need to do this. It's not the easiest thing to understand, but a few examples can show you how to use it in your own applications.
Automatic pagination
Sometimes your data seems like a natural fit for an Enumerable, but it doesn't quite work. What if you're using an API that paginates? The best you can get is an Enumerable per page, which makes it annoying to work with multiple pages at once. You could fetch all the pages and then flatten the list of lists… But now you're always fetching all the data (even if you don't need it) and you have to wait for every page to load before you can do anything.
Instead, wrap your API calls in an Enumerator:
def paginated_list(initial_url, params = {})
url = initial_url
Enumerator.new do |yielder|
# ...
end
end
This returns an Enumerator, an object that can be used like an Enumerable, responding
to all of the core Enumerable methods you know and love: map
, filter
, first
, and all the rest.
When you call a method like map
on that Enumerator, that method will ask for
the next object as it needs it. When asked, the body of the Enumerator yields
the next object using the yielder:
def paginated_list(initial_url, params = {})
url = initial_url
Enumerator.new do |yielder|
loop do
response = get(url, params)
body = response.body
# Yield each result to the caller
body["results"].each { |result| yielder.yield(result) }
break if last_page?(body)
# get ready to fetch the next page
url = next_page_url(body)
end
end
end
This Enumerator makes the request for the first page. Then, any time a record is needed from the first page, the Enumerator yields it to the caller. When you run out of records on the first page, the Enumerator fetches the next page and so on until you run out. If you don't need the next page, it won't fetch it.
How does this look for the caller?
paginated_list("/things").filter { |t| t["selected"] == "true" }.sort_by { |t| t["name"].downcase }
Easy — you don't even have to care that the data spans multiple pages across multiple API requests. From the caller's point of view, it's just an ordinary list.
Save a block for later
While working on the Aha! AI writing assistant, we had two different ways of working with AI responses. For development and testing, the response should be returned all at once because it is easier to work with. For a user, though, it feels better for the response to arrive incrementally — streaming in so you can know early on if you're getting the result you wanted.
Everything else was exactly the same, except for how the response was dealt with. In one case, a string was returned. In another, bits of a string were yielded to the caller as they came in:
client = Client.new(stream: stream?)
# Burst response
response = client.get(query: params[:query], context: params[:context] ...)
render json: { text: response }
# Stream response
# all kinds of streaming setup work
client.get(query: params[:query], context: params[:context] ...) do |chunk|
# stream chunk to browser
end
# all kinds of streaming teardown work, stream error handling, etc.
This is fine. But the API method calls look exactly the same, except one takes a block and the other doesn't. And for streaming, you have extra work to do before and after the method call.
It's hard to untangle this. You could pass around lambdas, get
could ignore a
lambda parameter in stream mode, you could wrap the setup and teardown in its own block:
client = Client.new(stream: stream?)
response = streaming_setup(stream?) do
client.get(query: params[:query], context: params[:context]) do |chunk|
# stream chunk to browser
end
end
render json: { text: response } unless stream?
Maybe this will work. Again, it's fine. But this is Ruby, so we can do better than fine.
Object#to_enum is a method available on any object. You give it a method name and arguments and it returns an Enumerator. Whenever that Enumerator is enumerated, it will call the method you gave it, on the object you called it on, with the arguments you gave it, and it will pass along anything your method yields as the next value of the Enumerator.
At first, it's hard to see how this is useful. In practice, this means you can capture the arguments a method was called with, but defer needing the block the method yields to until the part that processes your data needs it.
Using to_enum
, the caller can now look like this:
def show
client = Client.new(stream: stream?)
response = client.get(query: params[:query], context: params[:context])
if stream?
stream_response response
else
render json: { text: response }
end
end
def stream_response(response)
# all kinds of setup work
response.each do |chunk|
# stream each chunk to browser as it arrives
end
# all kinds of teardown work, stream error handling, etc.
end
client.get
now returns either a string or an Enumerator. If it's an Enumerator,
it will yield each part of the response as it's received. Because this code no
longer needs the block at the same time as the arguments, all of the logic around
streaming a response can go somewhere else. This is much less tangled up than the
last version. How is this possible? By using to_enum
in the client when it's not
given a block:
class Client
def initialize(stream:)
@stream = stream
end
def get(query:, context:, &block)
if @stream
get_stream(query, context, &block)
else
get_burst(query, context)
end
end
def get_stream(query, context, &block)
return to_enum(:get_stream, query, context) unless block_given?
some_api_request do |chunk|
# ...
yield chunk
end
end
def get_burst(query, context)
# nothing interesting here, just a basic request and response
response
end
end
If the client is in streaming mode and isn't given a block, get_stream
returns an Enumerator that calls get_stream with a block. The yielded chunk
is passed back through to the Enumerator when the caller, or any other code that
the caller hands the Enumerator to, uses it.
This pattern — decoupling the place where you need the arguments from the place where you have the block — becomes even more powerful the further apart those places are separated. It's a nice pattern to understand, and it's possible because of Enumerators.
Enumerators can be tricky at first. It's hard to see the use when you read about them in the docs. But they have an amazing ability to wrap Ruby code into objects that take advantage of the full power of Enumerable methods. It's valuable to use the knowledge you already have in more places, and Enumerators will help you do exactly that.