Ruby RBS made easy with codegen
Ruby 3 ships with support for static type checking via
rbs
(“Ruby Signatures”). This post will detail
a technique you can use to get started quickly with RBS using code generation,
so that you can have statically-typed code when processing JSON data.
As a quick reminder of the Ruby static typing landscape, the big actors to be aware of are:
rbs
is the standardized format for Ruby type signatures. They live in separate files that by convention have the extension.rbs
.steep
is a static type checker. It reads your Ruby code and makes sure it’s kosher with respect to yourrbs
files.
Related actors that aren’t really covered here, but you should probably know
about, are typeprof
(it analyzes your code
to guess .rbs
files) and Sorbet, Stripe’s Ruby type
checker. Sorbet’s maintainers have promised that Sorbet will work with RBS
nicely, although at the
time of writing it’s not all quite there yet.
Motivating Example
To motivate this article, let’s imagine a little CLI application that takes a query string, and then fetches the top 3 results from GitHub about that request. For example:
ghsearch "ruby types"
nateware/redis-objects ⭐ 2012 🐛 19 ⚖️ Artistic License 2.0
Updated 12 days ago, created over 11 years ago
Map Redis types directly to Ruby objects
https://github.com/nateware/redis-objects
sorbet/sorbet ⭐ 2542 🐛 462 ⚖️ Apache License 2.0
Updated about 2 hours ago, created over 2 years ago
A fast, powerful type checker designed for Ruby
https://github.com/sorbet/sorbet
ruby/rbs ⭐ 1049 🐛 62 ⚖️ Other
Updated about 5 hours ago, created almost 2 years ago
Type Signature for Ruby
https://github.com/ruby/rbs
This sort of code is relatively mundane, but it’s a big part of what I get paid to do at work:
- Take an input (usually a JSON message, but in this case just an ordinary string)
- Do some business logic to convert that input into an API request
- Do some business logic on the API response
- Rinse and repeat
So if we could have a good story around static type checking for these sorts of application, I would be able to move faster without having to spend so much time worrying that a typo in my JSON-handling logic will cause an incident in the middle of the night.
GitHub has an API for searching repos off of a query string, so what we need to do is to call that API, parse the results, and then display them in a pretty fashion. I’m aware that GitHub has an SDK that lets you do this, but I’m not going to use it here because GitHub’s SDK doesn’t support RBS yet, and because I want to emphasize how the technique we’re gonna use here works with any JSON-based API.
The pretty displaying we can do like this:
require 'rainbow/refinement'
require 'action_view'
class SearchResult
using Rainbow
include ActionView::Helpers::DateHelper
attr_accessor :url
attr_accessor :name
attr_accessor :description
attr_accessor :stars
attr_accessor :issues
attr_accessor :created_at
attr_accessor :updated_at
attr_accessor :license
def to_pretty_s
<<~PRETTY
#{name.bright} ⭐ #{stars.to_s.yellow} 🐛 #{issues.to_s.cyan} ⚖️ #{license || "Unknown".red}
Updated #{ago(updated_at)}, created #{ago(created_at)}
#{description}
#{url}
PRETTY
end
private
def ago(date_time)
"#{distance_of_time_in_words_to_now(date_time)} ago"
end
end
So all we need to do now is to construct some SearchResult
s from a query.
Here’s the vanilla Ruby way to do that:
class Searcher
SEARCH_ENDPOINT = "https://api.github.com/search/repositories"
MAX_RESULTS = 3
def self.search(q)
params = { q: q, per_page: MAX_RESULTS }
res = HTTP.get(SEARCH_ENDPOINT, params: params).parse
res['items'].map do |item|
search_result = SearchResult.new
search_result.url = item['html_url']
search_result.name = item['full_name']
search_result.description = item['description']
search_result.stars = item['stargazers_count']
search_result.issues = item['open_issues']
search_result.created_at = DateTime.rfc3339(item['created_at'])
search_result.updated_at = DateTime.rfc3339(item['updated_at'])
search_result.license = item['license']['name']
search_result
end
end
end
If you’ve integrated with 3rd-party tools with Ruby, this sort of stuff is probably your bread and butter. It’s this sort of code that would gain the most from static type analysis, because there are so many little ways it can go wrong. For instance, how do I make sure that:
- I can’t spell a property wrong by accident
- I can’t forget to parse the dates instead of leaving them as strings
- Don’t make an assumption somewhere that something is a
String
but actually it’s anInteger
- Make sure to handle the possibilty of a field sometimes being
null
In fact, the code I showed above is broken! It doesn’t handle the possibility
that license
may be null
(in JSON; in Ruby, nil
). Here’s what happens if I
search for something where one of the results doesn’t have a license
field:
Traceback (most recent call last):
3: from lib/ghsearch.rb:85:in `<main>'
2: from lib/ghsearch.rb:16:in `search'
1: from lib/ghsearch.rb:16:in `map'
lib/ghsearch.rb:25:in `block in search': undefined method `[]' for nil:NilClass (NoMethodError)
What we need to do is change:
search_result.license = item['license']['name']
Into:
search_result.license = item['license']['name'] if item['license']
It’s not rocket science, it’s just a footgun. How do we do better?
Code generation to the rescue
What we really want to do is generate classes that exist just to safely hold JSON data, and have strongly-typed attributes that we can confidently access. Thankfully, there’s an emerging standard (that I contribute to) called RFC 8927: JSON Type Definition that can help here.
The idea with JSON Type Definition is that you describe a schema for JSON data,
and then you can generate code from those schemas. The
jtd-codegen
tool can
generate rb
and rbs
files from a schema.
For example, if you have this JSON Type Definitino schema in user.jtd.json
:
{
"properties": {
"id": { "type": "string" },
"createdAt": { "type": "timestamp" },
"karma": { "type": "int32" },
"isAdmin": { "type": "boolean" }
}
}
Then you can run:
jtd-codegen user.jtd.json \
--ruby-out lib --ruby-module User \
--ruby-sig-out sig --ruby-sig-module User
Which will generate this in lib/user.rb
:
# Code generated by jtd-codegen for Ruby v0.1.0
require 'json'
require 'time'
module User
class User
attr_accessor :created_at
attr_accessor :id
attr_accessor :is_admin
attr_accessor :karma
def self.from_json_data(data)
out = User.new
out.created_at = User::from_json_data(DateTime, data["createdAt"])
out.id = User::from_json_data(String, data["id"])
out.is_admin = User::from_json_data(TrueClass, data["isAdmin"])
out.karma = User::from_json_data(Integer, data["karma"])
out
end
def to_json_data
data = {}
data["createdAt"] = User::to_json_data(created_at)
data["id"] = User::to_json_data(id)
data["isAdmin"] = User::to_json_data(is_admin)
data["karma"] = User::to_json_data(karma)
data
end
end
private
# plus some internal utility stuff (from_json_data, to_json_data)
end
And the following in sig/user.rbs
:
# Code generated by jtd-codegen for Ruby Type Signatures v0.1.0
module User
class User
attr_accessor created_at: DateTime
attr_accessor id: String
attr_accessor is_admin: bool
attr_accessor karma: Integer
def self.from_json_data: (untyped) -> User
def to_json_data: () -> untyped
end
private
def self.from_json_data: (untyped, untyped) -> untyped
def self.to_json_data: (untyped) -> untyped
end
For me, this is a game-changer. It not only handles all the mundane type stuff,
it also handles doing things like normalizing camelCase stuff like isAdmin
into a Ruby is_admin
on the way in (and vice-versa on the way out).
Using JSON Type Definition on someone else’s API
But how do we use JSON Type Definition with GitHub’s API? GitHub doesn’t publish
JSON Type Definition schemas (yet). Thankfully, JSON Type Definition ships with
a tool called jtd-infer
that can guess a JSON typedef schema from data, so we don’t need to write a
schema ourselves, and we don’t need to wait for GitHub to write one either:
curl https://api.github.com/search/repositories\?q\=rails \
| jtd-infer \
> search_repositories_response.jtd.json
cat search_repositories_response.jtd.json | jq | head -n 20
{
"properties": {
"incomplete_results": {
"type": "boolean"
},
"items": {
"elements": {
"properties": {
"archive_url": {
"type": "string"
},
"archived": {
"type": "boolean"
},
"assignees_url": {
"type": "string"
},
"blobs_url": {
"type": "string"
},
So now we can generate Ruby code from this schema like so:
jtd-codegen search_repositories_response.jtd.json \
--ruby-out lib --ruby-module Github \
--ruby-sig-out sig --ruby-sig-module Github
Which generates a pretty big (~400 LOC) but super mundane Ruby file, filled with code that looks like this:
module Github
# ...
class SearchRepositoriesResponseItemLicense
attr_accessor :key
attr_accessor :name
attr_accessor :node_id
attr_accessor :spdx_id
attr_accessor :url
def self.from_json_data(data)
out = SearchRepositoriesResponseItemLicense.new
out.key = Github::from_json_data(String, data["key"])
out.name = Github::from_json_data(String, data["name"])
out.node_id = Github::from_json_data(String, data["node_id"])
out.spdx_id = Github::from_json_data(String, data["spdx_id"])
out.url = Github::from_json_data(String, data["url"])
out
end
def to_json_data
data = {}
data["key"] = Github::to_json_data(key)
data["name"] = Github::to_json_data(name)
data["node_id"] = Github::to_json_data(node_id)
data["spdx_id"] = Github::to_json_data(spdx_id)
data["url"] = Github::to_json_data(url)
data
end
end
# ...
end
But now, we can refactor this stringly-typed code:
params = { q: q, per_page: MAX_RESULTS }
res = HTTP.get(SEARCH_ENDPOINT, params: params).parse
res['items'].map do |item|
search_result = SearchResult.new
search_result.url = item['html_url']
search_result.name = item['full_name']
search_result.description = item['description']
search_result.stars = item['stargazers_count']
search_result.issues = item['open_issues']
search_result.created_at = DateTime.rfc3339(item['created_at'])
search_result.updated_at = DateTime.rfc3339(item['updated_at'])
search_result.license = item['license']['name'] if item['license']
search_result
end
Into this strongly-typed code:
params = { q: q, per_page: MAX_RESULTS }
res = HTTP.get(SEARCH_ENDPOINT, params: params).parse
res = Github::SearchRepositoriesResponse.from_json_data(res)
res.items.map do |item|
search_result = SearchResult.new
search_result.url = item.html_url
search_result.name = item.full_name
search_result.description = item.description
search_result.stars = item.stargazers_count
search_result.issues = item.open_issues
search_result.created_at = item.created_at
search_result.updated_at = item.updated_at
search_result.license = item.license&.name
search_result
end
end
What do I mean by “strongly typed”? I mean that if were to make the same mistake as with the stringly typed code, i.e. changing this:
search_result.license = item.license&.name
To this:
search_result.license = item.license.name # no more "&." operator
Then I’ll get a static analysis error from steep
!
lib/ghsearch.rb:45:45: [error] Type `(::Github::SearchRepositoriesResponseItemLicense | nil)` does not have method `name`
│ Diagnostic ID: Ruby::NoMethod
│
└ search_result.license = item.license.name
Takeaways
Overall, it’s still very early days for Ruby static type analysis, but I believe many halcyon days are before us. My biggest takeaways from working with Ruby Signatures so far are:
-
steep
is a little rough around the edges today. To me, the biggest issue that it crashes on manyrbs
syntax errors, instead of giving friendly errors. It also has some smaller issues, like incomplete coverage of the standard library and some issues with things likeprivate
methods.I have every confidence these issues will be sorted out in the months and years to come. Unless you’ve tried writing one yourself, it’s hard to appreciate just how incredibly hard it is to get to the level of production quality
steep
has already achieved. As enterprises like Stripe and Square invest in RBS tooling, these sharp corners will be sanded down. -
JSON Type Definition in combination with
steep
seems to work pretty well. I’ve tried usingrbs prototype
andtypeprof
, but neither tool seems to work as well asjtd-codegen
's mundane code generation approach.I think
jtd-codegen
's Ruby and Ruby Signatures generators can be a great item in the RBS toolbox. It definitely doesn’t hurt to know about it!
If you want to play around with RBS and jtd-codegen
yourself, here are some
resources:
-
I’ve put the working, final code in this blog in a GitHub repo here:
https://github.com/ucarion/rbs-jtd-codegen-demo
Give it a whirl!
-
The JSON Type Definition website has detailed documentation on
jtd-codegen
available here:https://jsontypedef.com/docs/ruby-codegen/
Some exciting JSON Type Definition features for Ruby not covered in this blog post include the ability to validate JSON data at runtime, automatically adding documentation comments to the generated Ruby code, or customizing the Ruby code to use a custom data type.