Ruby RBS made easy with codegen

Ruby 3 ships with support for static type checking via rbs (“Ruby Signatures”). This post will detail a technique you can use to get started quickly with RBS using code generation, so that you can have statically-typed code when processing JSON data.

As a quick reminder of the Ruby static typing landscape, the big actors to be aware of are:

  • rbs is the standardized format for Ruby type signatures. They live in separate files that by convention have the extension .rbs.
  • steep is a static type checker. It reads your Ruby code and makes sure it’s kosher with respect to your rbs files.

Related actors that aren’t really covered here, but you should probably know about, are typeprof (it analyzes your code to guess .rbs files) and Sorbet, Stripe’s Ruby type checker. Sorbet’s maintainers have promised that Sorbet will work with RBS nicely, although at the time of writing it’s not all quite there yet.

Motivating Example

To motivate this article, let’s imagine a little CLI application that takes a query string, and then fetches the top 3 results from GitHub about that request. For example:

ghsearch "ruby types"
nateware/redis-objects ⭐ 2012 🐛 19 ⚖️  Artistic License 2.0
Updated 12 days ago, created over 11 years ago
Map Redis types directly to Ruby objects
https://github.com/nateware/redis-objects

sorbet/sorbet ⭐ 2542 🐛 462 ⚖️  Apache License 2.0
Updated about 2 hours ago, created over 2 years ago
A fast, powerful type checker designed for Ruby
https://github.com/sorbet/sorbet

ruby/rbs ⭐ 1049 🐛 62 ⚖️  Other
Updated about 5 hours ago, created almost 2 years ago
Type Signature for Ruby
https://github.com/ruby/rbs

This sort of code is relatively mundane, but it’s a big part of what I get paid to do at work:

  1. Take an input (usually a JSON message, but in this case just an ordinary string)
  2. Do some business logic to convert that input into an API request
  3. Do some business logic on the API response
  4. Rinse and repeat

So if we could have a good story around static type checking for these sorts of application, I would be able to move faster without having to spend so much time worrying that a typo in my JSON-handling logic will cause an incident in the middle of the night.

GitHub has an API for searching repos off of a query string, so what we need to do is to call that API, parse the results, and then display them in a pretty fashion. I’m aware that GitHub has an SDK that lets you do this, but I’m not going to use it here because GitHub’s SDK doesn’t support RBS yet, and because I want to emphasize how the technique we’re gonna use here works with any JSON-based API.

The pretty displaying we can do like this:

require 'rainbow/refinement'
require 'action_view'

class SearchResult
  using Rainbow
  include ActionView::Helpers::DateHelper

  attr_accessor :url
  attr_accessor :name
  attr_accessor :description
  attr_accessor :stars
  attr_accessor :issues
  attr_accessor :created_at
  attr_accessor :updated_at
  attr_accessor :license

  def to_pretty_s
    <<~PRETTY
      #{name.bright} ⭐ #{stars.to_s.yellow} 🐛 #{issues.to_s.cyan} ⚖️  #{license || "Unknown".red}
      Updated #{ago(updated_at)}, created #{ago(created_at)}
      #{description}
      #{url}
    PRETTY
  end

  private

  def ago(date_time)
    "#{distance_of_time_in_words_to_now(date_time)} ago"
  end
end

So all we need to do now is to construct some SearchResults from a query. Here’s the vanilla Ruby way to do that:

class Searcher
  SEARCH_ENDPOINT = "https://api.github.com/search/repositories"
  MAX_RESULTS = 3

  def self.search(q)
    params = { q: q, per_page: MAX_RESULTS }
    res = HTTP.get(SEARCH_ENDPOINT, params: params).parse

    res['items'].map do |item|
      search_result = SearchResult.new
        search_result.url = item['html_url']
        search_result.name = item['full_name']
        search_result.description = item['description']
        search_result.stars = item['stargazers_count']
        search_result.issues = item['open_issues']
        search_result.created_at = DateTime.rfc3339(item['created_at'])
        search_result.updated_at = DateTime.rfc3339(item['updated_at'])
        search_result.license = item['license']['name']

        search_result
    end
  end
end

If you’ve integrated with 3rd-party tools with Ruby, this sort of stuff is probably your bread and butter. It’s this sort of code that would gain the most from static type analysis, because there are so many little ways it can go wrong. For instance, how do I make sure that:

  1. I can’t spell a property wrong by accident
  2. I can’t forget to parse the dates instead of leaving them as strings
  3. Don’t make an assumption somewhere that something is a String but actually it’s an Integer
  4. Make sure to handle the possibilty of a field sometimes being null

In fact, the code I showed above is broken! It doesn’t handle the possibility that license may be null (in JSON; in Ruby, nil). Here’s what happens if I search for something where one of the results doesn’t have a license field:

Traceback (most recent call last):
	3: from lib/ghsearch.rb:85:in `<main>'
	2: from lib/ghsearch.rb:16:in `search'
	1: from lib/ghsearch.rb:16:in `map'
lib/ghsearch.rb:25:in `block in search': undefined method `[]' for nil:NilClass (NoMethodError)

What we need to do is change:

search_result.license = item['license']['name']

Into:

search_result.license = item['license']['name'] if item['license']

It’s not rocket science, it’s just a footgun. How do we do better?

Code generation to the rescue

What we really want to do is generate classes that exist just to safely hold JSON data, and have strongly-typed attributes that we can confidently access. Thankfully, there’s an emerging standard (that I contribute to) called RFC 8927: JSON Type Definition that can help here.

The idea with JSON Type Definition is that you describe a schema for JSON data, and then you can generate code from those schemas. The jtd-codegen tool can generate rb and rbs files from a schema.

For example, if you have this JSON Type Definitino schema in user.jtd.json:

{
  "properties": {
    "id": { "type": "string" },
    "createdAt": { "type": "timestamp" },
    "karma": { "type": "int32" },
    "isAdmin": { "type": "boolean" }
  }
}

Then you can run:

jtd-codegen user.jtd.json \
  --ruby-out lib --ruby-module User \
  --ruby-sig-out sig --ruby-sig-module User

Which will generate this in lib/user.rb:

# Code generated by jtd-codegen for Ruby v0.1.0

require 'json'
require 'time'

module User
  class User
    attr_accessor :created_at
    attr_accessor :id
    attr_accessor :is_admin
    attr_accessor :karma

    def self.from_json_data(data)
      out = User.new
      out.created_at = User::from_json_data(DateTime, data["createdAt"])
      out.id = User::from_json_data(String, data["id"])
      out.is_admin = User::from_json_data(TrueClass, data["isAdmin"])
      out.karma = User::from_json_data(Integer, data["karma"])
      out
    end

    def to_json_data
      data = {}
      data["createdAt"] = User::to_json_data(created_at)
      data["id"] = User::to_json_data(id)
      data["isAdmin"] = User::to_json_data(is_admin)
      data["karma"] = User::to_json_data(karma)
      data
    end
  end

  private

  # plus some internal utility stuff (from_json_data, to_json_data)
end

And the following in sig/user.rbs:

# Code generated by jtd-codegen for Ruby Type Signatures v0.1.0

module User
  class User
    attr_accessor created_at: DateTime
    attr_accessor id: String
    attr_accessor is_admin: bool
    attr_accessor karma: Integer

    def self.from_json_data: (untyped) -> User
    def to_json_data: () -> untyped
  end

  private

  def self.from_json_data: (untyped, untyped) -> untyped
  def self.to_json_data: (untyped) -> untyped
end

For me, this is a game-changer. It not only handles all the mundane type stuff, it also handles doing things like normalizing camelCase stuff like isAdmin into a Ruby is_admin on the way in (and vice-versa on the way out).

Using JSON Type Definition on someone else’s API

But how do we use JSON Type Definition with GitHub’s API? GitHub doesn’t publish JSON Type Definition schemas (yet). Thankfully, JSON Type Definition ships with a tool called jtd-infer that can guess a JSON typedef schema from data, so we don’t need to write a schema ourselves, and we don’t need to wait for GitHub to write one either:

curl https://api.github.com/search/repositories\?q\=rails \
  | jtd-infer \
  > search_repositories_response.jtd.json
cat search_repositories_response.jtd.json | jq | head -n 20
{
  "properties": {
    "incomplete_results": {
      "type": "boolean"
    },
    "items": {
      "elements": {
        "properties": {
          "archive_url": {
            "type": "string"
          },
          "archived": {
            "type": "boolean"
          },
          "assignees_url": {
            "type": "string"
          },
          "blobs_url": {
            "type": "string"
          },

So now we can generate Ruby code from this schema like so:

jtd-codegen search_repositories_response.jtd.json \
  --ruby-out lib --ruby-module Github \
  --ruby-sig-out sig --ruby-sig-module Github

Which generates a pretty big (~400 LOC) but super mundane Ruby file, filled with code that looks like this:

module Github
  # ...

  class SearchRepositoriesResponseItemLicense
    attr_accessor :key
    attr_accessor :name
    attr_accessor :node_id
    attr_accessor :spdx_id
    attr_accessor :url

    def self.from_json_data(data)
      out = SearchRepositoriesResponseItemLicense.new
      out.key = Github::from_json_data(String, data["key"])
      out.name = Github::from_json_data(String, data["name"])
      out.node_id = Github::from_json_data(String, data["node_id"])
      out.spdx_id = Github::from_json_data(String, data["spdx_id"])
      out.url = Github::from_json_data(String, data["url"])
      out
    end

    def to_json_data
      data = {}
      data["key"] = Github::to_json_data(key)
      data["name"] = Github::to_json_data(name)
      data["node_id"] = Github::to_json_data(node_id)
      data["spdx_id"] = Github::to_json_data(spdx_id)
      data["url"] = Github::to_json_data(url)
      data
    end
  end

  # ...
end

But now, we can refactor this stringly-typed code:

params = { q: q, per_page: MAX_RESULTS }
res = HTTP.get(SEARCH_ENDPOINT, params: params).parse

res['items'].map do |item|
  search_result = SearchResult.new
    search_result.url = item['html_url']
    search_result.name = item['full_name']
    search_result.description = item['description']
    search_result.stars = item['stargazers_count']
    search_result.issues = item['open_issues']
    search_result.created_at = DateTime.rfc3339(item['created_at'])
    search_result.updated_at = DateTime.rfc3339(item['updated_at'])
    search_result.license = item['license']['name'] if item['license']

    search_result
end

Into this strongly-typed code:

params = { q: q, per_page: MAX_RESULTS }
res = HTTP.get(SEARCH_ENDPOINT, params: params).parse
res = Github::SearchRepositoriesResponse.from_json_data(res)

res.items.map do |item|
  search_result = SearchResult.new
    search_result.url = item.html_url
    search_result.name = item.full_name
    search_result.description = item.description
    search_result.stars = item.stargazers_count
    search_result.issues = item.open_issues
    search_result.created_at = item.created_at
    search_result.updated_at = item.updated_at
    search_result.license = item.license&.name

    search_result
  end
end

What do I mean by “strongly typed”? I mean that if were to make the same mistake as with the stringly typed code, i.e. changing this:

search_result.license = item.license&.name

To this:

search_result.license = item.license.name # no more "&." operator

Then I’ll get a static analysis error from steep!

lib/ghsearch.rb:45:45: [error] Type `(::Github::SearchRepositoriesResponseItemLicense | nil)` does not have method `name`
│ Diagnostic ID: Ruby::NoMethod
│
└         search_result.license = item.license.name

Takeaways

Overall, it’s still very early days for Ruby static type analysis, but I believe many halcyon days are before us. My biggest takeaways from working with Ruby Signatures so far are:

  1. steep is a little rough around the edges today. To me, the biggest issue that it crashes on many rbs syntax errors, instead of giving friendly errors. It also has some smaller issues, like incomplete coverage of the standard library and some issues with things like private methods.

    I have every confidence these issues will be sorted out in the months and years to come. Unless you’ve tried writing one yourself, it’s hard to appreciate just how incredibly hard it is to get to the level of production quality steep has already achieved. As enterprises like Stripe and Square invest in RBS tooling, these sharp corners will be sanded down.

  2. JSON Type Definition in combination with steep seems to work pretty well. I’ve tried using rbs prototype and typeprof, but neither tool seems to work as well as jtd-codegen's mundane code generation approach.

    I think jtd-codegen's Ruby and Ruby Signatures generators can be a great item in the RBS toolbox. It definitely doesn’t hurt to know about it!

If you want to play around with RBS and jtd-codegen yourself, here are some resources:

  • I’ve put the working, final code in this blog in a GitHub repo here:

    https://github.com/ucarion/rbs-jtd-codegen-demo

    Give it a whirl!

  • The JSON Type Definition website has detailed documentation on jtd-codegen available here:

    https://jsontypedef.com/docs/ruby-codegen/

    Some exciting JSON Type Definition features for Ruby not covered in this blog post include the ability to validate JSON data at runtime, automatically adding documentation comments to the generated Ruby code, or customizing the Ruby code to use a custom data type.