Keep your IDs in lowercase

by Ulysse Carion 21 Aug 2020

When you’re designing a system that needs to assign IDs to things, keep your ID format entirely lowercase. Doing so is a small way to make your life easier in the long term.

You should do this because you will inevitably end up ETLing your IDs into a system whose default behavior is to lowercase things. Datadog and AWS Athena do this, for instance.

If you make your IDs case-sensitive and use both the upper and lower cases, then systems that have a lowercasing step will lose information, and potentially lead to collisions. In most cases, key collision is either a problem or a disaster.

If you make your IDs use only the upper-case, then you give yourself extra work when you want to take an ID in a system like Datadog or Athena, and look it up inside your systems.

It’s fine if your IDs use only the lower-case, but you normalize upper-case ASCII down to lower-case on input; the important thing is that you always output things in lower-case, so that your output format is byte-for-byte equal to the format that lowercasing systems will output.

Since you’re probably limiting your IDs to the alphanumerics plus maybe hyphen and underscore, this just means you just limit yourself to: [a-z], [0-9], -, and _. If you’re using UUIDs, or something like xyz_$UUID, where xyz indicates what type of data the ID is for (this is the pattern Stripe’s API popularized), you’re already good here: spec-following UUID implementations use lowercase on output, but accept uppercase on input:

Declaration of syntactic structure:

A UUID is an identifier that is unique across both space and time, with respect to the space of all UUIDs. […]

The hexadecimal values “a” through “f” are output as lower case characters and are case insensitive on input.

(From Section 3 of RFC4122.)

Again: it’s a small thing. Not the end of the world if you ignore it. But getting the little things right often makes getting the big things right so much easier.