Share tokens instead of IDs

2026-05-26

Drawing of a table surface. On the left are four keys neatly organised vertically. On the right is a pile of ~15 bronze tokens, each with a unique design. — Converting IDs to tokens is easy (Share tokens instead of IDs: Header image)

When tracking a collection of entities in a relational database, it's common to use sequential identifiers (IDs) as the primary key, starting from 1 and counting up. These IDs are fine for uniquely identifying an entity in a local database, or upholding a foreign key relationship with other entities in the same database, but start causing problems once they cross a service boundary:

They're ambiguous. IDs just look like numbers. A database error log mentions 1779590843... is that an order or an invoice? It kinda looks like a timestamp.
They expose your internals. A customer who receives Invoice 0038 immediately knows how many invoices you've sent. Anyone with this information can reason about your data structure, and it may be possible to use enumeration attacks to access records belonging to other customers.
They're bound to a single source. ID sequences are tightly bound to a single entity table in a database. Merge entities, or split a monolith into new services, and the same identifier may exist in multiple sources with no way to tell them apart.

Two service domains, each owning its own entity tables.

Let's take a simple e-commerce example, with two services. We have an Orders Service, which serves as the domain for Order and Payment entities. We have the Deliveries Service, which serves as the domain for Delivery entities.

Integer IDs are fine for primary and foreign keys in a single service...

....but they become less valuable bridging between services, as seen here.

🧑Customer 9032
im still waiting for 3044 from 2185
Sure. I'll need the delivery and order IDs.
i just told you!

Outside of these domains, these IDs completely lose their meaning.

Given how poorly raw IDs travel across service boundaries, a better approach is to share tokens. Generate tokens that are externally-sharable, and do not leak information about the entity storage. This article shares a strategy for using sequential entity IDs to generate compact, non-enumerable, type-prefixed tokens.

Order ID	→	Order Token
1		`O_Q4BDJQ`
2		`O_ZCXQ8F`
3		`O_2PN7M8`
1000		`O_WHE1FY`
1073741823		`O_VNT7CW`

Instead, we prefer to share tokens...

🧑Customer C_7XEV43
im still waiting for D_1hj4ln from O_32onOi
Sure, let me look up your delivery and order.

...which makes everything much clearer.

These tokens are:

Ciphered. Sequential IDs result in completely different tokens.
Encoded. The compact format uses non-ambiguous characters, and is forgiving in its decoding process.
Prefixed. A known prefix tracks the entity type. Even with no other context, D_NM0X1X is clearly a Delivery token, and O_Q4BDJQ is clearly an Order token.

These tokens are NOT:

Secure. The ciphering and encoding process obfuscates a numeric ID into a compact token, but the process can be reversed and doesn't meaningfully hide the underlying ID.
A replacement for IDs. We don't want to rely on deriving IDs from tokens, in case we change the process in the future to support multiple data sources.

Keen to try it out? Use the generator below, or check out the library code at GitHub: TassSinclair/tokens.

Prefix ID Seed Length

C_71VHDH

Try generating a token with your own inputs.

Worked example: Token lookups

Consider the example above, where a support agent is investigating a customer's delivery issue:

🧑Customer C_7XEV43
im still waiting for D_1hj4ln from O_32onOi
Sure, let me look up your delivery and order.

We start with a Delivery Token and Order Token.

Service calls quietly canonicalise the tokens and return the correct entities.

🧑Customer C_7XEV43
I see Order O_320N01 is still open, Delivery D_1HJ41N is due to arrive on Wednesday.
ok thanks!

The support agent can respond with minimal overhead.

In this example, entity IDs are not shared outside of the core domain, and service APIs rely on tokens. Tokens are used in customer-facing artefacts, such as invoices.

Why tokens, versus UUIDs?

UUIDs also avoid enumeration and information leakage, but at 36 characters they're hard to communicate verbally and carry no type information. They're still valuable for uniquely identifying entity records, but similar to IDs, should not be handled outside of the service that owns the entities.

How does this work?

Ciphering, encoding, and prefixing each operate as independent steps chained in sequence. Let's take customer entities as an example.

Customer ID	Cipher (seed 41434539)	Encoding	Prefixed Customer token	Canonicalisation (example)
`1`	`236832177`	`71VHDH`	`C_71VHDH`	`C_7iuhdh`
`2`	`55488065`	`1MXBJ1`	`C_1MXBJ1`	`C_imxbjl`
`3`	`1065536837`	`ZR5KA5`	`C_ZR5KA5`	`C_zr5ka5`
`10`	`185554103`	`5GYN5Q`	`C_5GYN5Q`	`C_5gyn5q`
`1000`	`346451463`	`AACVG7`	`C_AACVG7`	`C_aacvg7`
`53765282`	`2`	`000002`	`C_000002`
`379690282`	`1`	`000001`	`C_000001`	`C_00000i`
`1073741823`	`421086739`	`CHJHGK`	`C_CHJHGK`	`C_ckjhgk`

Converting Customer IDs to Customer tokens, and canonicalising Customer Tokens from user input.

Ciphering

In a process where a sequence of customers with IDs [ 100, 101, 102 ] would normally map to customer tokens [ C_000034, C_000035, C_000036 ], it's trivial to start guessing nearby tokens. We can mask this relationship by using a substitution cipher to obscure the ID before converting it into a token.

We want a deterministic substitution cipher that uses the same input and output domain. This kind of cipher takes a number in a range, and maps it to somewhere else in the same range. To visualise an example, the Caesar cipher shifts the input "left" by a certain distance, wrapping around to the end of the domain:

Caesar cipher

Domain

Left shift

Try different domain sizes and left-shift values.

Feistel cipher

In practice, we want a cipher that scrambles the mapping, so we use a Feistel cipher to encrypt the integer ID before the Base32 encoding step. A Feistel network provides diffusion, where a one-bit change in the input affects all bits of the output.

Feistel cipher

Domain

Seed

Try different domain sizes and seed values.

For our customer tokens, we've set the "domain" space as 32⁶, which matches our imposed token constraint of six Base32 characters. We can set a different seed for each entity type, so they follow different token sequences. The "seed" is not a secret; it is hard-coded for each token type to ensure consistent results.

Customer ID	Cipher (seed 41434539)	Cipher (seed 41434540)
`1`	`236832177`	`82650702`
`2`	`55488065`	`773480176`
`3`	`1065536837`	`592394359`
`10`	`185554103`	`1064651039`
`1000`	`346451463`	`6045569`
`53765282`	`2`	`874379223`
`379690282`	`1`	`285382121`
`1073741823`	`421086739`	`1063701256`

Ciphering masks the relationship between IDs and tokens

See GitHub: TassSinclair/tokens/FeistelCipher.kt for full implementation details.

Encoding

Base32 encoding lets us represent identifiers with a larger set of human-parsable characters. We use Douglas Crockford's Base32 implementation, which maps Base10 integers (0123456789) to a Base32 range (0123456789ABCDEFGHJKMNPQRSTVWXYZ). Note that we've constrained the domain to six Base32 characters, so the maximum ID value supported is 32⁶ - 1 = 1,073,741,823.

When converting an ID to a token, this step also left-pads the resulting token, so all tokens become the same length.
When parsing a token from input, this step canonicalises confusing and lowercased characters into their formal representation (for example, i and l are canonicalised to 1).

ID	Encoding without cipher	Canonicalisation (example)
`1`	`000001`	`00000i`
`2`	`000002`
`3`	`000003`
`10`	`00000A`	`00000a`
`1000`	`0000Z8`	`0000z8`
`1073741823`	`ZZZZZZ`	`ZzzZzz`

Encoding IDs as fixed-width Base32 tokens

Token

C_71VHDH

Try canonicalising a prefixed token.

Encoding makes tokens safer to communicate over lossy media, such as over the phone, or scribbled on pieces of paper. One last step reduces confusion further.

Prefixing

Finally, each entity is represented by a short prefix, such as "C" for customer, "O" for order, or "INV" for invoice. In code, we create a token type for each of these, which also hard codes the cipher seed.

This gives us instant type identification, reduces the risk of accidentally using the wrong token in the wrong place, and ensures tokens from different types never collide even if they share an underlying ID.

Entity-scoped tokens, with prefix, token length and cipher seed constants.

In previous examples, we've set the cipher domain and token size to six Base32 characters, allowing us to represent IDs up to 32⁶ - 1 = 1,073,741,823. But this is an arbitrary limit. Tokens with seven or eight Base32 characters would push the ID ceiling higher.

With entity-specific prefixes, lengths, and cipher seeds, see how the same ID sequence is represented against our example token types:

Example ID	Prefixed Customer token	Prefixed Order token	Prefixed Invoice token
`1`	`C_71VHDH`	`O_Q4BDJQ`	`INV_ZJP9WNJA`
`2`	`C_1MXBJ1`	`O_ZCXQ8F`	`INV_M9GZ29A2`
`3`	`C_ZR5KA5`	`O_2PN7M8`	`INV_D9MEBYC4`
`10`	`C_5GYN5Q`	`O_W88WFZ`	`INV_FE3T0SS5`
`1000`	`C_AACVG7`	`O_WHE1FY`	`INV_4PGCFN8B`
`1073741823`	`C_CHJHGK`	`O_VNT7CW`	`INV_8R95M5RK`
`1099511627775`	`(out of bounds)`	`(out of bounds)`	`INV_9G26G99K`

Token prefixes, seeds, and lengths result in very different mappings.

Bonus: Type safety

As an added benefit, these strongly-typed entity tokens improve type safety in our code. Consider the method signatures below:

fun getDeliveryForOrder(deliveryToken: String, orderToken: String) // these could be anything!

fun getDeliveryForOrder(deliveryToken: Token, orderToken: Token) // better, but still possible to switch them.

fun getDeliveryForOrder(deliveryToken: DeliveryToken, orderToken: OrderToken) // best, mistakes are caught immediately.

End-to-end considerations

Each of the steps above becomes part of the process in converting an ID to a token.

Now we have a solid understanding of how each step works, let's review some end-to-end considerations.

Security

Tokens solve an operational problem, not a security problem. The ciphering step prevents casual inference: A customer seeing C_71VHDH would not easily guess how many customers signed up before them, or that C_1MXBJ1 belongs to the next customer.

This cipher is fully reversible. Anyone with the Feistel seed can recover the original integer ID from any token. The seed is hard-coded as a configuration constant, visible in source code and deployment artefacts, and shouldn't be treated as a cryptographic secret.

Abstraction

Tokens don't replace IDs as the internal identifier. If the token conversion process changes (for example, if a service migrates from sequential integer IDs to UUIDs), persisted tokens remain stable.

Consider a situation where a second customer database is brought into scope, using UUIDs as primary keys rather than sequential integers. Both databases could represent customer entities using CustomerTokens, with a different output format to avoid clashes with the original format. Code that accepts a CustomerToken doesn't need to know which source a given token came from, as long as it matches one of the expected formats.

Customer entities with integer IDs use the process discussed above, and end up with six-character Base32-encoded tokens.
Customer entities with UUIDs use a different process (for example, with uuid-base58), and end up with 22-character Base58-encoded tokens.

Source	Key	Customer token
DB 1 (integer ID)	`10`	`C_5GYN5Q`
DB 2 (UUID)	`550e8400-e29b-41d4-a716-446655440000`	`C_3NRpJeT1HaqFxQ5dJRaZBG`

Token types can span multiple sources; callers only ever see a CustomerToken.

Persistence also allows individual tokens to be assigned directly, without going through the cipher at all. In test environments you often want predictable, readable tokens for known fixtures, rather than whatever the cipher happens to produce for a given ID:

INSERT INTO customers (token, name, ...) VALUES ('C_TEST_SENDER', 'Sandy Sender', ...);

Error detection

Crockford's Base32 specification defines an optional check symbol that can be appended to any encoded value, which is computed as value mod 37, mapped to one of 37 symbols (the standard 32 characters plus five extras: *~$=U). This catches the most common transcription mistakes, such as a single wrong character, or two adjacent characters swapped. We can try this out using the generator from before:

Prefix ID Seed Length

C_71VHDHR

The check symbol is the final character in the token.

Appending a check symbol lengthens the token by one character, but means a service entrypoint can reject malformed tokens during validation, instead of performing a database lookup that will never match:

Token Check symbol

C_71VHDH

The check symbol helps us validate tokens easily.

Check symbols are an optional extension. Use them if you need to communicate tokens over the phone, or on handwritten notes. Skip them when tokens are only ever generated and consumed programmatically.

In conclusion

This approach works most effectively in systems where entity identifiers cross service boundaries, or appear in customer-facing contexts.

Ciphering makes tokens visually distinct, so people are less likely to confuse sequential entities.
Encoding makes tokens more visually compact, and easier to communicate.
Prefixes reduce token ambiguity when working with multiple entity types.

Keep IDs doing what they do best - uniquely identifying entity records in a database, and enforcing referential integrity between entities. Outside of the service boundary, prefer tokens. For further reading, check out my reference implementation at GitHub: TassSinclair/tokens. This is the approach we use in OmniTabz systems, inspired by the Base32 tokens we used at Cash App.

If you have feedback or questions about this article, let's catch up via Mastodon, LinkedIn, or email.