
Share tokens instead of IDs
When tracking a collection of entities in a relational database, it's common to use sequential
identifiers (IDs) as the primary key, starting from
1 and counting up. These IDs are fine for uniquely identifying an
entity in a local database, or upholding a foreign key relationship with other entities in the
same database, but start causing problems once they cross a service boundary:
- They're ambiguous. IDs just look like numbers. A database error log
mentions
1779590843... is that a user, or tenant? It kinda looks like a timestamp. - They expose your internals. A customer who receives Invoice
0038immediately knows how many invoices you've sent. Anyone with this information can reason about your data structure, and a savvy attacker can use enumeration attacks to access records belonging to other customers. - They're bound to a single source. ID sequences are tightly bound to a single entity table in a database. Merge entities, or split a monolith into new services, and the same identifier may exist in multiple sources with no way to tell them apart.
- 🧑Customer 9032 / Support ticket 80299
- im still waiting for 3044 from 2185
- Sure. I'll need the delivery and order IDs.
- i just told you!
Given the limitations and risks exposed by leaking IDs, a better approach is to share tokens. Generate tokens that are externally-sharable, and do not leak information about the entity storage. This article shares a strategy for using sequential entity IDs to generate compact, non-enumerable, type-prefixed tokens.
| User ID | → | User Token |
|---|---|---|
| 1 | U_71VHDH | |
| 2 | U_1MXBJ1 | |
| 3 | U_ZR5KA5 | |
| 1000 | U_AACVG7 | |
| 1073741823 | U_CHJHGK |
- 🧑Customer C_7XEV43 / Support ticket ST_20SDKE
- im still waiting for D_1hj4ln from O_32onOi
- Sure, let me look up your delivery and order.
These tokens are:
- Ciphered. Sequential IDs result in completely different tokens. All tokens are the same length. The ciphering algorithm we use supports decoding a token back to an ID.
- Encoded. The compact format uses non-ambiguous characters, and is forgiving in its decoding
process, so
U_abcol2is canonicalised asU_ABC012. - Prefixed. A known prefix tracks the entity type. Even with no other context,
U_NM0X1Xis clearly a User token, andT_DCPRTDis a Tenant token.
Keen to try it out? The library code is available at GitHub: TassSinclair/tokens.
Worked example: Token lookups
Consider the example above, where a support agent is investigating a customer's delivery issue:
- 🧑Customer C_7XEV43 / Support ticket ST_20SDKE
- im still waiting for D_1hj4ln from O_32onOi
- Sure, let me look up your delivery and order.
- 🧑Customer C_7XEV43 / Support ticket ST_20SDKE
- I see Order O_320N01 is still open, Delivery D_1HJ41N is due to arrive on Wednesday.
- ok thanks!
In this example, integer IDs are not shared outside of the core domain. Service layers can bi-directionally convert tokens and IDs. Users of the broader system use tokens exclusively - this is what APIs return, and what gets printed on invoices.
How does this work?
Ciphering, encoding, and prefixing each operate as independent, reversible steps chained in sequence.
| User ID | ↔ | Cipher (seed 0x2783dab) | ↔ | Encoding | ↔ | Prefixed User token | ← | Canonicalisation (example) |
|---|---|---|---|---|---|---|---|---|
1 | 236832177 | 71VHDH | U_71VHDH | U_7iuhdh | ||||
2 | 55488065 | 1MXBJ1 | U_1MXBJ1 | U_imxbjl | ||||
3 | 1065536837 | ZR5KA5 | U_ZR5KA5 | U_zr5ka5 | ||||
10 | 185554103 | 5GYN5Q | U_5GYN5Q | U_5gyn5q | ||||
1000 | 346451463 | AACVG7 | U_AACVG7 | U_aacvg7 | ||||
53765282 | 2 | 000002 | U_000002 | |||||
379690282 | 1 | 000001 | U_000001 | U_00000i | ||||
1073741823 | 421086739 | CHJHGK | U_CHJHGK | U_ckjhgk |
Ciphering
In a process where a sequence of users with IDs
[ 100, 101, 102 ...] would normally map to user tokens
[ U_000034, U_000035, U_000036 ...], it's trivial to start
guessing nearby tokens. We can mask this relationship by using a bi-directional cipher to
obscure the ID before converting it into a token.
This step uses a Feistel cipher to encrypt and decrypt the integer ID before the Base32 encoding step. A Feistel network provides diffusion, where a one-bit change in the input affects all bits of the output.
6) and
fixed seed (0x123abc).
For our user tokens, we've set the "domain" space as
326, which matches our imposed token constraint (six Base32 characters). Given a preconfigured
seed, we can encrypt and decrypt any ID in our domain size. We can set a different seed for
each entity type, so they follow different token sequences.
| User ID | Cipher (seed 0x2783dab) | Cipher (seed 0x2783dac) |
|---|---|---|
1 | 236832177 | 82650702 |
2 | 55488065 | 773480176 |
3 | 1065536837 | 592394359 |
10 | 185554103 | 1064651039 |
1000 | 346451463 | 6045569 |
53765282 | 2 | 874379223 |
379690282 | 1 | 285382121 |
1073741823 | 421086739 | 1063701256 |
See GitHub: TassSinclair/tokens/FeistelCipher.kt for full implementation details, but in summary the encryption process works by:
- Splitting the number into two halves.
- Running several rounds, where in each round, one half is XORed with a keyed hash of the other half, then the halves swap.
Decryption reverses the order: swap halves first, then XOR each round in reverse. The result is a fully reversible permutation, so every input maps to exactly one output, with no collisions. Now we can proceed to encoding without risk of creating enumerable tokens.
Encoding
Base32 encoding lets us represent
identifiers with a larger set of human-parsable characters. We use
Douglas Crockford's Base32 implementation,
which maps Base10 integers (0123456789) to a Base32 range (0123456789ABCDEFGHJKMNPQRSTVWXYZ). Note that we've constrained the domain to six Base32 characters, so the maximum ID value
supported is
326 - 1 = 1,073,741,823.
- When converting an ID to a token, this step also left-pads the resulting token, so all tokens become the same length.
-
When converting a token to an ID, this step also canonicalises confusing and lowercased
characters into their formal representation (eg.
iandlare canonicalised to1).
| ID | ↔ | Encoding without cipher | ← | Canonicalisation (example) |
|---|---|---|---|---|
1 | 000001 | 00000i | ||
2 | 000002 | |||
3 | 000003 | |||
10 | 00000A | 00000a | ||
1000 | 0000Z8 | 0000z8 | ||
1073741823 | ZZZZZZ | ZzzZzz |
Encoding makes tokens safer to communicate over lossy media, such as over the phone, or scribbled on pieces of paper. One last step reduces confusion further.
Prefixing
Finally, each entity is represented by a short prefix, such as "U" for user, "T" for tenant, or "INV" for invoice. In code, we create a token type for each of these, which also sets the cipher seed.
This gives us instant type identification, reduces the risk of accidentally using the wrong token in the wrong place, and ensures tokens from different types never collide even if they share an underlying ID.
With entity-specific prefixes and cipher seeds, see how the same ID sequence is represented against our example token types:
| Example ID | Prefixed User token | Prefixed Tenant token | Prefixed Invoice token |
|---|---|---|---|
1 | U_71VHDH | T_Q4BDJQ | INV_ZJP9WNJA |
2 | U_1MXBJ1 | T_ZCXQ8F | INV_M9GZ29A2 |
3 | U_ZR5KA5 | T_2PN7M8 | INV_D9MEBYC4 |
10 | U_5GYN5Q | T_W88WFZ | INV_FE3T0SS5 |
1000 | U_AACVG7 | T_WHE1FY | INV_4PGCFN8B |
1073741823 | U_CHJHGK | T_VNT7CW | INV_8R95M5RK |
1099511627775 | (out of bounds) | (out of bounds) | INV_9G26G99K |
Bonus: Type safety
As an added benefit, these strongly-typed entity tokens improve type safety in our code. Consider the method signatures below:
fun getUserForTenant(userToken: String, tenantToken: String) // these could be anything!
fun getUserForTenant(userToken: Token, tenantToken: Token) // better, but still possible to switch them.
fun getUserForTenant(userToken: UserToken, tenantToken: TenantToken) // best, mistakes are caught at compile-time.The full pipeline
Each step is reversible. Encoding chains them left to right; decoding reverses the chain.
Notes and caveats
First, note that tokens obfuscate the underlying IDs, but it's not true encryption. Anyone with access to the Feistel cipher seed can reverse this process.
In these examples we've limited the token domain to six Base32 characters, allowing us to
represent IDs up to
326 - 1 = 1,073,741,823. Tokens with seven or eight Base32 characters would push the ID ceiling higher.
While we can convert IDs to tokens pretty easily, it's better practice to persist the token
(with a UNIQUE constraint) alongside the integer ID in a dedicated
column. Internal joins and indexes continue using the integer as primary key and foreign key
target, while external lookups can use an index on the token. This also protects you from
issues if you re-sequence the entities, or calculate the token using a different strategy in
the future.
CREATE TABLE users (
id SERIAL PRIMARY KEY,
token VARCHAR(8) UNIQUE NOT NULL,
name TEXT NOT NULL,
created_at TIMESTAMP DEFAULT now()
);In conclusion
- Ciphering makes tokens visually distinct, so people are less likely to confuse sequential entities.
- Encoding makes tokens more visually compact, and easier to communicate.
- Prefixes reduce token ambiguity when working with multiple entity types.
Keep IDs doing what they do best - uniquely identifying entity records in a database, and enforcing referential integrity between entities. Outside of the service boundary, prefer tokens. For further reading, check out my reference implementation at GitHub: TassSinclair/tokens. This is the approach we use in OmniTabz systems, inspired by the Base32 tokens we used at Cash App.
If you have feedback or questions about this article, let's catch up via email.

All articles
About Sinclair Studios