Share tokens instead of IDs

2026-05-26

When tracking a collection of entities in a relational database, it's common to use sequential identifiers (IDs) as the primary key, starting from 1 and counting up. These IDs are fine for uniquely identifying an entity in a local database, or upholding a foreign key relationship with other entities in the same database, but start causing problems once they cross a service boundary:

  • They're ambiguous. IDs just look like numbers. A database error log mentions 1779590843... is that a user, or tenant? It kinda looks like a timestamp.
  • They expose your internals. A customer who receives Invoice 0038 immediately knows how many invoices you've sent. Anyone with this information can reason about your data structure, and a savvy attacker can use enumeration attacks to access records belonging to other customers.
  • They're bound to a single source. ID sequences are tightly bound to a single entity table in a database. Merge entities, or split a monolith into new services, and the same identifier may exist in multiple sources with no way to tell them apart.
Integer IDs are fine for primary and foreign keys, sure...
  1. 🧑Customer 9032 / Support ticket 80299
  2. im still waiting for 3044 from 2185
  3. Sure. I'll need the delivery and order IDs.
  4. i just told you!
...but we really don't want this situation.

Given the limitations and risks exposed by leaking IDs, a better approach is to share tokens. Generate tokens that are externally-sharable, and do not leak information about the entity storage. This article shares a strategy for using sequential entity IDs to generate compact, non-enumerable, type-prefixed tokens.

User ID User Token
1 U_71VHDH
2 U_1MXBJ1
3 U_ZR5KA5
1000 U_AACVG7
1073741823 U_CHJHGK
Instead, we prefer to share tokens...
  1. 🧑Customer C_7XEV43 / Support ticket ST_20SDKE
  2. im still waiting for D_1hj4ln from O_32onOi
  3. Sure, let me look up your delivery and order.
...which makes everything much clearer.

These tokens are:

  • Ciphered. Sequential IDs result in completely different tokens. All tokens are the same length. The ciphering algorithm we use supports decoding a token back to an ID.
  • Encoded. The compact format uses non-ambiguous characters, and is forgiving in its decoding process, so U_abcol2 is canonicalised as U_ABC012.
  • Prefixed. A known prefix tracks the entity type. Even with no other context, U_NM0X1X is clearly a User token, and T_DCPRTD is a Tenant token.

Keen to try it out? The library code is available at GitHub: TassSinclair/tokens.

Worked example: Token lookups

Consider the example above, where a support agent is investigating a customer's delivery issue:

  1. 🧑Customer C_7XEV43 / Support ticket ST_20SDKE
  2. im still waiting for D_1hj4ln from O_32onOi
  3. Sure, let me look up your delivery and order.
We start with a Delivery Token and Order Token.
Service calls quietly canonicalise the tokens and return the correct entities.
  1. 🧑Customer C_7XEV43 / Support ticket ST_20SDKE
  2. I see Order O_320N01 is still open, Delivery D_1HJ41N is due to arrive on Wednesday.
  3. ok thanks!
The support agent can respond with minimal overhead.

In this example, integer IDs are not shared outside of the core domain. Service layers can bi-directionally convert tokens and IDs. Users of the broader system use tokens exclusively - this is what APIs return, and what gets printed on invoices.

How does this work?

Ciphering, encoding, and prefixing each operate as independent, reversible steps chained in sequence.

User ID Cipher
(seed 0x2783dab)
Encoding Prefixed
User token
Canonicalisation
(example)
1 236832177 71VHDH U_71VHDH U_7iuhdh
2 55488065 1MXBJ1 U_1MXBJ1 U_imxbjl
3 1065536837 ZR5KA5 U_ZR5KA5 U_zr5ka5
10 185554103 5GYN5Q U_5GYN5Q U_5gyn5q
1000 346451463 AACVG7 U_AACVG7 U_aacvg7
53765282 2 000002 U_000002
379690282 1 000001 U_000001 U_00000i
1073741823 421086739 CHJHGK U_CHJHGK U_ckjhgk
Converting User IDs to User tokens, and vice versa.

Ciphering

In a process where a sequence of users with IDs [ 100, 101, 102 ...] would normally map to user tokens [ U_000034, U_000035, U_000036 ...], it's trivial to start guessing nearby tokens. We can mask this relationship by using a bi-directional cipher to obscure the ID before converting it into a token.

This step uses a Feistel cipher to encrypt and decrypt the integer ID before the Base32 encoding step. A Feistel network provides diffusion, where a one-bit change in the input affects all bits of the output.

Encryption and decryption with a small domain size (6) and fixed seed (0x123abc).

For our user tokens, we've set the "domain" space as 326, which matches our imposed token constraint (six Base32 characters). Given a preconfigured seed, we can encrypt and decrypt any ID in our domain size. We can set a different seed for each entity type, so they follow different token sequences.

User ID Cipher
(seed 0x2783dab)
Cipher
(seed 0x2783dac)
1 236832177 82650702
2 55488065 773480176
3 1065536837 592394359
10 185554103 1064651039
1000 346451463 6045569
53765282 2 874379223
379690282 1 285382121
1073741823 421086739 1063701256
Ciphering masks the relationship between IDs and tokens

See GitHub: TassSinclair/tokens/FeistelCipher.kt for full implementation details, but in summary the encryption process works by:

  1. Splitting the number into two halves.
  2. Running several rounds, where in each round, one half is XORed with a keyed hash of the other half, then the halves swap.

Decryption reverses the order: swap halves first, then XOR each round in reverse. The result is a fully reversible permutation, so every input maps to exactly one output, with no collisions. Now we can proceed to encoding without risk of creating enumerable tokens.

Encoding

Base32 encoding lets us represent identifiers with a larger set of human-parsable characters. We use Douglas Crockford's Base32 implementation, which maps Base10 integers (0123456789) to a Base32 range (0123456789ABCDEFGHJKMNPQRSTVWXYZ). Note that we've constrained the domain to six Base32 characters, so the maximum ID value supported is 326 - 1 = 1,073,741,823.

  • When converting an ID to a token, this step also left-pads the resulting token, so all tokens become the same length.
  • When converting a token to an ID, this step also canonicalises confusing and lowercased characters into their formal representation (eg. i and l are canonicalised to 1).
ID Encoding
without cipher
Canonicalisation
(example)
1 000001 00000i
2 000002
3 000003
10 00000A 00000a
1000 0000Z8 0000z8
1073741823 ZZZZZZ ZzzZzz
Encoding IDs as fixed-width Base32 tokens

Encoding makes tokens safer to communicate over lossy media, such as over the phone, or scribbled on pieces of paper. One last step reduces confusion further.

Prefixing

Finally, each entity is represented by a short prefix, such as "U" for user, "T" for tenant, or "INV" for invoice. In code, we create a token type for each of these, which also sets the cipher seed.

This gives us instant type identification, reduces the risk of accidentally using the wrong token in the wrong place, and ensures tokens from different types never collide even if they share an underlying ID.

Entity-scoped tokens, with prefix and cipher seed constants.

With entity-specific prefixes and cipher seeds, see how the same ID sequence is represented against our example token types:

Example ID Prefixed
User token
Prefixed
Tenant token
Prefixed
Invoice token
1 U_71VHDH T_Q4BDJQ INV_ZJP9WNJA
2 U_1MXBJ1 T_ZCXQ8F INV_M9GZ29A2
3 U_ZR5KA5 T_2PN7M8 INV_D9MEBYC4
10 U_5GYN5Q T_W88WFZ INV_FE3T0SS5
1000 U_AACVG7 T_WHE1FY INV_4PGCFN8B
1073741823 U_CHJHGK T_VNT7CW INV_8R95M5RK
1099511627775 (out of bounds) (out of bounds) INV_9G26G99K
Token prefixes, seeds, and lengths result in very different mappings.

Bonus: Type safety

As an added benefit, these strongly-typed entity tokens improve type safety in our code. Consider the method signatures below:

fun getUserForTenant(userToken: String, tenantToken: String) // these could be anything!

fun getUserForTenant(userToken: Token, tenantToken: Token) // better, but still possible to switch them.

fun getUserForTenant(userToken: UserToken, tenantToken: TenantToken) // best, mistakes are caught at compile-time.

The full pipeline

Each step is reversible. Encoding chains them left to right; decoding reverses the chain.

The full process of converting an ID to a Token, followed in either direction.

Notes and caveats

First, note that tokens obfuscate the underlying IDs, but it's not true encryption. Anyone with access to the Feistel cipher seed can reverse this process.

In these examples we've limited the token domain to six Base32 characters, allowing us to represent IDs up to 326 - 1 = 1,073,741,823. Tokens with seven or eight Base32 characters would push the ID ceiling higher.

While we can convert IDs to tokens pretty easily, it's better practice to persist the token (with a UNIQUE constraint) alongside the integer ID in a dedicated column. Internal joins and indexes continue using the integer as primary key and foreign key target, while external lookups can use an index on the token. This also protects you from issues if you re-sequence the entities, or calculate the token using a different strategy in the future.

CREATE TABLE users ( 
  id SERIAL PRIMARY KEY,
  token VARCHAR(8) UNIQUE NOT NULL,
  name TEXT NOT NULL,
  created_at TIMESTAMP DEFAULT now()
);

In conclusion

  • Ciphering makes tokens visually distinct, so people are less likely to confuse sequential entities.
  • Encoding makes tokens more visually compact, and easier to communicate.
  • Prefixes reduce token ambiguity when working with multiple entity types.

Keep IDs doing what they do best - uniquely identifying entity records in a database, and enforcing referential integrity between entities. Outside of the service boundary, prefer tokens. For further reading, check out my reference implementation at GitHub: TassSinclair/tokens. This is the approach we use in OmniTabz systems, inspired by the Base32 tokens we used at Cash App.


If you have feedback or questions about this article, let's catch up via email.

Now (2026-05-26)Share tokens instead of IDsAutomating automationsSmartifying your devicesCHCon 2025 Badge ChallengeLEGO: Tangara (2025)