- JavaScript 63%
- Shell 22.5%
- Python 14.5%
| .claude/skills/keycloak-benchmark | ||
| bench | ||
| plots | ||
| results | ||
| scripts | ||
| seeder | ||
| .env.example | ||
| .gitignore | ||
| docker-compose.yml | ||
| README.md | ||
Keycloak Benchmark
A small benchmark setup for Keycloak multi-tenancy: how does Keycloak's performance behave as you add more tenants? Specifically, how does the classic realm-per-tenant model compare to the newer organizations feature (KC 26+) that keeps everything in one realm?
I'd heard repeatedly that "Keycloak gets weird with many realms" but never had concrete numbers. This repo is what came out of trying to make that concrete — a docker-compose stack, a REST-based seeder, three k6 workloads, a sweep driver, and a Python plotter. Everything's reproducible.
TL;DR: for realm-per-tenant multi-tenancy on a single node, Keycloak is not viable past ~500 tenants. Admin API throughput collapses by 4 orders of magnitude, p95 latency goes from 2 ms to 17 s, and somewhere around 1200 realms the server runs itself out of memory chasing scheduled cache reloads. Organizations mode stays essentially flat through at least 1500 tenants under the same workload.
Hardware
All numbers below come from a single laptop:
- CPU: AMD Ryzen 5 7640U (6 cores / 12 threads, ~4.3 GHz boost)
- RAM: 60 GB
- Storage: NVMe SSD, encrypted (LUKS)
- OS: Guix System, Linux 6.19
- Container runtime: Podman 5.8 (rootless), podman-compose
- Keycloak: 26.0, production
startmode, Postgres backend - Postgres: 16-alpine, default config
- Network: k6 → KC over
--network host(effectively localhost, no TLS)
No artificial resource limits on the containers — KC gets the whole box. Numbers will vary on other hardware, but the shape of the curves should not. Take the absolute throughput figures with a grain of salt; the realms-vs-orgs delta is what matters.
What's measured
| Workload | What it does | Why it matters |
|---|---|---|
token-issuance |
OIDC password grant against a uniform-random tenant, 50 VUs | The hot path most apps actually hit |
jwks |
GET /realms/{r}/protocol/openid-connect/certs, 50 VUs |
Cached path; sanity check that not everything degrades |
admin-api |
List realms + get realm + list users, 10 VUs, admin token | Where the multi-tenancy pain shows up first |
seed |
REST loop creating realms/orgs + clients + roles + groups + users | The provisioning angle: "how slow is automated tenant onboarding" |
boot |
Restart KC, time to /health/ready, snapshot container RSS |
Cold-start latency and memory footprint vs realm count |
Each k6 workload is run cold (immediately after a KC restart with N tenants already seeded) and warm (after a 60 s warmup pass that's discarded). Cold/warm matters because production failure modes — the admin console going unresponsive after a redeploy — are cold-only.
Each tenant has the same content in both modes:
- 2 OIDC clients (1 public web, 1 confidential backend)
- 5 realm roles + 2 client roles per client
- 2 groups
- 10 users, each assigned to a group and a role
How to run it
There's a Claude Code skill (.claude/skills/keycloak-benchmark/) that
walks through this end-to-end. Drop into Claude Code in this repo and ask
"run the keycloak benchmark" — it'll handle the prerequisites, suggest a
scope, run the sweep, and aggregate the results.
If you'd rather do it by hand:
cp .env.example .env
cd seeder && npm install && cd ..
# Bring up the stack
podman compose up -d
./scripts/wait-ready.sh 180
# Run the sweep (~3–4 hours on Ryzen 7640U for NS=1..1500 both modes)
NS="1 100 500 1000 1500" MODES="realms orgs" ./scripts/run-sweep.sh
# Aggregate + plot
node scripts/aggregate.mjs
guix shell python python-matplotlib python-pandas -- python3 plots/generate.py
The skill is the better path — it knows about the N > 1500 realms-mode trap (see below) and won't let you walk into it unattended.
Findings
Admin API: realm-per-tenant collapses fast
This is the headline. Listing realms, fetching a realm's metadata, and listing users in a realm — the things the admin console does on every page load — degrade catastrophically as realm count grows.
| N | realms throughput (req/s) | orgs throughput (req/s) | realms p95 | orgs p95 |
|---|---|---|---|---|
| 1 | 7,744 | 4,710 | 2 ms | 5 ms |
| 100 | 367 | 4,250 | 76 ms | 6 ms |
| 500 | 1.96 | 4,545 | 16.6 s | 5 ms |
| 1000 | 0.48 | 3,032 | 60.0 s (timeout) | 8 ms |
| 1500 | — | 2,777 | — | 9 ms |
Orgs mode actually starts slower at N=1 (fewer optimisations in the per-org code path) but stays roughly flat through N=1500. Realms drops 4 orders of magnitude.
Token issuance: degrades gracefully in both modes
The user-facing OIDC login flow holds up much better. Both modes lose roughly 40 % throughput by N=1000, but neither falls off a cliff.
| N | realms (req/s, warm) | orgs (req/s, warm) |
|---|---|---|
| 1 | 185 | 147 |
| 100 | 151 | 143 |
| 500 | 146 | 144 |
| 1000 | 106 | 97 |
| 1500 | — | 92 |
Worth noting: the cold-start tail is much uglier than the warm steady state. At N=100 realms, max latency for the first 30 seconds after a restart hits 6.4 seconds — that's a user staring at a hung login form for 6 seconds. By warm phase it's back under 600 ms.
JWKS: cached, stays fast in both modes
The OIDC discovery endpoint is heavily cached. Both modes hold above 11k req/s at all tenant counts. Not a differentiator, included as a sanity check that "the host machine isn't just falling over."
Memory: where realms-mode dies
| N | realms RSS (MB) | orgs RSS (MB) |
|---|---|---|
| 1 | 882 | 895 |
| 100 | 891 | 892 |
| 500 | 1,055 | 901 |
| 1000 | 1,062 | 877 |
| 1500 | (killed at 43 GB during seed) | 882 |
In realms mode, RSS at-ready jumps 20 % between N=100 and N=500. But the
real failure was during seeding at N≈1200: KC's
ClearExpiredUserSessions scheduled task started iterating every realm on
every tick, and each call rehydrated the realm cache. Within minutes,
process RSS went from ~1 GB to 43 GB (72 % of total system memory) and
CPU pegged at 600 %. Seeding rate collapsed from ~1 s/tenant to ~25 s/tenant
and we killed it.
Orgs mode shows none of this — RSS is indistinguishable from N=1 at N=1500.
Provisioning speed: REST loop is brutal on realms
We provisioned each tenant via the standard Keycloak admin REST API
(@keycloak/keycloak-admin-client) with 50 parallel requests. This is what
you'd actually run in real tenant-onboarding automation.
| N | realms seed (s) | orgs seed (s) |
|---|---|---|
| 1 | 2 | 2 |
| 100 | 29 | 11 |
| 500 | 387 | 37 |
| 1000 | 3,524 | 84 |
| 1500 | (killed) | 159 |
At N=1000 realms, seeding took 59 minutes vs orgs' 84 seconds — a 42× gap, and the realms curve is super-linear. Per-tenant time grows from 0.3 s at N=100 to 3.5 s at N=1000 to ~25 s by N=1500. Same admin REST API, same content per tenant, same concurrency.
Boot time
| N | realms boot (s) | orgs boot (s) |
|---|---|---|
| 1 | 6.83 | 7.13 |
| 100 | 7.73 | 7.41 |
| 500 | 8.33 | 7.83 |
| 1000 | 12.74 | 9.55 |
| 1500 | — | 9.71 |
Realms mode adds ~6 seconds of cold-start by N=1000 — meaningful in a restart-after-deploy context. Orgs adds <3 seconds across the whole range.
What this means
If you need true tenant isolation — separate password policies, identity providers, themes, and login flows per tenant — and you expect to grow past ~500 tenants, single-node realm-per-tenant Keycloak is not the answer. The admin API ceiling alone makes the admin console unusable; the memory and provisioning curves make routine operations painful before that.
For the same scale, the organizations feature (KC 26+) holds up cleanly, at the cost of sharing realm-level config across tenants. For most SaaS-style multi-tenancy that trade is fine: tenants don't need their own themes or their own OAuth flows, they need a logical grouping with isolated membership and policies, which is exactly what orgs give you.
The Keycloak team has been clear about this — see their Sizing guide and the Keycloak Benchmark project they maintain — but it's still useful to have your own numbers from your own hardware.
Recommended specs
Sizing suggestions inferred from the numbers above, plus the usual "give the JVM some headroom" rule of thumb. Worth re-stating up front: this repo only measured idle, post-boot RSS — heap behaviour under sustained load is not directly observed. Treat these as starting points, not guarantees.
| Scenario | Mode | Minimum | Recommended |
|---|---|---|---|
| Dev / staging, ≤10 tenants, <100 users | either | 1 vCPU / 1.5 GB | 2 vCPU / 2 GB |
| Small SaaS, ≤100 tenants | either (orgs preferred) | 2 vCPU / 2 GB | 4 vCPU / 4 GB |
| Mid SaaS, 100–500 tenants | orgs | 2 vCPU / 3 GB | 4 vCPU / 4 GB |
| Large SaaS, 500–1500 tenants | orgs | 4 vCPU / 4 GB | 8 vCPU / 8 GB |
| 1500+ tenants | orgs, HA cluster | benchmark per node first | scale horizontally |
| Real realm isolation past 500 | realms, HA cluster | don't attempt single-node | see the 43 GB seed failure at N≈1200 |
Add a separate small Postgres instance — 1 vCPU / 1 GB is plenty for these loads. The in-process KC cache is the bottleneck, not the database.
Translating to a cloud SKU:
- Token issuance is CPU-bound (BCrypt). Bursty logins? Size CPU first, memory second.
- Set
-Xmxexplicitly. The benchmark let the JVM auto-size against host RAM. In a container with a limit, JVM ergonomics will pick a much smaller heap than you'd expect — aim for ~75 % of the container limit. - Don't size for cold-start. The 6-second token tail at N=100 (and worse beyond) is a one-time hit after deploys, not steady state. Use a readiness probe with slack rather than oversizing the box.
- Active sessions matter more than total users. Heap scales with the
session/refresh-token cache, roughly
active_users × refresh_lifetime— not the size of your user table.
Caveats
Take these with a grain of salt:
- Single-node only. Multi-node Keycloak with distributed Infinispan behaves differently — and possibly worse in some ways (cache invalidation traffic scales with realm count). Not measured here.
- HTTP, not HTTPS. Pinned to plain HTTP for benchmark consistency. TLS adds per-request cost that varies with cipher and session-resumption config.
- Uniform random tenant selection. Real traffic is heavily skewed — 10 % of tenants get 90 % of the load. Uniform distribution stresses the cache the most (worst-case for hit rate), so these numbers are conservative for typical SaaS shapes.
- Default Postgres tuning. No
shared_buffers/max_connections/etc. tweaks. Production deployments with tuned Postgres may push the ceiling out somewhat — but not by orders of magnitude, since the bottleneck is KC's in-process cache, not the database. - One snapshot of one KC release. 26.0 — newer releases may improve this. Re-running the sweep on a newer version is a good first contribution.







