ARC-ADR-037 — BYO-Credentials: a Secrets Broker for Abstracted Systems¶
| Field | Value |
|---|---|
| ID | ARC-ADR-037 |
| Status | Accepted |
| Date | 2026-05-29 |
| Deciders | Hub owner (Nicky Clarke) — accepted 2026-05-29 via HITL selector (chose Option A, Infisical CE) → revised same day to Option C (Azure Key Vault) on self-host friction |
| Supersedes | — (enables ARC-ADR-036 at multi-system / multi-user scale) |
| Tags | secrets, credentials, byo-keys, secrets-broker, anti-corruption-layer, security, multi-tenant, openbao, infisical, azure-key-vault |
Context and Problem Statement¶
Abstracting a real third-party system (GitHub, Jira, Linear, Stripe, …) implies the user wants it abstracted and has granted the appropriate access. Our MCP tools — generated against the canonical APIs (ARC-ADR-036) and proxied through backend-core — can only call those real producers with that system's credentials. So the platform needs a place for users to register their keys/secrets per abstracted system, under a firm principle: raw keys stay server-side — a broker issues scoped, short-lived credentials and the tools never hold the raw third-party key.
Today this is ad-hoc and operator-only: secrets live in Azure Key Vault akv01-agentarmy, resolved at runtime by akv: reference (e.g. akv:GithubPAT). That works for a single operator but offers no per-user / per-system onboarding, scoping, rotation, or audit as a first-class surface.
Threat-model note (current vs. horizon): the fleet today is single-operator and trusted (private, no untrusted collaborators). Hard multi-tenant isolation between untrusted users is a design horizon, not today's threat — which argues for adopting fast now with a documented upgrade path, rather than paying heavy multi-tenant ops cost prematurely.
Decision Drivers¶
- Keys stay server-side — the broker issues scoped/short-lived credentials; tools proxy and never hold raw keys (extends the pattern the abstraction MCP already uses).
- Fast to adopt — the capability/tool cadence is aggressive ([[serve-capabilities-via-mcp]]); the credential surface must not be a multi-day yak-shave.
- Managed + supported + indemnified — a vendor-run, SLA-backed service with Microsoft IP indemnification is preferred over self-hosting OSS and owning its operational + legal risk; it must fit the existing Azure footprint (Entra, ACA, Key Vault) and be evolvable there. (Self-hostable OSS was the initial lean; managed Azure won — see Decision Outcome.)
- Onboarding UX — a real surface where a user registers a key for a system.
- Don't hand-roll security — prefer a battle-tested store over custom crypto/rotation/audit code (secure-by-default).
- Multi-tenant isolation — a driver, but weighted for the horizon, not the solo/trusted present.
Considered Options¶
Option A — Infisical Community Edition (MIT) (initially selected, then revised — see Decision Outcome)¶
Self-hosted secrets platform: single Docker stack (Postgres + Redis + app), first-class Python SDK (infisicalsdk 1.x), REST API ideal for a thin registration endpoint, Org→Project→Environment→Path hierarchy that maps to per-user/per-system scoping, RBAC + audit in CE, Azure auth method.
- Pros: fastest to a working broker (an afternoon); MIT core; clean fit with the fleet's Docker/ACA pattern; the onboarding surface is a thin wrapper over its API so users never touch the vault; active project (weekly releases).
- Cons: tenant isolation is RBAC/path-based, not cryptographic-namespace; dynamic-secret engines are fewer than Vault/OpenBao (third-party API-key rotation is partly hand-coded); SSO (SAML/SCIM) is Enterprise-tier.
Option B — OpenBao v2.5.4+ (MPL 2.0, Linux Foundation)¶
The truly-OSS fork of Vault. Hard multi-tenant namespaces, 50+ dynamic-secret engines, leased credentials with automatic TTL/revocation, mature audit, JWT/OIDC auth (exchange an Entra token for a scoped OpenBao token), Azure Key Vault as backing KMS.
- Pros: the "do it right" multi-tenant answer; cryptographic namespace isolation; dynamic secrets + leasing out of the box; no brokering logic to write; clean license (no Vault BSL trap).
- Cons: operational weight — HA Raft clustering, an unseal ceremony, namespace administration (~1–2 days setup + runbooks); hvac is Vault-branded (works, but won't track OpenBao-specific features); must run ≥2.5.4 to avoid the May-2026 cross-namespace CVEs.
Option C — Azure Key Vault + a thin broker ← chosen¶
Store per-user keys in Azure Key Vault (cred-{user}-{system} naming); a thin backend-core broker authenticates users and injects the resolved secret server-side. Reuses the box's DefaultAzureCredential + AZURE_KEYVAULT_URL (the akv: resolver machinery).
- Pros: zero new infra / no container / no bootstrap; native to the existing Azure + Entra footprint; fully managed, SLA-backed, with Microsoft IP indemnification — lean on MS rather than own self-host ops + OSS legal exposure; reuses the existing azure-* deps + auth so no new dependency and no extra runtime creds; clear evolution path on Azure (dedicated per-tenant vaults → RBAC → Managed HSM).
- Cons: flat namespace + one shared vault (per-user keys are name-prefixed, co-mingled with operator secrets), no per-user RBAC boundary, no dynamic-secret leasing. The earlier "you own all the broker code" worry proved minor — the broker was built backend-agnostic, so only store.py is KV-specific. Isolation is the real residual, accepted for the solo/trusted stage.
Ruled out¶
- HashiCorp Vault — BSL 1.1 since 2023 (IBM-owned); the "no competing product" clause is a legal trap if credential-brokering ever becomes a sold feature. OpenBao removes this.
- Doppler — SaaS-only, no self-host, not a multi-tenant BYO-keys broker.
- SOPS+age, Bitwarden/Vaultwarden — static-secret stores, no runtime lease/scoped-credential issuance.
- Teleport/Pomerium — infra-access certs / identity proxy, not third-party key brokering.
Decision Outcome¶
Chosen: Option C — Azure Key Vault + a thin broker (revised 2026-05-29). Option A (Infisical CE) was selected first via the HITL selector, but its self-host reality — a Postgres+Redis+app container plus an admin/project/machine-identity/client-secret bootstrap, compounded by a flaky local Docker engine — was real operational friction. We switched to Azure Key Vault: the broker API and the "raw keys stay server-side, inject server-side" contract are identical (only the storage backend changed), and it reuses the box's existing DefaultAzureCredential + AZURE_KEYVAULT_URL (the akv: resolver machinery) — no container, no new dependency, no extra credentials. Per-user keys are name-prefixed secrets (cred-{user}-{system}) in akv01-agentarmy.
Option C's original caveat ("you build all the broker code") proved minor here because the broker was already built backend-agnostic; only store.py swapped. We lean on Microsoft for the hard part — a managed, SLA-backed, IP-indemnified secrets store — rather than owning self-host operations + OSS legal/operational risk, and we can evolve on Azure (dedicated per-tenant vaults → RBAC → Managed HSM). Tradeoffs accepted for the solo/trusted stage ([[threat-model-no-forks]]): flat namespace + one shared vault (co-mingled with operator secrets), no per-user RBAC boundary. Documented upgrade path: a dedicated per-tenant vault, then OpenBao (Option B) for cryptographic namespace isolation only if untrusted multi-tenancy demands it. This honors velocity + "don't over-engineer for the horizon" + "don't hand-roll security."
Consequences¶
- Positive: a real BYO-keys surface; tools receive scoped, short-lived credentials and never hold raw third-party keys; rotation + audit become first-class; the abstraction tools can finally call real producers per-user, not just mocks.
- Negative / cost: a new platform-tier service to run (Infisical: Postgres + Redis + app); isolation is RBAC/path-based until/unless we move to OpenBao; a backend-core broker layer to build (thin) + an onboarding endpoint.
Implementation sketch (as built)¶
- No store to deploy — reuse Azure Key Vault (
akv01-agentarmy) viaDefaultAzureCredential+AZURE_KEYVAULT_URL, the same machinery asapp/secrets.py'sakv:resolver. - backend-core
/api/v1/credentials/*(app/credentials/): a user registers a{system}key (auth viarequire_roles); it is stored as the KV secretcred-{user}-{system}; the raw key never returns to the client.GETlists registered system names (never values);DELETEremoves. store.resolve(user, system)is server-side only — backend-core injects the secret into the outbound call (option A). There is no endpoint that returns a raw secret.- Audit every register/delete (
audit.emit); rotation = a new KV secret version. Live-verified end-to-end againstakv01-agentarmy(put/list/resolve/delete). - The store is backend-agnostic — swapping to a dedicated vault or OpenBao later changes only
store.py. Dev input surface (a small console UI //_devhelper) is a follow-up as we iterate.
Worked example — BYO graph backend (Neo4j / Aura)¶
The broker generalizes past SaaS APIs to infrastructure a user brings. The hub default graph DB is ArcadeDB (Platform tier, openCypher) — but a user/spoke may prefer their own Neo4j or a managed Neo4j Aura. That is a brokered system like any other, not a fork of the hub: the hub never runs Neo4j itself, so there is no split-brain with ArcadeDB.
- Register:
cred-{user}-neo4j(aNEO4J_PASSWORDfor a self-hosted/external instance) orcred-{user}-aura(aNEO4J_AURA_CLIENT_SECRET) — raw value stays server-side per the standard contract. - Proxy tools: the Docker MCP Toolkit Neo4j connectors (
neo4j,neo4j-cypher,neo4j-memory,neo4j-cloud-aura-api) are enabled opt-in, off by default in a dedicatedagentarmy-byo-neo4jprofile. Two credential paths: the brokered path (this ADR) keeps the raw key server-side incred-{user}-neo4jand resolves it via backend-core; the local-dev path lets a solo operator set the secret into Docker's local keychain directly. Bridging the broker into a Docker MCP secrets engine is follow-up work. Wire-up + tiering + both flows: tools/docker-mcp-gordon-wireup.md. - Design-time bridge stays hub-side:
neo4j-data-modeling(no DB, no secret) projects a property-graph model to OWL Turtle / Pydantic / Cypher ingest that targets ArcadeDB or a BYO Neo4j alike — feeds the ontology pipeline (ARC-ADR-030, ARC-ADR-032). - Guardrail: never run a hub-side Neo4j alongside ArcadeDB for the same data in the same layer; BYO external/Aura, scoped by broker creds, is the intended pattern.