Skip to content

ARC-ADR-037 — BYO-Credentials: a Secrets Broker for Abstracted Systems

Field Value
ID ARC-ADR-037
Status Accepted
Date 2026-05-29
Deciders Hub owner (Nicky Clarke) — accepted 2026-05-29 via HITL selector (chose Option A, Infisical CE) → revised same day to Option C (Azure Key Vault) on self-host friction
Supersedes — (enables ARC-ADR-036 at multi-system / multi-user scale)
Tags secrets, credentials, byo-keys, secrets-broker, anti-corruption-layer, security, multi-tenant, openbao, infisical, azure-key-vault

Context and Problem Statement

Abstracting a real third-party system (GitHub, Jira, Linear, Stripe, …) implies the user wants it abstracted and has granted the appropriate access. Our MCP tools — generated against the canonical APIs (ARC-ADR-036) and proxied through backend-core — can only call those real producers with that system's credentials. So the platform needs a place for users to register their keys/secrets per abstracted system, under a firm principle: raw keys stay server-side — a broker issues scoped, short-lived credentials and the tools never hold the raw third-party key.

Today this is ad-hoc and operator-only: secrets live in Azure Key Vault akv01-agentarmy, resolved at runtime by akv: reference (e.g. akv:GithubPAT). That works for a single operator but offers no per-user / per-system onboarding, scoping, rotation, or audit as a first-class surface.

Threat-model note (current vs. horizon): the fleet today is single-operator and trusted (private, no untrusted collaborators). Hard multi-tenant isolation between untrusted users is a design horizon, not today's threat — which argues for adopting fast now with a documented upgrade path, rather than paying heavy multi-tenant ops cost prematurely.

Decision Drivers

  • Keys stay server-side — the broker issues scoped/short-lived credentials; tools proxy and never hold raw keys (extends the pattern the abstraction MCP already uses).
  • Fast to adopt — the capability/tool cadence is aggressive ([[serve-capabilities-via-mcp]]); the credential surface must not be a multi-day yak-shave.
  • Managed + supported + indemnified — a vendor-run, SLA-backed service with Microsoft IP indemnification is preferred over self-hosting OSS and owning its operational + legal risk; it must fit the existing Azure footprint (Entra, ACA, Key Vault) and be evolvable there. (Self-hostable OSS was the initial lean; managed Azure won — see Decision Outcome.)
  • Onboarding UX — a real surface where a user registers a key for a system.
  • Don't hand-roll security — prefer a battle-tested store over custom crypto/rotation/audit code (secure-by-default).
  • Multi-tenant isolation — a driver, but weighted for the horizon, not the solo/trusted present.

Considered Options

Option A — Infisical Community Edition (MIT) (initially selected, then revised — see Decision Outcome)

Self-hosted secrets platform: single Docker stack (Postgres + Redis + app), first-class Python SDK (infisicalsdk 1.x), REST API ideal for a thin registration endpoint, Org→Project→Environment→Path hierarchy that maps to per-user/per-system scoping, RBAC + audit in CE, Azure auth method. - Pros: fastest to a working broker (an afternoon); MIT core; clean fit with the fleet's Docker/ACA pattern; the onboarding surface is a thin wrapper over its API so users never touch the vault; active project (weekly releases). - Cons: tenant isolation is RBAC/path-based, not cryptographic-namespace; dynamic-secret engines are fewer than Vault/OpenBao (third-party API-key rotation is partly hand-coded); SSO (SAML/SCIM) is Enterprise-tier.

Option B — OpenBao v2.5.4+ (MPL 2.0, Linux Foundation)

The truly-OSS fork of Vault. Hard multi-tenant namespaces, 50+ dynamic-secret engines, leased credentials with automatic TTL/revocation, mature audit, JWT/OIDC auth (exchange an Entra token for a scoped OpenBao token), Azure Key Vault as backing KMS. - Pros: the "do it right" multi-tenant answer; cryptographic namespace isolation; dynamic secrets + leasing out of the box; no brokering logic to write; clean license (no Vault BSL trap). - Cons: operational weight — HA Raft clustering, an unseal ceremony, namespace administration (~1–2 days setup + runbooks); hvac is Vault-branded (works, but won't track OpenBao-specific features); must run ≥2.5.4 to avoid the May-2026 cross-namespace CVEs.

Option C — Azure Key Vault + a thin broker ← chosen

Store per-user keys in Azure Key Vault (cred-{user}-{system} naming); a thin backend-core broker authenticates users and injects the resolved secret server-side. Reuses the box's DefaultAzureCredential + AZURE_KEYVAULT_URL (the akv: resolver machinery). - Pros: zero new infra / no container / no bootstrap; native to the existing Azure + Entra footprint; fully managed, SLA-backed, with Microsoft IP indemnification — lean on MS rather than own self-host ops + OSS legal exposure; reuses the existing azure-* deps + auth so no new dependency and no extra runtime creds; clear evolution path on Azure (dedicated per-tenant vaults → RBAC → Managed HSM). - Cons: flat namespace + one shared vault (per-user keys are name-prefixed, co-mingled with operator secrets), no per-user RBAC boundary, no dynamic-secret leasing. The earlier "you own all the broker code" worry proved minor — the broker was built backend-agnostic, so only store.py is KV-specific. Isolation is the real residual, accepted for the solo/trusted stage.

Ruled out

  • HashiCorp Vault — BSL 1.1 since 2023 (IBM-owned); the "no competing product" clause is a legal trap if credential-brokering ever becomes a sold feature. OpenBao removes this.
  • Doppler — SaaS-only, no self-host, not a multi-tenant BYO-keys broker.
  • SOPS+age, Bitwarden/Vaultwarden — static-secret stores, no runtime lease/scoped-credential issuance.
  • Teleport/Pomerium — infra-access certs / identity proxy, not third-party key brokering.

Decision Outcome

Chosen: Option C — Azure Key Vault + a thin broker (revised 2026-05-29). Option A (Infisical CE) was selected first via the HITL selector, but its self-host reality — a Postgres+Redis+app container plus an admin/project/machine-identity/client-secret bootstrap, compounded by a flaky local Docker engine — was real operational friction. We switched to Azure Key Vault: the broker API and the "raw keys stay server-side, inject server-side" contract are identical (only the storage backend changed), and it reuses the box's existing DefaultAzureCredential + AZURE_KEYVAULT_URL (the akv: resolver machinery) — no container, no new dependency, no extra credentials. Per-user keys are name-prefixed secrets (cred-{user}-{system}) in akv01-agentarmy.

Option C's original caveat ("you build all the broker code") proved minor here because the broker was already built backend-agnostic; only store.py swapped. We lean on Microsoft for the hard part — a managed, SLA-backed, IP-indemnified secrets store — rather than owning self-host operations + OSS legal/operational risk, and we can evolve on Azure (dedicated per-tenant vaults → RBAC → Managed HSM). Tradeoffs accepted for the solo/trusted stage ([[threat-model-no-forks]]): flat namespace + one shared vault (co-mingled with operator secrets), no per-user RBAC boundary. Documented upgrade path: a dedicated per-tenant vault, then OpenBao (Option B) for cryptographic namespace isolation only if untrusted multi-tenancy demands it. This honors velocity + "don't over-engineer for the horizon" + "don't hand-roll security."

Consequences

  • Positive: a real BYO-keys surface; tools receive scoped, short-lived credentials and never hold raw third-party keys; rotation + audit become first-class; the abstraction tools can finally call real producers per-user, not just mocks.
  • Negative / cost: a new platform-tier service to run (Infisical: Postgres + Redis + app); isolation is RBAC/path-based until/unless we move to OpenBao; a backend-core broker layer to build (thin) + an onboarding endpoint.

Implementation sketch (as built)

  1. No store to deploy — reuse Azure Key Vault (akv01-agentarmy) via DefaultAzureCredential + AZURE_KEYVAULT_URL, the same machinery as app/secrets.py's akv: resolver.
  2. backend-core /api/v1/credentials/* (app/credentials/): a user registers a {system} key (auth via require_roles); it is stored as the KV secret cred-{user}-{system}; the raw key never returns to the client. GET lists registered system names (never values); DELETE removes.
  3. store.resolve(user, system) is server-side only — backend-core injects the secret into the outbound call (option A). There is no endpoint that returns a raw secret.
  4. Audit every register/delete (audit.emit); rotation = a new KV secret version. Live-verified end-to-end against akv01-agentarmy (put/list/resolve/delete).
  5. The store is backend-agnostic — swapping to a dedicated vault or OpenBao later changes only store.py. Dev input surface (a small console UI / /_dev helper) is a follow-up as we iterate.

Worked example — BYO graph backend (Neo4j / Aura)

The broker generalizes past SaaS APIs to infrastructure a user brings. The hub default graph DB is ArcadeDB (Platform tier, openCypher) — but a user/spoke may prefer their own Neo4j or a managed Neo4j Aura. That is a brokered system like any other, not a fork of the hub: the hub never runs Neo4j itself, so there is no split-brain with ArcadeDB.

  • Register: cred-{user}-neo4j (a NEO4J_PASSWORD for a self-hosted/external instance) or cred-{user}-aura (a NEO4J_AURA_CLIENT_SECRET) — raw value stays server-side per the standard contract.
  • Proxy tools: the Docker MCP Toolkit Neo4j connectors (neo4j, neo4j-cypher, neo4j-memory, neo4j-cloud-aura-api) are enabled opt-in, off by default in a dedicated agentarmy-byo-neo4j profile. Two credential paths: the brokered path (this ADR) keeps the raw key server-side in cred-{user}-neo4j and resolves it via backend-core; the local-dev path lets a solo operator set the secret into Docker's local keychain directly. Bridging the broker into a Docker MCP secrets engine is follow-up work. Wire-up + tiering + both flows: tools/docker-mcp-gordon-wireup.md.
  • Design-time bridge stays hub-side: neo4j-data-modeling (no DB, no secret) projects a property-graph model to OWL Turtle / Pydantic / Cypher ingest that targets ArcadeDB or a BYO Neo4j alike — feeds the ontology pipeline (ARC-ADR-030, ARC-ADR-032).
  • Guardrail: never run a hub-side Neo4j alongside ArcadeDB for the same data in the same layer; BYO external/Aura, scoped by broker creds, is the intended pattern.