Cloud Serving Landscape¶

When you fork AgentArmy and build a spoke (UI layer, API layer, worker, infra layer), the first question is: which cloud, which database, which LLM? This guide makes that choice fast with opinionated defaults, hard limits, and a routing table for the right agent.

TL;DR — Opinionated Starter Stacks¶

Pick one and go. You can always migrate later; the LLM abstraction principle (below) keeps your options open.

Stack	Components	Best for
Zero-ops	GitHub Pages + Vercel Functions + Neon Postgres + Claude API	Solo dev, zero infra management, pay-as-you-go
GCP	Cloud Run + Cloud SQL + Vertex AI (Gemini)	Google-ecosystem teams, per-request billing, strong IAM
Azure	Container Apps + Azure SQL + Azure OpenAI	Enterprise M365/Azure shops, existing EA agreements
AWS	App Runner/Fargate + RDS Aurora + AWS Bedrock	AWS-native teams, compliance-sensitive workloads
GitHub-native	GitHub Pages + GitHub Actions compute	Fully GitHub, no external accounts, documentation-first

Default recommendation for new spokes: Start with the Zero-ops stack. Vercel + Neon costs nothing at small scale, deploys in minutes, and you can move compute to Cloud Run or ECS later with a container swap.

GitHub Pages: Capabilities and Limits¶

GitHub Pages is already wired up in this template (MkDocs auto-deployed via deploy-docs.yml). Here is what it can and cannot do:

What it does well¶

Serves static files: HTML, CSS, JS, images, JSON, PDFs
Supports Jekyll natively or any SSG (Hugo, Astro, MkDocs, Eleventy) via GitHub Actions
Custom domains with automatic HTTPS via Let's Encrypt
Free on public repos; included in GitHub plans for private repos
Ideal for: documentation sites, project portals, OpenAPI spec browsers, marketing pages

Hard limits¶

Limit	Value
Repository size	1 GB
Site size	1 GB
Bandwidth	100 GB/month (soft limit — GitHub may throttle, not block)
Build timeout	10 minutes
Deploys per hour	10

What it cannot do¶

No server-side execution — no Node.js, Python, PHP, or any runtime at request time
No API endpoints — any dynamic behavior must come from an external service called via client-side JS
No server-side secrets — anything in a Pages site is public; never embed API keys
No auth at the edge — anyone can access any URL; use a separate identity provider + SPA auth
No databases — all data access must go through a client-side API call to an external backend

Rule of thumb: If a request needs to read a database or call an LLM, it does not belong on GitHub Pages — it belongs in a Vercel Function, Cloud Run service, or similar compute layer.

Decision Matrix: Static / Frontend Hosting¶

Provider	Free tier	Custom domain	Build included	Server-side	Notes
GitHub Pages	Yes (public repos)	Yes (CNAME)	Via Actions	No	Best for docs and project portals
Vercel	Yes (Hobby)	Yes	Yes (Vercel CI)	Yes (Functions)	Best for full-stack Next.js/SvelteKit
Netlify	Yes (Starter)	Yes	Yes	Yes (Functions)	JAMstack, form handling, identity
Azure SWA	Yes (Free tier)	Yes	Yes	Yes (Azure Functions)	Best within existing Azure EA
Cloudflare Pages	Yes	Yes	Yes	Yes (Workers)	Global edge, Workers KV, R2

When to choose Vercel over GitHub Pages for a spoke frontend: any time the spoke needs server-side rendering, API routes, or LLM streaming — even if it's "mostly static."

Decision Matrix: Compute / API Hosting¶

Provider	Model	Cold start	Max duration	Best for
Vercel Functions	Serverless + Edge	~50ms (Edge), ~300ms (Node)	300s (Pro), 30s (Edge)	Full-stack apps co-located with frontend
Google Cloud Run	Container, autoscale-to-zero	~500ms	3600s	Any container, long-running tasks, GCP IAM
Azure Container Apps	Container, KEDA autoscale	~1s	Unlimited	Azure ecosystem, Dapr sidecar, KEDA scaling
AWS App Runner	Container, managed	~1s	Unlimited	AWS-native, simpler than Fargate
AWS Fargate	Container, task-based	~30s	Unlimited	AWS-native, fine-grained task control
Railway	Container, always-on option	Minimal	Unlimited	Solo dev, zero-config Dockerfile deploys
Fly.io	Container, anycast	Minimal	Unlimited	Multi-region, persistent volumes, ops-aware teams

Recommendation: Cloud Run is the most flexible managed option — any container, true scale-to-zero, per-request billing. Vercel Functions wins when the spoke is a Next.js / SvelteKit app. Railway and Fly.io are great for teams that want Git-push deploys without a full cloud account setup.

Decision Matrix: Managed Databases¶

Service	Engine	Free tier	Serverless	Best for
Supabase	Postgres	Yes (500 MB)	Yes (pause on inactivity)	Full-stack: Auth + Storage + Realtime + Postgres in one
Neon	Postgres	Yes (0.5 CU)	Yes	Vercel + Neon canonical pair; branch-per-PR databases
PlanetScale	MySQL (Vitess)	No (free tier paused)	Yes	High-scale MySQL, schema branching, no foreign keys
Firestore	NoSQL document	Yes (Spark plan)	Yes	GCP-native, mobile/serverless, event-driven
DynamoDB	NoSQL KV + doc	Free tier	On-demand billing	AWS-native, massive scale, single-digit ms latency
Upstash Redis	Redis	Yes (10k cmds/day)	Yes	Cache, sessions, rate limiting alongside a primary DB
Turso (libSQL)	SQLite (distributed)	Yes	Yes	Edge databases, extremely low-latency reads globally

Recommendation for most spokes: Neon (with Vercel) or Supabase (standalone). Both are Postgres, serverless, and have generous free tiers. Add Upstash Redis for session storage or rate limiting if needed.

Decision Matrix: LLM Providers¶

AgentArmy uses Claude for orchestration. Spoke applications may use any provider — abstract the client so you can swap without code changes.

Provider	Models	Pricing style	Strengths	Best for
Anthropic (Claude API)	Haiku, Sonnet, Opus	Per token (input/output)	Reasoning, code, long context	Default for AgentArmy-built agents
OpenAI	GPT-4o, GPT-4o-mini	Per token	Ecosystem breadth, vision, function calling	Existing OpenAI integrations
Vertex AI (GCP)	Gemini, Claude via Vertex	Per token	Audit logging, VPC Service Controls, GCP IAM	GCP-native compliance workloads
AWS Bedrock	Claude, Llama, Titan	Per token	PrivateLink, SCPs, AWS compliance	AWS-native, HIPAA/FedRAMP workloads
Ollama (self-hosted)	Llama, Mistral, Qwen, Phi	Compute cost only	Air-gapped, cost ceiling, no data egress	Privacy-sensitive, on-prem, cost-capped
Groq / Together.ai	Llama, Mixtral, Gemma	Per token (low cost)	Very high throughput, low latency	Budget inference, high-volume PoCs

Cost guidance: Haiku-class models (Claude Haiku, GPT-4o-mini) are 10–20× cheaper than flagship models and handle most classification, extraction, and light reasoning tasks. Reserve Sonnet/Opus/GPT-4o for tasks that measurably need them.

The LLM Abstraction Principle¶

The AgentArmy roadmap (Play 5) says: consume, abstract provider, avoid lock-in, stay multi-vendor.

In practice this means:

# BAD — hardcoded provider
from anthropic import Anthropic
client = Anthropic()

# GOOD — parameterized via env var + abstraction layer
import litellm  # or Vercel AI SDK, LangChain, custom wrapper
response = litellm.completion(
    model=os.environ["LLM_MODEL"],  # e.g. "claude-3-5-sonnet-latest" or "gpt-4o"
    messages=[...]
)

Options for the abstraction layer: - Vercel AI SDK (ai npm) — provider-agnostic, built for streaming, ideal for Vercel spokes - LiteLLM (Python) — unified OpenAI-compatible API across 100+ providers - LangChain — broader orchestration, more abstraction overhead - Custom env-var wrapper — for simple spokes that call one model at a time

Set LLM_PROVIDER and LLM_MODEL as environment variables per spoke environment. The model changes in config, not code.

Which Agent to Use¶

Scenario	Agent
GCP infrastructure (Cloud Run, Cloud SQL, GKE, IAM, Vertex AI, Cloud Build)	`gcp-infra-engineer`
AWS infrastructure (Fargate, RDS, Bedrock, EKS, CDK/CloudFormation, IAM/SCP)	`aws-infra-engineer`
Azure infrastructure (Container Apps, Bicep, Entra ID, Azure OpenAI)	`azure-infra-engineer`
Vercel platform (Functions, Postgres/KV/Blob, edge middleware, monorepo, AI SDK)	`vercel-engineer`
Multi-cloud strategy, cloud provider selection, landing zone design	`cloud-architect`
LLM system design, RAG, multi-model orchestration, inference serving	`llm-architect`
Cloud cost governance, unit economics, RI/Savings Plan commitments	`finops-engineer`
IaC module design, Terraform state management, Terragrunt orchestration	`terraform-engineer`

Coming Soon (Backlog)¶

These features are tracked as GitHub Issues on the project board:

IaC starter templates — Terraform modules for Cloud Run + Cloud SQL, Fargate + RDS, Container Apps
Spoke stack builder — a CLI/Actions workflow that scaffolds a new spoke with provider-specific IaC and pipelines
CI/CD workflow templates — reusable deploy-gcp.yml, deploy-aws.yml, deploy-vercel.yml for spoke repos
LLM provider abstraction layer — shared wrapper library for RT2 Play 5
Cloud-native CI/CD research spike — GCP Cloud Build/Deploy vs AWS CodePipeline vs GitHub Actions evaluation

Azure Dev Container Lane¶

For Azure-first spoke development, use Azure Container Apps Dev Deploy. It pairs a trusted local PC or self-hosted runner with Azure Container Registry and Azure Container Apps Dev, while keeping production promotion as a separate environment-gated workflow.

For multi-cloud lifecycle routing, use Lifecycle Promotion Management. It defines the local Docker gate and target-adapter pattern that can route a spoke to Azure Container Apps, GCP Cloud Run, Vertex AI Agent Engine, or future runtime targets.