KYAS turns Indonesian cultural subjects into finished, fully-sourced articles for PDBI — automatically. A chain of specialised stages assembles each article, and any stage can reject it, so only verified work is kept.
| Component | Status | Notes |
|---|---|---|
| Production chain (sources → vault) | ✓ running | The full chain runs end to end in staging. |
| Evidence & Grounding | ✓ AI, live | Both call AI (gpt-4o-mini) — part of the 18 calls/article. |
| Draft writing | ✓ AI, live | gpt-5.4-mini for prose, gpt-4o-mini for the plan. |
| Stability | ✓ hardened | Recovery sweepers + queue fixes; ~70 changes in 8 days. |
| Sourcing | ⚠ Wikipedia only | Free & reliable today; broader coverage needs paid search. |
| Publish to PDBI | ⏸ gated | Dry-run, awaiting canary approval before anything goes public. |
Ten-plus components and external paid, rate-limited services, with real spend at two points — AI generation and paid search (publishing is just an internal write). It is genuinely complex to operate — and because money flows through it, it carries real operational risk.
Each stage transforms the work and can reject it. Gold = AI step, clay = quality gate; the small labels are the artifact passed to the next stage. Only a draft that clears every gate is sealed — and only then is it ready to publish.
Three stages call AI — Evidence (+4), Draft (+10), Grounding (+4) — totalling 18 calls ≈ $0.04 by the time the article is sealed and published.
Writing is the easy half. The hard half is feeding the line enough good sources: every claim must be backed by a real, quotable source, and grounding rejects the draft if it isn't. At scale, finding and fetching those sources — cheaply — is where the system starves.
To reach 1,000,000 you fund both: ~$40k of AI credits (≈$6k/mo) and the infrastructure + search to feed it. Today we run on ~$150/mo of manual top-ups — about 40× under the AI line alone.
| Provider | ~ / 1k | How it helps |
|---|---|---|
| Brave Search API | $3–5 | Independent, fresh index — broad coverage. |
| Google (SerpAPI / CSE) | $5–10 | Best relevance for niche Indonesian subjects. |
| Tavily | $5–8 | Returns clean content — skips the slow fetch step. |
| Exa (semantic) | $3–5 | Finds sources that actually support a claim → better grounding. |
~10–30 searches per accepted article → at 1M scale, ~$40k–$150k in search alone. SearXNG is free, but stale & blocked.
| Item | ~ / month |
|---|---|
| Worker compute · 2–4 vCPU / 4–8 GB | $40–80 / worker |
| PostgreSQL (+ read replica) | $150–400 |
| SeaweedFS / object storage | $30–80 |
| NATS + bandwidth / egress | $40–80 |
~4 workers ≈ $500/mo · ~10–12 workers ≈ $1.2k/mo — before search.
Infrastructure is the predictable cost (~$0.5–1.2k/mo). Paid search is the wildcard — at full scale it can match or exceed the entire AI bill, which is exactly why staying on free Wikipedia and widening coverage cheaply matters so much.
Past a point, pushing volume harder yields repetitive or weakly-supported articles. The cheapest real lever is improving source coverage & draft acceptance — not simply spending more on generation.
All figures are measured from the KYAS staging environment, June 2026 — not estimates. Unit cost comes from logged + actual OpenRouter spend; the funnel and acceptance rates from a rolling 7-day window; throughput from daily vault output; queue depth from live database counts. Per-article cost ($0.04) uses real provider charges, ~1.8× the internally-logged figure. The ~18 AI calls per accepted article span evidence extraction, drafting, and grounding. Window to year-end: 197 days from 17 Jun 2026; the 5,076/day figure is 1,000,000 ÷ 197. Infrastructure and paid-search figures are planning estimates, clearly marked as such. Publishing remains gated pending canary approval — no machine-written article has been released to PDBI.