Internal review

KYAS

An automated factory that writes Indonesian cultural articles for PDBI — status, architecture, and the road to scale.

Prepared by Rizky Azmi Swandy Basis Measured staging data · Jun 2026

↑↓ navigate ↓

01What it is

A production line for cultural articles

KYAS turns Indonesian cultural subjects into finished, fully-sourced articles for PDBI — automatically. A chain of specialised stages assembles each article, and any stage can reject it, so only verified work is kept.

Input

58k subjects

Indonesian cultural entries in PDBI waiting to be written.

Engine

11 staged factories

Find sources → extract evidence → draft → verify → seal.

Output

Vault articles

Immutable, fully-cited, certified records.

Goal

1,000,000

The target volume for the public database.

02Where we are

The factory is built and running

Component	Status	Notes
Production chain (sources → vault)	✓ running	The full chain runs end to end in staging.
Evidence & Grounding	✓ AI, live	Both call AI (gpt-4o-mini) — part of the 18 calls/article.
Draft writing	✓ AI, live	gpt-5.4-mini for prose, gpt-4o-mini for the plan.
Stability	✓ hardened	Recovery sweepers + queue fixes; ~70 changes in 8 days.
Sourcing	⚠ Wikipedia only	Free & reliable today; broader coverage needs paid search.
Publish to PDBI	⏸ gated	Dry-run, awaiting canary approval before anything goes public.

03Architecture

A real distributed system, with money in the loop

Ten-plus components and external paid, rate-limited services, with real spend at two points — AI generation and paid search (publishing is just an internal write). It is genuinely complex to operate — and because money flows through it, it carries real operational risk.

04Pipeline detail · source to accepted draft

From a found source to a publish-ready draft

Each stage transforms the work and can reject it. Gold = AI step, clay = quality gate; the small labels are the artifact passed to the next stage. Only a draft that clears every gate is sealed — and only then is it ready to publish.

05The flow · one article, end to end

Watch the 18 AI calls add up

Scroll here to run one article from topic to publish.

Stage— AI calls0 / 18 AI cost · this article$0.000 Articles done0

Three stages call AI — Evidence (+4), Draft (+10), Grounding (+4) — totalling 18 calls ≈ $0.04 by the time the article is sealed and published.

06The hard part

The bottleneck is sourcing & grounding

Writing is the easy half. The hard half is feeding the line enough good sources: every claim must be backed by a real, quotable source, and grounding rejects the draft if it isn't. At scale, finding and fetching those sources — cheaply — is where the system starves.

Input need

4+ sources / article

Several quality URLs across 2+ domains before drafting can start.

Grounding

No support → reject

If a claim isn't found in a source, the paid-for draft is discarded.

Paid search

Costly at scale

Commercial search APIs charge per query — millions of articles, big bill.

Free search

SearXNG = poor

Stale, low-quality results; CAPTCHA-blocked on datacenter IPs.

Fetch

Slow & serial

Polite per-domain fetching is I/O-bound and hard to parallelize.

Coverage

Thin for niche

Many subjects have few pages that truly support claims.

07Cost · the reality check

Reaching the target costs on two fronts

$0.04

measured AI cost per accepted article

18

AI calls per article (evidence + draft + grounding)

$40,000

AI credits to reach 1,000,000

AI credits~$6,000 / month

draft 70% · evidence 28% · grounding 2%

Infrastructure + sourcing~$1–4k / month + paid-search risk

worker fleet · DB · storage

paid search at scale (can rival the AI bill)

The budget, plainly

To reach 1,000,000 you fund both: ~$40k of AI credits (≈$6k/mo) and the infrastructure + search to feed it. Today we run on ~$150/mo of manual top-ups — about 40× under the AI line alone.

08Cost in detail · search + infrastructure

Two cost engines: search providers and the worker fleet

Paid search — to widen coverage beyond Wikipedia

Provider	~ / 1k	How it helps
Brave Search API	$3–5	Independent, fresh index — broad coverage.
Google (SerpAPI / CSE)	$5–10	Best relevance for niche Indonesian subjects.
Tavily	$5–8	Returns clean content — skips the slow fetch step.
Exa (semantic)	$3–5	Finds sources that actually support a claim → better grounding.

~10–30 searches per accepted article → at 1M scale, ~$40k–$150k in search alone. SearXNG is free, but stale & blocked.

Infrastructure — mostly per worker

Item	~ / month
Worker compute · 2–4 vCPU / 4–8 GB	$40–80 / worker
PostgreSQL (+ read replica)	$150–400
SeaweedFS / object storage	$30–80
NATS + bandwidth / egress	$40–80

~4 workers ≈ $500/mo · ~10–12 workers ≈ $1.2k/mo — before search.

The complexity, in one line

Infrastructure is the predictable cost (~$0.5–1.2k/mo). Paid search is the wildcard — at full scale it can match or exceed the entire AI bill, which is exactly why staying on free Wikipedia and widening coverage cheaply matters so much.

09Throughput · output per day

What each level of investment produces

Today · 1 worker, credit-capped · measured~64 / day

~13k by year-end · 1M in ~43 years

Funded · ~4 workers~2,250 / day

~440k by year-end · 1M in ~15 months

Full-scale · ~10–12 workers, multi-key~5,100 / day

~1.0M by year-end · 1M in ~6.5 months

~80× faster than todayto reach 1M by year-end · peak so far 134/day · target 5,076/day

10Risks to flag

Two limits that get worse at scale

Risk 01 · variety

AI output saturates

Asking the AI for fresh angles and prose across hundreds of thousands of subjects, responses start to converge and repeat. More output is not automatically more distinct output.

Risk 02 · evidence

Claims lack support

For many subjects, few websites truly back the claims; grounding then fails. Raw volume is capped by how much trustworthy source material exists — not by how fast we write.

Implication

Past a point, pushing volume harder yields repetitive or weakly-supported articles. The cheapest real lever is improving source coverage & draft acceptance — not simply spending more on generation.

11Realistic expectations

What we can commit to — and the path

Status quo

12–25k

Current funding & one worker, run in bursts.

Funded + ~4 workers

~440k

The realistic, fundable target for this year.

Full mandate

1,000,000

Needs funding + ~80× throughput + better source coverage.

Fund AI + sourcing properly
~$6k/mo AI on a funded account, plus a search/fetch budget — the prerequisite for continuous running.
Scale workers horizontally
More parallel workers & keys to lift throughput toward the target rate.
Raise source coverage & acceptance
Cut the grounding-reject rate and widen sources — lowers cost and lifts every tier at once.

12Provenance & basis

All figures are measured from the KYAS staging environment, June 2026 — not estimates. Unit cost comes from logged + actual OpenRouter spend; the funnel and acceptance rates from a rolling 7-day window; throughput from daily vault output; queue depth from live database counts. Per-article cost ($0.04) uses real provider charges, ~1.8× the internally-logged figure. The ~18 AI calls per accepted article span evidence extraction, drafting, and grounding. Window to year-end: 197 days from 17 Jun 2026; the 5,076/day figure is 1,000,000 ÷ 197. Infrastructure and paid-search figures are planning estimates, clearly marked as such. Publishing remains gated pending canary approval — no machine-written article has been released to PDBI.