What this brings up
The interpreter enrichment pipeline is built and deployed across four workers. This runbook lights it up end to end on real data:
google-places (domains) POST /api/dispatch-scrape
-> domain-scraper (evidence) scrape-tasks queue -> broker_contents
-> openai-interpreter (facts /admin/api/migration/scraper-import-d1
+ display/filter metadata) enrichment workflow (OpenAI via AI Gateway)
-> yp-site (renders) field_visibility + filter_signals_json
Verified live 2026-06-25 (R-10). Gotchas the first real run surfaced —
read before re-running:
-
ADMIN_AUTH_TOKENS_JSONshape: each token object must include
principalIdand a non-emptyscopesarray, e.g.
{"realm":"gpn-interpreter","tokens":[{"token":"…","principalId":"operator","scopes":["admin:read","maintenance:write"]}]}.Set it via stdin pipe, not the interactive paste:
printf '%s' '<json>' | wrangler secret put ADMIN_AUTH_TOKENS_JSON.- A
defaultdataset must exist (scraper-import-d1resolves it via thedashboard payload, which prefers slug
default) and an **active
guidance_bundlesrow** must exist (theenrichment_runs.guidance_bundle_idFK requires one).
- Brokers are
PENDINGuntil a full crawl finalizes, so import with
{"acceptStatuses":["PENDING","SUCCESS"]}to ingest pre-scraped evidence.- Manual per-entity enrichment:
wrangler workflows trigger gpn-openai-interpreter-business-enrichment '{"scheduleId":"manual","datasetId":"dataset-default","rootEntityId":"<entityId>","jobType":"stale_business_enrichment","queuedAt":"<iso>","cronExpression":"manual"}'.- The enrichment workflow now loads the entity's real R2 evidence; runs land
in
waiting_review(human-accept gate). The discovery cron (0 */6 * * *)does not yet fan out per business (R-11).
All code is committed and deployed; the steps below are operator actions (secrets and triggers) that cannot live in the repo. Plan of record: .project/plans/INTERPRETER-ENRICHMENT-PLAN.md.
Prerequisites (one-time)
Set these secrets/vars. Run each from the worker's app directory.
# 1. google-places admin token (gates /api/dispatch-scrape and provisioning)
cd apps/gpn-google-places-worker
wrangler secret put ADMIN_TOKEN # choose a strong value; you send it as x-admin-token
# 2. interpreter admin auth (gates /admin/api/*). JSON of tokens + scopes:
cd ../gpn-openai-interpreter-worker
wrangler secret put ADMIN_AUTH_TOKENS_JSON
# value, single line, e.g.:
# {"realm":"gpn-openai-interpreter-admin","tokens":[{"token":"<bearer>","principalId":"ops","scopes":["admin:read","guidance:write","evaluation:run","review:write","maintenance:write","run:write"]}]}
# 3. interpreter OpenAI access — CENTRAL Cloudflare AI Gateway (authenticated + BYOK).
# The gateway stores the OpenAI key; the worker authenticates to the gateway with a
# gateway token and never sends the OpenAI key itself.
#
# a. Store the OpenAI key in the gateway (BYOK): AI Gateway -> <gateway> ->
# Provider keys -> add your OpenAI key. One central key serves every caller.
#
# b. Create the gateway token — this becomes CF_AIG_TOKEN. In the Cloudflare AI
# Gateway screen click "Create token" and fill in the permissions as:
# Account -> AI Gateway -> Run (the ONLY required permission)
# Account Resources: Include -> your account.
# Notes: AI Gateway Read/Run/Edit cannot be scoped to a single gateway, and
# the token is shown only once. Workers AI Read/Edit are NOT needed here.
#
# c. Put the token on the interpreter worker (worker-specific secret):
wrangler secret put CF_AIG_TOKEN # paste the gateway token at the prompt
#
# OPENAI_BASE_URL is already set in wrangler.jsonc to the gateway OpenAI endpoint
# (.../v1/<account_id>/<gateway_id>/openai); the worker sends CF_AIG_TOKEN as the
# `cf-aig-authorization` header. Fallback: the OPENAI_API_KEY Secrets Store binding
# still exists — blank OPENAI_BASE_URL to call api.openai.com directly with it.
# "OPENAI_MODEL": "gpt-4.1-mini" (per-stage overridable in guidance)
npm run deploy
Interpreter route is live. The interpreter is reachable at
interpreter.mondial-it.nl(custom domain)./healthis open;/admin/api/*is bearer-gated by
ADMIN_AUTH_TOKENS_JSON(401 until that secret is set). Thegoogle-places and yp-site workers are also reachable
(
google-places.mondial-it.nl,gpn-yp-site-worker.mondial-it.workers.dev).
Step 1 — Seed discovered businesses (google-places)
Run a Places search so agent_places is populated in gpn-data-d1. From the workspace UI (https://google-places.mondial-it.nl/) or via the API:
curl -sS -X POST https://google-places.mondial-it.nl/api/places/search \
-H "x-admin-token: $ADMIN_TOKEN" -H 'content-type: application/json' \
-d '{"project_name":"makelaars-groningen","queries":["makelaar Groningen"]}'
A Places search consumes Google Places API quota. Keep the query list small
for a first run.
Step 2 — Hand domains to the scraper
Enqueues one scrape-tasks message per resolved domain and seeds brokers.
curl -sS -X POST https://google-places.mondial-it.nl/api/dispatch-scrape \
-H "x-admin-token: $ADMIN_TOKEN" -H 'content-type: application/json' \
-d '{"project_name":"makelaars-groningen","limit":10}'
# -> { ok:true, dispatched:N, skipped:M, upload_id:"places-..." }
The scraper consumes the queue (Cloudflare Browser) and writes broker_contents (+ broker_info/broker_listings when extraction succeeds). Verify:
cd apps/gpn-domain-scraper-worker
wrangler d1 execute gpn-data-d1 --remote \
--command "SELECT status, COUNT(*) FROM brokers GROUP BY status"
wrangler d1 execute gpn-data-d1 --remote \
--command "SELECT COUNT(*) FROM broker_contents"
Step 3 — Ingest scraper evidence into the interpreter
Reads the broker_* tables straight from gpn-data-d1 (no manifest needed) and creates source snapshots + candidate entities. acceptStatuses lets you import crawled-but-unfinalized brokers.
INTERP=<interpreter base url> # e.g. https://gpn-openai-interpreter-worker.<acct>.workers.dev
curl -sS -X POST "$INTERP/admin/api/migration/scraper-import-d1" \
-H "authorization: Bearer $INTERP_TOKEN" -H 'content-type: application/json' \
-d '{"limit":50,"requireContents":true,"acceptStatuses":["SUCCESS","PENDING"]}'
# -> { totalProcessed, snapshotsCreated, entitiesCreated, skipped, errors }
Step 4 — Run enrichment (OpenAI interpretation)
The BusinessEnrichmentWorkflow runs from the scheduled-discovery path: the cron scheduled handler enqueues work for enabled rows in processing_schedules, the queue handler starts a workflow per business. With CF_AIG_TOKEN set, the model cascade calls OpenAI through the central AI Gateway (BYOK) and validates output against the pilot schema; without it (or without a fallback OPENAI_API_KEY + blank OPENAI_BASE_URL), the deterministic heuristic provider runs. Each run also:
- attaches
displayMetadata(policy fields + derivedfilterSignals) to the
canonical projection, and
- writes a
publicationprojection ({domain, filterSignals}) into
gpn-data-d1 (projection_versions, kind publication).
R-4 / ADR-0001: the interpreter no longer writes
gpn-user-d1directly.Per-business
filter_signals_jsonreaches the site only when
gpn-data-publisher-workerreads thatpublicationprojection and projects it(see Step 5). The old workflow direct-write was removed.
Confirm a run landed:
curl -sS "$INTERP/admin/api/runs/<runId>" -H "authorization: Bearer $INTERP_TOKEN"
# or inspect projection_versions / enrichment_runs in gpn-data-d1
Step 5 — Configure & deliver the display/filter policy
The policy is the dashboard-configurable contract for what the site shows and how it filters (using yp-site shadcn filter templates). View / edit it:
# current policy (stored or default)
curl -sS "$INTERP/admin/api/display-policy" -H "authorization: Bearer $INTERP_TOKEN"
# override it (operator edit)
curl -sS -X PUT "$INTERP/admin/api/display-policy" \
-H "authorization: Bearer $INTERP_TOKEN" -H 'content-type: application/json' \
-d '{"policy":{"version":"v2","fields":[
{"fieldKey":"displayName","label":"Name","surfaces":["list","detail"],"displayComponent":"field.display.name","filterTemplate":"filter.template.text-search","visible":true,"sortOrder":10},
{"fieldKey":"rating","label":"Rating","surfaces":["list","filter"],"displayComponent":"field.display.rating","filterTemplate":"filter.template.numeric-range","visible":true,"sortOrder":60}
]}}'
Deliver to the public site via the publisher (the sole gpn-user-d1 writer). One call projects both the baseline rows and the enrichment overlay — field_visibility from the stored policy and business_listings.filter_signals_json from the publication projections:
PUBLISHER="https://publisher.mondial-it.nl"
curl -sS -X POST "$PUBLISHER/publish?project=<projectName>&dataset=<datasetId>&category=*" \
-H "x-admin-token: $PUBLISHER_TOKEN"
# -> { ok:true, baseline:{...}, enrichment:{ fieldVisibilityRows:N, filterSignalBusinesses:M, policyFound:true } }
The interpreter's old
POST /admin/api/project-to-siteis retired (returns410). The interpreter still authors the policy (
PUT /admin/api/display-policyabove); the publisher delivers it.
Step 6 — Verify on the site
curl -s "https://gpn-yp-site-worker.mondial-it.workers.dev/?q=" | grep -o 'Business status\|Rating and reviews\|Category and type'
The directory sidebar renders only the filter groups the policy marks visible on the filter surface; per-business filter_signals_json is available on each listing row for downstream filtering. Removing the field_visibility rows restores the default "show all filters" behavior.
Cleanup
- If you enabled
workers_devon the interpreter for the admin calls, set it
back to false and redeploy.
- Treat tokens as secrets; never paste them into committed files or logs.
Endpoint reference
| Worker | Endpoint | Auth | Purpose |
|---|---|---|---|
| google-places | POST /api/places/search | x-admin-token | seed agent_places |
| google-places | POST /api/dispatch-scrape | x-admin-token | enqueue domains to scraper |
| interpreter | POST /admin/api/migration/scraper-import-d1 | Bearer (maintenance:write) | ingest broker_* evidence |
| interpreter | GET/PUT /admin/api/display-policy | Bearer (admin:read / maintenance:write) | view/edit display+filter policy |
| interpreter | GET /admin/api/runs/{id} | Bearer (admin:read) | inspect an enrichment run |
| publisher | POST /publish?project=&dataset=&category= | x-admin-token | project baseline + enrichment into gpn-user-d1 (sole writer) |
| yp-site | GET / | public | renders filters from field_visibility |