GPN BusinessDiscoverer

Documentation/ GPN BusinessDiscoverer/ gpn-openai-interpreter-worker/How-to

Bring up the enrichment pipeline (live)

Run the full chain on real data — google-places domains -> scraper evidence -> OpenAI interpretation -> display/filter metadata -> the public site. Covers the secrets, endpoints, and exact commands.

How-to Slug: pipeline-bringup Updated: 26 Jun 2026

What this brings up

The interpreter enrichment pipeline is built and deployed across four workers. This runbook lights it up end to end on real data:

google-places (domains)            POST /api/dispatch-scrape
  -> domain-scraper (evidence)     scrape-tasks queue -> broker_contents
    -> openai-interpreter (facts   /admin/api/migration/scraper-import-d1
       + display/filter metadata)  enrichment workflow (OpenAI via AI Gateway)
      -> yp-site (renders)         field_visibility + filter_signals_json

Verified live 2026-06-25 (R-10). Gotchas the first real run surfaced —

read before re-running:

- ADMIN_AUTH_TOKENS_JSON shape: each token object must include

principalId and a non-empty scopes array, e.g.

{"realm":"gpn-interpreter","tokens":[{"token":"…","principalId":"operator","scopes":["admin:read","maintenance:write"]}]}.

Set it via stdin pipe, not the interactive paste:

printf '%s' '<json>' | wrangler secret put ADMIN_AUTH_TOKENS_JSON.

- A default dataset must exist (scraper-import-d1 resolves it via the

dashboard payload, which prefers slug default) and an **active

guidance_bundles row** must exist (the enrichment_runs.guidance_bundle_id

FK requires one).

- Brokers are PENDING until a full crawl finalizes, so import with

{"acceptStatuses":["PENDING","SUCCESS"]} to ingest pre-scraped evidence.

- Manual per-entity enrichment:

wrangler workflows trigger gpn-openai-interpreter-business-enrichment '{"scheduleId":"manual","datasetId":"dataset-default","rootEntityId":"<entityId>","jobType":"stale_business_enrichment","queuedAt":"<iso>","cronExpression":"manual"}'.

- The enrichment workflow now loads the entity's real R2 evidence; runs land

in waiting_review (human-accept gate). The discovery cron (0 */6 * * *)

does not yet fan out per business (R-11).

All code is committed and deployed; the steps below are operator actions (secrets and triggers) that cannot live in the repo. Plan of record: .project/plans/INTERPRETER-ENRICHMENT-PLAN.md.

Prerequisites (one-time)

Set these secrets/vars. Run each from the worker's app directory.

# 1. google-places admin token (gates /api/dispatch-scrape and provisioning)
cd apps/gpn-google-places-worker
wrangler secret put ADMIN_TOKEN            # choose a strong value; you send it as x-admin-token

# 2. interpreter admin auth (gates /admin/api/*). JSON of tokens + scopes:
cd ../gpn-openai-interpreter-worker
wrangler secret put ADMIN_AUTH_TOKENS_JSON
# value, single line, e.g.:
# {"realm":"gpn-openai-interpreter-admin","tokens":[{"token":"<bearer>","principalId":"ops","scopes":["admin:read","guidance:write","evaluation:run","review:write","maintenance:write","run:write"]}]}

# 3. interpreter OpenAI access — CENTRAL Cloudflare AI Gateway (authenticated + BYOK).
# The gateway stores the OpenAI key; the worker authenticates to the gateway with a
# gateway token and never sends the OpenAI key itself.
#
#  a. Store the OpenAI key in the gateway (BYOK): AI Gateway -> <gateway> ->
#     Provider keys -> add your OpenAI key. One central key serves every caller.
#
#  b. Create the gateway token — this becomes CF_AIG_TOKEN. In the Cloudflare AI
#     Gateway screen click "Create token" and fill in the permissions as:
#        Account  ->  AI Gateway  ->  Run         (the ONLY required permission)
#     Account Resources: Include -> your account.
#     Notes: AI Gateway Read/Run/Edit cannot be scoped to a single gateway, and
#     the token is shown only once. Workers AI Read/Edit are NOT needed here.
#
#  c. Put the token on the interpreter worker (worker-specific secret):
wrangler secret put CF_AIG_TOKEN            # paste the gateway token at the prompt
#
# OPENAI_BASE_URL is already set in wrangler.jsonc to the gateway OpenAI endpoint
# (.../v1/<account_id>/<gateway_id>/openai); the worker sends CF_AIG_TOKEN as the
# `cf-aig-authorization` header. Fallback: the OPENAI_API_KEY Secrets Store binding
# still exists — blank OPENAI_BASE_URL to call api.openai.com directly with it.
#   "OPENAI_MODEL": "gpt-4.1-mini"   (per-stage overridable in guidance)
npm run deploy

Interpreter route is live. The interpreter is reachable at

interpreter.mondial-it.nl (custom domain). /health is open; /admin/api/*

is bearer-gated by ADMIN_AUTH_TOKENS_JSON (401 until that secret is set). The

google-places and yp-site workers are also reachable

(google-places.mondial-it.nl, gpn-yp-site-worker.mondial-it.workers.dev).

Step 1 — Seed discovered businesses (google-places)

Run a Places search so agent_places is populated in gpn-data-d1. From the workspace UI (https://google-places.mondial-it.nl/) or via the API:

curl -sS -X POST https://google-places.mondial-it.nl/api/places/search \
  -H "x-admin-token: $ADMIN_TOKEN" -H 'content-type: application/json' \
  -d '{"project_name":"makelaars-groningen","queries":["makelaar Groningen"]}'

A Places search consumes Google Places API quota. Keep the query list small

for a first run.

Step 2 — Hand domains to the scraper

Enqueues one scrape-tasks message per resolved domain and seeds brokers.

curl -sS -X POST https://google-places.mondial-it.nl/api/dispatch-scrape \
  -H "x-admin-token: $ADMIN_TOKEN" -H 'content-type: application/json' \
  -d '{"project_name":"makelaars-groningen","limit":10}'
# -> { ok:true, dispatched:N, skipped:M, upload_id:"places-..." }

The scraper consumes the queue (Cloudflare Browser) and writes broker_contents (+ broker_info/broker_listings when extraction succeeds). Verify:

cd apps/gpn-domain-scraper-worker
wrangler d1 execute gpn-data-d1 --remote \
  --command "SELECT status, COUNT(*) FROM brokers GROUP BY status"
wrangler d1 execute gpn-data-d1 --remote \
  --command "SELECT COUNT(*) FROM broker_contents"

Step 3 — Ingest scraper evidence into the interpreter

Reads the broker_* tables straight from gpn-data-d1 (no manifest needed) and creates source snapshots + candidate entities. acceptStatuses lets you import crawled-but-unfinalized brokers.

INTERP=<interpreter base url>   # e.g. https://gpn-openai-interpreter-worker.<acct>.workers.dev
curl -sS -X POST "$INTERP/admin/api/migration/scraper-import-d1" \
  -H "authorization: Bearer $INTERP_TOKEN" -H 'content-type: application/json' \
  -d '{"limit":50,"requireContents":true,"acceptStatuses":["SUCCESS","PENDING"]}'
# -> { totalProcessed, snapshotsCreated, entitiesCreated, skipped, errors }

Step 4 — Run enrichment (OpenAI interpretation)

The BusinessEnrichmentWorkflow runs from the scheduled-discovery path: the cron scheduled handler enqueues work for enabled rows in processing_schedules, the queue handler starts a workflow per business. With CF_AIG_TOKEN set, the model cascade calls OpenAI through the central AI Gateway (BYOK) and validates output against the pilot schema; without it (or without a fallback OPENAI_API_KEY + blank OPENAI_BASE_URL), the deterministic heuristic provider runs. Each run also:

canonical projection, and

gpn-data-d1 (projection_versions, kind publication).

R-4 / ADR-0001: the interpreter no longer writes gpn-user-d1 directly.

Per-business filter_signals_json reaches the site only when

gpn-data-publisher-worker reads that publication projection and projects it

(see Step 5). The old workflow direct-write was removed.

Confirm a run landed:

curl -sS "$INTERP/admin/api/runs/<runId>" -H "authorization: Bearer $INTERP_TOKEN"
# or inspect projection_versions / enrichment_runs in gpn-data-d1

Step 5 — Configure & deliver the display/filter policy

The policy is the dashboard-configurable contract for what the site shows and how it filters (using yp-site shadcn filter templates). View / edit it:

# current policy (stored or default)
curl -sS "$INTERP/admin/api/display-policy" -H "authorization: Bearer $INTERP_TOKEN"

# override it (operator edit)
curl -sS -X PUT "$INTERP/admin/api/display-policy" \
  -H "authorization: Bearer $INTERP_TOKEN" -H 'content-type: application/json' \
  -d '{"policy":{"version":"v2","fields":[
        {"fieldKey":"displayName","label":"Name","surfaces":["list","detail"],"displayComponent":"field.display.name","filterTemplate":"filter.template.text-search","visible":true,"sortOrder":10},
        {"fieldKey":"rating","label":"Rating","surfaces":["list","filter"],"displayComponent":"field.display.rating","filterTemplate":"filter.template.numeric-range","visible":true,"sortOrder":60}
      ]}}'

Deliver to the public site via the publisher (the sole gpn-user-d1 writer). One call projects both the baseline rows and the enrichment overlay — field_visibility from the stored policy and business_listings.filter_signals_json from the publication projections:

PUBLISHER="https://publisher.mondial-it.nl"
curl -sS -X POST "$PUBLISHER/publish?project=<projectName>&dataset=<datasetId>&category=*" \
  -H "x-admin-token: $PUBLISHER_TOKEN"
# -> { ok:true, baseline:{...}, enrichment:{ fieldVisibilityRows:N, filterSignalBusinesses:M, policyFound:true } }

The interpreter's old POST /admin/api/project-to-site is retired (returns

410). The interpreter still authors the policy (PUT /admin/api/display-policy

above); the publisher delivers it.

Step 6 — Verify on the site

curl -s "https://gpn-yp-site-worker.mondial-it.workers.dev/?q=" | grep -o 'Business status\|Rating and reviews\|Category and type'

The directory sidebar renders only the filter groups the policy marks visible on the filter surface; per-business filter_signals_json is available on each listing row for downstream filtering. Removing the field_visibility rows restores the default "show all filters" behavior.

Cleanup

back to false and redeploy.

Endpoint reference

WorkerEndpointAuthPurpose
google-placesPOST /api/places/searchx-admin-tokenseed agent_places
google-placesPOST /api/dispatch-scrapex-admin-tokenenqueue domains to scraper
interpreterPOST /admin/api/migration/scraper-import-d1Bearer (maintenance:write)ingest broker_* evidence
interpreterGET/PUT /admin/api/display-policyBearer (admin:read / maintenance:write)view/edit display+filter policy
interpreterGET /admin/api/runs/{id}Bearer (admin:read)inspect an enrichment run
publisherPOST /publish?project=&dataset=&category=x-admin-tokenproject baseline + enrichment into gpn-user-d1 (sole writer)
yp-siteGET /publicrenders filters from field_visibility