D1 database schema · gpn-yp-site-worker

Storage model

The YP site worker uses a Cloudflare D1 database bound as DB (named gpn-user-d1, the shared user-facing database). D1 is SQLite at the edge — queries run inside the Worker with no network hop. The schema is defined in schema.sql at the worker root.

Tables

source_batches

Tracks each data import into the system. Every business record links back to the batch that created it.

id — primary key
source_name — human label (e.g. "Google Places import Q2")
source_kind — type of source (GOOGLE_PLACES, MANUAL, SCRAPER)
status — PENDING, PROCESSING, COMPLETE, FAILED
record_count — number of records in the batch
uploaded_at, processed_at — lifecycle timestamps

businesses

The core business identity table. One row per business.

id, slug — unique identifiers (slug used in URLs)
domain — unique website domain
business_name, entity_name — display and legal names
primary_category, business_type — classification
status — ACTIVE, INACTIVE, BLOCKED, UNVERIFIED
city, region, country_code, latitude, longitude — location
website_url, phone_number, email_address — contact
featured_rank — promoted placement (null = organic)
search_tokens_json — pre-computed search tokens for fast matching
source_batch_id — FK to the batch that created this record

business_profiles

Extended profile data, one-to-one with businesses.

summary, description — text content
services_json, hours_json, highlights_json — structured JSON fields
trust_json — verification and trust signals
seo_title, seo_description, canonical_url — SEO metadata
hero_image_url, logo_image_url, cover_image_url — media

business_contents

Page-level content extracted from crawled websites. Many-to-one with businesses.

content_type — PAGE, REVIEW, FAQ, SERVICE
source_url — the crawled URL
markdown_content — cleaned content
extracted_json — structured data extracted by the scraper
content_hash — deduplication hash
verified — whether content has been reviewed

business_listings

Pre-computed listing cards for the search page. Many-to-one with businesses.

title, subtitle, snippet — display text
location_label, status_label, category_label — badge/chip labels
rating_label, review_count_label, distance_label — metrics
promoted — boolean for featured placement
sort_rank — pre-computed sort order
filter_signals_json — pre-indexed filter values for fast filtering

leads

Captured leads from the conversion flow.

full_name, email_address, phone_number — contact info
region, country_code — location
budget_value, needs_financing — qualification fields
business_matches_json — which businesses the lead was viewing
source_business_id — FK to the originating business page

Data flow

Google Places API
  -> gpn-google-places-worker (resolve, normalize)
    -> source_batches row created
    -> businesses rows inserted
      -> gpn-domain-scraper-worker (crawl websites)
        -> business_contents rows inserted
        -> business_profiles updated with extracted data
          -> business_listings rows computed for search display

The search page loader queries businesses and business_listings with joins. The detail page loader queries businesses joined to business_profiles and business_contents.

Indexes

D1 automatically indexes primary keys and unique columns. The slug and domain columns on businesses are unique-indexed for fast lookups. For full-text search, the loader currently uses LIKE queries on business_name — Vectorize will replace this with semantic search.