Storage model
The YP site worker uses a Cloudflare D1 database bound as DB (named gpn-user-d1, the shared user-facing database). D1 is SQLite at the edge — queries run inside the Worker with no network hop. The schema is defined in schema.sql at the worker root.
Tables
source_batches
Tracks each data import into the system. Every business record links back to the batch that created it.
id— primary keysource_name— human label (e.g. "Google Places import Q2")source_kind— type of source (GOOGLE_PLACES, MANUAL, SCRAPER)status— PENDING, PROCESSING, COMPLETE, FAILEDrecord_count— number of records in the batchuploaded_at,processed_at— lifecycle timestamps
businesses
The core business identity table. One row per business.
id,slug— unique identifiers (slug used in URLs)domain— unique website domainbusiness_name,entity_name— display and legal namesprimary_category,business_type— classificationstatus— ACTIVE, INACTIVE, BLOCKED, UNVERIFIEDcity,region,country_code,latitude,longitude— locationwebsite_url,phone_number,email_address— contactfeatured_rank— promoted placement (null = organic)search_tokens_json— pre-computed search tokens for fast matchingsource_batch_id— FK to the batch that created this record
business_profiles
Extended profile data, one-to-one with businesses.
summary,description— text contentservices_json,hours_json,highlights_json— structured JSON fieldstrust_json— verification and trust signalsseo_title,seo_description,canonical_url— SEO metadatahero_image_url,logo_image_url,cover_image_url— media
business_contents
Page-level content extracted from crawled websites. Many-to-one with businesses.
content_type— PAGE, REVIEW, FAQ, SERVICEsource_url— the crawled URLmarkdown_content— cleaned contentextracted_json— structured data extracted by the scrapercontent_hash— deduplication hashverified— whether content has been reviewed
business_listings
Pre-computed listing cards for the search page. Many-to-one with businesses.
title,subtitle,snippet— display textlocation_label,status_label,category_label— badge/chip labelsrating_label,review_count_label,distance_label— metricspromoted— boolean for featured placementsort_rank— pre-computed sort orderfilter_signals_json— pre-indexed filter values for fast filtering
leads
Captured leads from the conversion flow.
full_name,email_address,phone_number— contact inforegion,country_code— locationbudget_value,needs_financing— qualification fieldsbusiness_matches_json— which businesses the lead was viewingsource_business_id— FK to the originating business page
Data flow
Google Places API
-> gpn-google-places-worker (resolve, normalize)
-> source_batches row created
-> businesses rows inserted
-> gpn-domain-scraper-worker (crawl websites)
-> business_contents rows inserted
-> business_profiles updated with extracted data
-> business_listings rows computed for search display
The search page loader queries businesses and business_listings with joins. The detail page loader queries businesses joined to business_profiles and business_contents.
Indexes
D1 automatically indexes primary keys and unique columns. The slug and domain columns on businesses are unique-indexed for fast lookups. For full-text search, the loader currently uses LIKE queries on business_name — Vectorize will replace this with semantic search.