Define a Schema.
Get Back Structured Data.
Tell CrawlHQ what you want. Point it at any URL. Get back a JSON object that matches your exact schema — every run, every time. No regex, no XPath, no fragile selectors.
What makes it production-grade
Every module is built for pipelines that run without you watching.
Schema-Defined Output
You define exactly what fields you want. CrawlHQ returns a JSON object matching your schema — nested objects, arrays, enums, whatever you need.
LLM-Powered Extraction
No fragile CSS selectors or XPath. Semantic understanding means extraction works even when the site redesigns or reorganises content.
Source Attribution
Every extracted field traces back to a source URL and timestamp. Full audit trail — know exactly where every data point came from.
PDF Support
Extract structured data from PDF documents — ECI affidavits, financial reports, government filings, contracts. Same schema-driven approach.
Batch Extraction
Pass an array of URLs and get back an array of structured objects. Extract from 100 product pages in a single API call.
Schema Validation
Output is validated against your schema before returning. You get a match_confidence score and field-level validation results.
Use Cases
What teams build with extract
Competitor Pricing Intelligence
Define a pricing schema. Point at 20 competitor pricing pages. Get back a structured comparison table — automatically, daily.
ECI Affidavit Data Extraction
Extract candidate criminal history, declared assets, and liabilities from public ECI PDF affidavits. 500 candidates processed in minutes.
Product Catalogue Scraping
Define a product schema with name, price, SKU, availability. Extract from any e-commerce site — even JS-rendered ones.
Job Listing Extraction
Extract structured job data — title, company, salary, requirements, location — from any job board. Build market intelligence tools.
Lead Data Enrichment
Point at a company's About page. Extract company size, founding year, tech stack, leadership team. Feed into your CRM automatically.
Financial Filing Analysis
Extract key metrics from annual reports, quarterly results, and investor presentations. Structure unstructured financial data at scale.