The Challenge: Intelligence at Election Speed
Indian elections move fast. From nomination filing to polling day, the window to gather, analyse, and act on candidate intelligence can be as short as 14 days. For political consultancies managing constituencies across multiple states, this creates an impossible timeline.
Before CrawlHQ, a typical candidate vetting workflow looked like this:
- Day 1–3: Analysts manually download ECI affidavits as PDFs, one at a time
- Day 4–7: Data entry team transcribes criminal history, asset declarations, and liabilities from PDFs into spreadsheets
- Day 8–10: Cross-reference with news archives for any undisclosed cases or controversies
- Day 11–14: Review completed — often with 20–30% data gaps due to handwriting, poor scan quality, or time pressure
For a consultancy tracking 500 candidates across a state assembly election, this required a team of 12–15 analysts working 12-hour days for the entire filing period. Mistakes happened. Coverage was incomplete. The intelligence arrived too late to inform candidate selection decisions.
The Pipeline
With CrawlHQ, the same pipeline runs in 4 hours and produces more complete, more auditable data.
Stage 1: Discovery — Scrape the Nomination List
The ECI publishes candidate lists as they're filed. Using /v1/scrape with JavaScript rendering, the pipeline automatically retrieves the live nomination list for each constituency as new candidates file. The /v1/extract endpoint maps each listing to a structured schema: name, party, affidavit URL, nomination status.
Stage 2: Affidavit Extraction
For each accepted candidate, the pipeline fetches their Form 26 affidavit PDF. The extraction schema captures criminal cases (section, court, status, year), total declared assets for self and spouse, total liabilities, declared income, and educational qualifications. /v1/extract handles scanned PDFs — even handwritten ones — and returns a match_confidence score that flags documents requiring human review.
For a 500-candidate election, extraction runs in parallel batches and completes in under 2 hours. Every data point carries a source URL and timestamp — a full audit trail for any downstream review.
Stage 3: News Cross-Reference
The /v1/search endpoint queries news archives for each candidate name, surfacing controversy coverage not in official records, party affiliation history and defections, previous election results and margins, and endorsements or opposition from notable figures.
Stage 4: Nomination Monitoring
After initial extraction, /v1/watch monitors the ECI portal every 30 minutes for nomination withdrawals and rejections. When status changes, the system updates the candidate database and alerts the relevant constituency team — no manual refresh required.
Results
| Data Point | Completeness (CrawlHQ) | Completeness (Manual) |
|---|---|---|
| Criminal cases | 94% | 70–80% |
| Total declared assets | 97% | 75–85% |
| Total liabilities | 95% | 70–80% |
| News mentions (12 months) | 100% | 60–70% |
| Party affiliation | 99% | 95% |
Speed: 4 hours vs. 3 weeks. Cost: ~₹17,000 in API credits vs. ₹63 lakh in analyst time per election cycle. Every data point auditable to a source URL and timestamp.
What Gets Built on Top
With structured candidate data, consultancies have built candidate ranking dashboards that score by criminal record severity and wealth signals; voter-facing tools that show citizens a clean summary of every candidate's declared record; red flag alerts for scheduled offences under section 8 of the RPA; and trend analysis comparing declared wealth and criminal case patterns across election cycles.
Building something similar? Get API access →