Basics
Beyond the Buzz: 4 Core Ways Vendors Build “Synthetic” Survey Respondents

Written by:
Michael Hess
May 11, 2025
7 minute read

The synthetic-respondent gold rush is here. Every week a new platform claims its AI can replace—or at least super-charge—traditional sample. But how those respondents are actually generated makes a night-and-day difference in data quality. Below are the four dominant build-methods you’ll meet on the market, followed by a look at “agentic” add-ons that some vendors layer on top.
1. Rule-Based Personas
Oldest, simplest, and easiest to spot:
- Engine: hard-coded if/then decision trees tied to a static profile.
- Upside: 100 % explainable and fast for demos.
- Watch-out: zero surprise or nuance—answers stay glued to the stereotype that the designer wrote. Conjointly’s critique calls these offerings “homeopathy for market research.” Conjointly
2. Generative-LLM Simulations
Prompt a large language model (GPT-4, Claude 3, etc.) to “act like a 45-year-old procurement manager” and let it complete the survey.
- Evidence of value: Academic work shows LLM‐generated brand perceptual maps overlap with human maps by ~75 %. Harvard Business School
- Gaps: Fumbles on domain-specific jargon, recent events, or truly niche B2B topics. Bias mirrors whatever is over-represented in the training data.
3. Reinforcement-Learned Respondents
Here the synthetic respondent learns through reward signals—e.g., “match hold-out human answers” or “fool a discriminator.”
- Proof-point: Meta’s CICERO combined RL with a language model to hit top-10 % human performance in the negotiation game Diplomacy. Science
- Status: Impressive for goal-directed, consistent dialogue, but still experimental for large-scale survey work (requires big benchmark datasets to train the reward).
4. Data-Driven Digital Twins
A synthetic clone built from an individual’s own historical surveys, interviews, CRM logs, or behavioral exhaust.
- Stanford’s “Generative Agent Simulations of 1,052 People” reproduced each person’s General Social Survey answers with 85 % test–retest fidelity—on par with asking the same human two weeks later. arXiv
- Toluna’s HarmonAIze Personas uses panel data from 19 M members to build distinct, memory-rich twins that “reason and express emotion.” Business Wire
- Trade-off: Requires rich first-party data, but delivers the highest realism and traceability.

Understanding Digital Twins
Marketing copy loves the term “digital twin,” but most products using it are really just aggregated look-alikes masquerading as individual humans. Only true 1-to-1 twins—models built from the complete, consented data exhaust of a single respondent—capture the quirks that shape real decisions, let you validate answers line-by-line against historical surveys, slash stereotype drift by grounding every prediction in actual human behavior, and provide audit-ready provenance for regulated studies.

Layering on “Agentic” Abilities
Autonomy—memory, planning, even teamwork—can sit on top of any build-method. Here are three popular add-ons:
Add vector-store memory + a scheduler → the twin remembers yesterday’s answers and evolves over time (see Stanford’s “Smallville” agents).
Give the agent a goal plus an RL planner → multiple synthetic stakeholders can bargain like a real buying committee (CICERO in Diplomacy).
Schedule web-scrapes + re-prompts → personas read the news and update opinions without re-fielding (PersonaPanels’ “living” segments).

Key takeaway: agentic layers magnify whatever realism (or bias) is already inside the model. A 1:1 twin with memory becomes a living, evolving customer clone. A rule-based bot with memory just remembers its stereotypes.
Bottom Line for Insight Teams
- Always ask how the answers are generated. Methodology matters more than marketing.
- Prioritise data provenance. The closer the model is to real, individual data, the higher the signal.
- Validate before you trust. Demand test–retest or hold-out benchmarks.
- Use agentic layers as a force-multiplier—not as a crutch. Autonomy magnifies whatever realism (or bias) is already inside the model.
Synthetic respondents are only as good as the data—and modelling discipline—behind them. For high-stakes B2B work, pricing studies, or anything where nuance trumps averages, true 1:1 digital twins are the safest synthetic seatbelt you can buy. Add agentic layers once you trust the driver; otherwise you’re just giving bad data a longer leash.
Curious how to turn these insights into concrete brand value? Download our new white paper, “Positioning Synthetic, Agentic B2B Audiences,” for frameworks, case studies, and ROI benchmarks you can take straight to your next stakeholder meeting. Get your copy here.