3,000 statistically grounded synthetic Japanese personas — free and open on Hugging Face. Pre-test any question on the synthetic panel in seconds, then put the same question to real Japanese respondents. Every real answer is ground truth that sharpens the open dataset.
› No signup — load_dataset() and go · Free under CC BY 4.0
A synthetic panel is a statistically grounded model of Japanese consumers — free, instant, ideal for wide early exploration. But a model is not the people themselves: the consumer-behavior layer is aligned with the direction of official statistics, not yet calibrated against real category-level answers. So this is an open project with two layers, run in a loop.
3,000 personas grounded in Japan's demographics and household-income distributions. Query, segment, and simulate in seconds.
Put the same question to real people when a decision rides on it — no full research project to stand up.
Every persona is grounded in public data with fixed seeds, so the whole pipeline reproduces. Base personas come from NVIDIA's census-grounded set; income and consumer behavior are conditioned on Japanese government statistics.
Stratified sample by age-band × sex to match national population proportions → 3,000 personas.
P(income | head-of-household age) combined with a prefecture income index — joint age × region conditioning.
Income-tier-conditioned price sensitivity, brand orientation, channels & EC — both poles always kept, no homogenization.
A name-based, first-person life story per persona — attributes dissolved into a life, not a label list.
backstory_250wuuidsexageage_bandmarital_statuseducation_leveloccupationprefectureregionpersonahobbies_and_interestsskills_and_expertise …and morehousehold_income_brackethousehold_income_midpoint_manyenincome_tierhousehold_income_sourceprice_sensitivitybrand_orientationpromotion_responsivenessbulk_buy_tendencyec_adoptionprimary_purchase_channelsmedia_contactdisposable_income_feelbackstory_250w — a ~220–260 char first-person account of daily life (avg 278.8 chars).Condition an LLM on a demographic list alone and it drifts to the population average and reproduces stereotypes — collapsing the diversity simulation depends on. A concrete life story conditions the model far more richly. This is a design choice with a research basis.
Conditioning a model on detailed real backstories lets it emulate the response distributions of many human subgroups — the basis of “silicon sampling.”
Open-ended, naturalistic backstories yield more consistent and representative virtual personas — up to +18% representativeness and +27% consistency on Pew benchmarks.
Grounding an agent in a person's own first-person interview predicts that individual's real survey answers at ~85% of their own test–retest reliability.
Column names are English; values are native Japanese (with a full JA→EN reference in the data card, so it's usable without reading Japanese).
from datasets import load_dataset
ds = load_dataset("furuchanchan/japan-synthetic-personas", split="train")
print(len(ds), "personas") # 3000
print(ds[0]["backstory_250w"]) # first-person narrative (Japanese)
# segment: women in their 30s, high income, high EC adoption
seg = ds.filter(lambda r: r["sex"]=="女" and r["age_band"]=="30代"
and r["income_tier"]=="high" and r["ec_adoption"]=="高")
Load the dataset and pull a demographic segment in 30 seconds.
Run an LLM-driven concept test over the personas — the core use case.
The synthetic panel is free and instant. When a decision rides on it, you ask real Japanese respondents — and this is what comes back: anonymized, structured, with English translations. Below is a live sample from an actual survey (N=115).
基本は節約、衣服には流行があるし、家電も当たり外れがあるので安さ優先。ただ、友人とランチに行ったり、ご近所づきあいには、必要以上にケチりたくない。話題のお店に一度は行く、どうしても食べたいものは少々高くても食べる、くらいの贅沢はする
Basically saving, prioritizing cheapness for clothes due to trends and for appliances due to hit or miss. However, I don't want to be stingy with lunches with friends or neighborhood relations. I allow myself small luxuries like going to trendy restaurants once or eating what I really want even if it's a bit expensive.
コスパを重視し、事前に口コミを調べてから購入します。
Focus on cost performance and check reviews before purchasing.
推し活
Supporting my favorite idols/creators (oshi-katsu)
常にお金の計算をしながら買う
Always buy while calculating money
This is an open initiative. Take the free data and build with it, ask real Japanese respondents when it counts, or contribute answers and expertise that make the open dataset more accurate for everyone.
Download 3,000 personas under CC BY 4.0 — commercial and research use welcome. Reproduction code and the full data card are on GitHub.
Put your question to real Japanese respondents. Target by age, gender, region, occupation, education and income; get individual answers plus segment aggregates. Every answer also calibrates the open dataset.
No charge to ask — we confirm scope & price first, and send a secure link only if you proceed.
Have real survey data, a panel, or research to contribute? Or a question about the method? Bring it to the community — contributions that sharpen calibration are credited.
3,000 grounded synthetic Japanese consumers, open under CC BY 4.0. Pre-test on synthetic — then ask real people when the decision rides on it.
Facing a real decision? Ask real Japanese people →