TY - JOUR
T1 - Combining the strengths of Dutch survey and register data in a data challenge to predict fertility (PreFer)
AU - Sivak, Elizaveta
AU - Pankowska, Paulina
AU - Mendrik, Adriënne
AU - Emery, Tom
AU - Garcia-Bernardo, Javier
AU - Höcük, Seyit
AU - Karpinska, Kasia
AU - Maineri, Angelica
AU - Mulder, Joris
AU - Nissim, Malvina
AU - Stulp, Gert
N1 - Publisher Copyright:
© The Author(s) 2024.
PY - 2024/4/13
Y1 - 2024/4/13
N2 - The social sciences have produced an impressive body of research on determinants of fertility outcomes, or whether and when people have children. However, the strength of these determinants and underlying theories are rarely evaluated on their predictive ability on new data. This prevents us from systematically comparing studies, hindering the evaluation and accumulation of knowledge. In this paper, we present two datasets which can be used to study the predictability of fertility outcomes in the Netherlands. One dataset is based on the LISS panel, a longitudinal survey which includes thousands of variables on a wide range of topics, including individual preferences and values. The other is based on the Dutch register data which lacks attitudinal data but includes detailed information about the life courses of millions of Dutch residents. We provide information about the datasets and the samples, and describe the fertility outcome of interest. We also introduce the fertility prediction data challenge PreFer which is based on these datasets and will start in Spring 2024. We outline the ways in which measuring the predictability of fertility outcomes using these datasets and combining their strengths in the data challenge can advance our understanding of fertility behaviour and computational social science. We further provide details for participants on how to take part in the data challenge.
AB - The social sciences have produced an impressive body of research on determinants of fertility outcomes, or whether and when people have children. However, the strength of these determinants and underlying theories are rarely evaluated on their predictive ability on new data. This prevents us from systematically comparing studies, hindering the evaluation and accumulation of knowledge. In this paper, we present two datasets which can be used to study the predictability of fertility outcomes in the Netherlands. One dataset is based on the LISS panel, a longitudinal survey which includes thousands of variables on a wide range of topics, including individual preferences and values. The other is based on the Dutch register data which lacks attitudinal data but includes detailed information about the life courses of millions of Dutch residents. We provide information about the datasets and the samples, and describe the fertility outcome of interest. We also introduce the fertility prediction data challenge PreFer which is based on these datasets and will start in Spring 2024. We outline the ways in which measuring the predictability of fertility outcomes using these datasets and combining their strengths in the data challenge can advance our understanding of fertility behaviour and computational social science. We further provide details for participants on how to take part in the data challenge.
KW - Benchmark
KW - Data challenge
KW - Fertility
KW - Out-of-sample prediction
KW - Register data
KW - Survey data
UR - http://www.scopus.com/inward/record.url?scp=85190402603&partnerID=8YFLogxK
U2 - 10.1007/s42001-024-00275-6
DO - 10.1007/s42001-024-00275-6
M3 - Article
AN - SCOPUS:85190402603
SN - 2432-2717
JO - Journal of Computational Social Science
JF - Journal of Computational Social Science
ER -