TY - JOUR
T1 - To interact or not to interact
T2 - The pros and cons of including interactions in linear regression models
AU - Rimpler, Aljoscha
AU - Kiers, H.A.L.
AU - van Ravenzwaaij, Don
N1 - Publisher Copyright:
© 2025. The Author(s).
PY - 2025/3
Y1 - 2025/3
N2 - Interaction effects are very common in the psychological literature. However, interaction effects are typically very small and often fail to replicate. In this study, we conducted a simulation comparing the generalizability and estimability of two linear regression models: one correctly specified to account for interaction effects and one misspecified including simple effects only. We manipulated noise levels, predictor variable correlations, and different sets of regression weights, resulting in 9216 different conditions. From each dataset, we drew 1000 samples of N = 25, 50, 100, 250, 500, and 1000, resulting in a total of 55,296,000 analyses for each model. Our results show that misspecification can drastically bias regression estimates, sometimes leading to zero or reversed simple effects. Furthermore, we found that when models are generalized to the entire population, the difference between the explained variance in the sample and in the population is often smaller for the misspecified model than for the correctly specified model. However, the comparison between models shows that the correctly specified model explains the data at the population level better overall. These results emphasize the importance of theory in modeling choices and show that it is important to provide a rationale for why interactions are included or excluded in an analysis.
AB - Interaction effects are very common in the psychological literature. However, interaction effects are typically very small and often fail to replicate. In this study, we conducted a simulation comparing the generalizability and estimability of two linear regression models: one correctly specified to account for interaction effects and one misspecified including simple effects only. We manipulated noise levels, predictor variable correlations, and different sets of regression weights, resulting in 9216 different conditions. From each dataset, we drew 1000 samples of N = 25, 50, 100, 250, 500, and 1000, resulting in a total of 55,296,000 analyses for each model. Our results show that misspecification can drastically bias regression estimates, sometimes leading to zero or reversed simple effects. Furthermore, we found that when models are generalized to the entire population, the difference between the explained variance in the sample and in the population is often smaller for the misspecified model than for the correctly specified model. However, the comparison between models shows that the correctly specified model explains the data at the population level better overall. These results emphasize the importance of theory in modeling choices and show that it is important to provide a rationale for why interactions are included or excluded in an analysis.
KW - Generalizability
KW - Moderation
KW - Statistical interactions
KW - Theory
UR - http://www.scopus.com/inward/record.url?scp=85218358893&partnerID=8YFLogxK
U2 - 10.3758/s13428-025-02613-6
DO - 10.3758/s13428-025-02613-6
M3 - Article
C2 - 39920441
AN - SCOPUS:85218358893
SN - 1554-351X
VL - 57
JO - Behavior Research Methods
JF - Behavior Research Methods
IS - 3
M1 - 92
ER -