In the field of neurobiology of language, neuroimaging studies are generally based on stimulation paradigms consisting of at least two different conditions. Designing those paradigms can be very time-consuming and this traditional approach is necessarily data-limited. In contrast, in computational and corpus linguistics, analyses are often based on large text corpora, which allow a vast variety of hypotheses to be tested by repeatedly re-evaluating the data set. Furthermore, text corpora also allow exploratory data analysis in order to generate new hypotheses. By drawing on the advantages of both fields, neuroimaging and computational corpus linguistics, we here present a unified approach combining continuous natural speech and MEG to generate a corpus of speech-evoked neuronal activity.