Measurement of speech parameters in casual speech of dementia patients

Roelant Ossewaarde, Roel Jonkers, Fedor Jalvingh, Yvonne Bastiaanse

    Research output: Contribution to conferenceAbstractAcademic

    164 Downloads (Pure)


    Measurement of speech parameters in casual speech of dementia patients

    Roelant Adriaan Ossewaarde1,2, Roel Jonkers1, Fedor Jalvingh1,3, Roelien Bastiaanse1

    1CLCG, University of Groningen (NL); 2HU University of Applied Sciences Utrecht (NL); 33St. Marienhospital - Vechta, Geriatric Clinic Vechta (DE);


    Individuals with dementia often experience a decline in their ability to use language. Language problems have been reported in individuals with dementia caused by Alzheimer’s disease, Parkinson’s disease or degeneration of the fronto-temporal area.

    Acoustic properties are relatively easy to measure with software, which promises a cost-effective way to analyze larger discourses. We study the usefulness of acoustic features to distinguish the speech of German-speaking controls and patients with dementia caused by (a) Alzheimer’s disease, (b) Parkinson’s disease or (c) PPA/FTD. Previous studies have shown that each of these types affects speech parameters such as prosody, voice quality and fluency (Schulz 2002; Ma, Whitehill, and Cheung 2010; Rusz et al. 2016; Kato et al. 2013; Peintner et al. 2008).

    Prior work on the characteristics of the speech of individuals with dementia is usually based on samples from clinical tests, such as the Western Aphasia Battery or the Wechsler Logical Memory task. Spontaneous day-to-day speech may be different, because participants may show less of their vocal abilities in casual speech than in specifically designed test scenarios. It is unclear to what extent the previously reported speech characteristics are still detectable in casual conversations by software.

    The research question in this study is: how useful for classification are acoustic properties measured in spontaneous speech.

    MethodsParticipant recruitment and data

    The speech data used in this study was collected during a larger study of processing of verbs and nouns in speakers with different types of dementia, currently performed by one of the co-authors (FJ). Participant recruitment, data elicitation and manual CLAN-annotation were performed in the context of that study. Spontaneous speech fragments were elicited from German controls (n=7) and patients with a clinical diagnosis of a form of dementia: (probable) Alzheimer’s disease (AD, n=9), PPA (n=3), bvFTD (n=4), Parkinson’s disease (PD, n=6), PD with MCI (n=4), PD with dementia (n=3). In this study, only data on controls and participants diagnosed with PPA, AD or PD are reported.

    For each participant, discourses on three different topics (past, present, future) were elicited. Because the ultimate goal of the larger study is to track the long-time decline of the linguistic system in non-controls, the elicitation of the three topics was repeated three times with non-controls, with about 6 months between each elicitation session.

    Narrative sampling

    The interviewer asked participants in separate sessions to speak of childhood memories (topic: past), of a typical day in the present (topic: present), and of plans that they might have for the next week, month or year (topic: future).

    Elicitation was done in the participant’s own environment. This affects recording quality: background noises are present in the signal, such as children playing, telephones ringing and papers being shuffled. Segments of the interviewer giving instructions or asking questions were removed, but only if they are of significant length and truly interrupt the flow of the discourse of the participant. This was judged by the researcher (RO). The resulting discourses (μ=6m47s; σ=3m30s) are of sufficient length that minor utterances by the interviewer to move the discourse along (hmm-hmm, oh yeah, etc.) do not significantly impact the data analysis of the speaker’s voice characteristics.

    Acoustic feature extraction

    Audio recordings were analyzed for voice activity using an unsupervised learning framework (Ying et al. 2011), and for pitch using an automatic pitch extraction algorithm implemented in REAPER[1].

    Results of the analyses were stored in a database and then read in by R-scripts for further statistical analysis. Reaper’s output (in Hz) was translated to pitch interval (in cents, P0) as proposed by Matteson, Olness, & Caplow (2013).

    The following variables were used for analysis:

    FluencyPause lengthPause frequencyPhonationDuration of speechVoice qualityJitterShimmerProsodyPitch levelmean, median, maximum and minimum P0Pitch rangeSDFour standard deviations around the mean (SD4)Max-min P0The difference between the 95th and the 5th percentile (HDI, 90% span)The difference between the 16th and the 84th percentile (HDI 68% span)Skewness and kurtosis

    Machine learning

    We trained a generalized linear multilevel model using R and Stan (McElreath, 2016; R Core Team, 2017) and evaluated its performance. Evaluation was conducted using five-fold cross-validation over the set of fragments. In each of the five folds, the parameters of the model were first learned in a training phase using 80% of the data, and then applied to the held-out data to predict the participant’s diagnosis. This procedure is repeated for each of the five folds, with accuracy being the average performance on the test data across all folds.


    Results are compared to a baseline (“Zero Rule”) strategy that always predicts the majority class. The classifier is considered informative if it performs better than the baseline strategy. Machine learning results suggest that the proposed model is superior to the baseline standard of predicting the majority class, measured as the area under curve (AUROC), cf. figure 1. Individual univariate Wilcoxon rank sum tests, adapted with Benjamini-Hochberg correction for false discovery rate (Benjamini and Hochberg 1995), show that SD patients have significantly shorter pauses than controls, and PD patients have significantly lower values for voice quality parameters than controls.


    Post-hoc analyses show that most influence in the model comes from fluency and voice quality variables, while prosody variables contribute the least. Monopitch has frequently been associated with dementia speech, but the role of pitch is very limited in the model for this convenience sample. A possible explanation is that casual spontaneous speech invites less pitch variation, both in controls and in patients.

    [1] David Talkin,

    Original languageEnglish
    Number of pages4
    Publication statusPublished - Sep-2017
    Event18th International Science of Aphasia Conference - Geneva, Switzerland
    Duration: 11-Sep-201714-Sep-2017
    Conference number: 18


    Conference18th International Science of Aphasia Conference
    Abbreviated titleScience of Aphasia
    Internet address

    Cite this