Abstract
In this study we ask the question whether simplifying the data in dialectometrical
studies by removing infrequent forms is advantageous to uncover the geographical
structure in dialect data. By investigating lexical variation in a large corpus of
Tuscan dialect data via hierarchical bipartite spectral graph partitioning, we are
able to identify the main geographical areas together with their linguistic basis. In
order to assess the influence of infrequent forms, we conduct two analyses: one
which includes only lexical variants used by at least 0.5% of the informants, and
another which includes all lexical variants in the data. Using this approach we show
that using all data enables us to find a geographical characterization with a more
adequate linguistic basis than by using the trimmed data.
studies by removing infrequent forms is advantageous to uncover the geographical
structure in dialect data. By investigating lexical variation in a large corpus of
Tuscan dialect data via hierarchical bipartite spectral graph partitioning, we are
able to identify the main geographical areas together with their linguistic basis. In
order to assess the influence of infrequent forms, we conduct two analyses: one
which includes only lexical variants used by at least 0.5% of the informants, and
another which includes all lexical variants in the data. Using this approach we show
that using all data enables us to find a geographical characterization with a more
adequate linguistic basis than by using the trimmed data.
Original language | English |
---|---|
Title of host publication | The Future of Dialects |
Subtitle of host publication | selected papers from Methods in Dialectology XV |
Editors | Marie-Hélène Côté, Remco Knooihuizen, John Nerbonne |
Publisher | Language Science Press |
Pages | 215-224 |
Number of pages | 10 |
ISBN (Print) | 9783946234197 |
Publication status | Published - 2016 |