Abstract
This paper presents a Parallel Universal Dependency (PUD) treebank for the Indo-Aryan
language, Bengali. The treebank consists of 1000 Bengali sentences created using a
parallel corpus of English-Bengali and Hindi-Bengali. The number of tokens reported for
the 200 manually annotated sentences are 2622. Both the English and the Hindi corpus
was taken from the Parallel Universal Dependency (PUD) repository and subsequently
the English corpus was chosen as the source text. The corpus was then translated in
Bengali from scratch by the author, who is also a native speaker of the language, and
thereafter annotated based on universal parts of speech tag, language specific parts of
speech tag and on syntactic levels. The paper also illustrates the linguistic analysis of
the PUD treebank and concludes with the kappa score.
language, Bengali. The treebank consists of 1000 Bengali sentences created using a
parallel corpus of English-Bengali and Hindi-Bengali. The number of tokens reported for
the 200 manually annotated sentences are 2622. Both the English and the Hindi corpus
was taken from the Parallel Universal Dependency (PUD) repository and subsequently
the English corpus was chosen as the source text. The corpus was then translated in
Bengali from scratch by the author, who is also a native speaker of the language, and
thereafter annotated based on universal parts of speech tag, language specific parts of
speech tag and on syntactic levels. The paper also illustrates the linguistic analysis of
the PUD treebank and concludes with the kappa score.
| Original language | English |
|---|---|
| Publication status | Published - 11-Nov-2021 |
| Externally published | Yes |
| Event | Widening Natural Language Processing (WiNLP) - Hybrid, Punta Cana, Dominican Republic Duration: 11-Nov-2021 → 11-Nov-2021 |
Workshop
| Workshop | Widening Natural Language Processing (WiNLP) |
|---|---|
| Country/Territory | Dominican Republic |
| City | Punta Cana |
| Period | 11/11/2021 → 11/11/2021 |
Fingerprint
Dive into the research topics of 'Bengali Parallel Universal Dependency Treebank'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver