Bengali Parallel Universal Dependency Treebank

Pritha Majumdar*

*Corresponding author voor dit werk

OnderzoeksoutputAcademic

Samenvatting

This paper presents a Parallel Universal Dependency (PUD) treebank for the Indo-Aryan
language, Bengali. The treebank consists of 1000 Bengali sentences created using a
parallel corpus of English-Bengali and Hindi-Bengali. The number of tokens reported for
the 200 manually annotated sentences are 2622. Both the English and the Hindi corpus
was taken from the Parallel Universal Dependency (PUD) repository and subsequently
the English corpus was chosen as the source text. The corpus was then translated in
Bengali from scratch by the author, who is also a native speaker of the language, and
thereafter annotated based on universal parts of speech tag, language specific parts of
speech tag and on syntactic levels. The paper also illustrates the linguistic analysis of
the PUD treebank and concludes with the kappa score.
Originele taal-2English
StatusPublished - 11-nov.-2021
Extern gepubliceerdJa
EvenementWidening Natural Language Processing (WiNLP) - Hybrid, Punta Cana, Dominican Republic
Duur: 11-nov.-202111-nov.-2021

Workshop

WorkshopWidening Natural Language Processing (WiNLP)
Land/RegioDominican Republic
StadPunta Cana
Periode11/11/202111/11/2021

Vingerafdruk

Duik in de onderzoeksthema's van 'Bengali Parallel Universal Dependency Treebank'. Samen vormen ze een unieke vingerafdruk.

Citeer dit