TY - GEN
T1 - MaCoCu
T2 - 23rd Annual Conference of the European Association for Machine Translation, EAMT 2022
AU - Bañón, Marta
AU - Esplà-Gomis, Miquel
AU - Forcada, Mikel L.
AU - García-Romero, Cristian
AU - Kuzman, Taja
AU - Ljubešić, Nikola
AU - van Noord, Rik
AU - Sempere, Leopoldo Pla
AU - Ramírez-Sánchez, Gema
AU - Rupnik, Peter
AU - Suchomel, Vít
AU - Toral, Antonio
AU - van der Werff, Tobias
AU - Zaragoza, Jaume
N1 - Funding Information:
This action has received funding from the European Union's Connecting Europe Facility 2014-2020 - CEF Telecom, under Grant Agreement No. INEA/CEF/ICT/A2020/2278341. The contents of this publication are the sole responsibility of its authors and do not necessarily reflect the opinion of the European Union.
Funding Information:
We introduce the project MaCoCu: Mas sive collection and curation of monolin gual and bilingual data: focus on under-resourced languages, funded by the Connecting Europe Facility, which is aimed at building monolingual and parallel corpora for under-resourced European languages. The approach followed consists of crawling large amounts of textual data from selected top-level domains of the Internet, and then applying a curation and enrichment pipeline. In addition to corpora, the project will release the free/open-source web crawling and curation software used.
Funding Information:
This action has received funding from the European Union’s Connecting Europe Facility 2014-2020 - CEF Telecom, under Grant Agreement No.
Publisher Copyright:
© 2022 The authors.
PY - 2022
Y1 - 2022
N2 - We introduce the project MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages, funded by the Connecting Europe Facility, which is aimed at building monolingual and parallel corpora for under-resourced European languages. The approach followed consists of crawling large amounts of textual data from selected top-level domains of the Internet, and then applying a curation and enrichment pipeline. In addition to corpora, the project will release the free/open-source web crawling and curation software used.
AB - We introduce the project MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages, funded by the Connecting Europe Facility, which is aimed at building monolingual and parallel corpora for under-resourced European languages. The approach followed consists of crawling large amounts of textual data from selected top-level domains of the Internet, and then applying a curation and enrichment pipeline. In addition to corpora, the project will release the free/open-source web crawling and curation software used.
UR - http://www.scopus.com/inward/record.url?scp=85137715124&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85137715124
SP - 303
EP - 304
BT - EAMT 2022 - Proceedings of the 23rd Annual Conference of the European Association for Machine Translation
A2 - Macken, Lieve
A2 - Rufener, Andrew
A2 - Van den Bogaert, Joachim
A2 - Daems, Joke
A2 - Tezcan, Arda
A2 - Vanroy, Bram
A2 - Fonteyne, Margot
A2 - Barrault, Loic
A2 - Costa-Jussa, Marta R.
A2 - Kemp, Ellie
A2 - Pilos, Spyridon
A2 - Declercq, Christophe
A2 - Declercq, Christophe
A2 - Koponen, Maarit
A2 - Forcada, Mikel L.
A2 - Scarton, Carolina
A2 - Moniz, Helena
PB - European Association for Machine Translation
Y2 - 1 June 2022 through 3 June 2022
ER -