Fill-up versus Interpolation Methods for Phrase-based SMT Adaptation

Arianna Bisazza, Nick Ruiz, Marcello Federico

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

49 Citations (Scopus)

Abstract

This paper compares techniques to combine diverse parallel corpora for domain-specific phrase-based SMT system train- ing. We address a common scenario where little in-domain data is available for the task, but where large background models exist for the same language pair. In particular, we fo- cus on phrase table fill-up: a method that effectively exploits background knowledge to improve model coverage, while preserving the more reliable information coming from the in-domain corpus. We present experiments on an emerging transcribed speech translation task – the TED talks. While performing similarly in terms of BLEU and NIST scores to the popular log-linear and linear interpolation techniques, filled-up translation models are more compact and easy to tune by minimum error training.
Original languageEnglish
Title of host publicationProceedings International Workshop on Spoken Language Translation (IWSLT) 2011
Pages136-143
Number of pages8
Publication statusPublished - 2011
Externally publishedYes
EventInternational Workshop on Spoken Language Translation 2011 - San Francisco
Duration: 8-Dec-20119-Dec-2011

Conference

ConferenceInternational Workshop on Spoken Language Translation 2011
CitySan Francisco
Period08/12/201109/12/2011

Cite this