Fill-up versus Interpolation Methods for Phrase-based SMT Adaptation

Arianna Bisazza, Nick Ruiz, Marcello Federico

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

49 Citations (Scopus)


This paper compares techniques to combine diverse parallel corpora for domain-specific phrase-based SMT system train- ing. We address a common scenario where little in-domain data is available for the task, but where large background models exist for the same language pair. In particular, we fo- cus on phrase table fill-up: a method that effectively exploits background knowledge to improve model coverage, while preserving the more reliable information coming from the in-domain corpus. We present experiments on an emerging transcribed speech translation task – the TED talks. While performing similarly in terms of BLEU and NIST scores to the popular log-linear and linear interpolation techniques, filled-up translation models are more compact and easy to tune by minimum error training.
Original languageEnglish
Title of host publicationProceedings International Workshop on Spoken Language Translation (IWSLT) 2011
Number of pages8
Publication statusPublished - 2011
Externally publishedYes
EventInternational Workshop on Spoken Language Translation 2011 - San Francisco
Duration: 8-Dec-20119-Dec-2011


ConferenceInternational Workshop on Spoken Language Translation 2011
CitySan Francisco

Cite this