Abstract
Massively multilingual models are promising for transfer learning across tasks and languages. However, existing methods are unable to fully leverage training data when it is available in different task-language combinations. To exploit such heterogeneous supervision, we propose Hyper-X, a single hypernetwork that unifies multi-task and multilingual learning with efficient adaptation. It generates weights for adapter modules conditioned on both tasks and language embeddings. By learning to combine task and language-specific knowledge, our model enables zero-shot transfer for unseen languages and task-language combinations. Our experiments on a diverse set of languages demonstrate that Hyper-X achieves the best or competitive gain when a mixture of multiple resources is available, while on par with strong baseline in the standard scenario. Hyper-X is also considerably more efficient in terms of parameters and resources compared to methods that train separate adapters. Finally, Hyper-X consistently produces strong results in few-shot scenarios for new languages, showing the versatility of our approach beyond zero-shot transfer.
Original language | English |
---|---|
Title of host publication | Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing |
Editors | Yoav Goldberg, Zornitsa Kozareva, Yue Zhang |
Publisher | Association for Computational Linguistics (ACL) |
Pages | 7934–7949 |
Number of pages | 16 |
Publication status | Published - 2022 |