Abstract
CLIP (Contrastive Language-Image Pre-training) is a very recent multi-modal model that jointly learns representations of images and texts. The model is trained on a massive amount of English data and shows impressive performance on zero-shot classification tasks. Training the same model on a different language is not trivial, since data in other languages might be not enough and the model needs high-quality translations of the texts to guarantee a good performance. In this paper, we present the first CLIP model for the Italian Language (CLIP-Italian), trained on more than 1.4 million image-text pairs. Results show that CLIP-Italian outperforms the multilingual CLIP model on the tasks of image retrieval and zero-shot classification.
Original language | English |
---|---|
Publisher | arXiv |
Publication status | Submitted - 19-Aug-2021 |
Publication series
Name | ArXiv |
---|---|
Publisher | Cornell University Press |
ISSN (Print) | 2331-8422 |
Keywords
- contrastive learning
- zero-shot image classification
- deep learning
- natural language processing
- italian language
- image retrieval
Fingerprint
Dive into the research topics of 'Contrastive Language-Image Pre-training for the Italian Language'. Together they form a unique fingerprint.Press/Media
-
The AI that explains images in Italian
Bianchi, F., Attanasio, G., Pisoni, R., Terragni, S., Sarti, G. & Lakshimi, S.
13/09/2021
1 item of Media coverage
Press/Media: Research › Academic
-
CLIP-Italian: A new AI model to connect images and texts in Italian
Bianchi, F., Attanasio, G., Pisoni, R., Terragni, S., Sarti, G. & Lakshimi, S.
26/08/2021
1 item of Media coverage
Press/Media: Research › Academic