DescriptionAnnotation of Text and Pictures in Health Care Instructions Instructions are ubiquitous in modern life. For instance, in health communication we use instructions when we apply a sling, take medication or operate an automated external defibrillator in an emergency. These instructions often contain text and pictures. While the benefit of including pictures in such multimodal instructions (MIs) has been well established, the design of text and pictures and the relations between them have not been researched in a systematic manner. We present a corpus-based annotation scheme that allows for a methodical investigation of the use and functions of textual and pictorial information in MIs. To develop our annotation scheme, we analysed a corpus of 227 health care instructions encoded by 13 annotators in main categories for function, text, pictures, and the relationship between text and pictures with respectively 4, 13, 10 and 3 subcategories. A theoretical evaluation of the scheme prompted some changes in definitions and names of categories. For instance, the label ‘affordances’ in annotating pictures was replaced with the more concrete and transparent label ‘presence of human body parts’. In an evaluation of the annotated corpus, 12 of the 30 categories were found to be not sufficiently reliable. For instance, annotators differed in the number of acts they discerned in the instruction. In those cases, the definition and/or the annotation process were reformulated for increased precision and specificity. The result of these adjustments and specifications is an improved annotation scheme and a manual with examples for all categories to ensure a systematic description of MIs by multiple annotators. Such a description will support empirical investigations of the effectiveness of text-picture combinations in a context of use, as well as computational modeling for annotating and generating multimodal health care instructions.