This paper presents a quest for the most suitable setting and method to assess the naturalness of the output of an existing algorithm for the generation of multimodal referring expressions. For the evaluation of this algorithm a setting in Second Life was built. This paper reports on a pilot study that aimed to assess (1) the suitability of the setting and (2) the design of our evaluation method. Results show that subjects are able to discriminate between different types of referring expressions the algorithm produces. Lessons learnt in designing questionnaires are also reported.
|Title of host publication||Proc. of IVA-2009|
|Place of Publication||Amsterdam, Netherlands|
|Publication status||Published - 2009|