This paper presents an assessment of automatically generated multimodal referring expressions as produced by embodied conversational agents in a virtual world. The algorithm used for this purpose employs general principles of human motor control and cooperativity in dialogues that can be parameterised so as to vary the precision of the pointing gestures and the amount of linguistic information included in the referring expressions. The study assessed how native speakers of English and Japanese perceived three different algorithmic outputs for multimodal referring behaviour in terms of understandability, human-likeness and a social practice (selling). Results show that users generally prefer mobile agents that are economical in their linguistic descriptions to stationary verbose agents. They also show the need for further calibration of the algorithm to accommodate the differences between the two groups. In addition to the detailed description of the set up and results of the study, the paper discusses implications for the design and use of agents, methodological issues that arose while conducting the cross-cultural study and directions for future work. (C) 2012 Elsevier Ltd. All rights reserved.