Many application areas ranging from serious games for health to learning by demonstration in robotics, could beneﬁt from large body movement datasets extracted from textual instructions accompanied by images. The interpretation of instructions for the automatic generation of the corresponding motions (e.g. exercises) and the validation of these movements are difficult tasks. In this article we describe a ﬁrst step towards achieving automated extraction. We have recorded ﬁve different exercises in random order with the help of seven amateur performers using a Kinect. During the recording, we found that the same exercise was interpreted differently by each human performer even though they were given identical textual instructions. We performed a quality assessment study based on that data using a crowdsourcing approach and tested the inter-rater agreement for different types of visualizations, where the RGB-based visualization showed the best agreement among the annotators.