Enabling multiple-purpose robots to follow textual instructions is an important challenge on the path to automating skill acquisition. In order to contribute to this goal, we work with physical exercise instructions as an everyday activity domain where textual descriptions are usually focused on body movements. Body movements are a common element across a broad range of activities that are of interest for robotic automation. Developing a text-to-animation system, as a first step towards understanding language for machines, is an important task. The process requires natural language understanding (NLU) including non-declarative sentences and the extraction of semantic information from complex syntactic structures with a large number of potential interpretations. Despite a comparatively high density of semantic references to body movements, exercise instructions still contain a large amount of underspecified information. Detecting and bridging or filling such underspecified elements is extremely challenging when relying on methods from NLU alone. Humans, however, can often add such implicit information with ease, due to its embodied nature. We present a process that contains a combination of a semantic parser and a Bayesian network. It explicates the information that is contained in textual movement instructions so that an animation execution of the motion-sequences performed by a virtual humanoid character can be rendered. Human computation is then employed to determine best candidates and to further inform the models in order to increase performance adequacy.