Requirements for action-oriented information processing to support imitation

2. Next, let us look at the requirements for translating perception-based information into action-related information

The 'descriptive' information from perception must be transformed into a flow of instructions to the muscles. The perceived movement of limbs must be translated into instructions to our limbs to follow approximately the same movement sequence.

We do not require millisecond by millisecond accuracy, but we do want the same progression such as lifting the knee followed by ...
Two movements appear to be the same if an 18 frame per second video would show them the same

We therefore need at most 18 similar positions per second rather than 1000 similar instructions per second

Even the most precisely choreographed ballet would probably be satisfied if the performance was precise with quarter second accuracy

Listening to the music accompanying the ballet movements, this would be roughly a beat by beat accuracy.

We should also look at the accuracy or precision required for the instructions to the muscles

Continuing with our precisely choreographed ballet, no-one would care about precise muscle tension. They would want to control the position of the limbs relative to the stage.

Body building competitions might be a better example for wanting to show muscle tension, but it seems unlikely that precise repeatability is required. Ballet seems a better example, since other dancers rely on the accuracy of the positions and the repeatability of the motions.

Both our ballet example and the imprinting 'follow the target' example would lead us to speculate that it is more relevant to specify action relative to the environment rather than in terms of repeatable sequences of muscle tension specifications

3. Next, let us look at the transformation requirements for action-related information

The 'raw' information consists of a flow of instructions to the muscles. Based on tremendous simplification, we estimated a minimum of 84 such muscles just for gross body movements (no hands, jaw, tongue, etc.). We also estimated that we would need a new set of instructions approximately every millisecond.
Let us look at the shortcomings of our initial model of simply replaying a stored set of instructions from memory

The detailed instructions to the muscles need to adapt to the current circumstances

For example, it is unlikely that two walks are precisely the same, millisecond by millisecond. For instance, to walk from the bed to the bathroom at night, even though very routine, will not be identical millisecond by millisecond, and muscle by muscle, even though it is dark and we do not need vision to find the way. The feet are unlikely to touch down in precisely the same spots. The speed is unlikely to be precisely the same. Arm movements are likely to differ. Successive head orientation and eye movements are unlikely to follow exactly the same sequence.

We need to find a way of expressing the action at a higher level that captures the routine nature but does not require such precise repeatability - with the same number of steps at exactly the same time interval, etc. We may not even get out of bed onto the same foot every time, half asleep or not.

One attractive analogy for this problem comes from computing, from computer software instructions

At the simplest level, where we don't need to meet conditions from perception, we have the concept of macro expansion. This does not deal with variability and adpatability.
At the next level, we have the concept of programming instructions with conditions etc..

Finally, let us look at residual problems in translating perception-based information into action-related information

Perception, as discussed above, has to go beyond simple patches of two-dimensional rasterized images.
- For action, it would be ideal if perception could yield 3 dimensional descriptions of limbs that could readily be converted into instructions for action.