Requirements for visual information processing to support imitation

-- Left overs --

to analyze the perception in such a way that the essential elements to be imitated are recognized
to analyze the perception so that the movements of the perceived and recognized elements are translated into position goals for the limb that matches the perceived element
The position goals derived from perception as above must be translated into detailed instructions for the muscles that control the corresponding limb
Identifying targets and separating them from 'background'

1. Let us look at the minimal requirements for the transformation of visual information

Both may be required for something as simple as following the leader in imprinting

Both may be further separated into sub-patches. For example, chicks in imprinting do not suddenly follow another chick rather than the target.

Reviewing our earlier learning model, it must be possible to look through memory and find the foregound patch even though the background differs

Patches must be recognized and remembered separately, so that reoccurrence of a foreground patch in front of a different background can be recognized as the target to follow.

It must be possible to estimate the approximate distance to the foreground patch

Reexamining our imprinting example, chicks can follow at a relatively constant distance from the target and from the preceding chick

Chicks can peck at kernels of grain, and go to the next one. They must therefore be able to judge the distance to know whether to pick it from here or whether to take a few steps.

All of the above capabilities must be relatively unaffected by the angle of the head, and therefore the viewpoint of the eye

Reexamining our imprinting example, chicks can follow at a relatively constant distance from the target and from the preceding chick

The heads of chicks move side to side as well as forward and back, while still recognizing the target

The last two requirements suggest that just separating vision into patches or blobs of perceptual information is not sufficient

Complex comparisons would be required to identify patches as equivalent under viewpoint changes
The comparison would be even more difficult if the target changes orientation, e.g., backview vs. sideview of the hen.

2. Next, let us look at the requirements for translating perception-based information into action-related information

The 'descriptive' information from perception must be transformed into a flow of instructions to the muscles. The perceived movement of limbs must be translated into instructions to our limbs to follow approximately the same movement sequence.

We do not require millisecond by millisecond accuracy, but we do want the same progression such as lifting the knee followed by ...
Two movements appear to be the same if an 18 frame per second video would show them the same

We therefore need at most 18 similar positions per second rather than 1000 similar instructions per second

Even the most precisely choreographed ballet would probably be satisfied if the performance was precise with quarter second accuracy

Listening to the music accompanying the ballet movements, this would be roughly a beat by beat accuracy.

We should also look at the accuracy or precision required for the instructions to the muscles

Continuing with our precisely choreographed ballet, no-one would care about precise muscle tension. They would want to control the position of the limbs relative to the stage.

Body building competitions might be a better example for wanting to show muscle tension, but it seems unlikely that precise repeatability is required. Ballet seems a better example, since other dancers rely on the accuracy of the positions and the repeatability of the motions.

Both our ballet example and the imprinting 'follow the target' example would lead us to speculate that it is more relevant to specify action relative to the environment rather than in terms of repeatable sequences of muscle tension specifications

Finally, let us look at residual problems in translating perception-based information into action-related information

Perception, as discussed above, has to go beyond simple patches of two-dimensional rasterized images.
- For action, it would be ideal if perception could yield 3 dimensional descriptions of limbs that could readily be converted into instructions for action.