Requirements for visual information processing to support imitation
-- Left overs --
- to analyze the perception in such a way that the essential elements to be imitated are recognized
- to analyze the perception so that the movements of the perceived and
recognized elements are translated into position goals for the limb
that matches the perceived element
- The position goals derived from perception as above must be translated into detailed instructions for the muscles that control the corresponding limb
- Identifying targets and separating them from 'background'
1. Let us look at the minimal requirements for the transformation of visual information
- Foreground and background need to be separated
- The foreground may move relative to the background, or the background may move
- Both may be required for something as simple as following the leader in imprinting
- The foreground and the background may be seen as separate visual patches
- Both may be further separated into sub-patches. For example, chicks in imprinting do not suddenly follow another chick rather than the target.
- Reviewing our earlier learning model, it must be possible to look through memory and find the foregound patch even though the background differs
- Patches must be recognized and remembered separately, so that reoccurrence of a foreground patch in front of a different background can be recognized as the target to follow.
- Chicks follow the target in a variety of settings
- It must be possible to estimate the approximate distance to the foreground patch
- Reexamining our imprinting example, chicks can follow at a relatively constant distance from the target and from the preceding chick
- Chicks can peck at kernels of grain, and go to the next one. They must therefore be able to judge the distance to know whether to pick it from here or whether to take a few steps.
- All of the above capabilities must be relatively unaffected by the angle of the head, and therefore the viewpoint of the eye
- Reexamining our imprinting example, chicks can follow at a relatively constant distance from the target and from the preceding chick
- The heads of chicks move side to side as well as forward and back, while still recognizing the target
- The last two requirements suggest that just separating vision into patches or blobs of perceptual information is not sufficient
- Except possibly for stereo vision, patches do not help to estimate distances
- patches are viewpoint dependent and change with change of viewpoint
- Complex comparisons would be required to identify patches as equivalent under viewpoint changes
- The comparison would be even more difficult if the target changes orientation, e.g., backview vs. sideview of the hen.
2. Next, let us look at the requirements for translating perception-based information into action-related information
- The 'descriptive' information from perception must be transformed into a flow of instructions to the muscles. The perceived movement of limbs must be translated into instructions to our limbs to follow approximately the same movement sequence.
- We do not require millisecond by millisecond accuracy, but we do want the same progression such as lifting the knee followed by ...
- Two movements appear to be the same if an 18 frame per second video would show them the same
- We therefore need at most 18 similar positions per second rather than 1000 similar instructions per second
- Even the most precisely choreographed ballet would probably be satisfied if the performance was precise with quarter second accuracy
- Listening to the music accompanying the ballet movements, this would be roughly a beat by beat accuracy.
- We should also look at the accuracy or precision required for the instructions to the muscles
- Continuing with our precisely choreographed ballet, no-one would care about precise muscle tension. They would want to control the position of the limbs relative to the stage.
- Body building competitions might be a better example for wanting to show muscle tension, but it seems unlikely that precise repeatability is required. Ballet seems a better example, since other dancers rely on the accuracy of the positions and the repeatability of the motions.
- Both our ballet example and the imprinting 'follow the target' example would lead us to speculate that it is more relevant to specify action relative to the environment rather than in terms of repeatable sequences of muscle tension specifications
Finally, let us look at residual problems in translating perception-based information into action-related information
- Perception, as discussed above, has to go beyond simple patches of two-dimensional rasterized images.
- For action, it would be ideal if perception could yield 3 dimensional descriptions of limbs that could readily be converted into instructions for action.