Body, brain and language
A new
paradigm for language comprehension and learning
Overview
Rainer
von Königslöw, Ph.D.
We start
with the assumption that human language is an evolutionary trait that
facilitates rapid adaptation and learning.
In other words, we assume that from an evolutionary perspective,
language is primarily for self-programming and only secondarily for
communication. To explore how this might
work we assume that computers were built in man’s image, and explore how the
concept of stored instructions in computers could help explain the role of
language in humans.
Current
research makes modern man approximately 150 thousand years old. The general assumption seems to be that the
genetic makeup and the general structures have not changed, presumably
including the brain structures and capabilities we are discussing here. The capability for individual (solipsistic)
use of language-like constructs for learning and adaptation would presumably be
there from the beginning. The beginning
of spoken language communication is hard to estimate. It requires shared conventions of sound
production and references within a language community. It is a cultural property and likely came
later. However, one of the main
functions of language is to help in the organization of complexity, such as
complex tasks and social organizations.
We can therefore speculate on the size of social organizations that
could not have functioned without spoken language, and though that estimate the
beginnings of organized spoken communication.
Self-consciousness enables self-evaluation, speeding learning and faster
adaptation to hostile environments.
Julian Jaynes (1976) suggests that consciousness and self-consciousness
is fairly recent. Written language is 4
to 6 thousand years old. We conjecture
that spoken and written communication help in establishing a sense of shared
reality, and through that allow for planned activities that commit individuals
to predictable future activities. A
group with shared reality can predict the consequences of coordinated
activities, and thus motivate individuals in the group to cooperate to obtain
the predicted benefits.
According
to this theory, the main topics of language comprehension are:
Let us
decompose the problem of language comprehension by identifying boxes within
boxes, each associated with topics to be explained, and relevant methods for
explanation. The first type of box, the
largest container, is the external world.
All observable phenomena, including ‘real’ objects and events, as well
as ‘public’ language objects and events are contained within this box. Bodies corresponding to individuals are
separate boxes within this box. The body
representing an individual has a variety of ways of acting and interacting with
the external world, including behaviour and perception. We will posit the brain as a box within the
body of an individual, i.e., a third type of container. These three boxes are physical and
visible. The next box is functional,
representing the capacity to remember.
We are assuming that the memory we are concerned about is a function of
the brain, i.e., that it is a box within the brain, a fourth type of
container.
Let us go
through the 4 boxes and briefly indicate the contents and functions:
The
‘external world’ contains physical and chemical events. Language is present as marks on paper or as
acoustic events. Space and time are key
parameters. Individuals are contained
within this box, with skin as boundary.
The functions are described by the usual theories in the natural and
social sciences.
The ‘body’
of an individual contains muscles and bones etc. within skin. Perceptors such as eyes and ears belong into
this box, as well as actuators such as arms and legs. The basic structure is assumed to be
primarily genetically determined, but history (experience) will affect details
such as the strength of muscles, etc.
Language is represented as tongue and vocal chord movements, as
vibrations of the tympanic membrane in the ear, or as patterns on the retinas. The functions are described by the usual
theories in medical sciences and psychology.
The ‘brain’
of an individual contains neural cells, linked to receptors and actuators
through nerve cells. The basic structure is assumed to be primarily genetically
determined, but history (experience) will affect details such as acuity of
hearing, etc. Language is represented as
pattered neural activity in the speech cortex, or as patterns in the visual
cortex, etc. Both function and
localization of function are described in the relevant scientific literature.
The ‘memory’
function of the brain has been investigated extensively. Neural connectivity, chemistry, and other
mechanisms have been proposed as physical and chemical representation of the
memory function.
Let us go
through the 4 boxes and briefly indicate the phenomena to be explained:
The ‘external world’ contains objects and events as well as
language-based objects and events dealing with these events as part of
communication between individuals. To
make the language objects, e.g., books, useful and the communication events
meaningful we have to speculate about shared reality. It is also clear that there are limits to
shared reality, since not all individuals can read the same books or have
meaningful conversations.
The ‘body’ of the individual may turn the head to listen to language,
may mutter to himself as he concentrates on doing something, writes and reads
notes to himself, talks to and listens to other individuals.
The ‘brain’ of the individual shows a lot of neural activity during
language use, with some localization.
The ‘memory’ of the individual is clearly involved in storing ephemeral
speech sounds, in making up complex sentences, and in recalling and using
explicitly memorized sentences and phrases.
At this
point we can explore an analogy with computers to make a conjecture about some
brain functionality. Traditionally
programmes in artificial intelligence have been constructed to simulate human
functions. For this analysis, we will
start with the general assumption that both hardware and software of computers
have been constructed in the image of man, and therefore can be used as analogy
in reverse.
A single
computer might be analogous to an individual.
Computers
coexist in an environment, and are frequently connected with networks so that they
can communicate. This communication is
somewhat analogous to individuals communicating in the external world.
Computers
have central processing units that are roughly analogous to brains in
individuals.
Computers
have memories. These can be distributed
across several locations, and generally are not within the central processing
unit. However, they can be seen as
functionally analogous to human memory.
Using this
analogy we can make our first conjecture. Computers of the same general type can have
very different central processing units, based on different chipsets, with
different clock speeds etc., but they share the same instruction set capable of
reading and interpreting stored instructions.
This allows them to function equivalently. We conjecture that the human brain also has
the capability of interpreting stored instructions, with the same basic
instruction set. Furthermore we
conjecture that the brain stores programmes in memory for this instruction set,
and that it uses or ‘executes’ these programmes to control some of its
functions.
Supporting
argument and evidence: if we assume that
the human capacity for language reflects this basic capacity for interpreting
stored instructions, then the findings of Noam Chomsky and others supporting a
universal grammar provides indirect evidence that the underlying instruction
set may be universal. Psychological
evidence for universal developmental stages may provide further support.
The analogy
and conjecture above lead to a second conjecture. The machine language (assembler)
representation of programmes suitable for the instruction set of the central
processing units of computers has proved to be very awkward and unintuitive to
use. Higher-level languages were devised
that represented the instructions in a more symbolic and easier to understand
form. Examples of such languages are
Cobol, Fortran, Lisp, Basic, C, and Prolog.
The symbolic textual representations of software written in these
languages are easier to generate and to modify.
The text representation (source) has to be translated into machine
language form by special programmes called compilers and interpreters. We can conjecture by analogy that there are
equivalent higher-level symbolic representations of brain instructions for
humans, that they are easier to modify, but that they have to be translated
into basic universal brain instructions to be executed. Furthermore we conjecture that natural
language such as English serve as the high level symbolic programming languages
for humans.
Support for
the second conjecture is based on the experience that machine language or even
assembler programmes are very difficult to maintain. It is also difficult to transfer these
programmes to another person, or to work together as a team. For programmes written in higher level
symbolically expressed languages it is much easier to share programmes and to
work together as a team. We would
therefore conjecture that natural language of the symbolic representation of
brain instructions. By extension,
communication in natural language on tasks may be analogous to exchanging and
working together on symbolically expressed software programmes.
Further
following our computer analogy, high-level symbolic programmes normally utilize
lower-level programmes as well as hardware capabilities and functions. On the other hand, there are computer
functions and capabilities that only require firmware and hardware. In looking at computer capabilities and
functions, we normally look at the highest level at which these capabilities
are represented. In other words, to
modify a C programme, we edit the symbolically represented source code. We do not edit the compiled code or try to
change CPU instructions or the computer hardware on the motherboard etc. Analogously, in looking at human capabilities
and functions, we infer that it would make sense to look at the highest level
of programming or representation.
We now need
to look at human functions and capabilities to see how they might utilize and
depend on brain programmes.
Conjecture
2a: the more universal a human behaviour,
function, or capability, the more it may depend primarily on genetically
built-in versus programmed functions.
The corollary is that the more diverse behaviours, functions, or
capabilities are more likely they are to be represented primarily in
high-level, symbolic form.
Conjecture
2b: On computers, programmes that need very fast
responses are more likely to be written in low-level assembler code or
equivalent. Conversely, if speed or high
performance is not an issue, the programme might be kept in a high-level
language. Similar reasoning might be
true for human programmes.
Conjecture
2c: On computers, programmes that must be
modified often and/or extensively are usually kept in high-level symbolic
form. Similar reasoning might be true
for human programmes.
Conjecture
2d: Plans and
descriptions that are accessible to consciousness, i.e., can readily be
described in natural language, and reasoned about, are likely in high-level
symbolic form. Similarly for actions
that may be accompanied by ‘talking aloud’ about the steps.
Conjecture
2e: Behaviour and capabilities that cannot be
brought to consciousness, and that seem very ‘automatic’ may be a low-level
brain-code programme or a ‘hard-wired’ brain function. If they are universal and appear to have
existed for centuries, they may be ‘hard-wired’. If not, they are more likely to be a
low-level brain-code programme.
Further
following our computer analogy, computers generally function under control of
some programmes, when they are turned on and functioning normally. In most computers these programmes boot
automatically once they have been loaded initially. Humans are never fully shut off, and any
initial programme must be supplied and loaded genetically.
A third
conjecture follows the conjecture on brain programming. Computers are essentially passive. One encounters them in association with
operators/programmers. Humans, on the
other hand, are active and there are no obvious ways of entering programmes,
especially into babies and young children.
We therefore conjecture that humans are self-programming. We conjecture that all normal activities such
as play, conscious problem solving, and dreaming all contribute to an ongoing
activity of self-programming. Handling
language and handling objects in the environment may be among the early
programmes to be self-constructed.
Learning in school from language instruction requires these basic
programmes to be in place. Learning in
general suggests that the programmes are self-modifying.
Note: Some level of self-programming may exist in other species. Imprinting in chicks appears to be a time-window dependent self-programming task.
Conjecture
3a: Some computers can boot into alternative operating
systems. Humans may have an equivalent
‘dual-boot’ architecture, between waking mode and sleep mode. This daily switch in operating modes may have
a variety of functions, including the ‘reboot’ function common to
computers. The ‘wake’ system appears to
function under thought control, the highest level of programming, while the
‘sleep’ system appears to function under ‘brainware’ control, and have
maintenance functionality, including potentially cleaning up and maintaining
the ‘thoughtware’.
Further
following our computer analogy, computers generally function in a network, in
constant contact and in communication with other computers, with which they
communicate and exchange data etc.
Similarly, humans function in a social network, with constant
communication, primarily language based.
A fourth
conjecture follows the conjecture on self-programming and operating
modes. When humans communicate they
generally believe that they can understand one another. Special language and communication processes
allow humans within a language and cultural group to function with the
assumption and belief in a shared reality.
Conjecture
4a: Synchronizing shared reality derives in part
from shared educational and other experiences.
Conjecture
4b: Synchronizing shared reality is an active
process, involving the detection of mismatches.
Conjecture
4c: There are processes for disengaging from
communication when synchronization fails, such as exemplified by some
discussions on religion and politics.
Conjecture
4d: Faults in synchronizing shared reality can
lead to major problems and errors in the workplace.
Conjecture
4e: Part of the art of diplomacy and politics
deals with imperfectly synchronized shared reality.
We
conjecture that ‘conscious’, ‘planned’, ‘willed’, or ‘intentional’ behaviour
involves the invocation and use of functions at all four level (in all four
boxes) we have identified.
Action,
sensation, perception, and communication
We are
particularly interested in communication.
We can differentiate the following categories:
Simulating
shared cognitive (symbolic, language) reality
In
communication through natural language, there is an important implicit
assumptions that the words we use mean the same thing, i.e., that they refer to
the same objects in the external world, the same action sequences, the same
concepts, etc.
The
simplest case is a type of correspondence we might get when a young child is
learning the names of objects. The adult
gestures by pointing to an object, e.g., a chair and says “chair”. The child perceives the adult as pointing to
the chair. Both the adult and the child
hear the speech sound “chair”. Then the
child gestures by pointing to the same object, the chair and says “chair”. The adult sees the child pointing at the
chair. Both the adult and the child hear
the speech sound “chair”.
Learning
In general
what we mean by learning is that an individual is intentionally extending
functionality through structured (systematic) interaction with the environment.
Body
functions: strengthen muscles through
exercises, improve auditory or visual acuity with practice
Brain
functions: basic Pavlovian conditioning
– makes neural connections?; learn
language functions with babbling?
Brainware
functions: ?
Thoughtware
functions: Memorize the script and stage
directions for a character in a play, then act in the play and make the
character believable.
Time:
history, present, future -
Body
functions:
·
Past
affecting present: short term effects
from fatigue with recovery to ‘normal’, longer term effects from practicing,
but only incremental effects, a slow drift of ‘normal’. There can be ‘mode switching’ with waking vs.
sleep, or drug use. There can be
maturational effects and step switches from ‘trauma’, with or without recovery.
·
Future
affecting present: there is no complex
pattern stored in the body (not brain-memory) controlling future muscle or
preceptor patterns.
Brain
functions:
·
Past
affecting present: repeated pattern
fatigue, sensitization, all short term except for maturational effects.
·
Future
affecting present: plans for the future
are in memory, not likely in neural firing patterns.
Memory –
brainware & thoughtware functions:
We can work
through an example, to illustrate how this layered architecture might
work. We shall look at the example of an
actor learning a script, both with dialogue and stage directions. We shall assume that the stage directions are
much more explicit than they might usually be.
At the
beginning there is the script, in written (printed) language form in a
book. This is an object in the external
world that the actor is bringing along to the stage. Let us also assume that the stage is empty,
but the lights are on, and the actor goes to the middle of the stage, facing the
front. Let us assume that the lights are
on, so that the actor can read the book.
Process: the actor reads the script and acts out the
part, in segments. He also memorizes the
script and stage directions, all more or less simultaneously.
Process 1: the actor reads the script and understands
it, both dialogue and stage-directions
‘body
functions’ include holding the book, turning the pages, and appropriate eye
movements to focus on the writing and scan (left-to-right and top to bottom for
English)
‘brain
functions’ in the visual cortex involve pattern recognition and feature
extraction
‘brain-ware
functions’, i.e., brain-language memory-stored instructions, involve letter and
word recognition, as well as guiding the reading-scan of the eyes. Language recognition is another component,
where the visual representation in printed sentences is switched to meaningful
thought
‘thought-ware
functions’, i.e., natural-language memory-stored instructions, involve
following the implications of what is written to what can reasonably be assumed
or inferred from context or other indications.
For instance, goal-specified stage directions such as “walk to the
centre of the stage” have to be translated into “turn toward the centre, start
walking toward …, avoid obstacle …, stop when … is reached, turn toward …”.
Process
2: the actor acts out the part
‘body
functions’ involve the walking motions as well as any hand or body gestures
that are indicated.
‘brain
functions’ involve the motor controls involved in walking, balancing and
maintaining a posture, etc.
‘brain-ware
functions’, i.e., the brain-language memory-stored instructions
‘thought-ware
functions’, i.e., natural-language memory-stored instructions
Process
3: the actor speaks out the part
‘body
functions’: The mouth, tongue etc. may
be involved if some speech has to be delivered during the motion.
‘brain
functions’ involve the speech controls for the vocal cords, throat, mouth,
tongue, etc.
‘brain-ware
functions’, i.e., the brain-language memory-stored instructions, involved in
translating a ‘thought-sentence’ into a ‘speech-sentence’ along with
appropriate intonation, projection, enunciation, etc.
‘thought-ware
functions’, i.e., natural-language memory-stored instructions, involved in
framing a ‘though-sentence’ along with the attitude to be conveyed, dialect to
be used, etc.
Process
4: the actor pays attention to visual
and auditory clues
‘body
functions’: Eyes may be involved to get
visual clues for orientation. Ears may
be involved to synchronize movements with music or dialogue coming from other
actors.
‘brain
functions’ involve the visual and/or auditory cortex, etc.
‘brain-ware
functions’, i.e., the brain-language memory-stored instructions, used to
interpret the sights and sounds. Some
filtering may be applied at this level.
‘thought-ware
functions’, i.e., natural-language memory-stored instructions, involved in
interpreting the clues for relevance.
Unanticipated clues also have to be processed, some of which can change
the context, such as if someone yells “Fire!”.
Process
4: the actor memorizes the script and
stage directions
‘body
functions’ are unlikely to be memorized directly, at least for a play.
‘brain
functions’, outside of stored instructions, are unlikely to be memorized
independently
‘brain-ware
functions’, i.e., the brain-language memory-stored instructions, deal with
basic capabilities such as language processing.
They are unlikely to be changed for highly intentional and fairly
short-term activity such as acting in a play.
‘thought-ware
functions’, i.e., natural-language memory-stored instructions, are likely to
provide the memory capacity for intentional activity such as pretending to be
someone else, going through a fairly fixed sequence of behaviour in a fixed
context, in a simulated reality shared during the performance.
How would it
work: interactive game simulations
Interactive
multi-player video-games played on PC’s over the internet provide a good
context in which to explore this theory.
We can simulate the play-acting scenario described above. The players are at remote locations, but they
share a virtual reality. The players are
represented in this virtual reality, and can see other players.
Let us
start with a single player with a script as described above. This essentially reduces to animation. Let us assume the player has memorized the
script, and is acting it out. Let us further
assume that a video camera is running in the external environment, recording
the action. Let us assume that the
action corresponding to the script takes 30 seconds, and that the video camera
is running at 30 frames per second. Let
us assume that we are 5 seconds into the action. Let us start by focussing on action in space:
In the ‘external world’, we can describe the location of the actor in
terms of spatial dimensions relative to the stage, just like in high school
physics. In theory, and with endless
patience, we could record the action (both the movements and the gestures) in
terms of the location of the parts of the body in three dimensional space over
time. We record the outside of the body,
i.e., the skin and the clothing. This is
the record of the action as captured on video, especially if we had several
video cameras running from different angles.
For the ‘body functions’, the action is represented by the skeleton,
joints, and muscles. The actor can flex
muscles to move bones or create a facial expression to represent an emotion. The body provides some feedback on this
muscle activity, but it does not have built-in GPS (global positioning
system). The bones and the skin can be
bent and flexed, but the body cannot move it to an absolute spatial location
without feedback from the eyes or the stage director. Motion of skin and bones is relative to other
bones, and only very indirectly relative to the stage.
The ‘brain functions’ control the body motions and positions and
receives feedback through electrical control signals (nerve signals). This is somewhat like a process control
system in a highly automated factory. We
imagine that there is a set of built-in (genetically set) universal control
sequences, such as for walking, grasping, swallowing, talking. Each of the control sequences have relevant
feedback signals to control the trajectories and the smoothness of the
action. We assume these to be relatively
universal, with slight genetic variations.
The ‘brain-ware functions, i.e., the ‘brain-language stored
instructions’ we conjecture to extend the brain functions for learnt movements
that go beyond ‘native’ movements, e.g., riding (and balancing on) a
bicycle. Another example might be a set
of gestures learnt in acting school, and remembered through much practice till
they are more or less automatic and not open to introspection or conscious
control.
The ‘thought-ware functions’, i.e., the ‘natural-language stored
instructions’ are the stage directions.
These include methods and goals such as “walk to the centre of the stage
and stop, facing the audience with both arms straight forward and palms
up”. Reference to absolute position such
as centre stage can be interpreted through visual feedback. This instruction is then interpreted into a
series of movements and gestures. The
conscious sequence of detailed stylistic movements and gestures is then both
remembered and acted out, with awareness of where the actor is in the sequence
and on what needs to be done next. The
detailed movements and gestures must correspond to ‘brain-ware functions’ that
are invoked at the appropriate time in the sequence.
Let us now
focus on the time, past, present, and future, and look only at the action,
without feedback:
The ‘external world’ is represented by the video camera. At 30 fps over 5s we have recorded 150
frames.
The ‘body functions’ are represented by skeleton positions and joint
angles at the present moment. Since
there is momentum and muscles take time to react, we can assume that the next
few milliseconds of action are already on their way from the brain to the
muscles, or being processed by the muscles.
In other words, the next 10 frames or so of future action are committed
to and being processed by the body.
The ‘brain functions’ are represented by another few frames of action
being prepared by the motor centre of the brain for sending to the respective
muscles. Let us assume 10 frames prior
to those already being processed by the body.
The ‘brain-ware functions, i.e., the ‘brain-language stored
instructions’ is busy interpreting sentence such as “walk 3 steps
forward”. We assume that it handles
somewhat larger time chunks, since our reaction time is not that fast if we
wanted to change direction. Let us
assume a 1 second minimum time frame, i.e., 30 frames. This presumably is prior to activating the
neurons in the motor centre.
The ‘thought-ware functions’, i.e., the ‘natural-language stored
instructions’ deals with retrieving the script from memory and possible
interpreting the general instructions such as “walk across the stage” into more
specific instructions such as “walk one step forward, leading with the left
leg”. Assuming the actor can recall the
stage directions in sequence, there may be a chunk of which he is conscious
(e.g., 3 seconds = 90 frames), and other chunks that will become conscious
next. If the actor can remember all of
the stage directions for the remainder of the action, this takes us to the end
of the sequence. (If not, he may have to
pick up the book and read to remind himself.)
Snapshot of what we would find at the present – according to the theory,
and if we were Superman
Past: 0-149 frames – recorded in video camera as
video-frames 0-149
Present:
frame 150 – being recorded, and also reflecting the present location of bones
and joints.
Future 1:
frame 151-160 – already being processed by the body, bone momentum, muscles,
nerves
Future 2:
frame 161-170 – being processed by the brain in the motor centre
Future 3:
frame 171-200 – being processed by brainware
Future 4:
frame 201-290 – ‘consciously’ being processed by thoughtware
Future 5:
frame 291-990 – in memory, to be accessed and processed by thoughtware
So far we
have not included any perceptual feedback.
We have assumed that the stage is clear and large enough, so that the
actor can act out the script without running into obstacles or falling off the
stage. We can add perceptual (visual)
feedback by assuming a similar progression of the perceived information through
the layers of information processing.
At the
present, frame 150 – image on the retina, reflecting the present focus of the
eyes
Past 1:
frame 145 – 149 – image processed in the brain, by the visual cortex, for
low-level feature extraction
Past 2:
frame 140 – 144 – features processed by brainware for image recognition
Past 3:
frame 130 – 139 – recognized image is named and interpreted by
thoughtware. Inferences are drawn in
language.
In the
example above, we have made the simplifying assumption that images are
processed in 1/3 second batches and then passed to the higher processing
stage. We have also assumed that the actor
is not turning his head and/or refocusing the eyes. Better evidence comes from tachistoscopic and
other experiments.
Let us now
add the assumption that the actor has noticed a possible obstacle (based on
Past 3) and takes a closer look. At the
same time he may plan to change course.
Let us assume that taking a closer look is automatic, not consciously
planned. We are adding parallel activity
to future time periods – future frames.
Future 1:
frame 151-160
–
Original
walk-action already being processed by the body, bone momentum, muscles, nerves
–
Eye-muscle
direction being processed by brainware (frame 151 – 155)
–
Eye-muscle
direction being processed by brain (frame 156 – 160)
–
Planning
change in direction by thoughtware
Future 2:
frame 161-170
–
Original
walk-action being processed by the brain in the motor centre
–
Eye-muscle
direction being processed by eye (frame 161 – 165)
–
image
on the retina, reflecting the focus of the eyes (frame 165)
–
image
processed in the brain, by the visual cortex, for low-level feature extraction
(frame 166 – 170)
–
Planning
change in direction by thoughtware
Future 3:
frame 171-200
–
Original
walk-action being processed by brainware
–
features
processed by brainware for image recognition (frame 171 – 175)
–
recognized
image is named and interpreted by thoughtware (frame 176 – 200)
Future 4:
frame 201-290
–
Original
walk-action ‘consciously’ being processed by thoughtware
–
Walk-action
changed to integrate planned change and recognized obstacle
Future 5:
frame 291-990
–
in
memory, to be accessed and processed by thoughtware
–
may
need to be edited to fit change to avoid recognized obstacle
At this
point we have a simulation of a very simple adaptation. We could also have a change because the stage
director speaks to the actor (at the present).
We have not modelled learning, but the change in the walking script
could be stored in memory, and thus affect future acting behaviour.
----- extra stuff --------
The ‘external world’
The ‘body functions’
The ‘brain functions’
The ‘brain-ware functions, i.e., the ‘brain-language stored
instructions’
The ‘thought-ware functions’, i.e., the ‘natural-language stored
instructions’