Body, brain and language

A new paradigm for language comprehension and learning

Overview

Rainer von Königslöw, Ph.D.

We start with the assumption that human language is an evolutionary trait that facilitates rapid adaptation and learning. In other words, we assume that from an evolutionary perspective, language is primarily for self-programming and only secondarily for communication. To explore how this might work we assume that computers were built in man's image, and explore how the concept of stored instructions in computers could help explain the role of language in humans.

Current research makes modern man approximately 150 thousand years old. The general assumption seems to be that the genetic makeup and the general structures have not changed, presumably including the brain structures and capabilities we are discussing here. The capability for individual (solipsistic) use of language-like constructs for learning and adaptation would presumably be there from the beginning. The beginning of spoken language communication is hard to estimate. It requires shared conventions of sound production and references within a language community. It is a cultural property and likely came later. However, one of the main functions of language is to help in the organization of complexity, such as complex tasks and social organizations. We can therefore speculate on the size of social organizations that could not have functioned without spoken language, and though that estimate the beginnings of organized spoken communication. Self-consciousness enables self-evaluation, speeding learning and faster adaptation to hostile environments. Julian Jaynes (1976) suggests that consciousness and self-consciousness is fairly recent. Written language is 4 to 6 thousand years old. We conjecture that spoken and written communication help in establishing a sense of shared reality, and through that allow for planned activities that commit individuals to predictable future activities. A group with shared reality can predict the consequences of coordinated activities, and thus motivate individuals in the group to cooperate to obtain the predicted benefits.

According to this theory, the main topics of language comprehension are:

* Learning

o Self-motivated learning, systematic purposive learning through experimentation

* Purpose, personal and social control

o Predicting, planning, and controlling future activities

* The appearance of shared reality

o The feeling that the other person understands what we are thinking

o A mental model of what another person thinks, feels, is likely to do

* Teamwork and coordination

o Communication and persuasion to follow a common plan

Framework for the theory

Let us decompose the problem of language comprehension by identifying boxes within boxes, each associated with topics to be explained, and relevant methods for explanation. The first type of box, the largest container, is the external world. All observable phenomena, including 'real' objects and events, as well as 'public' language objects and events are contained within this box. Bodies corresponding to individuals are separate boxes within this box. The body representing an individual has a variety of ways of acting and interacting with the external world, including behaviour and perception. We will posit the brain as a box within the body of an individual, i.e., a third type of container. These three boxes are physical and visible. The next box is functional, representing the capacity to remember. We are assuming that the memory we are concerned about is a function of the brain, i.e., that it is a box within the brain, a fourth type of container.

Let us go through the 4 boxes and briefly indicate the contents and functions:

The 'external world' contains physical and chemical events. Language is present as marks on paper or as acoustic events. Space and time are key parameters. Individuals are contained within this box, with skin as boundary. The functions are described by the usual theories in the natural and social sciences.

The 'body' of an individual contains muscles and bones etc. within skin. Perceptors such as eyes and ears belong into this box, as well as actuators such as arms and legs. The basic structure is assumed to be primarily genetically determined, but history (experience) will affect details such as the strength of muscles, etc. Language is represented as tongue and vocal chord movements, as vibrations of the tympanic membrane in the ear, or as patterns on the retinas. The functions are described by the usual theories in medical sciences and psychology.

The 'brain' of an individual contains neural cells, linked to receptors and actuators through nerve cells. The basic structure is assumed to be primarily genetically determined, but history (experience) will affect details such as acuity of hearing, etc. Language is represented as pattered neural activity in the speech cortex, or as patterns in the visual cortex, etc. Both function and localization of function are described in the relevant scientific literature.

The 'memory' function of the brain has been investigated extensively. Neural connectivity, chemistry, and other mechanisms have been proposed as physical and chemical representation of the memory function.

Let us go through the 4 boxes and briefly indicate the phenomena to be explained:

The 'external world' contains objects and events as well as language-based objects and events dealing with these events as part of communication between individuals. To make the language objects, e.g., books, useful and the communication events meaningful we have to speculate about shared reality. It is also clear that there are limits to shared reality, since not all individuals can read the same books or have meaningful conversations.

The 'body' of the individual may turn the head to listen to language, may mutter to himself as he concentrates on doing something, writes and reads notes to himself, talks to and listens to other individuals.

The 'brain' of the individual shows a lot of neural activity during language use, with some localization.

The 'memory' of the individual is clearly involved in storing ephemeral speech sounds, in making up complex sentences, and in recalling and using explicitly memorized sentences and phrases.

At this point we can explore an analogy with computers to make a conjecture about some brain functionality. Traditionally programmes in artificial intelligence have been constructed to simulate human functions. For this analysis, we will start with the general assumption that both hardware and software of computers have been constructed in the image of man, and therefore can be used as analogy in reverse.

A single computer might be analogous to an individual.

Computers coexist in an environment, and are frequently connected with networks so that they can communicate. This communication is somewhat analogous to individuals communicating in the external world.

Computers have central processing units that are roughly analogous to brains in individuals.

Computers have memories. These can be distributed across several locations, and generally are not within the central processing unit. However, they can be seen as functionally analogous to human memory.

Using this analogy we can make our first conjecture. Computers of the same general type can have very different central processing units, based on different chipsets, with different clock speeds etc., but they share the same instruction set capable of reading and interpreting stored instructions. This allows them to function equivalently. We conjecture that the human brain also has the capability of interpreting stored instructions, with the same basic instruction set. Furthermore we conjecture that the brain stores programmes in memory for this instruction set, and that it uses or 'executes' these programmes to control some of its functions.

Supporting argument and evidence: if we assume that the human capacity for language reflects this basic capacity for interpreting stored instructions, then the findings of Noam Chomsky and others supporting a universal grammar provides indirect evidence that the underlying instruction set may be universal. Psychological evidence for universal developmental stages may provide further support.

The analogy and conjecture above lead to a second conjecture. The machine language (assembler) representation of programmes suitable for the instruction set of the central processing units of computers has proved to be very awkward and unintuitive to use. Higher-level languages were devised that represented the instructions in a more symbolic and easier to understand form. Examples of such languages are Cobol, Fortran, Lisp, Basic, C, and Prolog. The symbolic textual representations of software written in these languages are easier to generate and to modify. The text representation (source) has to be translated into machine language form by special programmes called compilers and interpreters. We can conjecture by analogy that there are equivalent higher-level symbolic representations of brain instructions for humans, that they are easier to modify, but that they have to be translated into basic universal brain instructions to be executed. Furthermore we conjecture that natural language such as English serve as the high level symbolic programming languages for humans.

Support for the second conjecture is based on the experience that machine language or even assembler programmes are very difficult to maintain. It is also difficult to transfer these programmes to another person, or to work together as a team. For programmes written in higher level symbolically expressed languages it is much easier to share programmes and to work together as a team. We would therefore conjecture that natural language of the symbolic representation of brain instructions. By extension, communication in natural language on tasks may be analogous to exchanging and working together on symbolically expressed software programmes.

Further following our computer analogy, high-level symbolic programmes normally utilize lower-level programmes as well as hardware capabilities and functions. On the other hand, there are computer functions and capabilities that only require firmware and hardware. In looking at computer capabilities and functions, we normally look at the highest level at which these capabilities are represented. In other words, to modify a C programme, we edit the symbolically represented source code. We do not edit the compiled code or try to change CPU instructions or the computer hardware on the motherboard etc. Analogously, in looking at human capabilities and functions, we infer that it would make sense to look at the highest level of programming or representation.

We now need to look at human functions and capabilities to see how they might utilize and depend on brain programmes.

Conjecture 2a: the more universal a human behaviour, function, or capability, the more it may depend primarily on genetically built-in versus programmed functions. The corollary is that the more diverse behaviours, functions, or capabilities are more likely they are to be represented primarily in high-level, symbolic form.

Conjecture 2b: On computers, programmes that need very fast responses are more likely to be written in low-level assembler code or equivalent. Conversely, if speed or high performance is not an issue, the programme might be kept in a high-level language. Similar reasoning might be true for human programmes.

Conjecture 2c: On computers, programmes that must be modified often and/or extensively are usually kept in high-level symbolic form. Similar reasoning might be true for human programmes.

Conjecture 2d: Plans and descriptions that are accessible to consciousness, i.e., can readily be described in natural language, and reasoned about, are likely in high-level symbolic form. Similarly for actions that may be accompanied by 'talking aloud' about the steps.

Conjecture 2e: Behaviour and capabilities that cannot be brought to consciousness, and that seem very 'automatic' may be a low-level brain-code programme or a 'hard-wired' brain function. If they are universal and appear to have existed for centuries, they may be 'hard-wired'. If not, they are more likely to be a low-level brain-code programme.

Further following our computer analogy, computers generally function under control of some programmes, when they are turned on and functioning normally. In most computers these programmes boot automatically once they have been loaded initially. Humans are never fully shut off, and any initial programme must be supplied and loaded genetically.

A third conjecture follows the conjecture on brain programming. Computers are essentially passive. One encounters them in association with operators/programmers. Humans, on the other hand, are active and there are no obvious ways of entering programmes, especially into babies and young children. We therefore conjecture that humans are self-programming. We conjecture that all normal activities such as play, conscious problem solving, and dreaming all contribute to an ongoing activity of self-programming. Handling language and handling objects in the environment may be among the early programmes to be self-constructed. Learning in school from language instruction requires these basic programmes to be in place. Learning in general suggests that the programmes are self-modifying.

Note: Some level of self-programming may exist in other species. Imprinting in chicks appears to be a time-window dependent self-programming task.

Conjecture 3a: Some computers can boot into alternative operating systems. Humans may have an equivalent 'dual-boot' architecture, between waking mode and sleep mode. This daily switch in operating modes may have a variety of functions, including the 'reboot' function common to computers. The 'wake' system appears to function under thought control, the highest level of programming, while the 'sleep' system appears to function under 'brainware' control, and have maintenance functionality, including potentially cleaning up and maintaining the 'thoughtware'.

Further following our computer analogy, computers generally function in a network, in constant contact and in communication with other computers, with which they communicate and exchange data etc. Similarly, humans function in a social network, with constant communication, primarily language based.

A fourth conjecture follows the conjecture on self-programming and operating modes. When humans communicate they generally believe that they can understand one another. Special language and communication processes allow humans within a language and cultural group to function with the assumption and belief in a shared reality.

Conjecture 4a: Synchronizing shared reality derives in part from shared educational and other experiences.

Conjecture 4b: Synchronizing shared reality is an active process, involving the detection of mismatches.

Conjecture 4c: There are processes for disengaging from communication when synchronization fails, such as exemplified by some discussions on religion and politics.

Conjecture 4d: Faults in synchronizing shared reality can lead to major problems and errors in the workplace.

Conjecture 4e: Part of the art of diplomacy and politics deals with imperfectly synchronized shared reality.

Interaction with the external world: a layered architecture

We conjecture that 'conscious', 'planned', 'willed', or 'intentional' behaviour involves the invocation and use of functions at all four level (in all four boxes) we have identified.

* 'body functions' involve muscles, the skeleton, preceptors

* 'brain functions' involves activation in some of the different brain regions

* 'brain-ware functions' or low-level programming must be involved, more by conjecture than direct observation

* 'thought-ware functions' is indicated through conscious reflection and by putting intent and plans into sentences.

Action, sensation, perception, and communication

We are particularly interested in communication. We can differentiate the following categories:

* writing, typing – producing symbolic expressions on paper, etc.

* reading

* speaking

* listening to speech

* gesturing

* watching and interpreting gestures

Simulating shared cognitive (symbolic, language) reality

In communication through natural language, there is an important implicit assumptions that the words we use mean the same thing, i.e., that they refer to the same objects in the external world, the same action sequences, the same concepts, etc.

The simplest case is a type of correspondence we might get when a young child is learning the names of objects. The adult gestures by pointing to an object, e.g., a chair and says “chair”. The child perceives the adult as pointing to the chair. Both the adult and the child hear the speech sound “chair”. Then the child gestures by pointing to the same object, the chair and says “chair”. The adult sees the child pointing at the chair. Both the adult and the child hear the speech sound “chair”.

Learning

In general what we mean by learning is that an individual is intentionally extending functionality through structured (systematic) interaction with the environment.

Body functions: strengthen muscles through exercises, improve auditory or visual acuity with practice

Brain functions: basic Pavlovian conditioning – makes neural connections?; learn language functions with babbling?

Brainware functions: ?

Thoughtware functions: Memorize the script and stage directions for a character in a play, then act in the play and make the character believable.

Time: history, present, future -

Body functions:

* Past affecting present: short term effects from fatigue with recovery to 'normal', longer term effects from practicing, but only incremental effects, a slow drift of 'normal'. There can be 'mode switching' with waking vs. sleep, or drug use. There can be maturational effects and step switches from 'trauma', with or without recovery.

* Future affecting present: there is no complex pattern stored in the body (not brain-memory) controlling future muscle or preceptor patterns.

Brain functions:

* Past affecting present: repeated pattern fatigue, sensitization, all short term except for maturational effects.

* Future affecting present: plans for the future are in memory, not likely in neural firing patterns.

Memory – brainware & thoughtware functions:

* Past, present, and future can all be represented as programs or as data – accessible as symbolic representations through language, or not accessible except possibly through dreams etc.

An illustrative example

We can work through an example, to illustrate how this layered architecture might work. We shall look at the example of an actor learning a script, both with dialogue and stage directions. We shall assume that the stage directions are much more explicit than they might usually be.

At the beginning there is the script, in written (printed) language form in a book. This is an object in the external world that the actor is bringing along to the stage. Let us also assume that the stage is empty, but the lights are on, and the actor goes to the middle of the stage, facing the front. Let us assume that the lights are on, so that the actor can read the book.

Process: the actor reads the script and acts out the part, in segments. He also memorizes the script and stage directions, all more or less simultaneously.

Process 1: the actor reads the script and understands it, both dialogue and stage-directions

'body functions' include holding the book, turning the pages, and appropriate eye movements to focus on the writing and scan (left-to-right and top to bottom for English)

'brain functions' in the visual cortex involve pattern recognition and feature extraction

'brain-ware functions', i.e., brain-language memory-stored instructions, involve letter and word recognition, as well as guiding the reading-scan of the eyes. Language recognition is another component, where the visual representation in printed sentences is switched to meaningful thought

'thought-ware functions', i.e., natural-language memory-stored instructions, involve following the implications of what is written to what can reasonably be assumed or inferred from context or other indications. For instance, goal-specified stage directions such as “walk to the centre of the stage” have to be translated into “turn toward the centre, start walking toward …, avoid obstacle …, stop when … is reached, turn toward …”.

Process 2: the actor acts out the part

'body functions' involve the walking motions as well as any hand or body gestures that are indicated.

'brain functions' involve the motor controls involved in walking, balancing and maintaining a posture, etc.

'brain-ware functions', i.e., the brain-language memory-stored instructions

'thought-ware functions', i.e., natural-language memory-stored instructions

Process 3: the actor speaks out the part

'body functions': The mouth, tongue etc. may be involved if some speech has to be delivered during the motion.

'brain functions' involve the speech controls for the vocal cords, throat, mouth, tongue, etc.

'brain-ware functions', i.e., the brain-language memory-stored instructions, involved in translating a 'thought-sentence' into a 'speech-sentence' along with appropriate intonation, projection, enunciation, etc.

'thought-ware functions', i.e., natural-language memory-stored instructions, involved in framing a 'though-sentence' along with the attitude to be conveyed, dialect to be used, etc.

Process 4: the actor pays attention to visual and auditory clues

'body functions': Eyes may be involved to get visual clues for orientation. Ears may be involved to synchronize movements with music or dialogue coming from other actors.

'brain functions' involve the visual and/or auditory cortex, etc.

'brain-ware functions', i.e., the brain-language memory-stored instructions, used to interpret the sights and sounds. Some filtering may be applied at this level.

'thought-ware functions', i.e., natural-language memory-stored instructions, involved in interpreting the clues for relevance. Unanticipated clues also have to be processed, some of which can change the context, such as if someone yells “Fire!”.

Process 4: the actor memorizes the script and stage directions

'body functions' are unlikely to be memorized directly, at least for a play.

'brain functions', outside of stored instructions, are unlikely to be memorized independently

'brain-ware functions', i.e., the brain-language memory-stored instructions, deal with basic capabilities such as language processing. They are unlikely to be changed for highly intentional and fairly short-term activity such as acting in a play.

'thought-ware functions', i.e., natural-language memory-stored instructions, are likely to provide the memory capacity for intentional activity such as pretending to be someone else, going through a fairly fixed sequence of behaviour in a fixed context, in a simulated reality shared during the performance.

How would it work: interactive game simulations

Interactive multi-player video-games played on PC's over the internet provide a good context in which to explore this theory. We can simulate the play-acting scenario described above. The players are at remote locations, but they share a virtual reality. The players are represented in this virtual reality, and can see other players.

Let us start with a single player with a script as described above. This essentially reduces to animation. Let us assume the player has memorized the script, and is acting it out. Let us further assume that a video camera is running in the external environment, recording the action. Let us assume that the action corresponding to the script takes 30 seconds, and that the video camera is running at 30 frames per second. Let us assume that we are 5 seconds into the action. Let us start by focussing on action in space:

Space

In the 'external world', we can describe the location of the actor in terms of spatial dimensions relative to the stage, just like in high school physics. In theory, and with endless patience, we could record the action (both the movements and the gestures) in terms of the location of the parts of the body in three dimensional space over time. We record the outside of the body, i.e., the skin and the clothing. This is the record of the action as captured on video, especially if we had several video cameras running from different angles.

For the 'body functions', the action is represented by the skeleton, joints, and muscles. The actor can flex muscles to move bones or create a facial expression to represent an emotion. The body provides some feedback on this muscle activity, but it does not have built-in GPS (global positioning system). The bones and the skin can be bent and flexed, but the body cannot move it to an absolute spatial location without feedback from the eyes or the stage director. Motion of skin and bones is relative to other bones, and only very indirectly relative to the stage.

The 'brain functions' control the body motions and positions and receives feedback through electrical control signals (nerve signals). This is somewhat like a process control system in a highly automated factory. We imagine that there is a set of built-in (genetically set) universal control sequences, such as for walking, grasping, swallowing, talking. Each of the control sequences have relevant feedback signals to control the trajectories and the smoothness of the action. We assume these to be relatively universal, with slight genetic variations.

The 'brain-ware functions, i.e., the 'brain-language stored instructions' we conjecture to extend the brain functions for learnt movements that go beyond 'native' movements, e.g., riding (and balancing on) a bicycle. Another example might be a set of gestures learnt in acting school, and remembered through much practice till they are more or less automatic and not open to introspection or conscious control.

The 'thought-ware functions', i.e., the 'natural-language stored instructions' are the stage directions. These include methods and goals such as “walk to the centre of the stage and stop, facing the audience with both arms straight forward and palms up”. Reference to absolute position such as centre stage can be interpreted through visual feedback. This instruction is then interpreted into a series of movements and gestures. The conscious sequence of detailed stylistic movements and gestures is then both remembered and acted out, with awareness of where the actor is in the sequence and on what needs to be done next. The detailed movements and gestures must correspond to 'brain-ware functions' that are invoked at the appropriate time in the sequence.

Time

Let us now focus on the time, past, present, and future, and look only at the action, without feedback:

The 'external world' is represented by the video camera. At 30 fps over 5s we have recorded 150 frames.

The 'body functions' are represented by skeleton positions and joint angles at the present moment. Since there is momentum and muscles take time to react, we can assume that the next few milliseconds of action are already on their way from the brain to the muscles, or being processed by the muscles. In other words, the next 10 frames or so of future action are committed to and being processed by the body.

The 'brain functions' are represented by another few frames of action being prepared by the motor centre of the brain for sending to the respective muscles. Let us assume 10 frames prior to those already being processed by the body.

The 'brain-ware functions, i.e., the 'brain-language stored instructions' is busy interpreting sentence such as “walk 3 steps forward”. We assume that it handles somewhat larger time chunks, since our reaction time is not that fast if we wanted to change direction. Let us assume a 1 second minimum time frame, i.e., 30 frames. This presumably is prior to activating the neurons in the motor centre.

The 'thought-ware functions', i.e., the 'natural-language stored instructions' deals with retrieving the script from memory and possible interpreting the general instructions such as “walk across the stage” into more specific instructions such as “walk one step forward, leading with the left leg”. Assuming the actor can recall the stage directions in sequence, there may be a chunk of which he is conscious (e.g., 3 seconds = 90 frames), and other chunks that will become conscious next. If the actor can remember all of the stage directions for the remainder of the action, this takes us to the end of the sequence. (If not, he may have to pick up the book and read to remind himself.)

Snapshot of what we would find at the present – according to the theory, and if we were Superman

Past: 0-149 frames – recorded in video camera as video-frames 0-149

Present: frame 150 – being recorded, and also reflecting the present location of bones and joints.

Future 1: frame 151-160 – already being processed by the body, bone momentum, muscles, nerves

Future 2: frame 161-170 – being processed by the brain in the motor centre

Future 3: frame 171-200 – being processed by brainware

Future 4: frame 201-290 – 'consciously' being processed by thoughtware

Future 5: frame 291-990 – in memory, to be accessed and processed by thoughtware

So far we have not included any perceptual feedback. We have assumed that the stage is clear and large enough, so that the actor can act out the script without running into obstacles or falling off the stage. We can add perceptual (visual) feedback by assuming a similar progression of the perceived information through the layers of information processing.

At the present, frame 150 – image on the retina, reflecting the present focus of the eyes

Past 1: frame 145 – 149 – image processed in the brain, by the visual cortex, for low-level feature extraction

Past 2: frame 140 – 144 – features processed by brainware for image recognition

Past 3: frame 130 – 139 – recognized image is named and interpreted by thoughtware. Inferences are drawn in language.

In the example above, we have made the simplifying assumption that images are processed in 1/3 second batches and then passed to the higher processing stage. We have also assumed that the actor is not turning his head and/or refocusing the eyes. Better evidence comes from tachistoscopic and other experiments.

Let us now add the assumption that the actor has noticed a possible obstacle (based on Past 3) and takes a closer look. At the same time he may plan to change course. Let us assume that taking a closer look is automatic, not consciously planned. We are adding parallel activity to future time periods – future frames.

Future 1: frame 151-160

– Original walk-action already being processed by the body, bone momentum, muscles, nerves

– Eye-muscle direction being processed by brainware (frame 151 – 155)

– Eye-muscle direction being processed by brain (frame 156 – 160)

– Planning change in direction by thoughtware

Future 2: frame 161-170

– Original walk-action being processed by the brain in the motor centre

– Eye-muscle direction being processed by eye (frame 161 – 165)

– image on the retina, reflecting the focus of the eyes (frame 165)

– image processed in the brain, by the visual cortex, for low-level feature extraction (frame 166 – 170)

– Planning change in direction by thoughtware

Future 3: frame 171-200

– Original walk-action being processed by brainware

– features processed by brainware for image recognition (frame 171 – 175)

– recognized image is named and interpreted by thoughtware (frame 176 – 200)

Future 4: frame 201-290

– Original walk-action 'consciously' being processed by thoughtware

– Walk-action changed to integrate planned change and recognized obstacle

Future 5: frame 291-990

– in memory, to be accessed and processed by thoughtware

– may need to be edited to fit change to avoid recognized obstacle

At this point we have a simulation of a very simple adaptation. We could also have a change because the stage director speaks to the actor (at the present). We have not modelled learning, but the change in the walking script could be stored in memory, and thus affect future acting behaviour.

----- extra stuff --------

The 'external world'

The 'body functions'

The 'brain functions'

The 'brain-ware functions, i.e., the 'brain-language stored instructions'

The 'thought-ware functions', i.e., the 'natural-language stored instructions'