Trash

54

The formal system of this Chapter is called the pq-system. It is not important to mathematicians or logicians-in fact, it is just a simple invention of mine. Its importance lies only in the fact that it provides an excellent example of many ideas that play a large role in this book. There are three distinct symbols of the pq-system: 

p q -

-the letters p, q, and a hyphen

54

The pq-system has an infinite number of axioms. Since we can’t write them all down, we have to have some other way of describing what they are. Actually, we want more than just a description of the axioms; we want a way to tell whether some given string is an axiom or not. A mere description of axioms might characterize them fully and yet weakly-which was the problem with the way theorems in the MIU-system were characterized. We don’t want to have to struggle for an indeterminate-possibly infinite length of time, just to find out if some string is an axiom or not. Therefore, we will define axioms in such a way that there is an obvious decision procedure for axiomhood of a string composed of p’s, q’s, and hyphens.

DEFINITION: xp-qx is an axiom, whenever x is composed of hyphens only. 

Note that ‘x’ must stand for the same string of hyphens in both occurrences. For example, —p-q--- is an axiom. The literal expression ‘xp-qx-’ is not an axiom, of course (because ‘x’ does not belong to the pq-system); it is more like a mold in which all axioms are cast-and it is called an axiom schema. 

47

The pq-system has only one rule of production: 

RULE: Suppose x, y, and z all stand for particular strings containing only hyphens. And suppose that xpyqz is known to be a theorem. Then xpy-qz- is a theorem. 

For example, take x to be’—’, y to be’---’, and z to be’-‘. The rule tells us: 

If —p---q- turns out to be a theorem, then so will —p----q-
-.

As is typical of rules of production, the statement establishes a causal connection between the theoremhood of two strings, but without asserting theoremhood for either one on its own. A most useful exercise for you is to find a decision procedure for the theorems of the pq-system. It is not hard; if you play around for a while you will probably pick it up. Try it

55

I presume you have tried it. First of all, though it may seem too obvious to mention, I would like to point out that every theorem of the pq-system has three separate groups of hyphens, and the separating elements are one p, and one q,
in that order. (This can be shown by an argument based on “heredity”, just the way one could show that all MIU-system theorems had to begin with M.) This means that we can rule out, from its form alone, a string such as —p—p—p—q--------

55

Now, stressing the phrase “from its form alone” may seem silly; what else is there to a string except its form? What else could possibly play a roll in determining its properties? Clearly nothing could. But bear this in mind as the discussion of formal systems goes on; the notion of “form” will star to get rather more complicated and abstract, and we will have to think more about the meaning of the word “form”. In any case, let us give the name well formed string to any string which begins with a hyphen-group, then has one p, then has a second hyphen-group, then a q, and then a final hyphen-group

55

Back to the decision procedure 
 The criterion for theoremhood is that the first two hyphen-groups should add up, in length, hyphen-group. for instance, —p—q---- is a theorem, since 2 plus 2 equals 4, whereas —p—q- is not, since 2 plus 2 is not 1. To see why this is the proper criterion, look first at the axiom schema. Obviously, it only manufactures axioms which satisfy the addition criterion. Second, look at the rule of production. If the first string satisfies the addition criterion, so must the second one-and conversely, if the first string does not satisfy the addition criterion, then neither does the second string. The rule makes the addition criterion into a hereditary property of theorems: any theorem passes the property on to its offspring. This shows why the addition criterion is correct

56

There is, incidentally, a fact about the pq-system which would enable us to say with confidence that it has a decision procedure, even before finding the addition criterion. That fact is that the pq-system is not complicated by the opposing currents of lengthening and shortening rules; it has only lengthening rules. Any formal system which tells you how to make longer theorems from shorter ones, but never the reverse, has got to have a decision procedure for its theorems. For suppose you are given a string. First check whether it’s an axiom or not (I am assuming that there is a decision procedure for axiomhood-otherwise, things are hopeless). If it is an axiom, then it is by definition a theorem, and the test is over. So suppose instead that it’s not an axiom. Then, to be a theorem, it must have come from a shorter string, via one of the rules. By going over the various rules one by one, you can pinpoint not only the rules that could conceivably produce that string, but also exactly which shorter strings could be its forebears on the “family tree”. In this way, you “reduce” the problem to determining whether any of several new but shorter strings is a theorem. Each of them can in turn be subjected to the same test. The worst that can happen is a proliferation of more and more, but shorter and shorter, strings to test. As you continue inching your way backwards in this fashion, you must be getting closer to the source of all theorems-the axiom schemata. You just can’t get shorter and shorter indefinitely; therefore, eventually either you will find that one of your short strings is an axiom, or you’ll come to a point where you’re stuck, in that none of your short strings is an axiom, and none of them can be further shortened by running some rule or other backwards. This points out that there really is not much deep interest in formal systems with lengthening rules only; it is the interplay of lengthening and shortening rules that gives formal systems a certain fascination.

56

The method above might be called a top-down decision procedure, to be contrasted with a bottom-up decision procedure, which I give now. It is very reminiscent of the genie’s systematic theorem-generating method for the MIU-system, but is complicated by the presence of an axiom schema. We are going to form a “bucket” into which we throw theorems as they are generated. Here is how it is done:

57

(1a) Throw the simplest possible axiom (-p-q—)into the bucket. (I b) Apply the rule of inference to the item in the bucket, and put the result into the bucket. (2a) Throw the second-simplest axiom into the bucket. 
(2b) Apply the rule to each item in the bucket, and throw all results into the bucket. 
(3a) Throw the third-simplest axiom into the bucket. 
(3b) Apply the rule to each item in the bucket, and throw all results into the bucket. 

57

A moment’s reflection will show that you can’t fail to produce every theorem of the pq-system this way. Moreover, the bucket is getting filled with longer and longer theorems, as time goes on. It is again a consequence of that lack of shortening rules. So if you have a particular string, such as —p---q---- , which you want to test for theoremhood, just follow the numbered steps, checking all the while for the string in question. If it turns up-theorem! If at some point everything that goes into the bucket is longer than the string in question, forget it-it is not a theorem. This decision procedure is bottom=up because it is 
working its way up from the basics, which is to say the axioms. The previous decision procedure is top-down because it does precisely the reverse: it works its way back down towards the basics.

57

Now we come to a central issue of this Chapter-indeed of the book. Perhaps you have already thought to yourself that the pq-theorems are like additions. The string —p---q--- is a theorem because 2 plus 3 equals 5. It could even occur to you that the theorem —p---q— is a statement, written in an odd notation, whose meaning is that 2 plus 3 is 5. Is this a reasonable way to look at things? Well, I deliberately chose ‘p’ to remind you of ‘plus’, and ‘q’ to remind you of ‘equals’ 
 So, does the string —p---q---- actually mean 
“2 plus 3 equals 5”?

58

In this case, we have an excellent prototype for the concept of isomorphism. 
There is a “lower level” of our isomorphism-that is, a mapping between the parts of the two structures:

p ⇐ ⇒ plus  q ⇐ ⇒ equals .- ⇐ ⇒ one — ⇐ ⇒ two --- ⇐ ⇒ three etc.

This symbol-word correspondence has a name: interpretation.

Inbox

80

It follows from the above result that: 
There exist formal systems for which
there is no typographical decision p
rocedure. How does this follow? Very simply. A t
ypographical decision procedure is a method which tells theorems from nontheorems. The
existence of such a test allows us to generate all nontheorems systematically, simply by going down a list of all 
strings and performing the test on them one at a time, discarding ill-formed strings and theorems 
along the way. This amounts to

81

a typographical method for generating the set of nontheorems. But according to the 
earlier statement (which we here accept on faith), for some systems this is not possible. 
So we must conclude that typographical decision procedures do not exist for all formal 
systems.

81

Suppose we found a set F of natural numbers (F' for Figure’) whi4 we could 
generate in some formal way-like the composite numbers. Suppose its complement is the 
set G (for ‘Ground’)-like the primes. Together F and G make up all the natural numbers, 
and we know a rule for making all the numbers 
in set F, but we know no such rule for making all tl numbers in set G. It is important to understand that if the members of were 
always generated in order of i ncreasing size,
then we could always characterize G. The problem is that many r.e. sets are generated I methods which throw in elements in an 
arbitrary order, so you never know if a number which has been skipped over for a long 
time will get included you just wait a little longer.

81

We answered no to the artistic question, “Are all figures recursive We have now 
seen that we must likewise answer no to the analogous question in mathematics: “Are all 
sets recursive?” With this perspective, 1 us now come back to the elusive word “form”. 
Let us take our figure-set and our ground-set G again. We can agree that all the numbers 
in set have some common “form”-but can the same be said about numbers in s G? It is a 
strange question. When we are dealing with a
n infinite set to sta with-the natural numbers-the holes created by removing some subs may be very hard to define in any 
explicit way. And so it may be that th< are not connected by any common attribute or 
“form”. In the last analysis it is a matter o
f taste whether you want to use the word “form”-but just thinking about it is provocative. Perhaps it is best not to define “form”, bi 
to leave it with some intuitive fluidity. 
Here is a puzzle to think about in connection with the above matter Can you 
c
haracterize the following set of integers (or its negative space) 1 3 7 12 18 26 35 45 56 69
 
H
ow i

81

Finally, what about a formal system for generating primes? How is it don< The trick is to 
skip right over multiplication, and to go directly to nondivisibility 
as the thing to represent positively. Here are an axiom schema and rule for producing theorems which 
represent the notion that one number does not divide (D N D) another number exactly: 
AXIOM SCHEMA: xy D N Dx
where x and y are hyphen-strings. For example ----D N D—, where x has been replaced by’—‘and y by ‘---“.

82

RULE: If x D N Dy is a theorem, then so is x D N Dx
y. If you use the rule twice, you can generate this theorem: 
-----D N D -------------- 
which is interpreted as “5 does not divide 12”. But ---D N D------------ is not a theorem. 
What goes wrong if you try to produce it?

82

Now in order to determine that a given number is prime, we have to build up 
some knowledge about its nondivisibility properties
. In particular, we want to know that it is not divisible by 2 or 3 or 4, etc., all the way up to 1 less than the number itself. But 
we can’t be so vague in formal systems as to say “et cetera”. We must spell things out. 
We would like to have a way of saying, in the language of the system, “the number Z is 
divisor free up to X”, meaning that no number betwe
en 2 and X divides Z. This can be d
one, but there is a trick to it. Think about it if you want. H
ere is the solution: RULE: If —D N D z is a theorem, so is z D F-
-. RULE: If z D Fx is a theorem and also x-D N Dz is a theorem, z D Fx- is a theorem. 
These two rules capture the notion of divisor fre
eness. All we need to do is to say that primes are numbers which are divisor-free up to 1 less than themselves: 
RULE: If z-DFz is a theorem, then Pz- is a theorem. 
Oh-let’s not forget that 2 is prime! 
Axiom: P

82

And there you have it. The principle of representing primality formally is that there is a 
test for divisibility which can be done without any backtracking. You march steadily 
upward, testing first for divisibility by 2, then by 3, and so on. It is this “monotonicity” or 
unidirectionality-this absence of cross-p
lay between lengthening and shortening, increasing and decreasing-that allows primal
ity to be captured. And it is this potential complexity of formal systems to involve arbitrary amounts of backwards-forwards 
interference that is responsible for such limitative results as Gödel’s Theorem, Turing’s 
Halting Problem, and the fact that not all recursively enumerable sets are recursive.

88

FIGURE 19. The last page of Bach’s Art of the F
ugue. In the original manuscript, in the h
andwriting of Bach’s son Carl Philipp Emanuel, is written: “N.B. In the course of this fugue, at the point where the name B.A.C.H. 
was brought in as countersubject, the composer died.” (B-A-C-H in box.) I have let thi
s final page of Bach’s last fugue serve as a
n epitaph. [Music Printed by Donald Byrd’s program “SMUT”, developed at Indiana University]

87

Achilles: Look at the inscription on the inside-do you see where tletters B', A’, C', H’ 
h
ave been etched? Tortoise: Sure enough! What an extraordinary thing. (Gen
tly sets Goblet G down on a shelf.) By the way, did you know that each of the four letters in\Bach’s name is 
the name of a musical note? 
Achilles:’ tisn’t possible, is it? After all, musical notes only go from ‘A’ through G'.  Tortoise: Just so; in most countries, that's the case. But in Germany, Bach’s own  homeland, the convention has always been similar, except that what we call B’, 
they call H', and what we call B-flat’, they call `B’. For instance, we talk about 
Bach’s “Mass in B Minor whereas they talk about his “H-moll Messe”. Is that 
c
lear? Achilles: 
 hmm 
 I guess so. It’s a little confusing: H is B, and B B-flat. I suppose 
his name actually constitutes a melody, then 
Tortoise: Strange but true. In fact, he worked that melody subtly into or of his most 
elaborate musical pieces-namely, the final Contrapunctus in his Art of the Fugue.
It was the last fugue Bach ever wrote. When I heard it for the first time, I had no 
idea how would end. Suddenly, without war
ning, it broke off. And the 
 dead silence. I realized immediately that was w
here Bach died. It is an indescribably sad moment, and the effect it had o me was-shattering. In any case, B-A-C-H is 
the last theme c that fugue. It is hidden inside the piece. Bach didn’t point it out

89

Explicitly, but if you know about it, you can find it without much trouble. Ah, me-there 
are so many clever ways of hiding things in music .. .

89

Achilles: . . or in poems. Poets used to do very similar things, you know (though it’s 
r
ather out of style these days). For instance, Lewis Carroll often hid words and names in the first letters (or characters) of the successive lines in poems he wrote. 
Poems which conceal messages that way are called “acrostics”. 
Tortoise: Bach, too, occasionally wrote acrostics, which isn’t surprising. After all, 
counterpoint and acrostics, with their levels of hidden meaning, have quite a bit in 
common. Most acrostics, however, have only one hidden level-but there is no 
reason that one couldn’t make a double-decke
r-an acrostic on top of an acrostic. Or one could make a “contracrostic”-where the initial letters, taken in reverse 
order, form a message. Heavens! There’s n
o end to the possibilities inherent in the form. Moreover, it’s not limited to poets; anyone could write acrostics-even a 
d
ialogician. Achilles: A dial-a-logician? That’s a new one on me. 
Tortoise: Correction: I said “dialogician”, by which I meant a writer of dialogues. Hmm 

 something just occurred to me. In the unlikely event that a dialogician should 
write a contrapuntal acrostic in homage to 
J. S. Bach, do you suppose it would be more proper for him to acrostically embed his OWN name-or that of Bach? Oh, 
well, why worry about such frivolous matters? Anybody who wanted to write 
such a piece could make up his own mind. Now getting back to Bach’s melodic 
name, did you know that the melody B-A
-C-H, if played upside down and backwards, is exactly the same as the original? 
Achilles: How can anything be played upside down? Backwards, I can see-you get H-C-A-B-but upside down? You must be pulling my leg. 
Tortoise: ’ pon my word, you’re quite a skeptic, aren’t you? Well, I guess I’ll have to give 
you a demonstration. Let me just go and fetch my fiddle- (Walks into the next 
room, and returns in a jiffy with an anc
ient-looking violin.) -and play it for you forwards and backwards and every which way. Let’s see, now 
 (Places his copy 
of the Art of the Fugue on his music stand and opens it to the last page.) 
 here’s 
the last Contrapunctus, and here’s the last theme 


90

C
hapter IV C
onsistency, Completeness, a
nd Geo

90

IN CHAPTER II, we saw how meaning-at least in the relatively simple context of formal 
systems-arises when there is an isomorphism between rule-governed symbols, and things 
in the real world. The more complex the isomorphism, in general, the more “equipment”-
both hardware and software-is required to extract the meaning from the symbols. If an 
isomorphism is very simple (or very familiar), we are tempted to say that the meaning 
which it allows us to see is explicit. We see the meaning without seeing the isomorphism. 
The most blatant example is human language, where people often attribute meaning to 
words in themselves, without being in the slightest aware of the very complex 
“isomorphism” that imbues them with meanings. This is an easy enough error to make. It 
attributes all the meaning to the object (the w
ord), rather than to the link between that object and the real world. You might compare it to the naive belief that noise is a 
necessary side effect of any collision of two objects. This is a false belief; if two objects 
collide in a vacuum, there will be no noise at all. Here again, the error stems from 
attributing the noise exclusively to the col
lision, and not recognizing the role of the medium, which carries it from the objects to the ear.

90

Above, I used the word “isomorphism” in quotes to indicate that it must be taken 
with a grain of salt. The symbolic processes which underlie the understanding of human 
language are so much more complex than the symbolic processes in typical formal 
systems, that, if we want to continue thinking of meaning as mediated by isomorphisms, 
we shall have to adopt a far more flexible conception of what isomorphisms can be than 
we have up till now. In my opinion, in fact, the key element in answering the question 
“What is consciousness?” will be the unraveling of the nature of the “isomorphism” 
which underlies meaning.

90

All this is by way of preparation for a discussion of the Contracrostipunctus-
a study in levels of meaning. The Dialogue has both explicit and implicit meanings. Its most 
explicit meaning is simply the story

91

Which was related. This “explicit meaning is, strictly speaking extremely implicit,
in the sense that the brain processes required to u
nderstand the events in the story, given only the black marks on paper, are incredibly complex. Nevertheless, we shall consider the 
events in the story to be the explicit meaning of the Dialogue, and assume that every 
reader of English uses more or less the same “isomorphism” in sucking that meaning 
from the marks on the paper.

91

Even so, I’d like to be a little more explicit about the explicit meaning of the story. 
First I’ll talk about the record players and the records. The main point is that there are two 
levels of meaning for the grooves in the records. Level One is that of music. Now what is 
“music”-a sequence of vibrations in the air, or a succession of emotional responses in a 
brain? It is both. But before there can be emot
ional responses, there have to be vibrations. Now the vibrations get “pulled” out of t
he grooves by a record player, a relatively straightforward device; in fact you can do it w
ith a pin, just pulling it down the grooves. A
fter this stage, the ear converts the vibrations into firings of auditory neurons in the brain. Then ensue a number of stages in the brain, which gradually transform the linear 
sequence of vibrations into a complex pattern of interacting emotional responses-far too 
complex for us to go into here, much though I
would like to. Let us therefore content ourselves with thinking of the sounds in the air as the “Level One” meaning of the 
g

91

What is the Level Two meaning of the grooves? 
It is the sequence of vibrations induced in the record player. This meaning can only arise after the Level One meaning has been 
pulled out of the grooves, since the vibrations 
in the air cause the vibrations in the phonograph. Therefore, the Level Two me
aning depends upon a chain of two isomorphisms: 
(1) Isomorphism between arbitrary groove patterns and air 
v
ibrations; (2) Isomorphism between graph vibrations. arbitrary air 
v
ibrations and phonograph vibrations This chain of two isomorphisms is depicted in Figure 20. Notice that isomorphism I is the 
one which gives rise to the Level One meaning. The Level Two meaning is more implicit 
than the Level One meaning, because it is mediated by the chain of two isomorphisms. It 
is the Level Two meaning which “backfires”, c
ausing the record player to break apart. What is of interest is that the production of the Level One meaning forces the production 
of the Level Two meaning simultaneously-ther
e is no way to have Level One without Level Two. So it was the implicit meaning of the record which turned back on it, and 
d

91

Similar comments apply to the goblet. One difference is that the mapping from 
letters of the alphabet to musical notes is one more level of isomorphism, which we could 
call “transcription”. That is followed by “translation”-conversion of musical notes into 
musical sounds. Thereafter, the vibrations act b
ack on the goblet just as they did on the e

92

FIGURE 20. Visual rendition of the principle underlying Gödel’s Theorem: two back-to-b
ack mappings which have an unexpected boomeranging effect. The first is from groove p
atterns to sounds, carried out by a phonograph. The second-familiar, but usually ignored — is from sounds to vibrations o
f the phonograph. Note that the second mapping exists independently of the first one, for any s
ound in the vicinity, not just ones produced b
y the phonograph itself, will cause such vibrations. The paraphrase of Gödel’s Theorem says that for any record player, there are r
ecords which it cannot play because they will c
ause

92

What about implicit meanings of the Dialogue? (Yes, it has more than one of these.) The 
simplest of these has already been pointed out in the paragraphs above-namely, that the 
events in the two halves of the dialogue are roughly isomorphic to each other: the 
phonograph becomes a violin, the Tortoise becomes Achilles, the Crab becomes the 
Tortoise, the grooves become the etched autograph, etc. Once you notice this simple 
isomorphism, you can go a little further. Observe t
hat in the first half of the story, the Tortoise is the perpetrator of all the mischief, while in the second half, he is the victim. 
What do you know, but his own method has turned around and backfired on him! 
Reminiscent of the backfiring of the records’ muusic-or the goblet’s inscription-or perhaps 
of the Tortoise’s boomerang collection? Yes, i
ndeed. The story is about backfiring on two l
evels, as follows 
 L
evel One: Goblets and records which backfire; Level Two: The Tortoise’s devilish method of exploiting implicit meaning to 
c
ause backfires-which backfires. Therefore we can even make an isomorphism between the two levels of the story, 
in which we equate the way in which the records and goblet boomerang back to destroy 
themselves, with the way in which the Tortoise’s own fiendish method boomerangs back 
to get him in the end. Seen this

93

way, the story itself is an example of the backf
irings which it discusses. So we can think of the Contracrostipunctus as referring to itse
lf indirectly that its own structure is isomorphic to the events it portrays. (Exactly goblet and records refer implicitly to 
themselves via the back-to-back morphisms of playing and vibration-causing.) One may 
read the Dialogue without perceiving this fact, of course-but it is there all the time.

93

Now you may feel a little dizzy-but the best is yet to come. (Actually, levels of 
implicit meaning will not even be discussed h
ere-they will 1 for you to ferret out.) The deepest reason for writing this Dialogue illustrate Gödel’s Theorem, which, as I said in 
the Introduction, heavily on two different levels of meaning of statements of number t1 
Each of the two halves of the Contracrostipunctus is an “isomorphic co Gödel’s Theorem. 
Because this mapping is the central idea of the Dialogue and is rather elaborate, I have 
c
arefully charted it out below. Phonograph ⇐ ⇒axiomatic system for number theory 
low-fidelity phonograph ⇐ ⇒“weak” axiomatic system 
high-fidelity phonograph ⇐ ⇒“strong” axiomatic system 
“Perfect” phonograph” ⇐ ⇒ complete system for number theory’ 
Blueprint” of phonograph ⇐ ⇒ axioms and rules of formal system 
record ⇐ ⇒ string of the formal system 
playable record⇐ ⇒ theorem of the axiomatic system 
unplayable record ⇐ ⇒nontheorem of the axiomatic system 
sound ⇐ ⇒true statement of number theory 
reproducible sound ⇐ ⇒ ‘interpreted theorem of the system 
unreproducible sound ⇐ ⇒ true statement which isn’t a theorem: 
song title ⇐ ⇒implicit meaning of Gödel’s string: 
“I Cannot Be Played “I Cannot Be Derived 
on Record Player X” in Formal System X”

93

This is not the full extent of the isomorphism between Gödel’s theorem and the 
Contracrostipunctus, but it is the core of it. You need not if you don’t fully grasp Gödel’s 
Theorem by now-there are still Chapters to go b
efore we reach it! Nevertheless, having read this Dialogue you have already tasted some of the flavor of Gödel’s Theorem 
without necessarily being aware of it. I now 
leave you to look for any other types of implicit meaning in the Contracrostipunctus. “Quaerendo invenietis!“

94

The Tortoise says that no sufficiently powerful
record player can be perfect, in the sense of being able to reproduce every possible sound from a record. Godel says that no 
sufficiently powerful formal system can be p
erfect, in the sense of reproducing every single true statement as a theorem. But as t
he Tortoise pointed out with respect to phonographs, this fact only seems like a defect 
if you have unrealistic expectations of what formal systems should be able to do. Nevertheless, mathematicians began this 
century with just such unrealistic expectations, thinking that axiomatic reasoning was the 
cure to all ills. They found out otherwise i
n 1931. The fact that truth transcends theoremhood, in any given formal system, is called “incompleteness” of that system.

94

A most puzzling fact about Gödel’s method of proof is that he uses

95

reasoning methods which seemingly cannot be “en
capsulated”-they re being incorporated into any formal system. Thus, at first sight, it seems that Gödel has unearthed a hitherto 
unknown, but deeply significant, difference between human reasoning and mechanical 
reasoning. This mysterious discrepancy in the power of living and nonliving systems is 
mirrored in the discrepancy between the notion of truth, and that of theoremhood or at 
least that is a “romantic” way to view the situation.

95

In order to see the situation more realistic
ally, it is necessary to see in, depth why and how meaning is mediated, in formal systems, by isomorphisms. And I believe that 
this leads to a more romantic way to vie
w i situation. So we now will proceed to investigate some further aspects of 1 relation between meaning and form. Our first step is 
to make a new formal system by modifying our old friend, the pq-system, very slightly. 
We a one more axiom schema (retaining the o
riginal one, as well as the sin rule of inference): 
Axiom SCHEMA II: If x is a hyphen-string, then xp-qx is an axiom. 
Clearly, then, —p-q— is a theorem in the new system, and so —p—q---. And yet, their 
i
nterpretations are, respectively, “2 plus; equals 2”, and “2 plus 2 equals 3”. It can be seen that our new system contain a lot of false statements (if you consider strings to be 
statement Thus, our new system is inconsistent w
ith the external world. As if this weren’t bad enough, we also have internal problems with < new system, 
since it contains statements which disagree w
ith one another such as -p-q— (an old axiom) and -p-q- (a new axiom). So our system is inconsistent in a second sense: 
i

95

Would, therefore, the only reasonable thin
g to do at this point be drop the new system entirely? Hardly. I have deliberately presented the “inconsistencies” in a wool-pulling manner: that is, I have tried to press fuzzy-headed arguments as strongly as 
possible, with the purpose of n leading. In fact, you may well have detected the fallacies 
in what I hi said. The crucial fallacy came when I unquestioningly adopted the very same 
interpreting words for the new system as I had for the old of Remember that there was 
only one reason for adopting those words in I last Chapter, and that reason was that t
he symbols acted isomorphically to concepts which they were matched with, by the 
interpretation. But when y modify the rules governing the system, you are bound to 
damage t isomorphism. It just cannot be helped. Thus all the problems which we 
lamented over in preceding paragraphs were bogus problems; they can made to vanish in 
no time, by suitably reinterpreting some of the symbols of system. Notice that I said 
“some”; not necessarily all symbols will have to mapped onto new notions. Some may 
very well retain their “meaning while others change.

96

Suppose, for instance, that we reinterpret just the symbol q, leaving all the others 
constant; in particular, interpret q by the phrase
”is greater than or equal to”. Now, our “contradictory” theorems -p-q-and -p-q—come out harmlessly as: “1 plus 1 is greater than 
or equal to 1”, and “1 plus 1 is greater than or equal to 2”. We have simultaneously gotten 
rid of (1) the inconsistency with the externa
l world, and (2) the internal inconsistency. And our new interpretation is a meaningful inter
pretation; of course the original one is meaningless. That is, it is meaningless for the new system; for the original pq-system, it is 
fine. But it now seems as pointless and arbitrary to apply it to the new pq-system as it 
was to apply the “horse-apple-happy” interpretation to the old pq-system.

96

Although I have tried to catch you off guar
d and surprise you a little, this lesson about how to interpret symbols by words may not seem terribly difficult once you have 
the hang of it. In fact, it is not. And yet it is one of the deepest lessons o
f all of nineteenth century mathematics! It all begins with Euclid, who, around 300 B.C., compiled and 
systematized all of what was known about plane and solid geometry in his day. The 
resulting work, Euclid’s Elements, was so solid that it was virtually a bible of geometry 
for over two thousand years-one of the most enduring works of all time. Why was this 
so?

96

The principal reason was that Euclid was the founder of rigor in mathematics. The 
Elements began with very simple concepts, definitions, and so forth, and gradually built 
up a vast body of results organized in such a
way that any given result depended only on foregoing results. Thus, there was a definite plan to the work, an architecture which made 
i
t strong and sturdy. Nevertheless, the architecture was of a different type from that of, say, a 
skyscraper. (See Fig. 21.) In the latter, that it is standing is proof en
ough that its structural elements are holding it up. But in a book on geometry, when each proposition is claimed 
to follow logically from earlier propositions, the
re will be no visible crash if one of the proofs is invalid. The girders and struts are not physical, but abstract. In fact, in Euclid’s 
Elements, the stuff out of which proofs were constructed was human language-that 
elusive, tricky medium of communication with so many hidden pitfalls. What, then, of 
the architectural strength of the Elements? Is it c
ertain that it is held up by solid structural elements, or could it have structural weaknesses?

96

Every word which we use has a meaning to 
us, which guides us in our use of it. The more common the word, the more associations we have with it, and the more deeply 
rooted is its meaning. Therefore, when someone gives a definition for a common word in 
the 

98

d
efinition, it is a foregone conclusion that we will not do so but will instead be guided, largely unconsciously, by what our minds find in their associative stores. I mention this 
because it is the sort of problem which Euclid created in his Elements, by attempting to 
give definitions of ordinary, common words such 
as “point”, “straight line”, “circle”, and so forth. How can you define something of which everyone already has a clear concept? 
The only way is if you can make it clear that 
your word is supposed to be a technical term, and is not to be confused with the everyday word with the same spelling. You have 
to stress that the connection with the everyday word is only suggestive. Well, Euclid did 
not do this, because he felt that the points and lines of his Elements were indeed the 
points and lines of the real world. So by not making sure that all associations were 
d
is

98

This sounds almost anarchic, and is a l
ittle unfair to Euclid. He did set down axioms, or postulates, which were supposed to 
be used in the proofs of propositions. In fact, nothing other than those axioms and postul
ates was supposed to be used. But this is where he slipped up, for an inevitable consequen
ce of his using ordinary words was that some of the images conjured up by those wor
ds crept into the proofs which he created. However, if you read proofs in the Elements, do not by any means expect to find glaring 
“jumps” in the reasoning. On the contrary, t
hey are very subtle, for Euclid was a penetrating thinker, and would not have made any simpleminded errors. Nonetheless, 
gaps are there, creating slight imperfections i
n a classic work. But this is not to be complained about. One should merely gain an appreciation for the difference between 
absolute rigor and relative rigor. In the long run, Euclid’s lack of absolute rigor was the 
cause of some of the most fertile path-breaking in mathematics, over two thousand years 
a
ft

98

E
uclid gave five postulates to be used as the “ground story” of the infinite skyscraper of geometry, of which his Eleme
nts constituted only the first several hundred s
tories. The first four postulates are rather terse and elegant: (1) A straight line segment can be drawn joining any two points. 
(2) Any straight line segment can be extended indefinitely in a straight line. 
(3) Given any straight line segment, a circle can be drawn having the segment as 
r
adius and one end point as center. (
4) All right angles are congruent. T
he fifth, however, did not share their grace: (5) If two lines are drawn which intersect a third in such a way that the sum of the 
inner angles on one side is less than t
wo right angles, then the two lines inevitably must intersect each other on that side if extended far enough

99

Though he never explicitly said so, Euclid considered this postulate to be somehow 
inferior to the others, since he managed to avoid using it in t proofs of the first twenty-eight propositions. Thus, the first twenty-eight propositions belong to what might be 
called “four-postulate geometry” that part of geometry which can be derived on the basis 
of the first to postulates of the Elements, with
out the help of the fifth postulate. (It is al often called absolute geometry.) Certainly E
uclid would have found it 1 preferable to prove this ugly duckling, rather than to have to assume it. B he found no proof, and 
t

99

But the disciples of Euclid were no happier about having to assume this fifth 
postulate. Over the centuries, untold numbers of 
people ga untold years of their lives in attempting to prove that the fifth postulate s itself part of four-postulate geometry. By 
1763, at least twenty-eight deficient proofs had
been published-all erroneous! (They were all criticized the dissertation of one G. S. Klu
gel.) All of these erroneous proofs involve a confusion between everyday intuition and strictly formal properties. It safe to say that 
today, hardly any of these “proofs” holds any mathematic or historical interest-but there 
a
re ce

99

Girolamo Saccheri (1667-1733) lived around Bach’s time. He had t ambition to 
free Euclid of every flaw. Based on some earlier w
ork he h; done in logic, he decided to try a novel approach to the proof of the famous fifth: suppose you assume its opposite; 
then work with that as your fif postulate 

 Surely after a while you will create a contradiction. Since i mathematical system 
can support a contradiction, you will have shown t unsoundness of your own fifth postulate, and therefore the soundness Euclid’s 
fifth postulate. We need not go into details here. Suffice it to s that with great skill, 
Saccheri worked out proposition after proposition “Saccherian geometry” and eventually 
became tired of it. At one point, decided
he had reached a proposition which was “repugnant to the nature of the straight line”.
That was what he had been hoping for-to his mind was the long-sought contradiction. At tha
t point, he published his work under the title Euclid Freed of Every Flaw,
and 

99

But in so doing, he robbed himself of much posthumous glory, sir he had 
unwittingly discovered what came later to be known as “hyperbolic geometry”. Fifty 
years after Saccheri, J. H. Lambert repeated the “near miss”, this time coming even 
closer, if possible. Finally, forty years after Lambert, and ninety years after Saccheri, 
non-Euclidean geometry was recognized for w
hat it was-an authentic new brand of geometry, a bifurcation the hitherto single stream of mathematics. In 1823, non-Euclidean geometry was discovered simultaneously, in one of those inexplicable 
coincidences, by a Hungarian mathematician, J
anos (or Johann) Bolyai, age twenty-one, and a Russian mathematician, Nikolay Lobachevs
kiy, ag thirty. And, ironically, in that same year, the great French mathematician

100

Adrien-Marie Legendre came up with what he was sure was a proof of Euclid’s fifth 
postulate, very much along the lines of Saccheri.

100

The clue to non-Euclidean geometry was “thinking straight” about the 
propositions which emerge in geometries like Saccheri’s and Lambert’s. The Saccherian 
propositions are only “repugnant to the nature o
f the straight line” if you cannot free yourself of preconceived notions of what “straight line” must mean. If, however, you can 
divest yourself of those preconceived images, and merely let a “straight line” be 
something which satisfies the new propositions,
then you have achieved a radically new v
iewpoin

100

This should begin to sound familiar. In particular, it harks back to the pq-system, and its 
variant, in which the symbols acquired passive meanings by virtue of their roles in 
theorems. The symbol q is especially interesting,

101

since its “meaning” changed when a new axiom schema was added. In the very same 
way, one can l
et the meanings of “point”, “line”, and so on I determined by the set of theorems (or propositions) in which they occur. This was th great realization of the 
discoverers of non-Euclidean geometry. The
found different sorts of non-Euclidean geometries by denying Euclid’s fifth postulate 
in different ways and following out the consequences. Strict] speaking, they (and Sa
ccheri) did not deny the fifth postulate directly, but rather, they denied an equivalent postulate, called the p
arallel postulate, w
hich runs as follows: Given any straight line, and a
point not on it, there exists one, and only one, straight line which passes through that point and never intersects the first line, no matter 
h
ow far they are extended. The second straight line is then said to be parallel to the first. If you assert that n
o such line exists, then you reach elliptical geometry; if you assert that, at east two s
uch lines exist, you reach hyperbolic geometry.
Incidentally, tf reason that such variations are still called “geometries” is that the cot element-absolute, or four-postulate, geometry-is 
embedded in them. is the presence of this minimal core which makes it sensible to think 
of the] as describing properties of some sort of geometrical space, even 
if the spa( is not a
s i

101

Actually, elliptical geometry is easily visualized. All “points”, “lines and so forth 
are to be parts of the surfa
ce of an ordinary sphere. Let t write “POINT” when the technical term is meant, and “point” when t1 
everyday sense is desired. Then, we can say that a POINT consists of a pa of diametrically opposed points of the sphere’s surface. A 
L
INE is a great circle on the sphere (a circle which, like the equator, has its center at tI center of the sphere). Under these interpretations, the propositions ( elliptical geometry, 
t
hough they contain words like “POINT” and “LINE speak of the goings-on on a sphere, not a plane. Notice that two LINT always int
ersect in exactly two antipodal points of the sphere’s surface that is, in exactly one single POINT! And just as two LINES determine 
POINT, so two POINTS determine a LINE.

101

By treating words such as “POINT” and “LINE” as if they had only tt meaning 
instilled in them by the propositions in which they occur, we take step towards complete 
formalization of geometry. This semiformal version still uses a lot of words in English 
with their usual meanings (words such “the”, 
` if ”, “and”, “join”, “have”), although the everyday meaning has bee drained out of spe
cial words like “POINT” and “LINE”, which are consequently called undefined terms. Undefined terms, like the p and q of th pq-system, do get defined in a sense: implicitly-by the totality of all propos dons in which 
t
hey occu

101

One could maintain that a full definitio
n of the undefined tern resides in the postulates alone, since the propositions which follow from them are implicit in the 
postulates already. This view would say that the postulates are implicit definitions of all 
the undefined terms, all of the undefined terms being defined in terms of the others.

102

A full formalization of geometry would take the drastic step of making every term 
undefined-that is, turning every term into a “meaningless” symbol of a formal system. I 
put quotes around “meaningless” because, as you know, the symbols automatically pick 
up passive meanings in accordance with the theorems they occur in. It is another 
question, though, whether people discover those meanings, for to do so requires finding a 
set of concepts which can be linked by an isomorphism to the symbols in the formal 
system. If one begins with the aim of formalizing geometry, presumably one has an 
intended interpretation for each symbol, so that the passive meanings are built into the 
system. That is what I did for p and q when I first created the pq-system. 
But there may be other passive meanings 
which are potentially perceptible, which no one has yet noticed. For instance, there wer
e the surprise interpretations of p as “equals” and q as “taken from”, in the original pq-system. Although this is rather a trivial 
example, it contains the essence of the idea that symbols may have many meaningful 
interpretations-it is up to the observer to look for them.

102

We can summarize our observations so far in terms of the word “consistency”. 
We began our discussion by manufacturing what appeared to be an inconsistent formal 
system-one which was internally inconsistent, a
s well as inconsistent with the external world. But a moment later we took it all back, w
hen we realized our error: that we had chosen unfortunate interpretations for the symbols. By changing the interpretations, we 
regained consistency! It now becomes clear that c
onsistency is not a property of a formal system per se, but depends on the interpretation which is proposed for it. By the same 
token, inconsistency is not an intrinsic property of any formal system.

102

We have been speaking of “consistency” and “inconsistency” all along, without 
defining them. We have just relied on good o
ld everyday notions. But now let us say exactly what is meant by consistency of a formal system (togethe
r with an interpretation): that every theorem, when interpreted, becomes a true statement. And we will say that 
inconsistency occurs when there is at least one false statement among the interpreted 
theorems.

102

This definition appears to be talking about inconsistency with the external world-what about internal inconsistencies? Presumably, a system would be internally 
inconsistent if it contained two or more theorems whose interpretations were 
incompatible with one another, and internally consistent if all interpreted theorems were 
compatible with one another. Consider, for example, a formal system which has only the 
following three theorems: TbZ, ZbE, and EbT. If T is interpreted as “the Tortoise”, Z 
as “Zeno”, E 
as “Egbert”, and x by as “x beats y in chess always”, then we have the following interpreted theorems:

103

T
he Tortoise always beats Zeno at chess Z
eno always beats Egbert at chess. E
gbert always beats the Tortoise at chess. The statements are not incompatible, although they describe a rather bizarre circle of 
chess players. Hence, under this interpretation, the form; system in which those three 
strings are theorems is internally consistent although, in point of fa
ct, none of the three statements is true! Intern< consistency does not require all theorems to come out true, but 
merely that they come out compatible 

103

Now suppose instead that x by is to be inter
preted as “x was invented by y”. Then w
e would have: T
he Tortoise was invented by Zeno. Z
eno was invented by Egbert. E
gbert was invented by the Tortoise. In this case, it doesn’t matter whether the individual statements are true c false-and 
perhaps there is no way to know which ones are true, and which are not. What is 
nevertheless certain is that not all three can be true at one Thus, the interpretation makes 
the system internally inconsistent. The internal inconsistency depends not on the 
interpretations of the three capital letters, but 
only on that of b, and on the fact that the three capita are cyclically permuted around t
he occurrences of b. Thus, one can have internal inconsistency without having interpreted all of the symbols of the formal system. 
(In this case it sufficed to interpret a single symbol.) By tl time sufficiently many symbols 
have been given interpretations, it may t clear that there is no way that the rest of them 
can be interpreted so that a theorems will come out true. But it is not just a question of 
truth-it is question of possibility. All three theorems would come out false if the capitals 
were interpreted as the names of real people-but that is not why we would call the system 
internally inconsistent; our grounds for doing s would be the circularity, combined with 
the interpretation of the letter I (By the way, you’ll find more on this “authorship triangle” 
i

103

We have given two ways of looking at consistency: the first says that system-plus-interpretation is consistent with the external world if every theorem comes out true 
when interpreted; the second says that a system-plus: interpretation is i
nternally consistent if all theorems come out mutually compatible when interpreted. Now there is a 
close relationship between these two types of consistency. In order to determine whether 
several statements at mutually compatible, you try to imagine a world in which all of 
them could be simultaneously true. There
fore, internal consistency depends upon consistency with the external world-only now, “the external world” allowed to be a
ny imaginable world, instead of the one we live in. But this is

104

an extremely vague, unsatisfactory conclusion. What constitutes an “imaginable” world? 
After all, it is possible to imagine a world in which three characters invent each other 
cyclically. Or is it? Is it possible to imagine a w
orld in which there are square circles? Is a world imaginable in which Newton’s laws, and not relativity, hold? Is it possible to 
imagine a world in which something can be simultaneously green and not green? Or a 
world in which animals exist which are not made of cells? In which Bach improvised an 
eight-part fugue on a theme of King Frederick the Great? In which mosquitoes are more 
intelligent than people? In which tortoises can play football-or talk? A tortoise talking 
football would be an anomaly, of course.

104

Some of these worlds seem more imaginable than others, since some seem to 
embody logical contradictions-for example, green and not green-while some of them 
seem, for want of a better word, “plausible” — such as Bach improvising an eight-part 
fugue, or animals which are not made of cells. Or even, come to think of it, a world in 
w
hich the laws of physics are different 
 Roughly, then, it should be possible to establish different brands of consistency. For instance, the most lenient would be “logical 
consistency”, putting no restraints on things a
t all, except those of logic. More specifically, a system-plus-interpretation would be logically consistent 
just as long as no two of its theorems, when interpreted as statements, directly contradict each other; and 
mathematically consistent just as long as interpreted theorems do not violate 
mathematics; and physically consistent just as long as all its interpreted theorems are 
compatible with physical law; then comes
biological consistency, and so on. In a biologically consistent system, there could be a theorem whose interpretation is the 
statement “Shakespeare wrote an opera”, but no theorem whose interpretation is the 
statement “Cell-less animals exist”. Generally speaking, these fancier kinds of 
inconsistency are not studied, for the reason that they are very hard to disentangle from 
one another. What kind of inconsistency, for example, should one say is involved in the 
problem of the three characters who invent each other cyclically?
Logical? Physical? Biological? Literary?

104

Usually, the borderline between uninterest
ing and interesting is drawn between physical consistency and mathematical consistency. (Of course, it is the mathematicians 
and logicians who do the drawing-hardly an impartial crew 
) This means that the kinds 
of inconsistency which “count”, for formal systems, are just the logical and mathematical 
kinds. According to this convention, then, we haven’t yet found an interpretation which 
makes the trio of theorems TbZ, ZbE, EbT inconsistent. We can do so by interpreting b 
as “is bigger than”. What about T and Z and E?
They can be interpreted as natural numbers-for example, Z as 0, T as 2, and E as 11. Notice that two theorems come out 
true this way, one false. If, instead, we had i
nterpreted Z as 3, there would have been two falsehoods and only one truth. But either way, we’d have had inconsistency. In fact, the 
values assigned to T, Z, and E 
are irrelevant, as long as it is understood that they are restricted to natural numbers. Once again we see a case where only some 
of the interpretatio

105

The preceding example, in which some symbols
could have interpretations while others didn’t, is reminiscent of doing geometry in natural languag4 using some words as 
undefined terms. In such a case, words are divide into two classes: those whose meaning 
is fixed and immutable, and, those whose meaning is to be adjusted until the system is 
consistent (these are th undefined terms). Doing geometry in this way requires that 
meanings have already been established for words in the first class, somewhere outside c 
geometry. Those words form a rigid skele
ton, giving an underlying structure to the system; filling in that skeleton comes other material, which ca vary (Euclidean or non-Euclidean geometry).

105

Formal systems are often built up in just this type of sequential, c hierarchical, 
manner. For example, Formal System I may be devised, wit rules and axioms that give 
certain intended passive meanings to its symbol Then Formal System I is incorporated 
fully into a larger system with more symbols-Formal System II. Since Formal System I’s 
axioms and rules at part of Formal System II, the passive meanings of Formal System I 
symbols remain valid; they form an immutable s
keleton which then plays large role in the determination of the passive meanings of the new symbols of Formal System II. The 
second system may in turn play the role of skeleton with respect to a third system, and so 
on. It is also possible-an geometry is a good example of this-to have a system (e.g., 
absolute geometry) which partly pins down the passive meanings of its undefined terms, 
and which can be supplemented by extra rules or axioms, which then further 
restrict the passive meanings of the undefined terms. This the case with Euclidean versus non-Euclidean geometry.

105

In a similar, hierarchical way, we a
cquire new knowledge, new vocabulary or perceive unfamiliar objects. It is particularly interesting in the case understanding 
d
rawings by Escher, such as Relativity (Fig. 22), in which there occur blatantly impossible images. You might think that we wo
n seek to reinterpret the picture over and over again until we came to interpretation of its parts which was free of contradictions-but we dot do that at all. We sit there amu
sed and puzzled by staircases which go eve which way, and by people going in inconsistent 
directions on a sing staircase. Those staircases are “islands of certainty” upon which
we base of interpretation of the overall picture. Having once identified them, we t
ry extend our understanding, by seeking to establish the relationship which they bear to o
ne another. At that stage, we encounter trouble. But if i attempted to backtrack-that i
s, to question the “islands of certainty”-s would also encounter trouble, of another sort. There’s no way of backtracking and 
“undeciding” that they are staircases. They are
not fishes, or whip or hands-they are just s
taircases. (There is, actually, one other on t-i leave all the lines of the picture totally uninterpreted, like the “meaningless

106

symbols” of a formal system. This ultimate escape route is an example of a “U-mode” 
response-a Zen attitude towards symbolism.)

106

So we are forced, by the hierarchical natur
e of our perceptive processes, to see either a crazy world or just a bunch of pointless lines. A similar analysis could be made 
of dozens of Escher pictures, which rely h
eavily upon the recognition of certain basic forms, which are then put together in nonstandard ways; and by the time the observer 
sees the paradox on a high level, it is too late-he can’t go back and change his mind about 
how to interpret the lower-level objects. The
difference between an Escher drawing and non-Euclidean geometry is that in the latter, comprehensible interpretations can be found 
for the undefined terms, resulting in a com

107

prehensible total system, whereas for the forme
r, the end result is not reconcilable with one’s conception of the world, no matter how long o
r stares at the pictures. Of course, one can still manufacture hypothetic worlds, in which
Escherian events can happen 
 but in such worlds, t1 laws of biology, physics, mathematics, or even logic will be violated on 
or level, while simultaneously being obeyed on another, which makes the: extremely 
weird worlds. (An example of this is in Waterfall (Fig. 5), whet normal gravitation 
applies to the moving water, but where the nature space violates the laws of physics.)

107

We have stressed the fact, above, that internal consistency of a form; system (together 
with an interpretation) requires that there be some imaginable 
world-that is, a world whose only restriction is that in it, mathematics and logic should be the same as in our 
world-in which all the interpreted theorems come out true. External 
consistency, however consistency with the external world-requires that all theorems come of true in the real 
world. Now in the special case where one wishes to create consistent formal system 
whose theorems are to be interpreted as statements of mathematics, it would seem that 
t
he difference between the two types of consistency should fade away, since, according to what we sat above, all imaginable worlds have the same mathematics as the real world.
Thus, i every conceivable world, 1 plus 1 would h
ave to be 2; likewise, there would have to be infinitely many prime numbers; furthermo
re, in every conceivable world, all right angles would have to be congruent; and of cours4 through any poin
t not on a given line t
he

107

But wait a minute! That’s the parallel postulate-and to assert i universality would 
be a mistake, in light of what’s just been s
aid. If in all conceivable worlds the parallel postulate-is obeyed, then we are asserting that non-Euclidean geometry is inconceivable, 
which puts us back in the same mental state as Saccheri and Lambert-surely an unwise 
move. But what, then, if not all of mathematics, must all conceivable worlds share?
Could it I as little as logic itself? Or is even logic suspect? Could there be worlds where 
contradictions are normal parts of existen
ce-worlds where contradictious are not contradictions?

107

Well, in some sense, by merely inventing the concept, we have shoe that such 
w
orlds are indeed conceivable; but in a deeper sense, they are al: quite inconceivable. (This in itself is a little contradiction.) Quite serious] however, it seems that if we want to 
be able to communicate at all, we ha, to adopt some common base, and it pretty well has 
to include logic. (The are belief systems which r
eject this point of view-it is too logical. particular, Zen embraces contradictions and non-c
ontradictions with equ eagerness. This may seem inconsistent, but then being inconsist
ent is pa of Zen, and so 
 what can one say?)

108

If we assume that logic is part of every conceivable
world (and note that we have not defined logic, but we will in Chapters to come), is that all? Is it really conceivable 
that, in some worlds, there are not infinitely many primes? Would it not seem necessary 
that numbers should obey the same laws in all conceivable worlds? Or 
 is the concept 
“natural number” better thought of as an undefined term, like “POINT” or “LINE”? In 
that case, number theory would be a bifurcated theory, like geometry: there would be 
standard and nonstandard number theories. But there would have to be some counterpart 
to absolute geometry: a “core” theory, an invariant ingredient of all number theories 
which identified them as number theories rather than, say, theories about cocoa or rubber 
or bananas. It seems to be the consensus of most modern mathematicians and 
philosophers that there is such a core numb
er theory, which ought to be included, along with logic, in what we consider to be “conceivable worlds”. This core of number theory, 
the counterpart to absolute geometry-is called Peano arithmetic, and we shall formalize it 
in Chapter VIII. Also, it is now well established-as a matter of fact as a direct 
consequence of Gödel’s Theorem-that number
theory is a bifurcated theory, with standard and nonstandard versions. Unlike the situation in geometry, however, the 
number of “brands” of number theory is infinite, which makes the situation of number 
theory considerably more complex.

108

For practical purposes, all number theories are the same. In other words, if bridge 
building depended on number theory (which in a
sense it does), the fact that there are different number theories would not matter, sinc
e in the aspects relevant to the real world, all number theories overlap. The same cannot be said of different geometries; for 
example, the sum of the angles in a triangle is 180 degrees only in Euclidean geometry; it 
is greater in elliptic geometry, less in hyper
bolic. There is a story that Gauss once attempted to measure the sum of the angles in a large triangle defined by three mountain 
peaks, in order to determine, once and for all, which kind of geometry really rules our 
universe. It was a hundred years later that Ein
stein gave a theory (general relativity) which said that the geometry of the universe is determined by its content of matter, so 
that no one geometry is intrinsic to space itself. Thus to the question, “Which geometry is 
true?” nature gives an ambiguous answer not only in mathematics, but also in physics. As 
for the corresponding question, “Which number theory is true?”, we shall have more to 
say on it after going through Gödel’s Theorem in detail.

108

If consistency is the minimal condition under which symbols acquire passive meanings, 
then its complementary notion, completeness, is the maximal confirmation of those 
passive meanings. Where consistency is the property

109

way round: “Every true statement is produced by the system”. Now I refine the notion 
slightly. We can’t mean every true statement in th world-we mean only those which 
belong to the domain which we at attempting to represent in the system. Therefore, 
completeness mean! “Every true statement which
can be expressed in the notation of the system is a theorem.” 
Consistency: when every theorem, upon interpretation, comes out true (in some 
imaginable world). 
Completeness: when all statements which are true (in some imaginable world), and 
which can be expressed as well-formed strings of the system, are 
theorems.

109

An example of a formal system which is complete on its own mode level is the 
original pq-system, with the original inter
pretation. All true additions of two positive integers are represented by theorems of th system. We might say this another way: “All 
true additions of two positive integers are provable within the system.” (Warning: When 
we start using th term “provable statements” instead of “theorems”, it shows that we at 
beginning to blur the distinction between formal systems and their interpretations. This is 
all right, provided we are very conscious of th 
blurring that is taking place, and provided that we remember that multiple interpretations are sometimes possible.) The pq-system 
with the origin interpretation is complete; it is also consistent, since no false statement is-, 
use our new phrase-provable within the system.

109

Someone might argue that the system is incomplete, on the grounds that additions 
of three positive integers (such as 2 + 3 + 4 =9) are not represented by theorems of the 
pq-system, despite being translatable into the notation of the system (e.g., -
-p---p----q------------). However, this string is not well-formed, and hence should be considered to I just 
as devoid of meaning as is p q p---q p q. Triple additions are simply not expressible 
in the notation of the system-so the completeness of the system is preserved.

109

Despite the completeness of the pq-system u
nder this interpretation, certainly falls far short of capturing the full notion of truth in numb theory. For example, there is no 
way that the pq-system tells us how mat prime numbers there are. Gödel’s 
Incompleteness Theorem says that any system which is “sufficiently powerful” is, by 
virtue of its power, incomplete, in the sense that there are well-formed strings which 
express tr statements of number theory, but which are not theorems. (There a truths 
belonging to number theory which are not provable within the system.) Systems like the 
pq-system, which are complete but not very powerful, are more like low-fidelity 
phonographs; they are so poor to beg with that 
it is obvious that they cannot do what we would wish them do-namely tell us everything about number theory.

110

What does it mean to say, as I did above, that “completeness is the maximal confirmation 
of passive meanings”? It means that if a system is consistent but incomplete, there is a 
mismatch between the symbols and their interpretations. The system does not have the 
power to justify being interpreted that way. Sometimes, if the interpretations are 
“trimmed” a little, the system can become complete. To illustrate this idea, let’s look at 
the modified pq-system (including Axiom Schema II) and the interpretation we used for 
i

110

After modifying the pq-system, we modified the interpretation for q from “equals” 
to “is greater than or equal to”. We saw that the modified pq-system was consistent under 
this interpretation; yet something about the n
ew interpretation is not very satisfying. The problem is simple: there are now many expressible truths which are not theorems. For 
instance, “2 plus 3 is greater than or equal to 1” is expressed by the nontheorem -
-p---q-. The interpretation is just too sloppy! It doesn’t accurately reflect what the theorems in the 
system do. Under this sloppy interpretation, the pq-system is not complete. We could 
repair the situation either by (1) adding new rules to the system, making it more 
powerful, or by (2) tightening up the interpretation.
In this case, the sensible alternative seems to be to tighten the interpretation. Instead of interpreting q 
as “is greater than or equal to”, we should say “equals or exceeds by 1”. Now the modified pq-system becomes 
both consistent and complete. And the completeness confirms the appropriateness of the 
i

110

In number theory, we will encounter incompleteness again; but there, to remedy the 
situation, we will be pulled in the other direction-towards adding new rules, to make the 
system more powerful. The irony is that we think, each time we add a new rule, that we 
surely have made the system complete now! The nature of the dilemma can be illustrated’ 
b
y the following allegory 
 We have a record player, and we also hav
e a record tentatively labeled “Canon on B-A-C-H”. However, when we play the record on the record player, the feedback-induced vibrations (as caused by the Tortoise’s records) interfere so much that we do not 
even recognize the tune. We conclude that something 
is defective-either our record, or our record player. In order to test our record, we would have to play it on friends’ record 
players, and listen to its quality. In order to test our phonograph,
we would have to play friends’ records on it, and see if the music we h
ear agrees with the labels. If our record player passes its test, then we will say the record was defective; contrariwise, if the 
record passes its test, then we will say our record player was defective. What, however, 
can we conclude when we find out that both 
pass their respective tests? That is the moment to remember the chain of two isomorphisms (Fig. 20), and think carefully!

121

Little Harmonic Labyrinth

121

Achilles: Oh, thank you so very much
, Genie. But curiosity is provoked. Before I make my wish, would you mind telling me who-or what-GOD is? 
Genie: Not at all. “GOD” is an acronym which stands “GOD Over Djinn”. 
The word “Djinn” is used designate Genies, Meta-Genies, Meta-Meta-G
en etc. It is a Typeless word. Achilles: But-but-how can “GOD” be a word in own acronym? That doesn’t 
make any sense! 
Genie: Oh, aren’t you acquainted with recursive acronyms? I thought 
everybody knew about them. \ see, “
GOD” stands for “GOD Over Djinn”-which can be expanded as 
“GOD Over Djinn, O, Djinn”-and that can, in turn, be expanded to “
G( Over Djinn, Over Djinn, Over D
jinn”-which can its turn, be further expanded 
 You can go as as y
ou like. Achilles: But I’ll never finish! 
G
enie: Of course not. You can never totally expand GOD. Achilles: Hmm 
 That’s puzzling. What did you me when you said to the 
Meta-Genie, “I have a sped wish to make of you, 0 Djinn, and of 
GOD”? 
Genie: I wanted not only to make a request of Meta-Genie, but also of all the 
Djinns over her. ‘I recursive acronym method accomplishes this qL 
naturally. You see, when the Meta-Genie received my request, she 
then had to pass it upwards to I GOD. So she forwarded a similar 
message to I Meta-Meta-Genie, who then did likewise to t Meta-Meta-Meta-Genie 
 Ascending the chain this way transmits the message to 
G

122

Achilles: I see. You mean GOD sits up at the top of the ladder of djinns? 
Genie: No, no, no! There is nothing “at the
top”, for there is no top. That is why GOD is a recursive acronym. GOD is not some ultimate djinn; 
G
OD is the tower of djinns above any given djinn. Tortoise: It seems to me that each an
d every djinn would have a different concept of what GOD is, then, sinc
e to any djinn, GOD is the set of djinns above him or her, and no two djinns share that set. 
Genie: You’re absolutely right-and since I am the lowest djinn of all, my 
notion of GOD is the most exalted o
ne. I pity the higher djinns, who fancy themselves somehow closer to GOD. What blasphemy! 
Achilles: By gum, it must have taken genies to invent GOD. 
Tortoise: Do you really believe all this stuff about GOD, Achilles? 
Achilles: Why certainly, I do. Are you atheistic, Mr. T? Or are you agnostic? 
Tortoise: I don’t think I’m agnostic. Maybe I’m metaagnostic. 
Achilles: Whaaat? I don’t follow you at all. 
Tortoise: Let’s see 
 If I were meta-agnostic, I’d be confused over whether 
I’m agnostic or not-but I’m not quite s
ure if I feel THAT way; hence I must be meta-meta-agnostic (I guess). Oh, well. Tell me, Genie, does 
any djinn ever make a mistake, and garble up a message moving up or 
down the chain? 
Genie: This does happen; it is the most common cause for Typeless Wishes 
not being granted. You see, the chances are infinitesimal, that a 
garbling will occur at any PARTICULA
R link in the chain-but when you put an infinite number of them in a row, it becomes virtually 
certain that a garbling will occur SOMEWHERE. In fact, strange as it 
seems, an infinite number of garblings usually occur, although they 
a
re very sparsely distributed in the chain. Achilles: Then it seems a miracle that any Typeless Wish ever gets carried 
o
ut. Genie: Not really. Most garblings are inconsequential, and many garblings 
tend to cancel each other out. But occasionally-in fact, rather seldom-the nonfulfillment of a Typeless Wish 
can be traced back to a single unfortunate djinn’s garbling. When t
his happens, the guilty djinn is forced t

123

Gauntlet and get paddled on his or her rump, by GOD. It’s good fun for the 
paddlers, and q harmless for the paddlee. You might be amused by the 
s
ight. Achilles: I would love to see that! But it only happens when a Typeless Wish 
goes ungranted? 
Genie: That’s right. 
Achilles: Hmm 
 That gives me an idea for my w Tortoise: Oh, really? What 
is it? Achilles: I wish my wish would not be granted! 
(
At that moment, an event-or is “event” the word for it? —takes place which cannot be described, and hence no attempt will be made to describe it.)

127

FIGURE 25. Cretan Labyrinth (Ital
ian engraving; School of Finiguerra). [From N Matthews, Mazes and Labyrinths: Their 
History and Development (New York: Dover Publications, 1970).

127

Tortoise: They say-although I person never believed it myself-that an I 
Majotaur has created a tiny labyrinth sits in a pit in the middle of 
it, waiting innocent victims to get lost in its fears complexity. 
T
hen, when they wander and dazed into the center, he laughs and laughs at them-so hard, that he laughs them to death!

129

Tortoise: Very simple. When I heard melody B-A-C-H i
n the top voice, I immediately realized that the grooves we’re walking through 
could only be Little Harmonic Labyrinth, one of Bach’s er known 
organ pieces. It is so named caus
e of its dizzyingly frequent modulations. 
Achilles: Wh-what are they? 
Tortoise: Well, you know that most music pieces are written in a key, or 
tonality, as C major, which is the key of this o; 
Achilles: I had heard the term before. Do that mean that C is the note 
you want to on? 
Tortoise: Yes, C acts like a home base, i
n a Actually, the usual word is “
tonic”. Achilles: Does one then stray away from tonic with the aim of eventually 
r
eturning Tortoise: That’s right. As the piece develops ambiguous chords and 
melodies are t which lead away from the tonic. Little by little, 
tension builds up-you feel at creasing desire to return home, to 
h
ear the tonic. Achilles: Is that why, at the end of a p
ie always feel so satisfied, as if I had waiting my whole life to hear the ton 
Tortoise: Exactly. The composer has uses knowledge of harmonic 
pr

130

manipulate your emotions, and to build up hopes in you to hear 
t
hat tonic. Achilles: But you were going to tell me about modulations. 
Tortoise: Oh, yes. One very important thing a composer can do is to 
“modulate” partway through a piece, which means that he sets up 
a temporary goal other than resolution into the tonic. 
Achilles: I see 
 I think. Do you mean that some sequence of chords 
shifts the harmonic tension somehow so that I actually desire to 
resolve in a new key? 
Tortoise: Right. This makes the situation more complex, for although in 
the short term you want to resolve i
n the new key, all the while at the back of your mind you retain the longing to hit that original 
goal-in this case, C major. And when the subsidiary goal is 
r

135

R
ecursive Structures a
nd Proc

135

WHAT IS RECURSION? It is what was illustrated in the Dialogue L
ittle Harmonic Labyrinth:
nesting, and variations on nesting. The concept is very general. (Stories inside stories, movies inside movies, paintings ins
ide paintings, Russian dolls inside Russian dolls (even parenthetical comments in. side parenthetical comments!)-these are just a few 
of the charms of recursion.) However, you should he aware that the meaning of 
“recursive’ in this Chapter is only faintly related to its meaning in Chapter 111. The 
r
elatio

135

Sometimes recursion seems to brush paradox very closely. For example, there are 
recursive definitions. Such a definition may give the casual viewer the impression that 
something is being defined in terms of itself.  
That would be circular and lead to infinite regress, if not to paradox proper. Actual
ly, a recursive definition (when properly formulated) never leads to infinite regress or paradox. This is because a recursive 
definition never defines something in terms of itself, but always in terms of s
impler versions of itself. What I mean by this will become clearer shortly, when ’ show some 
examples of recursive definitions.

135

One of the most common ways in which recursion appears in daily life is when 
you postpone completing a task in favor of a simpler task, often o the same type. Here is 
a good example. An executive has a fancy telephone and receives many calls on it. He is 
talking to A when B calls. To A he say,, “Would you mind holding for a moment?” Of 
course he doesn’t really car if A minds; he j
ust pushes a button, and switches to B. Now C calls. The same deferment happens to B. This could go on indefinitely, but let us not get 
too bogged down in our enthusiasm. So let’s say the call with C terminates. Then our 
e
xecutive “pops” back up to B, and continues. Meanwhile A is sitting at the other end of the line, drumming his fingernails again some table, and listening to some horrible 
Muzak piped through the phone lines to placate him 
 Now the easiest case is if the call 
with B simply terminates, and the executive returns to A finally. But it could 
happen that after the conversation with B is resumed, a
new caller-D-calls. B is once again pushed onto the stack of waiting callers, and D is taken
care of. Aft D is done, back to B, then back to A. This executive is hopelessly mecha
nical, to be sure-but we are illustrating recursion in its most precise form

136

In the preceding example, I have introduced some basic terminology of recursion-at least 
as seen through the eyes of computer scientists. The terms are push, pop, and stack 
(or push-down stack,
to be precise) and they are all related. They were introduced in the late 1950’s as part of IPL, one of 
the first languages for Artificial Intelligence. You have already encountered “push” and “pop” in t
he Dialogue. But I will spell things out anyway. To push means to suspend operations on the task you’re currently working on, 
without forgetting where you are-and to take up a
new task. The new task is usually said to be “on a lower level” than the earlier task. To pop is the reverse-it means to close 
operations on one level, and to resume operat
ions exactly where you left off, one level h
ighe

136

But how do you remember exactly where you were on each different level? The 
answer is, you store the relevant information in a stack.
So a stack is just a table telling you such things as (1) where you were in e
ach unfinished task (jargon: the “return address”), (2) what the relevant facts to know 
were at the points of interruption (jargon: the “variable bindings”). When you pop back up to resume some task, it is the stack 
which restores your context, so you don’t feel lost. In the telephone-call example, the 
stack tells you who is waiting on each different level, and where 
you were in the c
onversati

136

By the way, the terms “push”, “pop”, and “stack” all come from the visual image 
of cafeteria trays in a stack. There is usually some sort of spring underneath which tends 
to keep the topmost tray at a constant height, more or less. So when you push a tray onto 
the stack, it sinks a little-and when you remove a tray from the stack, the stack pops up a 
l

136

One more example from daily life. When 
you listen to a news report on the radio, oftentimes it happens that they switch you to some foreign correspondent. “We now 
switch you to Sally Swumpley in Peafog, England.” Now Sally has got a tape of some 
local reporter interviewing someone, so after giving a bit of background, she plays it. “I’m 
N
igel Cadwallader, here on scene just outside of Peafog, where the great robbery took place, and I’m talking with 
” Now you are three levels down. It may turn out that the 
interviewee also plays a tape of some conversation. It is not too uncommon to go down 
three levels in real news reports, and s
urprisingly enough, we scarcely have any awareness of the suspension. It is all kept track of quite easily by our subconscious mind. 
Probably the reason it is so easy is that each level is extremely different in flavor from 
each other level. If they were all similar, we would get confused in no time flat.

136

An example of a more complex recursio
n is, of course, our Dialogue. There, Achilles and the Tortoise appeared on all the different levels. Sometimes they were 
reading a story in which they appeared as characters. That is when your mind may get a 
little hazy on what’s going on, and you have to conc
entrate carefully to get things straight. “Let’s see, the real Achilles and Tortoise are still up there in Goodfortune’s helicopter, but 
the

137

secondary ones are in some Escher picture-and t
hen they found this book and are reading in it, so it’s the tertiary Achilles and Tortoise who wan
dering around inside the grooves of the Little Harmonic Labyrinth. wait a minute-I left out one level somewhere 
” You 
have to ha conscious mental stack like this in order to keep track of the recursion the 
D
i

137

FIGURE 26. Diagram of the structure of the Dialogue Little Harmonic Labyrinth
Vertical descents are “pushes”; rises ore “pops”
. Notice the similarity of this diagram to indentation pattern of the Dialogue. From the d
iagram it is clear that the initial tension Goodfortune’s threat-never was resolved; A
chilles and the Tortoise were just left dangling the sky. Some readers might agonize 
over this unpopped push, while others might not ba eyelash. In the story, Bach’s musical labyrinth likewise was cut off too soon-but Achilles d even notice anything funny. Only 
the Tortoise was aware of the more g
lobal

137

While we’re talking about the Little Harmonic Labyrinth, we should discuss 
something which is hinted at, if not stated explicitly in the Dialogue: that hear music 
recursively-in particular, that we maintain a mental stack of keys, and that each new 
modulation pushes a new key onto the stack. implic
ation is further that we want to hear that sequence of keys retrace reverse order
-popping the pushed keys off the stack, one by o
ne, until the tonic is reached. This is an exaggeration. There is a grain of truth to it h
oweve

137

Any reasonably musical person automatically maintains a shallow with two keys. 
In that “short stack”, the true tonic key is held and also most immediate “pseudotonic” 
(the key the composer is pretending t in). In other words, the most global key and the 
most local key. That the listener knows when the
true tonic is regained, and feels a strong s of “relief”. The listener can also distinguish (unlike Achilles) between a local 
easing of tension-for example a resolution into the pseudotonic —

138

and a global resolution. In fact, a pseudoresolutio
n should heighten the global tension, not relieve it, because it is a piece of irony-just like Achilles’ rescue from his perilous 
perch on the swinging lamp, when all the while 
you know he and the Tortoise are really a
waiting t

138

Since tension and resolution are the heart and soul of music, there are many, many 
examples. But let us just look at a couple in Bach. Bach wrote many pieces in an 
“AABB” form-that is, where there are two halves, and each one is repeated. Let’s take the 
gigue from the French Suite no. 5, which is quite typical of the form. Its tonic key is G, 
and we hear a gay dancing melody which e
stablishes the key of G strongly. Soon, however, a modulation in the A-section leads
to the closely related key of D (the dominant). When the A-section ends, we are in the key of D. In fact,
it sounds as if the piece has ended in the key of D! (Or at least it might sound that way to Achilles.) But 
then a strange thing happens-we abruptly jump back to the beginning, back to G, and 
rehear the same transition into D. But then a strange thing happens-we abruptly jump 
back to the beginning, back to G, and rehear the same transition into D.

138

Then comes the B-section. With the inversion of the theme for our melody, we 
begin in D as if that had always been the tonic-but we modulate back to G after all, which 
means that we pop back into the tonic, and t
he B-section ends properly. Then that funny repetition takes place, jerking us without warnin
g back into D, and letting us return to G once more. Then that funny repetition takes place, jerking us without warning 
back into D, and letting us return to G once more.

138

The psychological effect of all this key shifting-some jerky, some smooth-is very 
difficult to describe. It is part of the magic of music that we can automatically make sense 
of these shifts. Or perhaps it is the magic of B
ach that he can write pieces with this kind of structure which have such a natural grace to them that we are not aware of exactly 
w

138

The original Little Harmonic Labyrinth 
is a piece by Bach in which he tries to lose you in a labyrinth of quick key changes. 
Pretty soon you are so disoriented that you don’t have any sense of direction left-you don’t know where the true tonic is, unless you 
have perfect pitch, or like Theseus, have a f
riend like Ariadne who gives you a thread that allows you to retrace your steps. In this case, 
the thread would be a written score. This piece-another example is the Endlessly Rising Canon-goes to show that, as music 
listeners, we don’t have very reliable deep stacks.

138

Our mental stacking power is perhaps slightly stronger in language. The grammatical 
structure of all languages involves setting up q
uite elaborate push-down stacks, though, to be sure, the difficulty of understanding a sentence increases sharply with the number of 
pushes onto the stack. The proverbial German phenomenon of the “verb-at-the-end”, 
about

139

Droll tales of absentminded professors who would begin a sentence, ramble on for 
an entire lecture, and then finish up by rattling off a string of verbs by which their 
audience, for whom the stack had long sinc
e lost its coherence, would be totally nonplussed, are told, is an excellent example of linguistic pushing and popping. The 
confusion among the audience out-of-order popping from the stack onto which the 
professor’s verbs been pushed, is amusing to imagine, could engender. But in normal ken 
German, such deep stacks almost never occur-in fact, native speaker of German often 
unconsciously violate certain conventions which f
orce verb to go to the end, in order to avoid the mental effort of keeping track of 
the stack. Every language has constructions which involve stacks, though usually of a less spectacular nature than German. But there 
are always of rephrasing sentences so that the depth of stacking is minimal.

139

T
he syntactical structure of sentences affords a good place to present a of describing recursive structures and processes: the Recursive Transition Network (RTN). An RTN 
is a diagram showing various paths which can be followed to accomplish a particular task. 
Each path consists of a number of nodes, or little boxes with words in them, joined by 
arcs, or lines with arrows. The overall name for the RTN is written separately at the left, 
and the and last nodes have the words begin and end in them. All the other nodes contain 
either very short explicit directions to perform, or else name other RTN’s. Each time you 
hit a node, you are to carry out the direct inside it, or to jump to the RTN named inside it, 
a

139

Let’s take a sample RTN, called ORNATE NOUN,
which tells how to construct a certain type of English noun phrase. (See Fig. 27a.) If traverse O
RNATE NOUN purely horizontally, we begin’, then we create ARTICLE, an ADJECTIVE,
and a NOUN, then we end. For instance, “the shampoo” or “a thankless brunch”. But the arcs 
show other possibilities such as skipping the art
icle, or repeating the adjective. Thus we co construct “milk”, or “big red blue green sneezes”, etc.

139

When you hit the node NOUN, you are asking the unknown black I called N
OUN to fetch any noun for you from its storehouse of nouns. This is known as a p
rocedure call, in computer science terminology. It means you temporarily give control to a 
procedure (here, NOUN)
which (1) does thing (produces a noun) and then (2) hands control back to you. In above RTN, there are calls on three such procedures: ARTICLE,
ADJECTIVE and NOUN. Now the R TN ORNATE NOUN could itself be called from 
so other RTN-for instance an RTN called SENTENCE. In this case, O
RNATE NOUN would produce a phrase such as “the silly shampoo” and d return to the place inside 
SENTENCE from which it had been called. I quite reminiscent of the way in which you 
resume where you left off nested telephone calls or nested news reports.

139

However

140

FIGURE 27. Recursive Transition Networks for ORNATE NOUN and FANC

140

not exhibited any true recursion so far. Things get recursive-and seemingly circular-when 
you go to an RTN such as the one in Figure 27b, for FANCY NOUN.
As you can see, every possible pathway in FANCY NOUN involves a call on ORNATE NOUN,
so there is no way to avoid getting a noun of some sort or other. And it is possible to be no more 
ornate than that, coming out merely with “mi
lk” or “big red blue green sneezes”. But three of the pathways involve recursive calls on FANCY NOUN 
itself. It certainly looks as if something is being defined in terms of itself. Is that what is happening, or not?

140

The answer is “yes, but benignly”. Suppose that, in the procedure SENTENCE,
there is a node which calls FANCY NOUN, and we hit that node. This means that we 
commit to memory (viz., the stack) the location of that node inside SENTENCE, so we’ll 
know where to return to-then we transfer our attention to the procedure FANCY NOUN.
Now we must choose a pathway to take, in order to generate a FANCY NOUN.
Suppose w
e choose the lower of the upper pathways-the one whose calling sequence goes: ORNATE NOUN; RELATIVE PRONOU

141

So we spit out an ORNATE NOUN: “the strange bagels”; a RELATIVE NOUN:
“that”; and now we are suddenly asked for a FANCY NOUN. B are in the middle of 
FANCY NOUN! Yes, but remember our executive was in the middle of one phone call 
when he got another one. He n stored the old phone call’s status on a stack, and began the 
new one nothing were unusual. So we shall do the same.

141

We first write down in our stack the node we are at in the outer call on F
ANCY NOUN, so that we have a “return address”; then we jump t beginning of F
ANCY NOUN as if nothing were unusual. Now we h~ choose a pathway again. For variety’s sake, let’s 
choose the lower pat] ORNATE NOUN; PREPOSITION; FANCY NOUN.
That means we produce an ORNATE NOUN (say “the purple cow”), then a P
REPOSITION (say “without”
), and once again, we hit the recursion. So we hang onto our hats descend one more level. To avoid complexity, let’s assume that this the pathway we take is the 
direct one just ORNATE NOUN. For example: we might get “horns”. We hit the node 
END in this call on FANCY NOUN which amounts to popping out, and so we go to our 
stack to find the return address. It tells us that we were in the middle of executing 
FANCY NOUN one level up-and so we resume there. This yields “t
he purple cow without horns”. On this level, too, we hit END, and so we pop up once more, this finding 
ourselves in need of a VERB-so let’s choose “gobbled”
. This ends highest-level call on FANCY NOUN,
with the result that the phrase “
the strange bagels that the purple cow without horns gobbled” will get passed upwards to the patient SENTENCE, as we pop for the last time.

141

As you see, we didn’t get into any infini
te regress. The reason is tl least one pathway inside the RTN FANCY NOUN does not involve recursive calls on F
ANCY NOUN itself. Of course, we could have perve
rsely insisted on always choosing the bottom pathway inside FANCY NOUN 
then we would never have gotten finished, just as the acronym “GOD”
never got fully expanded. But if the pathways are chosen at random, an infinite regress of that sort will not happen.

141

This is the crucial fact which distinguishes recursive definitions from circular 
ones. There is always some part of the defin
ition which avoids reference, so that the action of constructing an object which satisfies the definition will eventually “bottom 
o

141

Now there are more oblique ways of achieving recursivity in RTNs than by self-calling. There is the analogue of Escher’s Drawing 
(Fig. 135), where each of two procedures calls the other, but not itself. For example, we could have an RTN named 
CLAUSE, which calls FANCY NOUN whenever it needs an object for a transitive verb, 
and conversely, the u path of FANCY NOUN could call RELATIVE PRONOUN 
and then CLAUSE

142

whenever it wants a relative clause. This is an example of indirect recursion. It is 
reminiscent also of the two-step version of the Epimenides paradox.

142

N
eedless to say, there can be a trio of procedures which call one another, cyclically-and so on. There can be a whole family of RTN’s which are all tangled up, 
calling each other and themselves like crazy. A program which has such a structure in 
which there is no single “highest level”, or “monitor”, is called a heterarchy (as 
distinguished from a hierarchy). The term is due, I believe, to Warren McCulloch, one of 
the first cyberneticists, and a reverent student of brains and minds.

142

One graphic way of thinking about RTN’s is this. Whenever you are moving along some 
pathway and you hit a node which calls on an RTN,
you “expand” that node, which means to replace it by a very small copy of the RTN 
it calls (see Fig. 28). Then you proceed into the very small RTN,
FIGURE 28. The F ANCY NOUN RTN w
it

142

When you pop out of it, you are automatically in the right place in the big one. While in 
the small one, you may wind up constructing even more miniature RTN’s. But by 
expanding nodes only when you come across them, you avoid the need to make an 
infinite diagram, even when an RTN calls itself.

142

Expanding a node is a little like replacing a letter in an acronym by the word it 
stands for. The “GOD” acronym is recursive but has the defect-or advantage-that you 
must repeatedly expand the `G’; thus it never bottoms out. When an RTN is implemented 
as a real computer program, however, it alway
s has at least one pathway which avoids recursivity (direct or indirect) so that infinite regress is not created. Even the most 
heterarchical program structure bottoms out-otherwise it couldn’t run! It would just be 
constantly expanding node after node, but never performing any action.

143

Infinite geometrical structures can be defined in just this way-that is by expanding 
node after node. For example, let us define an infinite diagram called “Diagram G”. To 
do so, we shall use an implicit representation. In two nodes, we shall write merely the 
letter `G’, which, however, will stand for an entire copy of Diagram G. In Figure 29a, 
Diagram G is portrayed implicitly. Now if we wish to see Diagram G more explicitly, we 
expand each of the two G’s-that is, we replace them by the same diagram,
only reduced in scale (see Fig. 29b). This “second-order” version of Diagram gives us an inkling of 
what the final, impossible-to-realize Diagram G
really looks like. In Figure 30 is shown a larger portion of Diagram G, where all the nodes have been numbered from the bottom 
up, and from left to right. Two extra nodes-numb
ers — 1 and 2--- have been inserted at the bottom 
This infinite tree has some very curious mathematical properties Running up its 
right-hand edge is the famous sequence of Fibonacci numbers.
1
, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, discovered around the year 1202 by Leonardo o
f Pisa, son of Bonaccio, ergo “Filius Bonacci”, or “Fibonacci” for short. These numbers are best

143

FIGURE 29. (a) Diagram G, unexpanded. (c) Diagram H, unexpanded 
(b) Diagram G, expanded once. (d) Diagram H, expanded once

144

F
IGUR

144

defined recursively by the pair of formulas 
FIBO(n) = FIBO(n- 1) + FIBO(n-2) for n > 2 
FIBO(l) = FIBO(
2) = 1 Notice how new Fibonacci numbers are defined in terms of previous Fibonacci numbers. 
We could represent this pair of formulas in an RTN 
(see Fig. 31). FIGURE 31. An RTN 
for Fibonacci numbers. Thus you can calculate FIBO(
15) by a sequence of recursive calls on the procedure defined by the RTN above. This recursive definition bottoms out when you hit FIBO(

  1. or FIBO(2) (which are given explicitly) after
    you have worked your way backwards through descending values of n. It is slight
    ly awkward to work your way backwards, when you could just as well work your way forwards, starting with FIBO(l) and FIBO(
  2. and always adding the most recent two values, until you reach FIBO(
    15). That way you don’t need to keep track of a stack.

144

Now Diagram G has some even more surpr
ising properties than this. Its entire s

145

G(n) = n - G(G(n- 1)) for n > 0 
G(
O) = 0 How does this function G(n) code for the tree-structure? Quite simply you construct a 
tree by placing G(n) below n, for all values of n, you recreate Diagram G.
In fact, that is how I discovered Diagram G in the place. I was investigating the f unction G,
and in trying to calculate its values quickly, I conc
eived of displaying the values I already knew in a tree. T surprise, the tree turned out to have this extremely orderly recursive 
geometrical description.

145

What is more wonderful is that if you make the analogous tree function H(n) 
defined with one more nesting than G
— H(n) = n - H(H(H(n - 1))) for n > 0 
H(
0) = 0 —then the associated “Diagram H” is defined implicitly as shown in Figure 29c. The 
right-hand trunk contains one more node; that 
is the difference. The first recursive expansion of Diagram H 
is shown in Figure 29d. And so it goes, for any degree of nesting. There is a beautiful regularity to the recursive geometrical structures, which 
c
orrespond

145

A problem for curious readers is: suppose you flip Diagram G 
around as if in a mirror, and label the nodes of the new tree so 
they increase left to right. Can you find a recursive algebraic definition for this “flip-tree. What about for the “flip” of the H-
tree? Etc.? 
Another pleasing problem involves a pair o
f recursively intertwined functions F(n) and M(n) — “married” functions, you might say — defined this way: 
F(n) = n - M(F(
n- 1)) For n 

0 M(n) = n - F(M(
n- 1)) F(0) = 1, and M(

  1. = 0 The RTN’s for these two functions call each other and themselves as well. The 
    problem is simply to discover the recursive structures of Diagram F; and Diagram M.
    They are quite elegant and simple.

145

One last example of recursion in number theory leads to a small my Consider the 
f
ollowing recursive definition of a function: Q(n) = Q(n - Q(n- 1)) + Q(n - Q(n-2)) for n > 2 
Q(1) = Q(

146

It is reminiscent of the Fibonacci definition in that each new value is a sum of two 
previous values-but not of the immediately 
previous two values. Instead, the two immediately previous values tell how far to count back to obtain the numbers to be added 
to make the new value! The first 17 Q-numbers run as follows: 
1
, 1, 2, 3, 3, 4, 5, 5, 6, 6, 6, 8, 8, 8, 10, 9, 10, 
 
 5 + 6 = 11 how far to move to the left 
New term 
To obtain the next one, move leftwards (from the three dots) respectively 10 and 9 terms; 
you will hit a 5 and a 6, shown by the arrows. Their sum-1 l-yields the new value: Q(1
8). This is the strange process by which the list of known Q-numbers is used to extend itself. 
The resulting sequence is, to put it mildly, er
ratic. The further out you go, the less sense it seems to make. This is one of those very peculiar cases where what seems to be a 
somewhat natural definition leads to extremely puzzling behavior: chaos produced in a 
very orderly manner. One is naturally led to 
wonder whether the apparent chaos conceals some subtle regularity. Of course, by definition, t
here is regularity, but what is of interest is whether there is another way of chara
cterizing this sequence-and with luck, a n
o

146

The marvels of recursion in mathematics are innumerable, and it is not my purpose to 
present them all. However, there are a couple of particularly striking examples from my 
own experience which I feel are worth presenting. They are both graphs. One came up in 
the course of some number-theoretical investigations. The other came up in the course of 
my Ph.D. thesis work, in solid state physics. Wha
t is truly fascinating is that the graphs a
re closely related. The first one (Fig. 32) is a graph of a function which I call INT(
x). It is plotted here for x between 0 and 1. For x between any o
ther pair of integers n and n + 1, you just find INT(x-n), then add n back. The structure of the plot is quite jumpy, as you can see. It 
consists of an infinite number of curved pieces, which get smaller and smaller towards 
the corners-and incidentally, less and less cur
ved. Now if you look closely at each such piece, you will find that it is actually a copy of the full graph, merely curved! The 
implications are wild. One of them is that the graph of INT 
consists of nothing but copies of itself, nested down infinitely deeply. If you pick up any piece of the graph, no matter 
how small, you are holding a complete copy of the whole graph-in fact, infinitely many 
c

146

The fact that INT consists of nothing but copies of itself might make you think it is too 
ephemeral to exist. Its definition sounds too circular.

147

FIGURE 32. Graph of the function INT(x). There i
s a jump discontinuity at every rat v

147

How does it ever get off the ground? That is a very interesting matter. main thing to 
notice is that, to describe INT to someone who hasn’t see it will not suffice merely to say, 
“It consists of copies of itself.” The o half of the story-the nonrecursive half-tells w
here those copies lie in the square, and how they have been deformed, relative to the full 
graph. Only the combination of these two aspects
of INT will specify structure of INT. It is exactly as in the definition of Fibonacci number where you need two lines-one to 
define the recursion, the other to de the bottom 
(i.e., the values at the beginning). To be very concrete, if make one of the bottom values 3 instead of 1, you will produce a 
completely different sequence, known as the Lucas sequence: 
1
, 3, 4 , 7, 11, 18, 29, 47, 76, 123, .. . the “bottom” 
29 + 47 = 76  
same recursive rule 

148

What corresponds to the bottom in the definition of INT
is a picture (Fig. 33a) composed of many boxes, showing where the copies go, and how 
they are distorted. I call it the “skeleton” of INT. To construct INT from its skeleton, you do the following. First, 
for each box of the skeleton, you do two operations: (1) put a small curved copy of the 
s
keleton inside the box, using the curved line inside it as a guide; (2) erase the containing box and its curved line. Once this has been 
done for each box of the original skeleton, you are left with many “baby” skeletons in 
place of one big one. Next you repeat the process one level down, with all the baby skeletons. Then again, again, and again 
 What 
you approach in the limit is an exact graph of INT,
though you never get there. By nesting the skeleton inside itself over and o
ver again, you gradually construct the graph of INT “from out of nothing”. But in fact the “
nothi

148

To see this even more dramatically, ima
gine keeping the recursive part of the definition of INT, but changing the initial picture, t
he skeleton. A variant skeleton is shown in Figure 33b, again with boxes which get smaller and smaller as they trail off to 
the four corners. If you nest this second s
keleton inside itself over and over again, you will create the key graph from my Ph.D. thesis, which I call Gplot 
(Fig. 34). (In fact, some complicated distortion of each copy is needed as well-but nesting is the basic idea.).

148

Gplot is thus a member of the INT-family. It is a distant relative, because its 
skeleton is quite different from-and considerably more complex than-that of INT. 
However, the recursive part of the definition is identical, and therein lies the family tie. 
I should not keep you too much in the dark ab
out the origin of these beautiful graphs. INT-standing for “interchange”-comes from a problem involving “Eta-sequences”, which 
are related to continued fractions. The basic idea behind INT is that plus and minus signs 
are interchanged in a certain kind of continued fraction. As a consequence, INT(INT(
x)) = x. INT has the property that if x is rational, so is INT(x); if x is quadratic, so is INT(
x). I do not know if this trend holds for higher alg
ebraic degrees. Another lovely feature of INT is that at all rational values of x, it has a jump discontinuity, but at all irrational 
v

148

Gplot comes from a highly idealized version of the question, “What are the 
allowed energies of electrons in a crystal in a magnetic field?” This problem is interesting 
because it is a cross between two very simple and fundamental physical situations: an 
electron in a perfect crystal, and an electron in a homogeneous magnetic field. These two 
simpler problems are both well understood, and their characteristic solutions seem almost 
incompatible with each other. Therefore, it is of quite some interest to see how nature 
manages to reconcile the two. As it happens, the crystal without-magnetic-field situation 
and the magnetic-field-without-crystal situation do have one feature in common: in each 
of them, the electron behaves periodically in time. It turns out that when the two 
situations are combined, the ratio of their two time periods is the key parameter. In fact, 
that ratio holds all the information about the d
istribution of allowed electron energies-but it only 

149

149

Figure 33

150

G
plot shows that distribution. The horizontal axis represents energy, and the vertical axis represents the above-mentioned ratio of time periods, which we can call “α”
. At the bottom, a is zero, and at the top a is unity. When a is zero, there is no magnetic 
field. Each of the line segments making up Gplot
is an “energy band”-that is, it represents allowed values of energy. The empty swaths t
raversing Gplot on all different size scales are therefore regions of forbidden energy. One of the most startling properties of Gplot is 
that when a is rational (say p/q in lowest terms), there are exactly q 
such bands (though when q is even, two of them “kiss” in the middle). And when a 
is irrational, the bands shrink to points, of which there are infinitely many, very sparsely distributed in a so-c
alle

150

You might well wonder whether such an intri
cate structure would ever show up in an experiment. Frankly, I would be the most surprised person in the world if Gplot came 
out of any experiment. The physicality of Gplot
lies in the fact that it points the way to the proper mathematical treatment of less idealized problems of this sort. In other words, 
Gplot is purely a contribution to theoretical physics, not a hint to experimentalists as to 
what to expect to see! An agnostic friend of mine once was so struck by Gplot’s infinitely 
many infinities that he called it “a picture of God”, which I don’t think is blasphemous at 
a

150

We have seen recursion in the grammars of languages, we have seen recursive 
geometrical trees which grow upwards forever
, and we have seen one way in which recursion enters the theory of solid state p
hysics. Now we are going to see yet another way in which the whole world is built out of r
ecursion. This has to do with the structure of elementary particles: electrons, pro
tons, neutrons, and the tiny quanta of electromagnetic radiation called “photons”. We a
re going to see that particles are-in a certain sense which can only be defined rigorously in relativistic quantum mechanics — 
nested inside each other in a way which can 
be described recursively, perhaps even by some sort of “grammar”.

150

We begin with the observation that if particles didn’t interact with each other, 
things would be incredibly simple. Physicists 
would like such a world because then they could calculate the behavior of all particles e
asily (if physicists in such a world existed, which is a doubtful proposition). Particles without interactions are called bare particles,
and they are purely hypothetical creations; they don’t exist. 
N
ow when you “turn on” the interactions, then particles get tangled up together in the way that functions F and M are tangled together, or married people are tangled 
together. These real particles are said to be renormalized-an ugly but intriguing term. 
What happens is that no particle can even 
be defined without referring to all other particles, whose definitions in turn depend o
n the first particles, etc. Round and round, in a

151

Figure 34. Gplot; a recursive graph, showing ene
rgy bands for electrons in an idealized crystal in a magnetic field, α 
representing magnetic field strength, runs vertically from 0 t
o 1. Energy runs horizontally. The horizontal line segments are bands of allowed e

152

Let us be a little more concrete, now. Let’s limit ourselves to only two kinds of 
particles: electrons and photons. We’ll also have to throw in the electron’s antiparticle, the 
positron. (Photons are their own antiparticles.) Ima
gine first a dull world where a bare electron wishes to propagate from point A to point B, as Zeno did in my T
hree-Part Invention. 
A physicist would draw a picture like this: There is a mathematical expression which cor
responds to this line and its endpoints, and it is easy to write down. With it, a physicist
can understand the behavior of the bare e
lectron 

152

Now let us “turn on” the electromagnetic 
interaction, whereby electrons and photons interact. Although there are no photons in 
the scene, there will nevertheless be profound consequences even for this simple t
rajectory. In particular, our electron now becomes capable of emitting and then reabsorbing virtual photons-
photons which flicker i
n and out of existence before they can be seen. Let us show one such process: Now as our electron propagates, it may emit an
d reabsorb one photon after another, or it may even nest them, as shown below: 
The mathematical expressions corresponding to these diagrams-called “Feynman 
diagrams”-are easy to write down, but they are harder to calculate than that for the bare 
electron. But what really complicates matters i
s that a photon (real or virtual) can decay for a brief moment into an electron-positron pair. Then these two annihilate each other, 
and, as if by magic, the original photon reappears. This sort of process is shown below: 
The electron has a right-pointing arrow, while the positron’s arrow points leftwards.

153

As you might have anticipated, these virtual
processes can be inside each other to arbitrary depth. This can give rise to some complicated-looking drawings, such as the one 
in Figure 35. In that man diagram, a single electron enters on the left at A, does some an 
acrobatics, and then a single electron emerges on the right at B. outsider who can’t see the 
inner mess, it looks as if one electron peacefully sailed from A to B. In the diagram, you 
can see how el lines can get arbitrarily embellished, and so can the photon lines diagram 
w
ould be ferociously hard to calculate. .
FIGURE 35. A Feynman diagram showing the
propagation of a renormalized electron from A to B. In this diagram, time increases
to the right. Therefore, in the segments w
here the electron’s arrow points leftwards, it is moving “backwards in time”. A more intuitive way to say this is that an antielectron (positron) is moving forwards in time. 
P
hoto

153

There is a sort of “grammar” to these diagrams, that only certain pictures to be 
realized in nature. For instance, the one be impossible: 
You might say it is not a “well-formed” Feynman diagram. The gram a result of basic 
laws of physics, such as conservation of ene
rgy, conservation of electric charge, and so on. And, like the grammars of l - languages, this grammar has a recursive structure, in 
that it allow’ nestings of structures inside each 
other. It would be possible to drat set of recursive transition networks defining the “grammar” of the electromagnetic interaction.

153

When bare electrons and bare photons are al
lowed to interact ii arbitrarily tangled ways, the result is renormalized electrons and ph Thus, to u
nderstand how a real, physical electron propagates from A to B,

154

the physicist has to be able to take a sort of average of all the infinitely many different 
p
ossible drawings which involve virtual particles. This is Zeno with a vengeance! Thus the point is that a physical particle-a renormalized particle involves (1) a 
b
are particle and (2) a huge tangle of virtual particles, inextricably wound together in a recursive mess. Every real particle’s exist
ence therefore involves the existence of infinitely many other particles, contained in a virtual “cloud” which surrounds it as it 
propagates. And each of the virtual particles i
n the cloud, of course, also drags along its own virtual cloud, and so on ad infinitum.

154

Particle physicists have found that this complexity is too much to handle, and in 
order to understand the behavior of electrons and photons, they use approximations 
which neglect all but fairly simple Feynman diagrams. Fortunately, the more complex a 
diagram, the less important its contribution. There is no known way of summing up all of 
the infinitely many possible diagrams, to get an expression for the behavior of a fully 
renormalized, physical electron. But by considering roughly the simplest hundred 
diagrams for certain processes, physicists have been able to predict one value (the so-called g-factor of the muon) to nine decimal places — correctly! 
Renormalization takes place not only among electrons and photons. Whenever 
any types of particle interact together, physicists use the ideas of renormalization to 
understand the phenomena. Thus protons and neutrons, neutrinos, pi-mesons, quarks-all 
the beasts in the subnuclear zoo they all have bare and renormalized versions in physical 
theories. And from billions of these bubbles wi
thin bubbles are all the beasts and baubles of the world composed.

154

Let us now consider Gplot once again. You will remember that in the 
Introduction, we spoke of different varieties
of canons. Each type of canon exploited some manner of taking an original theme and copying it by an isomorphism, or 
information-preserving transformation. Sometimes the copies were upside down, 
sometimes backwards, sometimes shrunken or expanded 
 In Gplot we have all those 
types of transformation, and more. The mappings between the full Gplot and the “copies” 
of itself inside itself involve size changes, skewings, reflections, and more. And yet there 
remains a sort of skeletal identity, which t
he eye can pick up with a bit of effort, particularly after it has practiced with INT.

154

Escher took the idea of an object’s parts being copies of the object itself and made 
it into a print: his woodcut Fishes and Scales 
(Fig. 36). Of course these fishes and scales are the same only when seen on a sufficiently 
abstract plane. Now everyone knows that a fish’s scales aren’t really small copies of the fish; and a fish’s cells aren’t small copies of 
the fish; however, a fish’s DNA, sitting inside each and every one of the fish’s cells, is a 
v

155

F
IGURE 36. Fish and Scales, by M. C. Escher (woodcut, 1959). luted “copy” of the entire fish-and so there is more than a grain of truth to the Escher 
p

155

What is there that is the “same” about all butterflies? The mapping from one 
butterfly to another does not map cell onto cell; rather, it m; functional part onto 
functional part, and this may be partially on a macroscopic scale, partially on a 
microscopic scale. The exact proportions of p
a are not preserved; just the functional relationships between parts. This is the type of isomorphism which links all butterflies in 
Escher’s wood engraving Butterflies (Fig. 37) to each other. The same goes for the more 
abstract butterflies of Gplot, which are all linked to each other by mathematical mappings 
that carry functional part onto functional par
t, but totally ignore exact line proportions, a

155

Taking this exploration of sameness to a yet higher plane of abstraction, we might 
well ask, “What is there that is the `same’ about all Esc l drawings?” It would be quite 
ludicrous to attempt to map them piece by piece onto each other. The amazing thing is 
that ev

156

F
IGURE 37. Butterflies, by M. C. Escher (wood-engraving, 1950). Escher drawing or a Bach piece gives it away. Just as a fish’s DNA 
is contained inside every tiny bit of the fish, so a creator’s “sig
nature” is contained inside every tiny section of his creations. We don’t know what to call it but “style” — a vague and elusive word.

156

We keep on running up against “sameness-in-differentness”, and the question 
When are two things the same? 
It will recur over and over again in this book. We shall come at it from 
all sorts of skew angles, and in the end, we shall see how deeply this simple question is connected with the 
nature of intelligence. 
T
hat this issue arose in the Chapter on recursion is no accident, for recursion is a domain where “sameness-in-differentness” plays
a central role. Recursion is based on the “same” thing happening on several differ-

157

ent levels at once. But the events on different levels aren’t exactly same-rather, we find 
some invariant feature in them, despite many s in which they differ. For example, in the 
Little Harmonic Labyrinth, all stories on different level
s are quite unrelated-their “sameness” reside only two facts: (1) they are
stories, and (2) they involve the Tortoise and Achilles. Other than that, they are radically different from each other.

157

One of the essential skills in computer programming is to perceive wl two processes are 
the same in this extended sense, for that leads modularization-the breaking-up of a task 
into natural subtasks. For stance, one might want a sequence of many similar operations 
to be cart out one after another. Instead of writing them all out, one can write a h which 
tells the computer to perform a fixed set of operations and then loop back and perform 
them again, over and over, until some condition is satisfied. Now the body 
of the loop-the fixed set of instructions to repeated-need not actually be completely fixed. It may vary in 
s
o predict

157

An example is the most simple-minded test for the primality o natural number N, 
i
n which you begin by trying to divide N by 2, then 3, 4, 5, etc. until N - 1. If N has survived all these tests without be divisible, it’s prime. Notice that each step in the loop is 
similar to, but i the same as, each other step. Notice also that the number of steps varies 
with N-hence a loop of fixed length could never work as a general test primality. There 
are two criteria for “aborting” the loop: (1) if so number divides N exactly, quit with 
answer “NO”; (2) if N - 1 is react as a tes
t divisor and N survives, quit with answer “
YES”. The general idea of loops, then, is this: perform some series of related steps over 
and over, and abort the process when specific conditions are n Now sometimes, the 
maximum number of steps in a loop will be known advance; other times, you just begin, 
and wait until it is aborted. The second type of loop — which I call a free 
loop — is dangerous, because criterion for abortion may never occur, leaving the computer in a so-cal “infinite loop”. This distinction between bounded loops and free loops is one the most 
important concepts in all of computer scienc
e, and we shall dev an entire Chapter to it: “

157

Now loops may be nested inside each other
. For instance, suppose t we wish to test all the numbers between 1 and 5000 for primality. We c write a second loop which 
uses the above-described test over and over s
tarting with N = I and finishing with N = 5000. So our program i have a “loop-the-loop” structure. Such program structures are 
typical – in fact they are deemed to be good programming style. Thi
s kind of nest loop also occurs in assembly instructions for commonplace items, and such activities as 
knitting or crocheting-in which very small loops are

158

repeated several times in larger loops, which in turn are carried out repeatedly 
 While 
the result of a low-level loop might be no more 
than couple of stitches, the result of a high-level loop might be a substantial portion of a piece of clothing.

158

In music, too, nested loops often occur-as, for instance, when a scale (a small 
loop) is played several times in a row, perhaps displaced in pitch each new time. For 
example, the last movements of both the
Prokofiev fifth piano concerto and the Rachmaninoff second symphony contain extended passages in which fast, medium, and 
slow scale-loops are played simultaneously by different groups of instruments, to great 
effect. The Prokofiev scales go up; the Rachmaninoff-scales, down. Take your pick.

158

A more general notion than loop is that o
f subroutine, or procedure, which we have already discussed somewhat. The basic i
dea here is that a group of operations are lumped together and considered a single unit with a name-such as the procedure 
ORNATE NOUN. As we saw in RTN’s, procedures can call each other by name, and 
thereby express very concisely sequences of o
perations which are to be carried out. This is the essence of modularity in programmin
g. Modularity exists, of course, in hi-fi systems, furniture, living cells, human soc
iety-wherever there is hierarchical o

158

More often than not, one wants a proced
ure which will act variably, according to context. Such a procedure can either be given a
way of peering out at what is stored in memory and selecting its actions accordingly
, or it can be explicitly fed a list of parameters which guide its choice of what actions to take. Sometimes both of these 
methods are used. In RTN terminology, choosing the sequence of actions to carry out 
amounts to choosing which pathway to follow. An RTN 
which has been souped up with parameters and conditions that control the c
hoice of pathways inside it is called an Augmented Transition Network (ATN). A place where you might prefer ATN’s to 
RTN’s is in producing sensible-as distinguished from nonsensical-English sentences out of raw 
words, according to a grammar represented in a set of ATN’s. The parameters and 
conditions would allow you to insert various semantic constraints, so that random 
juxtapositions like “a thankless brunch” would
be prohibited. More on this in Chapter X

158

A classic example of a recursive procedure with parameters is one for choosing the “best” 
move in chess. The best move would seem to be the one which leaves your opponent in 
the toughest situation. Therefore, a test for goodness of a move is simply this: pretend 
you’ve made the move, and now evaluate the board from the point of view of your 
opponent. But how does your opponent evaluate the position? Well, he looks for his 
best move. That is, he mentally runs through all possible moves and evaluates them from what 
he 

159

notice that we have now defined “best move” recursively, simply maxim that what is best 
for one side is worst for the other. The procedure which looks for the best move operates 
by trying a move and then calling on itself in the role of opponent!
As such, it tries another n calls on itself in the role of its opponent’s opponent-that is, its

159

This recursion can go several levels deep-but it’s got to bottom out somewhere! 
How do you evaluate a board position without looking There are a number of useful 
criteria for this purpose, such as si number of pieces on each side, the number and type of 
pieces undo the control of the center, and so o
n. By using this kind of evaluation at the bottom, the recursive move-generator can p
op back upwards an( evaluation at the top level of each different move. One of the parameters in the self-calling, then, must tell 
how many moves to look ahead. TI most call on the procedure will use some externally 
set value parameter. Thereafter, each time the procedure recursively calls must decrease 
this look-ahead parameter by 1. That way, w parameter reaches zero, the procedure will 
f

159

In this kind of game-playing program, each move investigate the generation of a 
so-called “look-ahead tree”, with the move trunk, responses as main branches, counter-responses as subsidiary branches, and so on. In Figure 38 I have shown a simple look-ahead tree depicting the start of a tic-tar-toe game. There is an art to figuring to avoid 
exploring every branch of a look-ahead tree out to its tip. trees, people-not computers-seem to excel at this art; it is known that top-level players look ahead relatively little, 
compared to most chess programs – yet the p
eople are far better! In the early days of compute people used to estimate that it would be ten years until a computer (or 
FIG

160

program) was world champion. But after ten years had passed, it seemed that the day a 
computer would become world champion was still more than ten years away 
 This is 
just one more piece of evidence for the rather recursive 
Hofstadter’s Law:
It always takes longer than you expect, even when you take into account Hofstadter’s Law.

160

Now what is the connection between the recursive processes of this Chapter, and the 
recursive sets of the preceding Chapter? The answer involves the notion of a r
ecursively enumerable set. For a set to be r.e. means that it can be generated from a set of starting 
points (axioms), by the repeated application of r
ules of inference. Thus, the set grows and grows, each new element being compounded somehow out of previous elements, in a sort 
of “mathematical snowball”. But this is the essence of recursion-something being defined 
in terms of simpler versions of itself, instead of explicitly. The Fibonacci numbers and 
the Lucas numbers are perfect examples of r.e. sets-snowballing from two elements by a 
recursive rule into infinite sets. It is just a matter of convention to c
all an r.e. set whose complement is also r.e. “recursive”.

160

Recursive enumeration is a process in which new things emerge from old things 
by fixed rules. There seem to be many surprises in such processes-for example the 
unpredictability of the Q-sequence. It might seem that recursively defined sequences of 
that type possess some sort of inherently increasing complexity of behavior, so that the 
further out you go, the less predictable they g
et. This kind of thought carried a little further suggests that suitably complicated recursive systems might be strong enough to 
break out of any predetermined patterns. And isn’t this one of the defining properties of 
intelligence? Instead of just considering programs composed of procedures which can 
recursively call themselves, why not get really sophisticated, and invent programs which 
can modify themselves-programs which can act on programs, extending them, improving 
them, generalizing them, fixing them, and so on? This kind of “tangled recursion” 
probably lies at the heart of intelligence.

161

Achilles: A haiku is a Japanese seventeen-syllable poem-or minipoem rather, which is evocative 
in the same way, perhaps, as a fragrant pet
al is, or a lily pond in a light drizzle. It g
enerally consists of groups of: of five, then seven, then five syllables. Tortoise: Such compressed poems with seventeen syllables can’t much meaning 
 
Achilles: Meaning lies as much in the mind of the reader as i haiku.

162

Achilles: It’s a little strange, for all the letters 
are run together, with no spaces in between. Perhaps it needs decoding in some way? Oh, n
ow I see. If you put the spaces back in where they belong, it says, “ONE WAR TWO EAR EWE”. I can’t quite make head or tail 
of that. Maybe it was a haiku-like poem, of which I ate the majority of syllables. 
Tortoise: In that case, your fortune is now a mere 5/17-haiku. And a curious image it evokes. If 
5/17-haiku is a new art form, then I’d say woe, 0, woe are we 
 May I look at it? 
Achilles (handing the Tortoise the small slip of paper)
: Certainly. Tortoise: Why, when I “decode” it, Achilles, it comes out completely different! It’s not a 5/17-
haiku at all. It is a six-syllable message which says, “0 NEW ART WOE ARE WE”. That 
sounds like an insightful commentary on the new art form of 5/17-haiku. 
Achilles: You’re right. Isn’t it astonishing that the poem contains its own commentary! 
Tortoise: All I did was to shift the reading frame by one unit-that is, shift all the spaces one unit 
t

164

Achilles: Let me see. First it goes down one semitone, from B to A (where B is taken the 
German way); then it rises three semitones to C; and finally it falls one semitone, to H. 
T
hat yields the pattern: -
1, +3, -1. Tortoise: Precisely. What about C-A-G-E, now? 
Achilles: Well, in this case, it begins by falling three semitones, then ten semitones (nearly an 
octave), and finally falls three more semitones. That means the pattern is: 

3, +10, -3. It’s very much like the other one, isn’t it? 
Tortoise: Indeed it is. They have exactly the same “skeleton”, in a certain sense. You can make 
C-A-G-E out of B-A-C-H by multiplying all the intervals by 31/3, and taking the nearest 
whole number.

165

some sort of skeletal code is present in the
grooves, and that the various record players add their own interpretations to that code? 
Tortoise: I don’t know, for sure. The cagey Crab wouldn’t fill me in on the details. But I did get 
t
o hear a third song, when record player B-1 swiveled into place. A
chilles: How did it go? Tortoise: The melody consisted of enormously wide intervals, and we B-C-A-H.
The interval pattern in semitones was: 

10, +33, -10. It can be gotten from the CAGE pattern by yet another multiplication by 3%3, and 
rounding to whole numbers. 
Achilles: Is there a name for this kind of interval multiplication? 
Tortoise: One could call it “intervallic augmentation”. It is similar to tl canonic device of 
temporal augmentation, where all the time values notes in a melody get multiplied by 
some constant. There, the effect just to slow the melody down. Here, the effect is to 
expand the melodic range in a curious way. 
Achilles: Amazing. So all three melodies you tried were intervallic augmentations of one single 
u
nderlying groove-pattern in the record: Tortoise: That’s what I concluded. 
Achilles: I find it curious that when you augment BACH you get CAGE and when you augment 
CAGE over again, you get BACH back, except jumbled up inside, as if BACH 
had an upset stomach after passing through the intermediate stage of CAGE.

166

C
HAPTER VI T
he Lo

166

LAST CHAPTER, WE came upon the question, “When are two things the same?” In this 
Chapter, we will deal with the flip side of that question: “When is one thing not always 
the same?” The issue we are broaching is whether meaning can be said to be inherent in a 
message, or whether meaning is always manufactured by the interaction of a mind or a 
mechanism with a message-as in the preceding Dialogue. In the latter case, meaning 
could not said to be located in any single place, nor could it be said that a message has 
any universal, or objective, meaning, since each observer could bring its own meaning to 
each message. But in the former case, meaning would have both location and 
universality. In this Chapter, I want to present the case for the universality of at least 
some messages, without, to be sure, claiming it for all messages. The idea of an 
“objective meaning” of a message will turn out t
o be related, in an interesting way, to the simplicity with which intelligence can be described.

166

I’ll begin with my favorite example: the relationship between records, music, and record 
players. We feel quite comfortable with the idea that a record contains the same 
information as a piece of music, because of t
he existence of record players, which can “
read” records and convert the groove-patterns into sounds. In other words, there is an isomorphism between groove-patterns and sounds, and the record player is a mechanism 
which physically realizes that isomorphism. It is natural, then, to think o
f the record as an information-bearer, and the record-player as an information-revealer. A second example 
of these notions is given by the pq-system. There, the “information-bearers” are the 
theorems, and the “information-revealer” is the i
nterpretation, which is so transparent that we don’t need any electrical machine to help us extract the information from pq-theorems.

166

One gets the impression from these two examples that isomorphisms and 
decoding mechanisms (i.e., information-revealers) simply reveal information which is 
i
ntrinsically inside the structures, waiting to be “pulled out”. This leads to the idea that for each structure, there are certain pieces of information which can 
be pulled out of it, while there are other pieces of information which cannot 
be pulled out of it. But what does t

167

”pull out” really mean? How hard are you allowed to pull? There are c where by 
investing sufficient effort, you can pull very recondite piece of information out of certain 
structures. In fact, the pulling-out may inv such complicated operations that it makes you 
feel you are putting in n information than you are pulling out.

167

Take the case of the genetic information commonly said to reside in double helix of 
deoxyribonucleic acid (DNA). A molecule of DNA – a genotype-
is converted into a physical organism-a phenotype-by a complex process, involving the manufacture of 
proteins, the replication the DNA,
the replication of cells, the gradual differentiation of cell types and so on. Incidentally, this unrolling of phenotype from genotype epigenesis-is the most tangled of tangled recursions, and in Chapter we shall devote our full attention 
to it. Epigenesis is guided by a se enormously complex cycles of chemical reactions and 
feedback loops the time the full organism has been constructed, there is not even remotest 
similarity between its physical characteristics and its genotype.

167

And yet, it is standard practice to attribute the physical structure of organism to 
the structure of its DNA, and to that alone. The first evidence for this point of view came 
from experiments conducted by Oswald A, in 1946, and overwhelming corroborative 
evidence has since been amassed Avery’s experiments showed that, of all the biological 
molecules, only E transmits hereditary properties. One can modify other molecules it 
organism, such as proteins, but such modifications will not be transmitted to later 
generations. However, when DNA is modified
, all successive generations inherit the modified DNA. Such experiments show that the only
of changing the instructions for building a new organism is to change DNA-and this, in turn, implies that those 
instructions must be cc somehow in the structure of the DNA.

167

Therefore one seems forced into accepting the idea that the DNA’s structure contains the 
information of the phenotype’s structure, which is to the two are isomorphic.
However, the isomorphism is an exotic one, by w] I mean 
that it is highly nontrivial to divide the phenotype and genotype into “parts” which can be mapped onto each other. Prosaic 
isomorphic by contrast, would be ones in whi
ch the parts of one structure are easily mappable onto the parts of the other. An example is the isomorphism between a record 
and a piece of music, where one knows that to any so in the piece there exists an exact 
“image” in the patterns etched into grooves
, and one could pinpoint it arbitrarily accurately, if the need arose Another prosaic isomorphism is that between Gplot and any 
of its internal butterflies.

168

The isomorphism between DNA 
structure and phenotype structure is anything but prosaic, and the mechanism which carries it out physically is awesomely complicated. 
For instance, if you wanted to find some piece of your DNA 
which accounts for the shape of your nose or the shape of your fingerprint, you would have a very hard time. It would 
be a little like trying to pin down the note in a piece of music which is the carrier of the 
emotional meaning of the piece. Of course there is no such note, because the emotional 
meaning is carried on a very high level, b
y large “chunks” of the piece, not by single notes. Incidentally, such “chunks” are not necessarily sets of contiguous notes; there may 
be disconnected sections which, taken together, carry some emotional meaning.

168

Similarly, “genetic meaning”-that is, information about phenotype structure-is 
spread all through the small parts of a molecule of DNA, although nobody understands 
the language yet. (Warning: Understanding this “language” would not at all be the same 
as cracking the Genetic Code, something which took place in the early 1960’s. The 
Genetic Code tells how to translate short portions of DNA into various amino acids. 
Thus, cracking the Genetic Code is comparable 
to figuring out the phonetic values of the letters of a foreign alphabet, without figuring out the grammar of the language or the 
meanings of any of its words. The cracking of the Genetic Code was a vital step on the 
way to extracting the meaning of DNA strands, but it was only t
he first on a long path w

168

The genetic meaning contained in DNA is one of the best possible examples of implicit 
meaning. In order to convert genotype into phenotype, a set of mechanisms far more 
complex than the genotype must operate o
n the genotype. The various parts of the genotype serve as triggers for those mechanisms. A jukeb
ox-the ordinary type, not the Crab type!-provides a useful analogy here: a pair of buttons specifies a very complex 
action to be taken by the mechanism, so that t
he pair of buttons could well be described as “triggering” the song which is played. In 
the process which converts genotype into p
henotype, cellular jukeboxes-if you will pardon the notion!-accept “button-pushings” from short excerpts from a long strand of DNA,
and the “songs” which they play are often prime ingredients in the creation of further
”jukeboxes”. It is as if the output of real jukeboxes, instead of being love ballads, were songs whose lyrics told how to build more 
complex jukeboxes 
 Portions of the DNA trigger the manufacture of proteins; those 
proteins trigger hundreds of new reactions; they 
in turn trigger the replicating-operation which, in several steps, copies the DNA-
and on and on 
 This gives a sense of how recursive the whole process is. The final result of these many-triggered triggerings is the 
phenotype-the individual. And one says that the phenotype is the revelation-the “pulling-out”-of the information that was present in the DNA to start with, latently. (The term 
“r

169

Jacques Monod, one of the deepest and most original of twentieth-century molecular 
b

169

Now no one would say that a song comin
g out of the loudspeaker of jukebox constitutes a “revelation” of information inheren
t in the pair buttons which were pressed, for the pair of buttons seem to be mere triggers, whose purpose is to activate information-bearing portions of the jukebox mechanism. On the other hand, it seems perfectly 
reasonable to call t extraction of music from a record a “revelation” of information 
i
nherent the record, for several reasons: (1) the music does not seem to be concealed in the mechanism of the record player; 
(2) it is possible to match pieces of the in
put (the record) with pieces of the output (the music) to an arbitrary degree of accuracy; 
(3) it is possible to play other records on the same record player and get other 
s
ounds out; (4) the record and the record player are easily separated from one another.

169

It is another question altogether whether the fragments of a smashed 
record contain intrinsic meaning. The edges of the separate p
ieces together and in that way allow the information to be reconstituted-t something much more complex is going on here. Then 
there is the question of the intrinsic meaning of a scrambled telephone call 
 There is a 
vast spectrum of degrees of inherency of meani
ng. It is interesting to try place epigenesis in this spectrum. As development of an organism takes place, can it be said that the 
information is being “pulled out” of its DNA? Is that where all of the information about 
the organism’s structure reside;

169

In one sense, the answer seems to be yes, thanks to experiments li Avery’s. But in another 
sense, the answer seems to be no, because so much of the pulling-out process depends on 
extraordinarily complicated cellular chemical p
rocesses, which are not coded for in the DNA itself. The DNA relies on the fact that they will happen, but does not seem to 
contain a code which brings them about. T
hus we have two conflicting views on the nature of the information in a genotype. One view says that so much of t information is 
outside the DNA that it is not reasonable to look upon the DNA as anything more than a 
very intricate set of triggers, like a sequen
ce of buttons to be pushed on a jukebox; another view says that the information is all there, but in a very implicit form.

169

Now it might seem that these are just two ways of saying the same thing, but that 
is not necessarily so. One view says that the DNA is quite meaningless out of context; the 
other says that even if it were taken out context, a molecule of DNA from a living being 
has such a compe

170

logic to its structure that its message could be 
deduced anyway. To put it as succinctly as possible, one view says that in order for DNA to have meaning, chemical context 
is necessary; the other view says that only intelligence 
is necessary to reveal the “intrinsic meaning” of a strand of DNA.

170

We can get some perspective on this issue by considering a strange hypothetical 
event. A record of David Oistrakh and Lev Oborin playing Bach’s sonata in F Minor for 
violin and clavier is sent up in a satellite. From the satellite it is then launched on a course 
which will carry it outside of the solar system, perhaps out of the entire galaxy just a thin 
plastic platter with a hole in the middle, swi
rling its way through intergalactic space. It has certainly lost its context. How much meaning does it carry?

170

If an alien civilization were to encounter it, they would almost certainly be struck 
by its shape, and would probably be very interested in it. Thus immediately its shape, 
acting as a trigger, has given them some information: that it is an artifact, perhaps an 
information-bearing artifact. This idea-communicated, or triggered, by the record itself-now creates a new context in which the record will henceforth be perceived. The next 
steps in the decoding might take considerably longer-but that is very hard for us to assess. 
We can imagine that if such a record had arrived on earth in Bach’s time, no one would 
have known what to make of it, and very l
ikely it would not have gotten deciphered. But that does not diminish our conviction that the information was in principle there; we just 
know that human knowledge in those times was not 
very sophisticated with respect to the possibilities of storage, transformation, and revelation of information.

170

Nowadays, the idea of decoding is extremely wid
espread; it is a significant part of the activity of astronomers, linguists, archaeologists, military specialists, and so on. It is 
often suggested that we may be floating in a sea of radio messages from other 
civilizations, messages which we do not yet know how to decipher. And much serious 
thought has been given to the techniques of deciphering such a message. One of the main 
problems perhaps the deepest problem-is the q
uestion, “How will we recognize the fact that there is a message at all? How to identify a frame?” The sending of a record seems to 
be a simple solution-its gross physical structu
re is very attention-drawing, and it is at least plausible to us that it would trigger, in any sufficiently great intelligence, the idea of 
looking for information hidden in it. However, 
for technological reasons, sending of solid objects to other star systems seems to be out 
of the question. Still, that does not prevent o

170

Now suppose that an alien civilization h
it upon the idea that the appropriate mechanism for translation of the record is a machine which

171

converts the groove-patterns into sounds. This would still be a far cry from a true 
deciphering. What, indeed, would constitute a successful deciphering of such a record? 
Evidently, the civilization would have to be able to ma sense out of the sounds. Mere 
p
roduction of sounds is in itself hart worthwhile, unless they have the desired triggering effect in the brains that is the word) of t
he alien creatures. And what is that desired effect? would be to activate structures in their brains which create emotional effects in 
them which are analogous to the emotional effects which experience in hearing the piece. 
In fact, the production of sounds 
cot even be bypassed, provided that they used the record in some other way get at the appropriate structures in their brains. (If we humans had a w 
of triggering the appropriate structures in our brains in sequential order, as music does, 
we might be quite content to bypass the sounds-
but it see] extraordinarily unlikely that there is any way to do that, other than via o ears. Deaf composers-Beethoven, Dvofåk, 
Faure-or musicians who can “hear” music by lookin
g at a score, do not give the lie to this assertion, for such abilities are founded upon p
receding decades of direct auditory e

171

Here is where things become very unclear. Will beings of an alien civilization 
have emotions? Will their emotions-supposing they have some-be mappable, in any 
sense, onto ours? If they do have emotions somewhat like ours, do the emotions cluster 
together in somewhat the same way as ours do? Will they understand such amalgams as 
tragic beauty courageous suffering? If it tur
ns out that beings throughout the universe do share cognitive structures with us to the extent that even emotions overlap, then in some 
sense, the record can never be out of its natural context; that context is part of the scheme 
of things, in nature. And if such is the case, then it is likely that a meandering record, if 
not destroyed en route, would eventually get
picked up by a being or group of beings, at g
et decip

171

In asking about the meaning of a molecule of DNA 
above, I used t phrase “compelling inner logic”; and I think this is a k
ey notion. To illustrate this, let us slightly modify our hypothetical record-into-spa event by substituting John Cage’s “Imaginary 
Landscape no. 4” for the Bach. This piece is a classic of aleatoric, or chance, music-music who structure is chosen by various random processes, rather than by an attempt to 
convey a personal emotion. In this case, twenty-four performers attar themselves to the 
twenty-four knobs on twelve radios. For the dura
tion the piece they twiddle their knobs in aleatoric ways so that each radio randomly get
s louder and softer, switching stations all the while. The tot sound produced is the piece of music. Cage’s attitude is expressed in 14 
own words: “to let sounds be themselves, rather than vehicles for man made theories or 
expressions of human sentiments.”

172

Now imagine that this is the piece on the re
cord sent out into space. It would be extraordinarily unlikely-if not downright impossible-for an alien civilization to 
understand the nature of the artifact. They 
would probably be very puzzled by the contradiction between the frame message (“I am a message; decode me”), and the chaos 
of the inner structure. There are few “chunks”
to seize onto in this Cage piece, few patterns which could guide a decipherer. On the other hand, there seems to be, in a Bach 
piece, much to seize onto-patterns, patterns of patterns, and so on. We have no way of 
knowing whether such patterns are universally appealing. We do not know enough about 
the nature of intelligence, emotions, or music
to say whether the inner logic of a piece by Bach is so universally compelling that its meaning could span galaxies.

172

However, whether Bach in particular has
enough inner logic is not the issue here; the issue is whether any message has, per se, enough compelling inner logic that its 
context will be restored automatically w
henever intelligence of a high enough level comes in contact with it. If some message did
have that context-restoring property, then it would seem reasonable to consider the meaning of the message as an inherent property of 
the message.

172

Another illuminating example of these ideas is the decipherment of ancient texts written 
in unknown languages and unknown alphabets. T
he intuition feels that there is information inherent in such texts, whether o
r not we succeed in revealing it. It is as strong a feeling as the belief that there is meaning inherent in a newspaper written in 
Chinese, even if we are completely ignorant of Chinese. Once the scr
ipt or language of a text has been broken, then no one questions where the meaning resides: clearly it resides 
in the text, not in the method of decipherment just as music resides in a record, not inside 
a record player! One of the ways that we identify decoding mechanisms is by the fact that 
they do not add any meaning to the signs or objects which they take as input; they merely 
reveal the intrinsic meaning of those sig
ns or objects. A jukebox is not a decoding mechanism, for it does not reveal any meaning belonging to its input symbols; on the 
contrary, it supplies meaning concealed inside itself.

172

Now the decipherment of an ancient text may have involved decades of labor by 
several rival teams of scholars, drawing on k
nowledge stored in libraries all over the world 
 Doesn’t this process add information, too? Just how intrinsic is the meaning of a 
text, when such mammoth efforts are required in order to find the decoding rules? Has 
one put meaning into the text, or was that me
aning already there? My intuition says that the meaning was always there, and that 
despite the arduousness of the pulling-out process, no meaning was pulled out that wasn’t 
in the text to start with. This intuition comes mainly from one fact: I feel that the r
esult was inevitable; that, had the text not been deciphered by this group at this time, it would have been deciphered by that group 
at that time-and it would have come

173

out the same way. That is why the meaning i
s part of the text itself; it acts upon intelligence in a predictable way. Generally, we can say: meaning is part of an object to 
t
h

173

In Figure 39 is shown the Rosetta stone, one of the most precious of all historic 
discoveries. It was the key to the decipherme
nt of Egyptian hieroglyphics, for it contains parallel text in three ancient scripts: hieroglyphic demotic characters, and Greek. The 
inscription on this basalt stele was firs deciphered in 1821 by Jean Francois Champollion, 
the “father of Egyptology”; it is a decree of priests assembled at Memphis in favor of 
Ptolemy Epiphanes.

174

In these examples of decipherment of out-of-context messages, we can separate 
out fairly clearly three levels of information: (1) the frame message; (2) the outer 
message; (3) the inner message. The one we are most familiar with is (3), the inner 
message; it is the message which is supposed to be transmitted: the emotional 
experiences in music, the phenotype in genetics, the royalty and rites of ancient 
c
ivilizations in tablets, etc. To understand the inner message is to have extracted the meaning intended by the 
s
ender.. The frame message is the message “I am a message; decode me if you can!”; and 
it is implicitly conveyed by the gross structural aspects of any information-bearer. 
To understand the frame message is to recognize the need for a decoding-mechanism. 
If the frame message is recognized as such, t
hen attention is switched to level (2), the outer message. This is information, implicitly carried by symbol-patterns and 
structures in the message, which tells how to decode the inner message. 
To understand the outer message is to build, or know how to build, the correct 
decoding mechanism for the inner message. 
This outer level is perforce an implicit message,
in the sense that the sender cannot ensure that it will be understood. It would be a vain ef
fort to send instructions which tell how to decode the outer message, for they would have to be part of the inner message, which can 
only be understood once the decoding mechanism has been found. For this reason, the 
outer message is necessarily a set of triggers, rather than a message which can be 
r

174

The formulation of these three “layers” 
is only a rather crude beginning at analyzing how meaning is contained in messages. There may be layers and layers of 
outer and inner messages, rather than just 
one of each. Think, for instance, of how intricately tangled are the inner and outer messages of the Rosetta stone. To decode a 
message fully, one would have to reconstruct the entire semantic structure which 
underlay its creation and thus to understand t
he sender in every deep way. Hence one could throw away the inner message, because i
f one truly understood all the finesses of the outer message, the inner message would be reconstructible.

174

The book After Babel, by George Steiner, is a long d
iscussion of the interaction between inner and outer messages (though he never uses that terminology). The tone of 
h
is book is given by this quote: We normally use a shorthand beneath which there lies a wealth of subconscious, 
deliberately concealed or declared assoc

175

cate that they probably equal the sum and uniqueness of our status as an individual person.’ 
Thoughts along the same lines are expressed by Leonard B. Meyer, in h book M
usic, the A
rts, and Ideas: The way of listening to a composition by Elliott Carter is radically different from the way 
of listening appropriate to a work by John Cage. Similarly, a novel by Beckett must in a 
significant sense be read differently from one by Bellow. A painting by Willem de 
Kooning and one by Andy Warhol require different perceptional-cognitive attitudes.’ 
Perhaps works of art are trying to convey their style more than an thing else. In 
that case, if you could ever plumb a style to its very bottom you could dispense with the 
creations in that style. “Style”, “outer message “decoding technique”-all ways of 
expressing the same basic idea.

175

What makes us see a frame message in certain objects, but none in other; Why 
should an alien civilization suspect, if they intercept an errant record that a message lurks 
within? What would make a record any different from a meteorite? Clearly its geometric 
shape is the first clue that “something funny is going on”. The next clue is that, on a more 
microscopic scale, consists of a very long a
periodic sequence of patterns, arranged in a spiral If we were to unwrap the spiral, w
e would have one huge linear sequence (around 2000 feet long) of minuscule symbols. This is not so different from a DNA molecule, 
whose symbols, drawn from a meager “alphabet” of four different chemical bases, are 
arrayed in a one-dimensional sequence, an then coiled up into a helix. Before Avery had 
established the connection between genes and DNA,
the physicist Erwin Schrödinger predicted, o purely theoretical grounds, that genetic information would have to be stored 
in “aperiodic crystals”, in his influential book What Is Life? In fact books themselves are 
aperiodic crystals contained inside neat geometric forms. These examples suggest that, 
where an aperiodic crystal is found “packaged” inside a very regular geometric structure, 
there may lurk a inner message. (I don’t claim this is a complete characterization of frame 
messages; however, it is a fact that many common messages have frame messages of this 
description. See Figure 40 for some good examples.)

175

The three levels are very clear in the case of a message found in a bottle washed up on a 
beach. The first level, the frame message, is 
found when one picks up the bottle and sees that it is sealed, and contains a dry pie
ce c paper. Even without seeing writing, one recognizes this type of artifact an information-bearer, and at this point it would take an 
extraordinary almost inhuman-lack of curiosity, to drop the bottle and not look further.

177

Next, one opens the bottle and examines the marks on the paper. Perhaps, they are in 
Japanese; this can be discovered without any of the inner message being understood-it 
merely comes from a recognition of 1 characters. The outer message can be stated as an 
E
nglish sentence: “I in Japanese.” Once this has been discovered, then one can proceed the inner message, which may be a call for help, a haiku poem, a lover’s lament 


177

It would be of no use to include in the inner message a translation the sentence 
“This message is in Japanese”, since it would take someone who knew Japanese to read 
it. And before reading it, he would have reco
gnize the fact that, as it is in Japanese, he can read it. You might try wriggle out of this by including translations of the statement 
“This mess2 is in Japanese” into many differen
t languages. That would help it practical sense, but in a theoretical sense the same difficulty is there. . English-speaking person 
still has to recognize the “Englishness” of the message; otherwise it does no good. Thus 
one cannot avoid the problem that one has to find out how to decipher the inner message 
from the outside the inner message itself may provide clues and confirmations, but those ; 
at best triggers acting upon the bottle finder (or upon the people whom enlists to help).

177

Similar kinds of problem confront the sh
ortwave radio listener. First he has to decide whether the sounds he hears actually constitute a message or are just static. The 
sounds in themselves do not give the answer, n
ot e% in the unlikely case that the inner message is in the listener’s own native languag
e, and is saying, “These sounds actually constitute a message a are not just static!” If the listener recognizes a frame message in 
the soup then he tries to ident
ify the language the broadcast is in-and clearly, he is still on the outside; he accepts triggers from the radio, but they cam explicitly tell him the 
a
nswer. It is in the nature of outer messages that they are not conveyed in any

177

FIGURE 40. A collage of scripts. Uppermost on the left is an inscription in the un ciphered 
boustrophedonic writing system from Easter Island, 
in which every second lin upside down. The characters are chiseled on a wooden tablet, 4 inc
hes by 35 inches. Mov clockwise, we encounter vertically written Mongolian: above, present-day Mongolian, below, a document dating from 
1314. Then we come to a poem in Bengali by Rabindran Tagore in the bottom righthand corner. 
Next to it is a newspaper headline in Malayalam (
II Kerala, southern India), above which is the elegant curvilinear language Tamil (F Kerala). The s
mallest entry is part of a folk tale in Buginese (Celebes Island, Indonesia). In center of the collage is a paragraph in the Thai 
language, and above it a manuscript in Rn dating from the fourteenth century, containing a 
sample of the provincial law of Scania (so Sweden). Finally, wedged in on the left is a section of 
the laws of Hammurabi, written Assyrian cuneiform. As an outsider, I feel a deep sense of 
mystery as I wonder how meanin cloaked in t
he strange curves and angles of each of these beautiful aperiodic crystals. Info there is content. [From Ham Jensen, Sign, Symbol, and Script 
(New York: G. Putnam’s S. 1969), pp. 89 (cuneiform), 356 (Easter Island), 386, 417 (Mongolian), 
552 (Runic); from Keno Katzner, The Languages of the World (New York: Funk & Wagnalls, 
1
975), pp. 190 (Bengali), (Buginese); from I. A. Richards and Christine Gibson, English Through Pictures (New Y 
Washington Square Press, 1960), pp. 73 (Tamil), 82 (Thai)

178

explicit language. To find an explicit language in which to convey outer messages would 
not be a breakthrough-it would be a contradiction in terms! It is always the listener’s 
burden to understand the outer message. Success lets him break through into the inside, at 
which point the ratio of triggers to explicit meanings shifts drastically towards the latter. 
By comparison with the previous stages, understanding the inner message seems 
effortless. It is as if it just gets pumped in.

178

These examples may appear to be evidence for the viewpoint that no message has 
intrinsic meaning, for in order to understand any inner message, no matter how simple it 
is, one must first understand its frame message and its outer message, both of which are 
carried only by triggers (such as being wri
tten in the Japanese alphabet, or having spiraling grooves, etc.). It begins to seem, then, that one cannot get away from a 
“jukebox” theory of meaning-the doctrine that no message contains inherent meaning, 
because, before any message can be understood, it has to be used as the input to some 
“jukebox”, which means that information contained in the “jukebox” must be added to the 
message before it acquires meaning.

178

This argument is very similar to the trap 
which the Tortoise caught Achilles in, in Lewis Carroll’s Dialogue. There, the trap was the idea that before you can use any rule, 
y
ou have to have a rule which tells you how to use that rule; in other words, there is an infinite hierarchy of levels of rules, which prevents any rule from ever getting used. Here, 
the trap is the idea that before you can understand any message, you have to have a 
message which tells you how to understand that message; in other words, there is an 
infinite hierarchy of levels of messages, which prevents any message from ever getting 
understood. However, we all know that these p
aradoxes are invalid, for rules do get used, and messages do get understood. How come?

178

This happens because our intelligence is not disembodied, but is instantiated in physical 
objects: our brains. Their structure is due t
o the long process of evolution, and their o
perations are governed by the laws of physics. Since they are physical entities, our brains run without being told how to run. So 
it is at the level where thoughts are produced by physical law that Carroll’s rule-paradox breaks down; and likewise, it is at the level 
where a brain interprets incoming data as a message that the message-paradox breaks 
down. It seems that brains come equipped wi
th “hardware” for recognizing that certain things are messages, and for decoding those messages. This minimal inborn ability to 
extract inner meaning is what allows the
highly recursive, snowballing process of language acquisition to take place. The inborn h
ardware is like a jukebox: it supplies the additional information which turns mere triggers into complete messages.

179

Now if different people’s “ju
keboxes” had different “songs” in then responded to given triggers in completely idiosyncratic ways, t
he would have no inclination to attribute intrinsic meaning to those tri; However, human 
brains are so constructed that one brain responds in much the same way to a given tri
gger as does another brain, all other t being equal. This is why a baby can learn any language; it responds to triggers in the same way 
as any other baby. This uniformity of “human jukeboxes” establishes a uniform 
“language” in which frame message outer messages can be communicated. If, 
furthermore, we believe human intelligence is just one example of a general phenomena 
nature-the emergence of intelligent beings in widely varying contexts then presumably 
the “language” in which frame messages and outer sages are communicated among 
humans is a “dialect” of a universal gauge by which intelligences can communicate with 
each other. Thus, would be certain kinds of 
triggers which would have “universal triggering power”, in that all intelligent beings would tend to respond to them i same way 
a

179

This would allow us to shift our description of where meaning located. We could 
ascribe the meanings (frame, outer, and inner) message to the message itself, because of 
the fact that deciphering mechanisms are themselves universal-that is, they are 
fundamental f of nature which arise in the same way in diverse contexts. To make it 
concrete, suppose that “A-5” triggered the same song in all jukeboxes suppose moreover 
that jukeboxes were not man-made artifacts, b
ut w occurring natural objects, like galaxies or carbon atoms. Under such circumstances, we 
would probably feel justified in calling the universal triggering power of “A-5” its “inherent meaning”; also, “A-5” would merit: 
the name of “message”, rather than “trigger”,
and the song would indeed “revelation” of the inherent, though implicit, meaning of “A-5”.

179

This ascribing of meaning to a message comes from the invariance c processing of the 
message by intelligences distributed anywhere ii universe. In that sense, it bears some 
resemblance to the ascribing of to an object. To the ancients, it must have seemed that an 
object’s weight was an intrinsic property of the object. But as gravity became understood, 
it was realized that weight varies with the gravitational field the object is immersed in. 
Nevertheless, there is a related quantity, the mass, which not vary according to the 
gravitational field; and from this invariance the conclusion that an object’s mass was an 
intrinsic property of the object itself. If it turns out that mass is also variable, according to 
context, then will backtrack and revise our opinion that it is an intrinsic property of an 
object. In the same way, we might imagine that there could exist other

180

kinds of “jukeboxes”-intelligences-which communicate among each other via messages 
which we would never recognize as messages, 
and who also would never recognize our messages as messages. If that were the case, then the claim that meaning is an intrinsic 
property of a set of symbols would have to be reconsidered. On the other hand, how 
could we ever realize that such beings existed?

180

It is interesting to compare this argument for the inherency of meaning with a 
parallel argument for the inherency of weight. Suppose one defined an object’s weight as 
“the magnitude of the downward force which t
he object exerts when on the surface of the planet Earth”. Under this definition, the down
ward force which an object exerts when on the surface of Mars would have to be given another name than “weight”. This definition 
makes weight an inherent property, but at the cost of geocentricity” Earth chauvinism”. It 
would be like “Greenwich chauvinism”-refusing to accept local time anywhere on the 
globe but in the GMT time zone. It is an unnatural way to think of time.

180

Perhaps we are unknowingly burdened with a similar chauvinism with respect to 
intelligence, and consequently with respect to meaning. In our chauvinism, we would call 
any being with a brain sufficiently much like our own “intelligent”, and refuse to 
recognize other types of objects as intelligent. To take an extreme example, consider a 
meteorite which, instead of deciphering the outer-space Bach record, punctures it with 
colossal indifference, and continues in its merry o
rbit. It has interacted with the record in a way which we feel disregards the record’s meaning. Therefore, we might well feel 
tempted to call the meteorite “stupid”. But perhaps we would thereby do the meteorite a 
disservice. Perhaps it has a “higher intelligence” which we in our Earth chauvinism 
cannot perceive, and its interaction with the record was a manifestation of that higher 
intelligence. Perhaps, then, the record has a “higher meaning”-totally different from that 
which we attribute to it; perhaps its meaning depends on the type of intelligence 
p

180

It would be nice if we could define intelligence in some other way than “that 
which gets the same meaning out of a sequence of symbols as we do”. For if we can only 
define it this one way, then our argument that meaning is an intrinsic property is circular, 
hence content-free. We should try to formulate in some independent way a set of 
characteristics which deserve the name “in
telligence”. Such characteristics would constitute the uniform core of intelligence, shared by humans. At this point in history we 
do not yet have a well-defined list of those cha
racteristics. However, it appears likely that within the next few decades there will be much progress made in elucidating what human 
intelligence is. In particular, perhaps cogni
tive psychologists, workers in Artificial Intelligence, and neuroscientists will be able to synthesize their understandings, and come 
up with a definition of intelligence. It may still be human-chauvinistic; there is no way 
around that. But to counterbalance that, there may be some elegant and beautiful-and 
perhaps even simple-abstract ways of characterizing the essence of intelligence. This 
would 

181

formulated an anthropocentric concept. And of c
ourse, if contact were established with an alien civilization from another star system, w
e feel supported in our belief that our own type of intelligence is not just a fluke, but an example of a basic form which reappears in 
nature in contexts, like stars and uranium nucl
ei. This in turn would support the idea of meaning being an inherent property.

181

To conclude this topic, let us consider some new and old ex; and discuss the 
degree of inherent meaning which they have, b
y ourselves, to the extent that we can, in t

181

Consider a rectangular plaque made of an indestructible metallic alloy which are 
engraved two dots, one immediately above the
another preceding colon shows a picture. Though the overall form of the might suggest that it is an artifact, a
nd therefore that it might conceal some message, two dots are simpl
y not sufficient to convey anything. (Can before reading on, hypothesize what they are supposed to mean suppose that we made a 
second plaque, containing more dots, as follows.

181

Now one of the most obvious things to do-so it might seer terrestrial intelligence 
a
t least-would be to count the dots in the successive rows. The sequence obtained is: 1
, 1, 2, 3, 5, 8, 13, 21, 34. Here there is evidence of a rule governing the progression from one the next. In fact, the 
recursive part of the definition of the Fib numbers can be inferred, with some confidence, 
from this list. Supp think of the initial pair of values (1,1) as a “genotype” from which the 
“phenotype”-the full Fibonacci sequence-is pulled
out by a recursive rule. By sending the genotype alone-namely the first version plaque-we fail to send the information which 
allows reconstitution phenotype. Thus, the genotype does not contain the full 
sp

182

the phenotype. On the other hand, if we consider 
the second version of the plaque to be the genotype, then there is much better c
ause to suppose that the phenotype could actually be reconstituted. This new version of 
the genotype-a “long genotype”-contains so much information that t
he mechanism by which phenotype is pulled out of genotype can b
e i

182

Once this mechanism is firmly established as the way to pull phenotype from 
genotype, then we can go back to using “sh
ort genotypes”-like the first plaque. For i
nstance, the “short genotype” (1,3) would yield the phenotype 1, 3, 4, 7, 11, 18, 29, 47, .. . 
-the Lucas sequence. And for every set of two initial values-that is, for every short 
g
enotype-there will be a corresponding phenotype. But the short genotypes, unlike the long ones, are only triggers-buttons to b
e pushed on the jukeboxes into which the recursive rule has been built. The long genotypes are informative enough that they 
trigger, in an intelligent being, the recognition o
f what kind of “jukebox” to build. In that sense, the long genotypes contain the informa
tion of the phenotype, whereas the short genotypes do not. In other words, the long genotype transmits not only an inner message, 
but also an outer message, which enables the inner message to be read. It seems that the 
clarity of the outer message resides in the sheer length of the message. This is not 
u
nexpected; it parallels precisely what happens in deciphering ancient texts. Clearly, one’s likelihood of success depends crucially on the amount of text available.

182

But just having a long text may not be enough. Let us take up once more the difference 
between sending a record of Bach’s music into space, and a record of John Cage’s music. 
Incidentally, the latter, being a Composition of Aleatorically Generated Elements, might 
be handily called a “CAGE”, whereas the former, being a Beautiful Aperiodic Crystal of 
Harmony, might aptly be dubbed a “BACH”. Now let’s consider what the meaning of a 
Cage piece is to ourselves. A Cage piece has to 
be taken in a large cultural setting-as a revolt against certain kinds of traditions. Thus, if we want to transmit that meaning, we 
must not only send the notes of the piece, but we must have earlier communicated an 
extensive history of Western culture. It is fair t
o say, then, that an isolated record of John Cage’s music does not have an intrinsic meaning. However, for a listener who is 
sufficiently well versed in Western and Eastern
cultures, particularly in the trends in Western music over the last few decades, it does carry meaning-but such a listener is like 
a jukebox, and the piece is like a pair of buttons. The meaning is mostly contained inside 
the listener to begin with; the music serves o
nly to trigger it. And this “jukebox”, unlike pure intelligence, is not at all univer
sal; it is highly earthbound, depending on i

183

quences of events all over our globe for long period of time. Hoping that John Cage’s 
music will be understood by another civilizatio
n is like hoping that your favorite tune, on a jukebox on the moon, will have the same buttons as in a saloon in Saskatoon.

183

On the other hand, to appreciate Bach requires far less cultural k edge. This may 
seem like high irony, for Bach is so much more 
con and organized, and Cage is so devoid of intellectuality. But there strange reversal h
ere: intelligence loves patterns and balks at randomness For most people, the randomness in Cage’s music requires much explanation; 
and even after explanations, they may feel they are missing the message-whereas with 
much of Bach, words are superfluous. In sense, Bach’s music is more self-contained than 
Cage’s music. Still, it is clear how much of the human condition is presumed by Bach.

183

For instance, music has three major dimensions of structure (me harmony, 
rhythm), each of which can be further divided into small intermediate, and overall 
aspects. Now in each of these dimensions, there is a certain amount of complexity which 
our minds can handle before boggling; clearly a composer takes this into account, mostly 
unconsciously when writing a piece. These “levels of tolerable complexity” along 
different dimensions are probably very dependent on the peculiar conditions of our 
evolution as a species, and another intelligent species might have developed music with 
totally different levels of tolerable complexity along these many dimensions. Thus a Bach 
piece might conceivably have to be accompanied, by a lot of information about the 
human species, which simply could not inferred from the music’s structure alone. If we 
equate the Bach music a genotype, and the emo
tions which it is supposed to evoke with t
he phenotype, then what we are interested in is whether the genotype con all the information necessary for the revelation of the phenotype.

183

The general question which we are facing, and which is very similar t questions inspired 
by the two plaques, is this: “How much of t
he co necessary for its own understanding is a message capable of restoring? can now revert to the original biological meanings of 
“genotype” “phenotype”-DNA and a living organism-and ask similar quest Does DNA 
have universal triggering power? Or does it need a “biojukebox” to reveal its meaning? 
Can DNA evoke a phenotype without being embedded in the proper chemical context? 
To this question to answer is no-but a qualified no. Certainly a molecule of DNA in a 
vacuum will not create anything at all. However, if a molecule of DNA 
were set to seek its fortune in the universe, as we imagined the BACH and the CAGE were, it might be 
intercepted by an intelligent civilization. They might first of all recognize its frame 
message. Given that, they might to try to deduce from its chemical structure what kind of 
chemical environment it seemed to want, and then supply such an environment. Succes-

184

sively more refined attempts along these lines might eventually lead to a full restoration 
of the chemical context necessary for the revelation of DNA’s phenotypical meaning. 
This may sound a little implausible, but if one allows many millions of years for the 
experiment, perhaps the DNA’s meaning would finally emerge.

184

On the other hand, if the sequence of bases which compose a strand of DNA 
were sent as abstract symbols (as in Fig. 41), not as a long helical molecule, the odds are 
virtually nil that this, as an outer message, would trigger the proper decoding mechanism 
which would enable the phenotype to be drawn
out of the genotype. This would be a case of wrapping an inner message in such an abstract outer message that the context-restoring 
power of the outer message would be lost, and so in a very pragmatic sense, the set of 
symbols would have no intrinsic meaning. L
est you think this all sounds hopelessly abstract and philosophical, consider that the exact moment when phenotype can be said to 
be “available”, or “implied”, by genotype, is a h
ighly charged issue in our day: it is the i

184

F
IGURE 41. This Giant Aperiodic Crystal is the base sequence for the chromosome of bacteriophage OX174. It is the first complete 
genome ever mapped out for any organism. About 2,000 of these boustrophedonic pages would b
e needed to show the base sequence of a single E. Coli cell, and about one millio
n pages to show the base sequence of the DNA of a single human cell. The book now in 
your hands contains roughly the same amou

189

C
HAPTER VII T
he Pro

189

THE PRECEDING DIALOGUE is reminiscent of the Two-Part Invention 
by Lewis Carroll. In both, the Tortoise refuses to use normal, ordinary in the normal, ordinary way-or at least he refuses to do so when it is his
advantage to do so. A way to think about the Carroll paradox was given last Chapter. In this Chapter we are going to make symbols dc 
Achilles couldn’t make the Tortoise do with his words. That is, we are to make a formal 
system one of whose symbols will do just what A wished the word and' would do, when  spoken by the Tortoise, and ail of whose symbols will behave the way the words ' if... then . . .' ought to behave. There are only two other words which we will attempt to deal with  or’ and `not’. Reasoning which depends only on correct usage of these words is termed 
propositional reasoning.

189

I will present this new formal system, called
the Propositional Calculus, like a puzzle, not explaining everything at once, but letting you things out to some extent. We begin with 
the list of symbols: 

< > P Q R ÂŽ
∧ ∹ ⊃ 
~ The first rule of this system that I will reveal is the following: 
RULE OF JOINING: If x and y are theorems of the system, then so is the string < x∧y

. This rule takes two theorems and combines them into one. It s remind you of the 
D

189

There will be several other rules of inference, and they will all be pres shortly-but first, it 
is important to define a subset of all strings, namely the

190

well formed strings. They will be defined in a recursive way. We begin with the 
ATOMS: P, Q, and R are called atoms.. New atoms are formed by appending primes 
onto the right of old atoms-thus, R’, Q”, P”’, etc. This gives an endless supply of atoms. 
All atoms are well-formed.

190

T
hen we have four recursive FORMATION RULES: If x and y are well-forme
d, then the following four strings are also well-formed: 
(1) ~x 
(2) < x∧y> 
(3) < x√y> 
(4) < x⊃y

For example, all of the following are well-formed: 

P atom 
P b
y (1) ~~P b
y (1) QŽ atom 
Q1 b
y (1) <P∧
Q’ > b
y (2) <P∧Q’ > b
y (1) ~~<P⊃
Q’ > b
y (4) <<P∧Q’ >∹~~<P⊃~Q’ >> by (3) 
The last one may look quite formidable, but it is built up straightforwardly from two 
components-namely the two lines just above it. Each of them is in turn built up from 
previous lines 
 and so on. Every well-formed str
ing can in this way be traced back to its elementary constituents-that is, atoms. You simply run the formation rules backwards 
until you can no more. This process is guaranteed to terminate, since each formation rule 
(when run forwards) is a lengthening 
rule, so that running it backwards always drives you towards atoms.

190

This method of decomposing strings thus serves as a check on the well-formedness of any string. It is a top-down decision procedure for wellformedness. You 
can test your understanding of this decis
ion procedure by checking which of the following strings are well-formed: 
(

  1. (2) (2) <~P> 

(3) <P∧Q∧R

(4) <P∧Q
(5) <<P∧Q>∧Q~∧P

(6) <P∧~
P> (7) <<P∹<Q⊃R>>∧<P∹
RŽ>> (8) <P∧Q>∧< Q∧P

191

(Answer: Those whose numbers are Fibonacci numbers are not formed. The rest are well-formed.

191

Now we come to the rest of the rules by which theorems of this system constructed. A 
few rules of inference follow. In all of them, the symbols ®x® and ‘y’ are always to be 
understood as restricted to well formed 
strings RULE OF SEPARATION: If < x∧y> is a theorem, then both x and theorems. 
Incidentally, you should have a pretty good guess by now as to concept the symbol `A’ 
stands for. (Hint: it is the troublesome word the preceding Dialogue.) From the following 
rule, you should be a figure out what concept the tilde (’~’) represents: 
DOUBLE-TILDE RULE: The string ’~~’ can be deleted from any theorem. It can also be 
inserted into any theorem, provided that the rest string is itself well-formed.

191

Now a special feature of this system is that it has no axioms-
only rule you think back to the previous formal systems we’ve seen, you may w( how there can be any theorems, 
then. How does everything get started? The ans
wer is that there is one rule which manufactures theorems from out of thin air-it doesn’t need an “old theorem” as input. 
(The rest of the do require input.) This special rule is called the fantasy rule.
The reason I call it that is quite simple. 
To use the fantasy rule, the first thing you do is to write down an well-formed 
string x you like, and then “fantasize” by asking, “What if string x were an axiom, or a 
theorem?” And then, you let the system give an answer. That is, you go ahead and make a 
derivation with x ; opening line; let us suppose y i
s the last line. (Of course the derivation must strictly follow the rules of the system.) Everything from x to y (inclusive) is the 
fantasy; x is the premise of the fantasy, and y is its outcome. The next step is to jump out 
of the fantasy, having learned from it that out. 
If x were a theorem, y would be a theorem. 
Still, you might wonder, where is the real theorem? The real theorem is the string 
<x⊃y

Notice the resemblance of this string to the sentence printed above

191

To signal the entry into, and emergence from, a fantasy, one uses the

192

square brackets `[’ and ’]’, respectively. Thus,
whenever you see a left square bracket, you know you are “pushing” into a fantasy, and the next line will contain the fantasy’s 
premise. Whenever you see a right square bracket, you know you are “popping” back out, 
and the preceding line was the outcome. It is helpful (though not necessary) to indent 
t
hose li

192

Here is an illustration of the fantasy rul
e, in which the string P is taken as a premise. (It so happens that P is not a theorem, but that is of no import; we are merely 
inquiring, “What if it were?“) We make the following fantasy: 
[ push into fantasy 
P premise 

] pop out of fantasy   
T  
he fantasy shows that: If P were a theorem, so would ~~P   
be one. We now "squeeze" this sentence of English (the metalanguage) into the formal   
notation (the object language): <P⊃~~P>. This, our first theorem of the Propositional   
Calculus, should reveal to you the intended interpretation of the symbol `⊃'.   
H  
ere is another derivation using the fantasy rule: [    
push <P∧Q> premise   
P    
separation Q    
separation <Q∧P>    
joining ]    
pop <<P∧Q>⊃<Q∧P>> fantasy rule   
It is important to understand that only the last line is a genuine theorem, here-everything   
e
 
192
 
As you might guess from the recursion termin  
ology "push" and "pop", the fantasy rule can be used recursively-thus, there can be fantasies within fantasies, thrice-nested   
fantasies, and so on. This means that there are a  
ll sorts of "levels of reality", just as in nested stories or movies. When you pop out of a movie-within-a-movie, you feel for a   
moment as if you had reached the real world, though you are still one level away from the   
top. Similarly, when you pop out o  
f a fantasy-within-a-fantasy, you are in a "realer" world than you had been, but you are still one level away from the top.
 
192
 
Now a "No Smoking" sign inside a movie theater does not apply to the
 
193
 
characters in the movie-there is no carry-over from the real world in fantasy world, in   
movies. But in the Propositional Calculus, then carry-over from the real world into the   
fantasies; there is even carry from a fantasy to fantasies inside it. This is formalized by   
the following rule:   
CARRY-OVER RULE: Inside a fantasy, any theorem from the "reality level higher can   
b  
e brought in and used. It is as if a "No Smoking" sign in a theater applied not only to a moviegoers, but also to   
all the actors in the movie, and, by repetition of the same idea, to anyone inside multiply   
nested movies! (Warning: There carry-over in the reverse direction: theorems inside   
fantasies cannot be exported to the exterior! If it weren't for this fact, you could write any   
as the first line of a fantasy, and then lift it out into the real world as a theorem.)
 
193
 
To show how carry-over works, and to sh  
ow how the fantasy rule can be used r  
ecursively, we present the following derivation: [    
push P premise of outer fantasy   
[ push again   
Q premise of inner fantasy   
P carry-over of P into inner fantasy   
<P∧Q>    
joining ] pop out of inner fantasy  
, regain outer fantasy <Q⊃<P∧Q>> fantasy rule   
] pop out of outer fantasy, reach real world!   
<P⊃<Q⊃<P∧Q>>> fantasy rule   
Note that I've indented the outer fantasy  
once, and the inner fantasy twice, to emphasize the nature of these neste  
d "levels of reality". One to look at the fantasy rule is to say that an observation made about the system is inserted into the system. Namely, the   
theorem < x⊃y> which gets produced can be thought o  
f as a representation inside the system of the statement about the system "If x is a theorem, then y is too". To be specific,   
the intended interpretation for <P⊃Q> is "if P, then Q equivalently, "P implies Q"
 
193
 
Now Lewis Carroll's Dialogue was all about "if-then" statements. In particular, Achilles   
had a lot of trouble in persuading the Tortoise   
to accept the second clause of an "if-then" statement, even when the "if-then" state itself  
was accepted, as well as its first clause. The next rule allows y infer the second "clause" of a'⊃'-string, provided that the `⊃'-string it a   
theorem, and that its first "clause" is also a theorem.
 
194
 
RULE OF DETACHMENT: If x and < x⊃y> are both theorems, then y is a theorem.   
I  
ncidentally, this rule is often called "Modus Ponens", and the fantasy rule is often called the "Deduction Theorem".
 
194
 
We might as well let the cat out of the bag at this point, and reveal the "meanings" of the   
rest of the symbols of our new system. In case it is not yet apparent, the symbol `A' is   
meant to be acting isomorphically to the normal, everyday word `and'. The symbol '-'   
represents the word 'not'-it is a formal sort of negation. The angle brackets '<' and `>' are   
groupers-their function being very similar to that of parentheses in ordinary algebra. The   
main difference is that in algebra, you have the freedom to insert parentheses or to leave   
them out, according to taste and style, whereas in a formal system, such anarchic freedom   
is not tolerated. The symbol '√' represents the word `or' ('vel' is a Latin word for `or'). The   
`or' that is meant is the so-called inclusive `or', which means that the interpretation of   
<x√y
 
194
 
The only symbols we have not interpreted are the atoms. An atom has no single   
interpretation-it may be interpreted by any sentence of English (it must continue to be   
interpreted by the same sentence if it occurs multiply within a string or derivation). Thus,   
for example, the well-formed string <P∧~P> could be interpreted by the compound   
s  
entence This mind is Buddha, and this mind is not Buddha.   
Now let us look at each of the theorems so far derived, and interpret them. The first one   
was <P⊃~~P>. If we keep the same interpretation for P,  
we have the following i  
nterpretation: If this mind is Buddha,   
then it is not the case that this mind is not Buddha.   
Note how I rendered the double negation. It is   
awkward to repeat a negation in any natural language, so one gets around it by usi  
ng two different ways of expressing negation. The second theorem we derived was <<P∧Q>⊃<Q∧P>>. If we let Q   
be interpreted by the sentence "This flax weighs three pounds", then our theorem reads as   
follows:   
If this mind is Buddha and this flax weighs three pounds,   
then this flax weighs three pounds and this mind is Buddha.
 
194
 
The third theorem was <P⊃<Q⊃<P∧Q>>>. This one goes into the following nested "if-t
 
195
 
If this mind is Buddha,   
t  
hen, if this flax weighs three pounds, then this mind is Buddha and this flax weighs three pounds.   
You probably have noticed that each theorem, when interpreted, something   
absolutely trivial and self-evident. (Sometime  
s they are so s evident that they sound vacuous and-paradoxically enough-confusing or even wrong!) This may not be very   
impressive, but just remember there are plenty of falsities out there which could have   
been produced they weren't. This system-the Propositional Calculus-steps neatly ft truth   
to truth, carefully avoiding all falsities, just a  
s a person who is concerned with staying dry will step carefully from one stepping-stone c  
reek to the next, following the layout of stepping-stones no matter I twisted and tricky it might be. What is impressive is that-in   
the Propositional Calculus-the whole thing is done purely typographically.  
There is nobody down "in there", thinking about the meaning   
of the strings. It i! done mechanically, thoughtlessly, rigidly, even stupidly.
 
195
 
We have not yet stated all the rules of the Propositional Calculus. The complete set of   
r  
ules is listed below, including the three new ones. JOINING RULE: If x and y are theorems, then < x∧y> is a theorem.   
SEPARATION RULE: If < x∧y> is a theorem, then both x and y are theorems.   
DOUBLE-TILDE RULE: The string '~~' can be deleted from any theorem can also be   
inserted into any theorem, provided that the result string is itself well-formed.   
FANTASY RULE: If y can be derived when x is assumed to be a theorem then < x⊃y  
> is a theorem.   
CARRY-OVER RULE: Inside a fantasy, any theorem from the "reality" c level higher   
c  
an be brought in and used. RULE OF DETACHMENT: If x and < x⊃y> are both theorems, then y is a theorem.   
CONTRAPOSITIVE RULE: <x⊃y> and <~y⊃~  
x> are interchangeable DE MORGAN'S RULE: <~x∧~y> and ~< x√y  
> are interchangeable. SWITCHEROO RULE: <x∹y> and <~x⊃y
 
195
 
(The Switcheroo rule is named after Q. q. Swi  
tcheroo, an Albanian railroad engineer who worked in logic on the siding.) By "interchan  
geable" in foregoing rules, the following is meant: If an expression of one form occurs as either a theorem or part of a theorem, the   
other form may be
 
196
 
substituted, and the resulting string will also be a theorem. It must be kept in mind that   
the symbols ‘x’ and ‘y’ always stand for well-formed strings of the system.
 
196
 
Before we see these rules used inside derivations, let us look at some very short   
justifications for them. You can probably justify them to yourself better than my   
examples – which is why I only give a couple.   
The contrapositive rule expresses explicitly a way of turning around conditional   
statements which we carry out unconsciously. For instance, the “Zentence”   
If you are studying it, then you are far from the Way   
Means the same thing as   
If you are close to the Way, then you are not studying it.
 
196
 
De Morgan’s rule can be illustrated by our familiar sentence “The flag is not   
moving and the wind is not moving”. If P symbolizes “the flag is not moving”, and Q  
symbolizes “the wind is moving”, then the compound sentence is symbolized by   
<~P∧~Q>, which, according to Morgan’s law, is interchangeable with ~<P∹Q>.   
whose interpretation would be “It is not true that either the flag or the wind is moving”. And no   
o  
ne
 
196
 
For the Switrcheroo rule, consider the  
sentence “Either a cloud is hanging over the mountain, or the moonlight is penetrating the waves of the lake,” which might be   
spoken, I suppose, by a wistful Zen master remembering a familiar lake which he can   
visualize mentally but cannot see. Now hang on t  
o your seat, for the Swircheroo rule tells us that this is interchangeable with t  
he thought “If a cloud is not hanging over the mountain, then the moonlight is penetrating the waves of the lake.” This may not be   
enlightenment, but it is the best the Propositional Calculus has to offer.
 
196
 
Now, let us apply these rules to a previous theorem, ands see what we get: For instance,   
take the theorem <P⊃~~P>:  
<P⊃~~P>: old theorem   
<~~~P⊃~P>: contrapositive   
<~P⊃~P> d  
ouble-tilde <P√~P>    
switcheroo This new theorem, when interpreted, says:
 
197
 
Either this mind is Buddha, or this mind is not Buddha   
Once again, the interpreted theorem, though perhaps less than mind boggling, is at least   
t
 
197
 
It is natural, when one reads theorems of t  
he Propositional Calculus out loud, to interpret everything but the atoms. I call this semi-interpreting. For example, the semi-interpretation of <P√~P>:  
: would be P or not P.  
Despite the fact that P is not a sentence, the above semisentence still sounds true, because   
you can very easily imagine sticking any sentence in for P – and the form of the semi-interpreted theorem assures you that however you make your choice, the resulting   
sentence will be true. And that is the key i  
dea of the Propositional Calculus: it produces theorems which, when semi-interpreted, are seen to be “universally true semisaentences”,   
by which is meant that no matter how you complet  
e the interpretation, the final result will be a true statement.
 
197
 
Now we can do a more advanced exercise, based on a Zen koan called “Ganto’s Ax”.   
H  
ere is how it began. One day Tokusan told his student Ganto, “I have two monks who have been here   
for many years. Go and examine them.” Ganto  
picked up an ax and went to the hut where the two monks were meditating. He r  
aised the ax, saying “If you say a word, I will cut off your heads; and if you do not   
say a word, I will also cut off your heads.”1  
If you say a word I will cut off this koan, and i  
f you do not say a word, I will also cut off this koan – because I want you to translate some of it into our notation. Let us symbolize   
“you say a word” by P and “I will cut off your heads” by Q.  
Then Ganto’s ax threat is symbolized by the string <<P⊃Q>∧<~`P⊃Q>>. What if this ax threat were an axiom?   
H  
ere is a fantasy to answer that question. (1) [ push   
(2) <<P⊃Q>∧<~`P⊃Q>>. Ganto’s axiom   
(3) <P⊃Q> s  
eparation (4) <~Q⊃~P>. contrapositive   
(5) <~P⊃Q> s  
eparation (6) <~Q⊃~~P>. contrapositive   
(7) ] push again   
(8) ~Q premise
 
198
 
(9) <~Q⊃~P>. carry-over of line 4   
(10) ~P detachment  
(11) <~Q⊃~~P>. carry-over of line 6   
(12) ~~P detachment (lines 8 and 11)  
(13) <~P∧~~P> j  
oining (14) <~P√~~P> D  
e Morgan (15) ] pop once   
(16) <~Q⊃~<P∹~P>>. fantasy rule   
(17) <~P∹~P>⊃Q>. contrapositive   
(18) [ push   
(19) . ~P premise (also outcome)   
(20) ] pop   
(21) <~P⊃~P>. fantasy rule   
(22) <P√~P>. switcheroo   
(23) Q detachment (lines 22 and 17)   
(24) ] pop out
 
198
 
The power of the Propositional Calculus is shown in this example. Why, in but two dozen   
steps, we have deduced Q: that the heads will be cut off! (Ominously, the rule last   
invoked was "detachment" ...) It might seem su  
perfluous to continue the koan now, since we know what must ensue ... However, I shall drop my resolve to cut the koan off; it is a   
t  
rue Zen koan, after all. The rest of the incident is here related: Both monks continued their meditation as   
if he had not spoken. Ganto dropped the a  
x and said, "You are true Zen students." He returned to Tokusan and related the incident. "I see your side well," Tokusan agreed, "but tell me, how is their side?"   
"TÔzan may admit them," replied Ganto, "but they should not be admitted under   
T  
okusan."2 Do you see my side well? How is the Zen side?
 
198
 
The Propositional Calculus gives us a set of rules for producing statements which would   
be true in all conceivable worlds. That is why all of its theorems sound so simple-minded;   
it seems that they have absolutely no co  
ntent! Looked at this way, the Propositional Calculus might seem to be a waste of time, since what it tells us is absolutely trivial. On   
the other hand, it does it by specifying the form of statements that are universally true,   
and this throws a new kind of l  
ight onto the core truths of the universe: they are not only fundamental, but also regular:  
they can be produced by one set of typographical rules. To put it another way, they are all "cut from the same cloth". You might consider whether   
the same could be said about Zen koans: c  
ould they all be produced by one set of typographical rules?
 
198
 
It is quite relevant here to bring up the q  
uestion of a decision procedure. That is, does there exist any mechanical method to tell nontheorems from theorems? If so, that   
would tell us that the set of theorems of the
 
199
 
Propositional Calculus is not only r.e., but also r  
ecursive. It turns out that there is an interesting decision procedure-the method of truth  
u would take us a bit afield to present it here; you can find it in almost any standard book on logic. And what about Zen koans?   
Could there conceivably be a mechanical de  
cision procedure which distinguishes genuine Zen koans from other things?
 
199
 
Up till now, we have only presumed that all theorems, when interpreted as indicated, are   
true statements. But do we know that that is the case' we p  
rove it to be? This is just another way of asking whether the intended interpretations ('and' for `∧', etc.) merit being   
called the "passive meanings” of the symbols. One can look at this issue from two very   
different points of view, which might be called the "prudent" and "imprudent" points I   
will now present those two sides as I see them, personifying their as "Prudence" and   
"Imprudence".
 
199
 
Prudence: We will only KNOW that all theorems come out true un intended   
interpretation if we manage to PROVE it. That is the c: thoughtful way to proceed.   
Imprudence: On the contrary. It is OBVIOUS that all theorems will come out true. If you   
doubt me, look again at the rules of the system. You will find that each rule makes a   
symbol act exactly as the word it represents   
ought to be used. For instance, the joining rule makes the symbol ‘∧’ act as `and' ought to act; the rule of detachment makes `⊃'  
act as it ought to, if it is to stand for 'implies', or 'if-then'; and so on. Unless you are   
like the Tortoise, you will recognize in each rul  
e a codification of a pattern you use in your own thought patterns. So if you trust  
your own thought patterns, then you HAVE to believe that all theorems come out true! That's the way I see it. I don't need any   
further proof. If you think that some theorem comes out false, then presumably you   
think that some rule must be wrong. Show me which one.   
Prudence: I'm not sure that there is any faulty rule, so I can't point one out to you. Still, I   
can imagine the following kind of scenario. You, following the rules, come up with a   
theorem -- say x. Meanwhile I, also following the rules, come up with another   
theorem-it happens to be ~x. Can't you force yourself to conceive of that?   
Imprudence: All right; let's suppose it happened. Why would it bother you? Or let me put   
it another way. Suppose that in playing with the MIU-system, I came up with a   
theorem x, and you came up with xU Can you force yourself to conceive of that?   
Prudence: Of course-in fact both MI and MIU are theorems.   
Imprudence: Doesn't that bother you?   
Prudence: Of course not. Your example is ridiculous, because MI and MIU   
are not CONTRADICTORY, whereas two strings x a  
nd ~x in the Propositional Calculus A
 
200
 
Imprudence: Well, yes -- provided you wish to interpret `~' as `not'. But what would lead   
you to think that '~' should be interpreted as `not'?   
Prudence: The rules themselves. When you look at them, you realize that the only   
conceivable interpretation for '~' is 'not'-and likewise, the only conceivable   
interpretation for `∧' is `and', etc.   
Imprudence: In other words, you are convinced that the rules capture the meanings of   
those words?   
P  
rudence: Precisely. Imprudence: And yet you are still willing to entertain the thought that both x and ~x   
could be theorems? Why not also entertain t  
he notion that hedgehogs are frogs, or that 1 equals 2, or that the moon is made of green cheese? I for one am not prepared even   
to consider whether such basic ingredients of my thought processes are wrong --   
because if I entertained that notion, then I would also have to consider whether my   
modes of analyzing the entire question are a  
lso wrong, and I would wind up in a total t  
angle. Prudence: Your arguments are forceful ... Yet  
I would still like to see a PROOF that all theorems come out true, or that x and ~x can never both be theorems.   
Imprudence: You want a proof. I guess that means that you want to be more convinced   
that the Propositional Calculus is consist  
ent than you are convinced of your own sanity. Any proof I could think of would involve mental operations of a greater   
complexity than anything in the Propositional Calculus itself. So what would it prove?   
Your desire for a proof of consistency of the Propositional Calculus makes me think   
of someone who is learning English and ins  
ists on being given a dictionary which definers all the simple words in terms of complicated ones...
 
200
 
This little debate shows the difficulty of trying to use logic and reasoning to defend   
themselves. At some point, you reach rock bottom, and there is no defense except loudly   
shouting, "I know I'm right!" Once again, we   
are up against the issue which Lewis Carroll so sharply set forth in his Dialogue: you can't go on defending your patterns of   
reasoning forever. There comes a point where faith takes over.
 
200
 
A system of reasoning can be compared to an egg. An egg has a shell which   
protects its insides. If you want to ship an egg somewhere, though, you don't rely on the   
shell. You pack the egg in some sort of   
container, chosen according to how rough you expect the egg's voyage to be. To be extra careful, you may put the egg inside several   
nested boxes. However, no matter how many l  
ayers of boxes you pack your egg in, you can imagine some cataclysm which could break the egg. But that doesn't mean that you'll   
never risk transporting your egg. Similarly, one can never give an ultimate, absolute   
proof that a proof in some system is correct. Of course,
 
201
 
one can give a proof of a proof, or a proof of a proof of a pro  
of – but the validity of the outermost system always remains an unproven assumption, accepted on faith. One can   
always imagine that some unsuspected subtlety will invalidate every single level of proof   
down to the bottom, and tI "proven" result will be seen not to be correct after all. But that   
doesn’t mean that mathematicians and logicians are constantly worrying that the whole   
edifice of mathematics might be wro  
ng. On the other hand, unorthodox proofs are proposed, or extremely lengthy proofs, or proofs generated by computers, then people do   
stop to think a bit about what they really mean by that quasi-sacred word "proven"
 
201
 
An excellent exercise for you at this poin  
t would be to go back Carroll Dialogue, and code the various stages of the debate int  
o our notation -- beginning with the original b  
one of contention: Achilles: If you have <<A∧B>⊃Z>, and you also have <A∧B>,   
then surely you have Z. Tortoise: Oh! You mean: <<<<A∧B>⊃Z>∧<A∧B>>⊃Z>, : don't you?   
(Hint: Whatever Achilles considers a rule of inference, the Tortoise immediately flattens   
into a mere string of the system. If you use or letters A, B, and Z,  
you will get a recursive p
 
201
 
When carrying out derivations in the Propositio  
nal Calculus, one quickly invents various types of shortcut, which are not strictly part of the system For instance, if the string   
<Q√~Q> were needed at some point, and <P√~P> had been derived earlier, many people   
would proceed as if <Q√~Q> had been derived, since they know   
that its derivation is an exact parallel to that of <P√~P>. The derived theorem is treated as a "theorem schema" --   
a mold for other theorems. This turns out to be   
a perfect valid procedure, in that it always leads you to new theorems, but it is not a r  
ule of the Propositional Calculus as we presented it. It is, rather, a derived rule, It is part of the knowledge which we have a  
bout the system. That this rule keeps you within the space of theorems needs proof, of course-but such a proof is not like a derivation inside the system. It is a proof in the ordinary,   
intuitive sense -- a chain of reasoning carried out in the I-mode. The theory about the   
Propositional Calculus is a "metatheory", and results in it can be called "metatheorems" -   
Theorems about theorems. (Incidentally, note  
the peculiar capitalization in the phrase "Theorems about theorems". It is a consequence of our convention: metatheorems are   
Theorems (proven results) concerning theorems (derivable strings).)
 
201
 
In the Propositional Calculus, one could discover many metatheorems, or derived   
rules of inference. For instance, there is a De Morgan's Rule:
 
202
 
<~x√~y> and ~<x∧y  
> are interchangeable. If this were a rule of the system, it could speed up many derivations considerably. But if   
we prove that it is correct, isn't that good enough? Can't we use it just like a rule of   
inference, from then on?   
T  
here is no reason to doubt the correctness of this particular derived rule. But once you start admitting derived rules as par  
t of your procedure in the Propositional Calculus, you have lost the formality of the system, since derived rules are derived   
informally-outside the system. Now formal systems were proposed as a way to exhibit   
every step of a proof explicitly, within one single, rigid framework, so that any   
mathematician could check another's work me  
chanically. But if you are willing to step outside of that framework at the drop of a hat, you might as well never have created it at   
a
 
202
 
On the other hand, there is an alternative way out. Why not formalize the metatheory,   
too? That way, derived rules (metatheorems) would be theorems of a larger formal   
system, and it would be legitimate to look for shortcuts and derive them as theorems-that   
is, theorems of the formalized metatheory-wh  
ich could then be used to speed up the derivations of theorems of the Propositional Calculus. This is an interesting idea, but as   
soon as it is suggested, one jumps ahead to think of metametatheories, and so on. It is   
clear that no matter how many levels you formalize, someone will eventually want to   
make shortcuts in the top level.
 
202
 
It might even be suggested that a theory o  
f reasoning could be identical to its own metatheory, if it were worked out carefully. Then, it might seem, all levels would   
collapse into one, and thinking about the system would be just o  
ne way of working in the system! But it is not that easy. Even if a system can "think about itself", it still is not   
outside itself. You, outside the system, perceive it differently from the way it perceives   
itself. So there still is a metatheory-a view from outside-even for a theory which can   
"think about itself" inside itself. We will fin  
d that there are theories which can "think about themselves". In fact, we will soon see a system in which this happens completely   
accidentally, without our even intending it! An  
d we will see what kinds of effects this produces. But for our study of the Propositional Calculus, we will stick with the simplest   
ideas-no mixing of levels.
 
202
 
Fallacies can result if you fail to distin  
guish carefully between working in the system (the M-mode) and thinking about the system (the I-mode). For example, it might   
seem perfectly reasonable to assume that, since <P√~P> (whose semi-interpretation is   
"either P or not P") is a theorem, either P or ~P must be a theorem. But this is dead   
wrong: neither one of the latter pair is a theorem. In general, it is a dangerous practice to   
assume that symbols can be slipped back an  
d forth between different levels-here, the language of the formal system and its metalanguage (English)
 
203
 
You have now seen one example of a system wi  
th a purpose-to re part of the architecture of logical thought. The concepts which this handles are very few in number, and they are   
very simple, precise co But the simplicity a  
nd precision of the Propositional Calculus are the kinds of features which make it appealing to mathematicians. There are two reasons   
for this. (1) It can be studied for its own properties, ex geometry studies simple, rigid   
shapes. Variants can be made on it, employing different symbols, rules of inference,   
axioms or axiom schemata on. (Incidentally  
, the version of the Propositional Calculus here pr is related to one invented by G. Gentzen in the early 1930's. The other versions in   
which only one rule of inference is used-detachment usually-and in which there are   
several axioms, or axiom schemata study of ways to carry out propositional reasoning in   
elegant formal systems is an appealing branch of pure mathematics. (2) The Propositional   
Calculus can easily be extended to include other fundamental aspects of reasoning. Some   
of this will be shown in the next Chapt  
er, where the Propositional Calculus is incorporated lock, stock and barrel into a much larger and deeper system in which   
sophisticated number-theoretical reasoning can be done.
 
203
 
The Propositional Calculus is very much like reasoning in some w one should not equate   
its rules with the rules of human thought. A proof is something informal, or in other   
words a product of normal thought written in a human language, for human consumption.   
All sorts of complex features of thought may be used in proofs, and, though they may   
“feel right", one may wonder if they can be defended logically. That is really what   
formalization is for. A derivation is an artificial counterpart o  
f and its purpose is to reach the same goal but via a logical structure whose methods are not only all explicit, but also   
very simple.
 
203
 
If -- and this is usually the case -it happens that a formal derivation is extremely   
lengthy compared with the corresponding "natural"  
proof that is just too bad. It is the price one pays for making each step so simple. What often happens is that a derivation   
and a proof are "simple" in complementary senses of the word. The proof is simple in   
that each step sounds right", even though one may not know just why; the derivation is   
simple in that each of its myriad steps is co  
nsidered so trivial that it is beyond reproach, and since the whole derivation consists just of such trivial steps it is supposedly error-free. Each type of simplicity, however, brings along a characteristic type of complexity.   
In the case of proofs, it is the complexity of the underlying system on which they rest --   
namely, human language -- and in the case of derivations, it is their astronomical size,   
which makes them almost impossible to grasp.
 
203
 
Thus
 
204
 
general method for synthesizing artificial proof-  
like structures. It does not, however, have much flexibility or generality. It is intended only for use in connection with mathematical   
concepts-which are themselves quite rigid. As a rather interesting example of this, let us   
make a derivation in which a very peculiar string is taken as a premise in a fantasy:   
<P∧~P>. At least its semi-interpretation is   
peculiar. The Propositional Calculus, however, does not think about semi-interpretations; it just manipulates strings   
typographically-and typographically, there is r  
eally nothing peculiar about this string. Here is a fantasy with this string as its premise:   
(1) [ push   
(2) <P∧~P> premise   
(3) P separation   
(4) ~P separation   
(5) [ push   
(6) ~Q premise   
(7) P carry-over line 3   
(8) ~~P double-tilde   
(9) ] pop   
(10) <~Q⊃~~P> fantasy   
(11) <~P⊃Q> contrapositive   
(12) Q detachment (Lines 4,11)   
(13) ] pop   
(14) <<P∧~P >⊃Q>   
fantasy Now this theorem has a very strange semi-interpretation:   
P and not P together imply Q  
Since Q is interpretable by any statement, we can loosely take the theorem to say that   
"From a contradiction, anything follows"! Thus, in systems based on the Propositional   
Calculus, contradictions cannot be contained; they infect the whole system like an   
i
 
204
 
This does not sound much like human thought. I  
f you found a contradiction in your own thoughts, it's very unlikely that your whole mentality would break down. Instead, you   
would probably begin to question the beliefs or modes of reasoning which you felt had   
led to the contradictory thoughts. In other wor  
ds, to the extent you could, you would step out of the systems inside you which you felt wer  
e responsible for the contradiction, and try to repair them. One of the least likely t  
hings for you to do would be to throw up your arms and cry, "Well, I guess that shows that I believe everything now!" As a joke, yes-b  
ut not 
 
204
 
Indeed, contradiction is a major source of clarification and progress in all domains   
of life-and mathematics is no exception. When in times past, a
 
205
 
contradiction in mathematics was found, mathematicians would immediately seek to   
pinpoint the system responsible for it, to jump out of it, to reason about it, and to amend   
it. Rather than weakening mathematics, the dis  
covery and repair of a contradiction would strengthen it. This might take time and a number  
of false starts, but in the end it would y  
ield fruit. For instance, in the Middle Ages, the value of the infinite series 1  
– 1 + 1 – 1 + 1 -. .. w  
as hotly disputed. It was "proven" to equal 0, 1, œ, and perhaps other values. Out of such controversial findings came a fuller, deeper about infinite series.
 
205
 
A more relevant example is the contradiction right now confronting us-namely the   
discrepancy between the way we really think, and t the Propositional Calculus imitates   
us. This has been a source of discomfort for many logicians, and much creative effort has   
gone into trying to patch up the Propositional Calcu  
lus so that it would not act so stupidly and inflexibly. One attempt, put forth in the book Entailment   
by A. R. Anderson and N. Belnap,3  
involves "relevant implication", which tries to make the symbol for "if-then"   
reflect genuine causality, or at least connect meanings. Consider the following theorems   
o  
f the Propositional Calculus <P⊃<Q⊃P  
>> <P⊃<Q∹~  
P>> <<P∧~P>⊃Q  
> <<P⊃Q>∹<Q⊃P  
>> They, and many others like them, all show that there need be no relationship at all   
between the first and second clauses of an if-then statement for it to be provable within   
the Propositional Calculus. In protest, "relevant implication" puts certain restrictions on   
the contexts in which the rules of inference can be applied. Intuitively, it says that   
"something can only be derived from something els  
e if they have to do with each other”. For example, line 10 in the derivation given above would not be allowed in such a   
system, and that would block the derivation of the <<P∧~P >⊃Q
 
205
 
More radical attempts abandon completely the quest for completeness or   
consistency, and try to mimic human reasoning with all its inconsistencies. Such research   
no longer has as its goal to provide a solid underpinning for mathematics, but purely to   
study human thought processes.
 
205
 
Despite its quirks, the Propositional Calculus has some feat recommend itself. If   
one embeds it into a larger system (as we will do next Chapter), and if one is sure that the   
larger system contains no contradictions (an  
d we will be), then the Propositional Calculus does all that one could hope: it provides valid p  
ropositional inferences -- all that can be made. So if ever an incompleteness or an inc  
onsistency is uncovered, can be sure that it will be the fault of the larger system, and not of its subsystem which is the Propositional   
C
 
212
 
C  
HAPTER VIII T  
ypog
 
212
 
THREE EXAMPLES OF indirect self-referen  
ce are found in the Crab Canon. Achilles and the Tortoise both describe artistic creat  
ions they know-and, quite accidentally, those creations happen to have the same structure as the Dialogue they're in. (Imagine my   
surprise, when I, the author, noticed this!) Als  
o, the Crab describes a biological structure and that, too, has the same property. Of c  
ourse, one could read the Dialogue and understand it and somehow fail to notice that it, too, has the form of a crab canon. This   
w  
ould be understanding it on one level, but not on another. To see the self-reference, one has to look at the form, as well as the content, of the Dialogue.   
Gödel’s construction depends on describing the form, as well as the content, of   
strings of the formal system we shall define in this Chapter -- T  
ypographical Number Theory (TNT). The unexpected twist is that, because of the subtle mapping which Gödel   
discovered, the form of strings can be described in the formal system itself. Let us   
acquaint ourselves with this strange system with the capacity for wrapping around.
 
212
 
We'll begin by citing some typical sentences belonging to number theory; then we will   
try to find a set of basic notions in terms o  
f which all our sentences can be rephrased. Those notions will then be given individual symbols. Incidentally, it should be stated at   
the outset that the term "number theory" will refer only to properties of positive integers   
and zero (and sets of such integers). These numbers are called the natural numbers.  
Negative numbers play no role in this theory. Thus the word "number", when used, will   
mean exclusively a natural number. And it is imp  
ortant -- vital-for you to keep separate in your mind the formal system (TNT) and the rather ill-defined but comfortable old branch   
of mathematics that is number theory itself; this I shall call "N"  
. Some typical sentences of N-number theory-are:   
(1) 5 is prime.   
(2) 2 is not a square.   
(3) 1729 is a sum of two cubes.   
(4) No sum of two positive cubes is itself a cube.   
(5) There are infinitely many prime numbers.   
(6) 6 is even.
 
213
 
Now it may seem that we will need a symbol for each notion such as "prime” or "cube"   
or "positive" -- but those notions are really not primitive. Primeness, for instance, has to   
do with the factors which a number has, which in turn has to do with multiplication.   
Cubeness as well is defined in terms multiplication. Let us rephrase the sentences, then,   
in terms of what seem to be more elementary notions.   
(1) There do not exist numbers a and b, both greater than 1. such that 5 equals a  
times b.  
(2) There does not exist a number b, such that b times b   
equals 2. (3) There exist numbers b and c such that b times b times b, plus c times c times c,  
e  
quals 1729. (4') For all numbers b and c, greater than 0, there is no number a such that a times a  
times a equals b times b times b plus c times c times c.  
(5) For each number a, there exists a number b, greater than a,  
with the property that there do not exist numbers c and d, both greater than 1, such that b equals c  
times d.  
(6') There exists a number e such that 2 times e   
equals 6. This analysis has gotten us a long ways towards the basic elements of language of   
number theory. It is clear that a few phrases reappear over a over:   
for all numbers b  
there exists a number b,  
such that g  
reater than e  
quals times   
p  
lus 0  
, 1, 2, . . Most of these will be granted individual symb  
ols. An exception is "greater than", which can be further reduced. In fact, the sentence "a is greater than b" becomes   
there exists a number c, not equal to 0, such that a equals b plus c.
 
213
 
We will not have a distinct symbol for each natural number. Instead, we have a very   
simple, uniform way of giving a compound symbol to e natural number -- very much as   
we did in the pq-system. Here is notation for natural numbers:
 
214
 
zero:    
0 one:    
SO two:    
SSO three:    
SSSO    
etc. The symbol S has an interpretation-"the successor of". Hence, the interpretation of S  
SO is literally "the successor of the successor of zero". Strings of this form are called   
n
 
214
 
Clearly, we need a way of referring to unspecified, or variable, numbers. For that, we will   
use the letters a, b, c, d, e. But five will not be enough. We need an unlimited supply of   
them, just as we had of atoms in the Propositional Calculus. We will use a similar method   
for making more variables: tacking on any number of primes. (Note: Of course the   
symbol "'-read "prime"-is not to be confused with prime numbers!) For instance:   
e  
d'   
c  
" b  
ÂŽÂŽÂŽ... a  
ÂŽÂŽÂŽÂŽ are all variables.  
In a way it is a luxury to use the first fiv  
e letters of the alphabet when we could get away with just a and the prime. Later on, I will actually drop b, c, d, and e,  
which will result in a sort of "austere" version of TNT-austere in the sense that   
it is a little harder to decipher complex formulas. But for now we'll be luxurious.
 
214
 
Now what about addition and multiplication? Very simple: we will use the   
ordinary symbols `+' and `‱'. However, we will also introduce a parenthesizing   
requirement (we are now slowly slipping into the rules which define well-formed strings   
of TNT). To write "b plus c" and "b times c"  
, for instance, we use the strings (  
b+c) (b ‱ c)   
There is no laxness about such parentheses; to violate the convention is to produce a non-well-formed formula. ("Formula"? I use the term instead of "string" because it is   
conventional to do so. A formula is no more and no less than a string of TNT.
 
214
 
Incidentally, addition and multiplication are always to be thought of as binary operations-that is, they unite precisely two numbers, never three or more. Hence, if you wish to   
translate "1 plus 2 plus 3", you have to de  
cide which of the following two expressions y
 
215
 
(  
SO+(SSO+SSSO)) (  
(SO+SSO)+SSSO) The next notion we'll symbolize is equals. That is very simple: we use Ž=Ž.The advantage   
of taking over the standard symbol used N -- nonformal number theory -- iis obvious:   
easy legibility. The disadvantage is very much like the disadvantage of using the words   
"point" a "line" in a formal treatment of geometry: unless one is very conscious a careful,   
one may blur the distinction between the familiar meaning and strictly rule-governed   
behavior of the formal symbol. In discuss geometry, I distinguished between the   
everyday word and the formal to by capitalizing the formal term: thus, in elliptical   
geometry, a POINT was 1 union of two ordin  
ary points. Here, there is no such distinction; hen mental effort is needed not to confuse a symbol with all of the association   
is laden with. As I said earlier, with reference to the pq-system: the string --- is not the   
number 3, but it acts isomorphically to 3, at least in the context of additions. Similar   
remarks go for the string SSSO.
 
215
 
All the symbols of the Propositional Calculus except the letters used making atoms (P, Q,  
and R) will be used in TNT, and they retain their interpretations. The role of atoms   
will be played by strings which, when interpreted, are statements of equality, such as   
SO=SSO or (SO ‱ SO) Now, we have the equipment to do a fair amount of translation of   
simple sentences into the notation of TNT:  
2 plus 3 equals 4: (  
SSO+SSSO)=SSSSO 2 plus 2 is not equal to 3: ~  
(SSO+SSO)=SSSO If 1 equals 0, then 0 equals 1: <  
SO=OJO=SO> The first of these strings is an atom; the rest are compound formulas (Warning: The `and'   
in the phrase "I and 1 make 2" is just another word for `plus', and must be represented by   
`+' (and the requisite parentheses).)
 
215
 
All the well-formed formulas above have the  
property that their interpretations are sentences which are either true or false. There are, however, well-formed formulas which   
d  
o-not have that property, such as this one (  
b+SO)=SSO Its interpretation is "b plus 1 equals 2". Since b   
is unspecified, there is way to assign a truth value to the statement. It is like an out-of-context statement with a pronoun, such as   
"she is clumsy". It is neither true nor false; it is waiting for you to put it into a context.   
Because it is neither true nor false, such a formula is called open, and the variable b   
is called a free variable.
 
216
 
One way of changing an open formula into a closed formula, or sentence,  
is by prefixing it with a quantifier-either the phrase "there exists a number b such that , or the   
phrase "for all numbers b". In the first instance, you get the sentence   
There exists a number b such that b   
plus 1 equals 2. C  
learly this is true. In the second instance, you get the sentence For all numbers b, b   
plus 1 equals 2. Clearly this is false. We now introduce symbols for both of these quantifiers.  
These s  
entences are translated into TNT-notation as follows: ℑb:(b+SO)=SSO ('ℑ' stands for `exists'.)   
Vb:(b+SO)=SSO ('V' stands for `all'.)
 
216
 
It is very important to note that these statements are no longer about unspecified   
numbers; the first one is an assertion of existence, and the second one is a u  
niversal assertion. They would mean the same thing, even if written with c instead of b:   
ℑc  
:(c+SO)=SSO ` Vc  
:(c+SO)=SSO A variable which is under the dominion of a quantifier is called a quantified variable.  
The following two formulas illustrate the difference between free variables and quantified   
v  
ariables: (b.b)=SSO (open)   
---ℑb:(b‱b)=SSO (closed; a sentence 
 
216
 
The first one expresses a property which might be possessed by some natural number. Of   
course, no natural number has that property. An  
d that is precisely what is expressed by the second one. It is very crucial to understand this difference between a string with a f  
ree variable, which expresses a property, and a string where the variable is quantified,  
which expresses a truth or falsity. The English translation of a formula with at least one free   
variable-an open formula-is called a predicate.  
It is a sentence without a subject (or a s  
entence whose subject is an out-of-context pronoun). For instance, "  
is a sentence without a subject" "would be an anomaly"   
"runs backwards and forwards simultaneously"   
"improvised a six-part fugue on demand"   
are nonarithmetical predicates. They express properties which specific entities might or   
might not possess. One could as well stick on a "dummy
 
217
 
subject", such as "so-and-so". A string with free variables is like a predicate with "so-a  
nd-so" as its subject. For instance, (  
SO+SO)=b is like saying "1 plus 1 equals so-and-so". This is a predicate in the variable b.  
It expresses a property which the number b might have. If one wet substitute various   
numerals for b, one would get a succession of forms most of which would express   
falsehoods. Here is another example of difference between open formulas and sentences:  
`Vb:`Vc  
:(b+c)=(c+b) The above formula is a sentence representing, of course, the commutativity of addition.   
O  
n the other hand, `Vc  
:(b+c)=(c+b) is an open formula, since b is free. It expresses a property which unspecified number b  
might or might not have -- namely of commuting with all numbers c.
 
217
 
This completes the vocabulary with which we will express all num theoretical statements!   
It takes considerable practice to get the hang of expressing complicated statements of N  
in this notation, and converse] figuring out the meaning of well-formed formulas. For this   
reason return to the six sample sentences   
given at the beginning, and work their translations into TNT. By the way, don't think that the  
translations given below are unique-far from it. There are many -- infinitely many -- ways to express each one.
 
217
 
Let us begin with the last one: "6 is even". This we rephrased in to of more   
primitive notions as "There exists a number e such that 2 times e   
equals 6". This one is easy :  
ℑe  
:(SSO. e)=SSSSSSO Note the necessity of the quantifier; it simply would not do to write   
(  
SSO . e)=SSSSSSO alone. This string's interpretation is of course neither true nor false; it expresses a   
property which the number e might have.   
It is curious that, since we know multiplication is commutative might easily have   
w  
ritten ℑe  
:(e - SSO)=SSSSSSO instead. Or, knowing that equality is a symmetrical relation, we might 1 chosen to write   
the s
 
218
 
ℑe  
:SSSSSSO=(SSO ‱ e) Now these three translations of "6 is even"  
are quite different strings, and it is by no means obvious that theoremhood of any one of them is tied to theoremhood of any of the   
others. (Similarly, the fact that --p-q--- was a theorem had very little to do with the fact   
that its "equivalent" string -p--q--- was a theorem. The equivalence lies in our minds,   
since, as humans, we almost automatically t  
hink about interpretations, not structural properties of formulas.)   
We can dispense with sentence 2: "2 is not a square", almost immediately:   
-ℑb  
:(b ‱ b)=SSO However, once again, we find an ambiguity. What if we had chosen to write it this way?   
Vb: -(b ‱ b) =SSO   
The first way says, "It is not the case that there exists a number b   
with the property that b's square is 2", while the second way says, "For all numbers b, it is not the case that b's   
square is 2." Once again, to us, they are conceptually equivalent-but to TNT,  
they are d
 
218
 
Let us proceed to sentence 3: "1729 is a sum of two cubes." This one will involve   
two existential quantifiers, one after the other, as follows:   
ℑb:ℑc:SSSSSS



SSSSSO=(((b ‱ b) ‱ b)+((c ‱ c) ‱ c))   
1  
729 of them There are alternatives galore. Reverse the order of the quantifiers; switch the sides of the   
equation; change the variables to d and e; reverse the addition; write the multiplications   
d  
ifferently; etc., etc. However, I prefer the following two translations of the sentence: ℑb:ℑc  
:(((SSSSSSSSSSO.SSSSSSSSSSO).SSSSSSSSSSO)+ ((SSSSSSSSSO ‱ SSSSSSSSSO) ‱ SSSSSSSSSO))=(((b ‱ b) ‱ b)+((c ‱ c) ‱ c))   
a  
nd ℑb:ℑc  
:(((SSSSSSSSSSSSO.SSSSSSSSSSSSO). SSSSSSSSSSSSO)+ ((SO ‱SO) ‱ SO))=(((b ‱b) ‱b)+((c ‱ c) ‱c))   
Do you see why?
 
218
 
Now let us tackle the related sentence 4: "No sum of two positive cubes is itself a cube".   
Suppose that we wished merely to state that 7 is not a sum of two positive cubes. The   
easiest way to do this is by negating the formula
 
219
 
which asserts that 7 is a sum of two positive c  
ubes. This will be just like the preceding sentence involving 1729, except that we have to   
add in the proviso of the cubes being positive. We can do this with a trick: prefix variables with the symbol S, as follows:   
ℑb:ℑc:SSSSSSSO=(((Sb ‱ Sb) ‱ Sb)+((Sc ‱ Sc) -Sc))   
You see, we are cubing not b and c, but their successors, which must be positive, since   
the smallest value which either b or c   
can take on is zero. Hence the right-hand side represents a sum of two positive cubes. In( t  
ally, notice that the phrase "there exist numbers b and c such that
..”) when translated, does not involve the symbol `n'   
which stands for ‘and’. That symbol is used for connecting entire well-formed strings, not for   
joining two quantifiers.
 
219
 
Now that we have translated "7 is a sum of two positive cubes", we wish to negate   
it. That simply involves prefixing the whole thing by a single (Note: you should n  
ot negate each quantifier, even though the desired phrase runs "There do not exist numbers   
b and c   
such that ...".) Thus we get: -ℑb:ℑc:SSSSSSSO=(((Sb ‱ Sb) ‱ Sb)+((Sc -Sc) -Sc))   
Now our original goal was to assert this property not of the number of all cubes.   
Therefore, let us replace the numeral SSSSSSSO by the ((a-a)-a)  
, which is the translation of "a   
cubed": ℑb:ℑc:((a ‱a) ‱a)=(((Sb ‱Sb) ‱ Sb)+((Sc -Sc) -Sc))   
-  
At this stage, we are in possession of an open formula, since a is still free. This formula   
expresses a property which a number a might or might not have-and it is our purpose to   
assert that all numbers do have that property. That is simple -- just prefix the whole thing   
w  
ith a universal quantifier Va:-ℑb:ℑc:((a -a) ‱ a)=(((Sb ‱ Sb) ‱ Sb) +((Sc -Sc) -Sc))   
A  
n equally good translation would be this: --ℑa:ℑb:ℑc:((a-a) a)=(((Sb‱Sb)‱Sb)+((Sc‱Sc)‱Sc))   
In austere TNT, we could use a' instead of b, and a" instead of c, and the formula would   
become:   
--ℑa: ℑa': ℑa":((a ‱ a) ‱ a) =(((Sa' ‱ Sa') ‱ Sa') +((Sa" ‱ Sa") ‱ Sa"))
 
219
 
What about sentence 1: "5 is prime"? We had r  
eworded it in this way "There do not exist numbers a and b, both greater than 1, such equals a times b". We can slightly modify it,   
as follows: "There do not exist numbers a and b such that 5 equals a plus 2, times b   
plus 2". This is another trick-since a and b are restricted to natural number values, this is an   
adequate way to say the same thing. Now "b plu
 
220
 
(b+SSO), but there is a shorter way to write it -- namely, SSb. Likewise, "c   
plus 2" can be written SSc. Now, our translation is extremely concise:   
ℑb: ℑc  
:SSSSSO=(SSb ‱ SSc) Without the initial tilde, it would be an assertion that two natural numbers do   
exist, which, when augmented by 2, have a product equal to 5. With the tilde in front, that   
whole statement is denied, resulting in an assertion that 5 is prime.
 
220
 
If we wanted to assert that d plus e plus 1, rather than 5, is prime, the most   
economical way would be to replace the numeral for 5 by the string (d+Se)  
: ℑb: ℑc  
:(d+Se)=(SSb SSc) Once again, an open formula, one whose interpretation is neither a true nor a false   
sentence, but just an assertion about two unspecified numbers, d and e.  
Notice that the number represented by the string (d+Se) is necessarily greater than d,  
since one has added to d an unspecified but definitely positive amount. Therefore, if we existentially   
quantify over the variable e, we will have a formula which asserts that:   
There exists a number which is greater than d and which is prime.   
ℑe:- ℑb  
:3c:(d+Se)=(SSb ‱ SSc) Well, all we have left to do now is to assert that this property actually obtains, no matter   
what d is. The way to do that is to universally quantify over the variable d:  
V  
d:3e:-3b:3c:(d+Se)=(SSb ‱SSc) That's the translation of sentence 5!
 
220
 
This completes the exercise of translating all six typical number-theoretical sentences.   
However, it does not necessarily make you an expert in the notation of TNT.  
There are still some tricky issues to be mastered. The following six well-formed formulas will test   
your understanding of TNT notation. What do they mean? Which ones are true (under   
interpretation, of course), and which ones are false? (Hint: the way to tackle this exercise   
is to move leftwards. First, translate the atom; next, figure out   
what adding a single quantifier or a tilde does; then move leftwards, adding another quantifier or tilde; then   
move leftwards again, and do the same.)   
-Vc: ℑb  
:(SSO ‱ b)=c Vc:- ℑb
 
221
 
Vc: ℑb:---(SSO ‱ b)=c   
~ℑb:Vc  
:(SSO ‱ b)=c ℑb:- Vc  
:(SSO ‱ b)=c ℑb: Vc  
:-(SSO ‱ b)=c (Second hint: Either four of them are true an  
d t
 
221
 
At this juncture, it is worthwhile pausing for breath and contempt what it would mean to   
have a formal system that could sift out the true from the false ones. This system would   
treat all these strings-which look like statements-as designs having form, but no content.   
An( system would be like a sieve through which could pass only designs v special style-the "style of truth". If you yourself have gone through ti formulas above, and have   
separated the true from the false by this about meaning, you will appreciate the subtlety   
that any system would to have, that could do the same thing-but typographically! The   
bout separating the set of true statements from the set of false statements written in the   
TNT-notation) is anything but straight; it is a boundary with many treacherous curves   
(recall Fig. 18), a boundary of which mathematician  
s have delineated stretches, here and there, working over hundreds years. Just thi  
nk what a coup it would be to have a typographical m( which was guaranteed to place any formula on the proper side o border!
 
221
 
It is useful to have a table of Rules of Formation for well-formed formulas This is   
provided below. There are some preliminary stages, defining numerals, variables,  
and terms. Those three classes of strings are ingredients of well-formed formulas, but are not   
in themselves well-formed. The smallest well-formed formulas are the atoms;  
then there are ways of compounding atoms. Many of these r  
ules are recursive lengthening rules, in that they take as input an item of a given class and produce a longer item of the class. In   
this table, I use `x' and 'y' to stand for well-formed formulas, and `s', `t', and `u'   
to stand for other kinds of TNT-strings. Needless to say, none of these five symbols is itself a   
symbol of TNT.  
N  
UMERALS. 0 is a numeral.   
A numeral preceded by S is also a numeral.   
Examples: 0
 
222
 
V  
ARIABLES. a is a variable. If we're not being austere, so are b, c, d and e.  
A variable followed by a prime is also a variable.   
Examples: a  
b' c" d"' a"" T  
ERMS. All numerals and variables are terms.   
A term preceded by S is also a term.   
If s and t are terms, then so are (s+ t) and (s ‱ t).   
Examples: 0  
b SSa' (SO ‱ (SSO+c)) S(Sa ‱ (Sb ‱ Sc)) TERMS may be divided into two categories:   
(1) DEFINITE terms. These contain no variables.   
Examples: 0 (SO+SO) SS((SSO.SSO)+(SO.SO))   
(2) INDEFINITE terms. These contain variables.   
Examples: b  
Sa (b+SO) (((SO+SO)+SO)+e) The above rules tell how to make parts of well-formed formulas; the remaining   
rules tell how to make complete well-formed formulas.   
ATOMS.   
If s and t are terms, then s = t is an atom.   
Examples: S  
O=0 (SS0+SS0)=5SSS0 5(b+c)=((c‱d).e) If an atom contains a variable u, then u is free   
in it. Thus there are four free variables in the last example.   
N  
EGATIONS. A well-formed formula preceded by a tilde is well-formed.   
Examples: ~S0=0 ~ℑb:(b+b)=SO -<O=0⊃S  
0=O> ~b=SO The quantification status   
of a variable (which says whether the variable is f  
ree or quantified) does not change under negation. COMPOUNDS.   
If x and y are well-formed formulas, and pro  
vided that no variable which is free in one is quantified in the other, then the following are all well-formed formulas:   
< x∧ y>, < x∹ y>, < x⊃   
y>. Examples: <O=O∧~-0=0> <b=b∹~ℑc  
:c=b> <SO=O⊃Vc:~ℑb  
:(b+b)=c> The quantification status of a variable doesn't change here.   
Q  
UANTI FI CATIONS. If u is a variable, and x is a well-formed formula in which u   
is free then the following strings are well-formed formulas:   
ℑu: x and Vu  
: x. Examples: Vb:<b=b∹~ℑc:c=b> ∹c:~ℑb:(b+b)=c ~ℑc  
:Sc=d OPEN FORMULAS   
contain at least one free variable. Examples: --c=c b=b <Vb  
:b=bn---c=c> CLOSED FORMULAS (SENTENCES)   
contain no free variables. Examples: 50=0 ~Vd:d=0 ℑc:<Vb:b=b∧~
 
223
 
This completes the table of Rules of Formation for the well-formed formulas of TNT.
 
223
 
And now, a few practice exercises for you, to t  
est your understanding of the notation of TNT. Try to translate the first four of the following N-sentences into TNT-  
sentences, and the last one into an open formed formula.   
All natural numbers are equal to 4.   
There is no natural number which equals its own square.   
Different natural numbers have different successors.   
If 1 equals 0, then every number is odd.   
b   
is a power of 2. The last one you may find a little tricky. But it is nothing, compared to this one:   
b   
is a power of 10. Strangely, this one takes great cleverness to   
render in our notation. I would caution you to try it only if you are willing to spend hours and hours on it -- and if   
you know quite a bit of number theory!
 
223
 
This concludes the exposition of the notation of TNT; however, we still left with the   
problem of making TNT into the ambitious system which we have described. Success   
would justify the interpretations which we given to the various symbols. Until we have   
done that, however, particular interpretations are no more justified than the "horse-apple   
happy" interpretations were for the pq-system's symbols.   
Someone might suggest the following way of constructing TNT:  
(1|) Do not have any rules of inference; they are unnecessary, because (2) We take as axioms all true   
statements of number theory (as written in TNT-notation). What a simple prescription!   
Unfortunately it is as empty as instantaneous re  
action says it is. Part (2) is, of course, not a typographical description of strings. The whole purpose of TNT   
is to figure out if and h
 
223
 
Thus we will follow a more difficult route than the suggestion above; we will have   
axioms and rules of inference. Firstly, as was promised, a  
ll of the rules of the P ropositional Calculus are taken over into TNT. Therefore one theorem of TNT   
will be t  
his one: <S0=0√~
 
224
 
which can be derived in the same way as <P√-P>   
was d
 
224
 
Before we give more rules, let us give the five axioms of TNT:  
Axiom 1: Va  
:~Sa=O Axiom 2: Va  
:(a+O)=a Axiom 3: Va:Vb  
:(a+Sb)=S(a+b) Axiom 4: Va  
:(a-O)=O Axiom 5: Va:Vb  
:(a-Sb)=((a-b)+a) (In the austere versions, use a' instead of b.) All of them are very simple to understand.   
Axiom 1 states a special fact about the number 0; Axioms 2 and 3 are concerned with the   
nature of addition; Axioms 4 and 5 are concerned with the nature of multiplication, and in   
p  
 
224
 
By the way, the interpretation of Axiom 1-"  
Zero is not the successor of any natural number"-is one of five famous properties of natural numbers first explicitly recognized   
by the mathematician and logician Giuseppe P  
eano, in 1889. In setting out his postulates, Peano was following the path of Euclid in this way: he made no attempt to formalize the   
principles of reasoning, but tried to give a small set of properties of natural numbers from   
which everything else could be derived by reasoning. Peano's attempt might thus be   
considered "semiformal". Peano's work had a s  
ignificant influence, and thus it would be good to show Peano's five postulates. Since the notion of "natural number" is the one   
which Peano was attempting to define, we will not use the familiar term "natural   
number", which is laden with connotation. We will replace it with the undefined term   
djinn, a word which comes fresh and free of connotations to our mind. Then Peano's five   
postulates place five restrictions on djinns. There are two other undefined terms: Genie,  
and meta. I will let you figure out for yourself what usual concept each of them is   
s  
upposed to represent. The five Peano postulates: (  
1) Genie is a djinn. (2) Every djinn has a mesa (which is also a djinn).   
(3) Genie is not the mesa of any djinn. (4) Different djinns have different metas.   
(5) If Genie has X, and each djinn relays X to its mesa, then all djinns get X.   
In light of the lamps of the Little Harmonic Labyrinth, we should name the set of a  
ll djinns "GOD". This harks back to a celebrated statement by the German mathematician   
and logician Leopold Kronecker, archenemy of Georg Cantor: "God made the natural   
numbers; all the rest is the work of man."
 
225
 
You may recognize Peano's fifth postulate as the principle of mathematical   
induction-another term for a hereditary argument  
. Peano he that his five restrictions on the concepts "Genie", "djinn", and "mesa" so strong that if two different people formed   
images in their minds o concepts, the two images would have completely isomorphic   
structures. example, everybody's image would include an infinite number of distinct   
djinns. And presumably everybody would agree that no djinn coins with its own meta, or   
its meta's meta, etc.
 
225
 
Peano hoped to have pinned down the essence of natural numbers in his five   
postulates. Mathematicians generally grant t  
hat he succeeded that does not lessen the importance of the question, "How is a true statement about natural numbers to be   
distinguished from a false one?" At answer this question, mathematicians turned to totally   
formal systems, as TNT. However, you will see the influence of Peano in TNT,  
because all of his postulates are incorporated in TNT   
in on
 
225
 
Now we come to the new rules of TNT.  
Many of these rules will allow reach in and change the internal structure of the atoms of TNT. In sense they deal with more   
"microscopic" properties of strings than the  
of the Propositional Calculus, which treat atoms as indivisible units. example, it would be nice if we could extract the string -  
SO=O from the first axiom. To do this we would need a rule which permits us to di universal   
quantifier, and at the same time to change the internal strut of the string which remains, if   
w  
e wish. Here is such a rule: RULE OF SPECIFICATION: Suppose u is a variabl  
e which occurs inside string x. If the string Vu:x is a theorem, then so is x, and so an strings made from x by replacing u,   
wherever it occurs, by one the same term.   
(Restriction: The term which replaces u must not contain any vat that is quantified   
i  
n x.) The rule of specification allows the desired string to be extracted Axiom 1. It is a one-s  
tep derivation: Va -Sa=0 axiom 1   
~S0=0 specification   
Notice that the rule of specification will allow some formulas which co: free variables   
(i.e., open formulas) to become theorems. For example following strings could also be   
derived from Axiom 1, by specification:   
S  
a=0 ~  
S(c+SSO)=0 There is another rule, the rule of generalization, whic
 
226
 
back the universal quantifier on theorems which contain variables that became free as a   
result of usage of specification. Acting on the lower string, for example, generalization   
w  
ould give: Vc  
:~S(c+SSO)=O Generalization undoes the action of specificatio  
n, and vice versa. Usually, generalization is applied after several intermediate steps have transformed the open formula in various   
ways. Here is the exact statement of the rule:   
RULE OF GENERALIZATION: Suppose x is a theorem in which u, a variable, occurs   
free. Then Vu:x is a theorem.   
( Restriction: No generalization is allowed in   
a fantasy on any variable which appeared free in the fantasy's premise.)
 
226
 
The need for restrictions on these two rules will shortly be demonstrated explicitly.   
Incidentally, this generalization is the same generalization as was mentioned in Chapter   
II, in Euclid's proof about the infinitude of primes. Already we can see how the symbol-manipulating rules are starting to approximate the kind of reasoning which a   
mathematician uses.
 
226
 
These past two rules told how to take off universal quantifiers and put them back on; the   
next two rules tell how to handle existential quantifiers.   
RULE OF INTERCHANGE: Suppose u is a variable. Then the strings Vu:- and -3u:  
are interchangeable anywhere inside any theorem.   
For example, let us apply this rule to Axiom 1:   
Va:-Sa=O axiom 1   
~ℑa:Sa=O   
interchange By the way, you might notice that both these str  
ings are perfectly natural renditions, in TNT, of the sentence "Zero is not the successor of any natural number". Therefore it is   
g  
oo
 
226
 
The next rule is, if anything, even more   
intuitive. It corresponds to the very simple kind of inference we make when we go from "2 is prime" to "There exists a   
prime". The name of this rule is self-explanatory:   
RULE OF EXISTENCE: Suppose a term (which may contain variables as long as they   
are free) appears once, or multiply, in a theorem. Then any (or several, or all) of the   
appearances of the term may be replaced by a v  
ariable which otherwise does not occur in the theorem, and the corresponding existential quantifier must be placed in front.   
Let us apply the rule to --as usual--Axiom 1:
 
227
 
a:-Sa=O axiom 1   
ℑb:Va:-Sa=b   
existence You might now try to shunt symbols, according to rules so far giver produce the theorem   
~Vb: ℑa
 
227
 
We have given rules for manipulating quantifiers, but so far none for symbols `=' and 'S'.   
We rectify that situation now. In what follows, r, s, t all stand for arbitrary terms.   
R  
ULES OF EQUALITY: SYMMETRY: If r = s is a theorem, then so is s = r.   
TRANSITIVITY: If r = s and s = t are theorems, then so is r = t.   
R  
ULFS OF SUCCESSORSHIP: ADD S: If r = t is a theorem, then Sr = St is a theorem.   
DROP S: If Sr = St is a theorem, then r = t is a theorem.   
Now we are equipped with rules that can give us a fantastic variet theorems. For   
example, the following derivations yield theorems which pretty fundamental:   
(1) Va: Vb:(a+Sb)=S(a+b) axiom 3   
(2) Vb:(SO+Sb)=S(SO+b) specification (SO for a)   
(3) (SO+SO)=S(SO+O) s  
pecification (0 for b) (4) Va:(a+O)=a axiom 2   
(5) (SO+O)=SO s  
pecification (SO for a) (6) S(SO+O)=SSO a  
dd S (7) (SO+SO)=SSO t  
ransitivity (lines 3,6)    
* * * * * (1) Va: Vb:(a-Sb)=((a-b)+a) axiom 5   
(2) Vb:(SO‱Sb)=((SO‱b)+SO) specification (SO for a)   
(3) (SO.SO)=((SO.O)+SO) specification (0 for b)   
(4) Va: Vb:(a+Sb)=S(a+b) axiom 3   
(5) Vb:((SO.O)+Sb)=S((5O O)+b) specification ((SO-0) for a)   
(6) ((SO .0)+SO)=S((SO.0)+0) specification (0 for b)   
(7) Va:(a+O)=a axiom 2   
(8) ((SO.O)+0)=(SO.O) specification ((S0.0) for a)   
(9) Va:(a.0)=0 axiom 4   
(10) (S0-0)=0 specification (SO for a)   
(11) ((SO.O)+O)=O transitivity (lines 8,10)   
(12) S((SO.0)+0)=SO add S   
(13) ((SO -0)+SO)=SO transitivity (lines 6,12)   
(14) (SO.SO)=SO transitivity (lines 3,13)
 
228
 
Now here is an interesting question: "How can we make a derivation for the string 0=0?  
" It seems that the obvious route to go would be first to derive the string Va:a=a,  
and then to use specification. So, what about the following "derivation" of Va:a=a ... What is   
wrong with it? Can you fix it up?   
(1) Va:(a+0)=a axiom 2   
(2) Va:a=(a+0) symmetry   
(3) Va:a=a t  
ransitivity (lines 2,1) I gave this mini-exercise to point out one simple fact: that one should not jump too fast in   
manipulating symbols (such as `=') which are familiar. One must follow the rules, and not   
one's knowledge of the passive meanings of the symbols. Of course, this latter type of
 
228
 
Now let us see why there are restrictio  
ns necessary on both specification and generalization. Here are two derivations. In each of them, one of the restrictions is   
v  
iolated. Look at the disastrous results they produce: (1) [    
push (2) a=0 premise   
(3) Va:a=0 generalization (Wrong!  
) (4) Sa=O    
specification (5) ]    
pop (6) <a=O⊃Sa=O> fantasy rule   
(7) Va:<a=O⊃Sa=O> generalization   
(8) <O=O⊃SO=0>    
specification (9) 0=0 previous theorem   
(10) S0=0 detachment (lines 9,8)   
T  
his is the first disaster. The other one is via faulty specification. (1) Va:a=a previous theorem   
(2) Sa=Sa s  
pecification (3) ℑb:b=Sa    
existence (4) Va: ℑb:b=Sa generalization   
(5) ℑb:b=Sb specification (Wrong!  
) S  
o now you can see why those restrictions are needed. Here is a simple puzzle: translate (if you have not already done so) Peano's fourth   
postulate into TNT-notation, and then derive that string as a theorem.
 
229
 
Now if you experiment around for a while with the rules and axioms of TNT   
so far presented, you will find that you can produce the following pyramidal family of theorems   
(a set of strings all cast from an identical mold, differing from one an  
other only in that the numerals 0, SO, SSO,  
and s have been stuffed in): (  
0+0)=0 (  
O+SO)=S0 (  
O+SSO)=SSO (  
O+SSSO)=SSSO (  
O+SSSSO)=SSSSO e  
tc. As a matter of fact, each of the theorems in this family can be derived the one directly   
above it, in only a couple of lines. Thus it is a so "cascade" of theorems, each one   
triggering the next. (These theorem very reminiscent of the pq-theorems, where the   
middle and right-] groups of hyphens grew simultaneously.)
 
229
 
Now there is one string which we can easily write down, and v summarizes the   
passive meaning of them all, taken together. That un sally quantified s  
ummarizing string i  
s this: Va  
:(O+a)=a Yet with the rules so far given, this strin  
g eludes production. Ti produce it yourself if you don't believe me.   
You may think that we should immediately remedy the situation the following   
(PROPOSED) RULE OF ALL: If all the strings in a pyramidal family are theorems, then   
so is the universally quantified string which summarizes them.   
The problem with this rule is that it cannot be used in the M-mode. people who are   
thinking about the system can ever know that an infinite set of strings are all theorems.   
Thus this is not a rule that can be stuck i any formal system.
 
229
 
S  
o we find ourselves in a strange situation, in which we can typographically produce theorems about the addition of any specific numbers, but even a simple string as the one   
above, which expresses a property of addition in general, is not a theorem. You might   
think that is not all that strange, we were in precisely that situation with the pq-system.   
However, the pq-system had no pretensions ab  
out what it ought to be able to do; and ii fact
 
230
 
there was no way to express general statements about addition in its symbolism, let alone   
prove them. The equipment simply was not there,   
and it did not even occur to us to think that the system was defective. Here, however,  
the expressive capability is far stronger, and we have correspondingly higher expectations of TNT than of the pq-system. If the   
string above is not a theorem, then we wi  
ll have good reason to consider TNT to be defective. As a matter of fact, there is a name for systems with this kind of defect-they   
are called ω-incomplete. (The prefix 'ω'-'omega'- comes from the fact that the totality of   
natural numbers is sometimes denoted by `ω'.) Here is the exact definition:   
A system is ω-incomplete if all the strings in a pyramidal family are theorems, but   
the universally quantified summarizing string is not a theorem.
 
230
 
Incidentally, the negation of the above summarizing string   
~Va  
:(O+a)=a -is also a nontheorem of TNT. This means that the original string is u  
ndecidable within the system. If one or the other were a theorem, then we would say that it was decidable.   
Although it may sound like a mystical term, there is nothing mystical about   
undecidability within a given system. It is only a sign that the system could be extended.   
For example, within absolute geometry, Euclid's fifth postulate is undecidable. It has to   
be added as an extra postulate of geometry, to yield Euclidean geometry; or conversely,   
its negation can be added, to yield non-Euclidean geometry. If you think back to   
geometry, you will remember why this curious   
thing happens. It is because the four postulates of absolute geometry simply do not pin down the meanings of the terms   
"point" and "line", and there is room for different extensions   
of the notions. The points and lines of Euclidean geometry provide one k  
ind of extension of the notions of "point" and "line"; the POINTS and LINES of non-Euclidean geometry, another. However, using   
the pre-flavored words "point" and "line" tended, for two millennia, to make people   
believe that those words were necessarily univalent, capable of only one meaning.
 
230
 
We are now faced with a similar situation, involving TNT. We have adopted a notation   
which prejudices us in certain ways. For instance, usage of the symbol `+'tends to make   
us think that every theorem with a plus sign in it ought to say something known and   
familiar and "sensible" about the known and familiar operation we call "addition".   
Therefore it would run against the grain to propose adding the following "sixth axiom":   
~Va
 
231
 
It doesn't jibe with what we believe about addition. But it is one possible extension of   
TNT, as we have so far formulated TNT. The system which uses this as its sixth axiom is   
a consistent system, in the sense of not has, two theorems of the form x and - x. However,   
when you juxtapose this "sixth axiom" with the pyramidal family of theorems shown   
above, you will probably be bothered by a seeming inconsistency between the family and   
the new axiom. But this kind of inconsistency is riot so damaging as the other kind   
(where x and x are both theorems)  
. In fact, it is not a true inconsistency, because there is a way of interpreting the symbols so that everything comes out all right.
 
231
 
This kind of inconsistency, created by the opposition of (1) a pyramidal family of   
theorems which collectively assert that all natural numbers have some property, and (2) a   
single theorem which seems to assert that not all numbers have it, is given the name of w- inconsistency. An w-inconsistent system is more like the at-the-outset-distasteful-but-in-the-end-accept non-Euclidean geometry. In order to form a mental model of what is   
going on, you have to imagine that there are some "extra", unsuspected numbers--let us   
not call them "natural", but supernatural numbers-which have no numerals. Therefore,   
facts about them cannot be represented in the pyramidal family. (This is a little bit like   
Achilles' conception GOD-as a sort of "superdjinn", a bein  
g greater than any of the djinn This was scoffed at by the Genie, but it is a reasonable image, and may I you to imagine   
supernatural numbers.)
 
231
 
What this tells us is that the axioms a  
nd rules of TNT, as so presented, do not fully pin down the interpretations for the symbol TNT. There is still room for variation in   
one's mental model of the notions they stand for. Each of the various possible extensions   
would pin d, some of the notions further; but in different ways. Which symbols we begin   
to take on "distasteful" passive meanings, if we added the "s axiom" given above? Would   
all of the symbols become tainted, or we some of them still mean what we want them to   
mean? I will let you tt about that. We will encounter a similar question in Chapter XIV,   
discuss the matter then. In any case, we will not follow this extension r but instead go on   
to try to repair the w-incompleteness of TNT.
 
231
 
T  
he Last Rule The problem with the "Rule of   
All" was that it required knowing that all lines of an infinite pyramidal family are theorems -- too much for a finite being. But suppose that   
each line of the pyramid can be derived from its predecessor in a patterned   
way. Then there would be a f inite reason accounting for the fact that all the strings in the pyramid   
are theorems. The trick then, is to find the pattern th
 
232
 
pattern is a theorem in itself. That is like proving that each djinn passes a message to its   
meta, as in the children's game of "Telephone". T  
he other thing left to show is that Genie starts the cascading message-that is, to establish that the first line of the pyramid is a   
theorem. Then you know that GOD will get the message!
 
232
 
In the particular pyramid we were looking a  
t, there is a pattern, captured by lines 4  
-9 of the derivation below. (1) Va:Vb:(a+Sb)=S(a+b) axiom 3   
(2) Vb:(O+Sb)=S(O+b)    
specification (3) (O+Sb)=S(O+b)    
specification (4) [    
push (5) (O+b)=b premise   
(6) S(O+b)=Sb add S   
(7) (O+Sb)=S(O+b) carry over line 3   
(8) (O+Sb)=Sb    
transitivity (9) ]    
pop The premise is (O+b)=b; the outcome is (  
O+Sb)=Sb. The first line of the pyramid is also a theorem; it follows directly from Axiom 2.   
All we need now is a rule which lets us deduce that the string which summarizes the   
entire pyramid is itself a theorem. Such a rule will he a formalized statement of the fifth   
P
 
232
 
To express that rule, we need a little notation. Let us abbreviate a well-formed   
formula in which the variable a is free by the following notation:   
X  
{a} (There may be other free variables, too, but that is irrelevant.) Then the notation X  
{Sa/a} will stand for that string but with every occurrence of a replaced by Sa. Likewise, X  
{0/a} would stand for the same string, with each appearance of a replaced by 0.  
A specific example would be to let X{a}   
stand for the string in question: (O+a)=a. Then X{Sa/a} would represent the string (O+Sa)=Sa, and X{0/a}   
would represent (0+0)=0. (Warning: This notation is not part of TNT;  
it is for our convenience in talking about TNT.  
) With this new notation, we can state the last rule of TNT   
quite precisely: RULE OF INDUCTION: Suppose u is a variable, and X{u} is a well-formed formula in   
which u occurs free. If both Vu:< X{u}⊃ X{Su/u}> and X{0/u} are theorems,   
then Vu: X{u} is also a theorem.
 
232
 
This is about as close as we can come to putting Peano's fifth postulate into TNT.  
Now let us use it to show that Va:(O+a)=a is indeed a theorem in TNT. Emerging from the   
f  
antasy in our derivation above, we can apply the fantasy rule, to give us (10) <(O+b)=b⊃(O+Sb)=Sb> fantasy rule   
(11) Vb:<(O+b)=b⊃(O+Sb)=Sb> generalizatio
 
233
 
This is the first of the two input theorems required by the induction The other   
requirement is the first line of the pyramid, w  
hich we have. Therefore, we can apply the r  
ule of induction, to deduce what we wanted. `Vb  
:(O+b)=b Specification and generalization will allow us to change the variable from b to a; thus   
Va:(O+a)=a is no longer an undecidable string of TNT.
 
233
 
Now I wish to present one longer derivation in   
TNT, so that you ca what one is like, and also because it proves a significant, if simple, fact of number theory.   
(1) Va: Vb:(a+Sb)=S(a+b) axiom 3   
(2) Vb:(d+Sb)=S(d+b)    
specification (3) (d+SSc)=S(d+Sc)    
specificatic (4) b:(Sd+Sb)=S(Sd+b) specification (line 1)   
(5) (Sd+Sc)-S(Sd+c)    
specification 6) S(Sd+c)=(Sd+Sc) symmetry   
(7) [ push   
(8) Vd:(d+Sc)=(Sd+c) premise   
(9) (d+Sc)=(Sd+c) specification   
(10) S(d+Sc)=S(Sd+c) add S   
(11) (d+SSc)=S(d+Sc) carry over 3   
(12) (d+SSc)=S(Sd+c) transitivity   
(13) S(Sd+c)=(Sd+Sc) carry over 6   
(14) (d+SSc)=(Sd+Sc) transitivity   
(15) Vd:(d+SSc)=(Sd+Sc) generalization   
(16) ]    
pop (17) <Vd:(d+5c)=(Sd+c)⊃Vd:(d+SSc)=(Sd+Sc)> f  
antasy rule (18) Vc:< Vd:(d+Sc)=(Sd+c) ⊃Vd:(d+SSc)=(Sd+Sc)> generalization   
*  
* * * * (19) (d+S0)=5(d+0) specification (line 2)   
(20) Va:(a+0)=a axiom 1   
(21) (d+0)=d    
specification (22) S(d+0)=Sd add S   
(23) (d+SO)=Sd transitivity (lines 19,2)   
(24) (Sd+O)=Sd specification (line 20)   
(25) Sd=(Sd+O) symmetry
 
234
 
(26) (d+SO)=(Sd+o) transitivity (lines 23,25)   
(27) Vd:(d+5O)=(Sd+O) generalization   
*  
* * * * (28) Vc: Vd:(d+Sc)=(Sd+c) induction (lines 18,27)   
[S   
can be slipped back and forth in an addition] *  
* * * * (29) Vb:(c+Sb)=S(c+b) specification (line 1)   
(30) (c+Sd)=S(c+d)    
specification (31) Vb:(d+Sb)=S(d+b) specification (line 1)   
(32) (d+Sc)=S(d+c)    
specification (33) S(d+c)=(d+Sc) symmetry   
(34) bed:(d+Sc)=(Sd+c) specification (line 28)   
(35) (d+Sc)=(Sd+c)    
specification (36) [    
push (37) Vc:(c+d)=(d+c) premise   
(38) (c+d)=(d+c) specification   
(39) S(c+d)=S(d+c) add S   
(40) (c+Sd)=S(c+d) carry over 30   
(41) (c+Sd)=S(d+c)    
transitivity (42) S(d+c)=(d+Sc) carry over 33   
(43) (c+Sd)=(d+Sc)    
transitivity (44) (d+Sc)=(Sd+c) carry over 35   
(45) (c+Sd)=(Sd+c)    
transitivity (46) Vc:(c+Sd)=(Sd+c)    
generalization (47) ]    
pop (48) <Ve:(c+d)=(d+c) ⊃Vc:(c+Sd)=(Sd+c)> fantasy rule   
(49) Vd:< Vc:(c+d)=(d+c) ⊃Vc:(c+Sd)=(Sd+c)>    
generalization [If d commutes with every c, then Sd   
does too. *  
* * * * (50) (c+O)=c specification (line 20)   
(51) Va:(O+a)=a previous theorem   
(52) (O+c)=c specification   
(53) c=(O+c) symmetry
 
235
 
(54) (c+0)=(0+c) transitivity (lines 50,53)   
(55) Vc:(c+0)=(O+c)    
generalization [0 commutes with every c.  
] (56) Vd: Vc:(c+d)=(d+c) induction (lines 49,55)   
[Therefore, every d commutes with every c.
 
235
 
TNT has proven the commutativity of addition. Even if you do not follow this derivation   
in detail, it is important to realize that, like a piece of music, it has its own natural   
"rhythm". It is not just a random walk that   
happens to have landed on the desired last line. I have inserted "breathing marks” to show some of the "phrasing" of this derivation.   
Line 28 in particular turning point in the derivation, something like the halfway point it   
AABB type of piece, where you resolve momenta  
rily, even if not in the t key. Such important intermediate stages are often called "lemmas"
 
235
 
It is easy to imagine a reader starting at l  
ine 1 of this derivation ignorant of where it is to end up, and getting a sense of where i  
t is going as he sees each new line. This would set up an inner tension, very much the tension in a piece of music caused by chord   
progressions that let know what the tonality i  
s, without resolving. Arrival at line 28 w, confirm the reader's intuition and give him a momentary feeling of satisfaction while at   
the same time strengthening his drive to progress tow what he presumes is the true goal.
 
235
 
Now line 49 is a critically important tension-increaser, because of "almost-there"   
feeling which it induces. It would be extremely unsatisfactory to leave off there! From   
there on, it is almost predictable how things must go. But you wouldn't want a piece of   
music to quit on you just when had made the mode of resolution apparent. You don't   
want to imagine ending-you want to hear   
the ending. Likewise here, we have to c things through. Line 55 is inevitable, and sets up all t  
he final tension which are resolved by Line 5
 
235
 
This is typical of the structure not only of formal derivations, but of informal   
proofs. The mathematician's sense of tension is intimately related to his sense of beauty,   
and is what makes mathematics worthy doing. Notice, however, that in TNT   
itself, there seems to be no reflection of these tensions. In other words, TNT doesn't formalize the   
notions of tension and resolution, goal and s  
ubgoal, "naturalness" and "inevitable any more than a piece of music is a book about harmony and rhythm. Could one devise a   
much fancier typographical system which is aware   
of the tensions and goals inside derivations?
 
236
 
I would have preferred to show how to derive Euclid's Theorem (the infinitude of   
primes) in TNT,  
but it would probably have doubled the length of the book. Now after this theorem, the natural direction to go would  
be to prove the associativity of addition, the commutativity and associativity of multiplication and the distributivity of   
multiplication over addition. These would give a powerful base to work from.
 
236
 
As it is now formulated, TNT has reached "critical mass" (perhaps a strange   
metaphor to apply to something called "TNT"). It is of the same strength as the system of   
Principia Mathematica; in TNT one can now prove every theorem which you would find   
in a standard treatise on number theory. Of course, no one would claim that deriving   
theorems in TNT is the best way to do number theory. Anybody who felt that way would   
fall in the same class of people as those wh  
o think that the best way to know what 1000 x 1  
000 is, is to draw a 1000 by 1000 grid, and count all the squares in it ... No; after total formalization, the only way to go is towards relaxation of the formal system. Otherwise,   
it is so enormously unwieldy as to be, for a  
ll practical purposes, useless. Thus, it is important to embed TNT within a wider context, a context which enables new rules of   
inference to be derived, so that derivatio  
ns can be speeded up. This would require formalization of the language in which rules  
of inference are expressed-that is, the metalanguage. And one could go considerably further. However, none of these speeding-up tricks would make TNT any more powerful; they would simply make it more usable.  
The simple fact is that we have put into TNT every mode of thought that number   
theorists rely on. Embedding it in ever larger contexts will not enlarge the space of   
theorems; it will just make working in TNT-or in each "new, improved version"-look   
more like doing conventional number theory.
 
236
 
Suppose that you didn't have advance knowledge that TNT   
will turn out to be incomplete, but rather, expected that it is complete-that is, that every true statement   
expressible in the TNT-notation is a theorem. In that case, you could make a decision   
procedure for all of number theory. The method would be easy: if you want to know if N-statement X is true or false, code it into TNT-sentence x. Now if X is true, completeness   
says that x is a theorem; and conversely, if not-X is true, then completeness says that ~  
x is a theorem. So either x or ~x must be a theorem, since ei  
ther X or not-X is true. Now begin systematically enumerating all the theorems of TNT,  
in the way we did for the MIU-system and pq-system. You must come to x or ~x   
after a while; and whichever one you hit tells you which of X and not-X is true. (Did you follow this argument? It crucially   
depends on your being able to hold separate in your mind the formal system TNT   
and its informal counterpart N. 
 
237
 
ple, if TNT were complete, number theorists would be put out of business any question   
in their field could be resolved, with sufficient time, in a purely mechanical way. As it   
turns out, this is impossible, which, depending o  
n your point of view, is a cause either for rejoicing, or for mourning.
 
237
 
The final question which we will take up in   
this Chapter is whether should have as much faith in the consistency of TNT   
as we did consistency of the Propositional Calculus; and, if we don't, whether possible to increase our faith in TNT, by proving   
it to be consistent could make the same opening statement on the "obviousness" of TNT’  
s consistency as Imprudence did in regard to the Propositional Calculus namely, that each   
rule embodies a reasoning principle which we b  
elieve in, and therefore to question the consistency of TNT is to question our own sanity. To some extent, this argument still   
carries weight-but not quite so much weight as before. There are just too many rules of   
inference and some of them just might be slightly "off ". Furthermore, how do we know   
that this mental model we have of some abstract entities called "natural numbers" is   
actually a coherent construct? Perhaps our own thought processes, those informal   
processes which we have tried to capture in the formal rules of the system, are themselves   
inconsistent! It is of course not the kind of thing we expect, but it gets more and more   
conceivable that our thoughts might lead us astray, the more complex the subject matter   
gets-and natural numbers are by no means a trivial subject matter. Prudence's cry for a   
proof of consistency has to be taken more seriously in this case. It's not that we seriously   
doubt that TNT could be inconsistent but there is a little doubt, a flicker, a glimmer of a   
doubt in our minds, and a proof would help to dispel that doubt.
 
237
 
But what means of proof would we like t  
o see used? Once again, faced with the recurrent question of circularity. If we use all the equipment in a proof about our system   
as we have inserted into it, what will we have accomplished? If we could manage to   
convince ourselves consistency of TNT, but by using a weaker system of reasoning than   
we will have beaten the circularity objection! Think of the way a heavy rope is passed   
between ships (or so I read when I was a kid):  
first a light arrow is fired across the gap, pulling behind it a thin rope. Once a connectio  
n has been established between the two ships this way, then the heavy rope pulled across the gap. If we can use a "light" system   
to show that a system is consistent, then we shall have really accomplished something.
 
237
 
Now on first sight one might think there is a thin rope. Our goal is to prove that   
TNT has a certain typographical property (consistency): that no theorems of the form x  
and .~x ever occur. This is similar to trying to show that MU is not a theorem of the   
MIU-system. Both are statements about typographical properties of symbol-manipulation systems. The visions of a thin rope are based on the presumption that f  
acts ab
 
238
 
needed in proving that such a typographical proper  
ty holds. In other words, if properties of integers are not used-or if only a few extremely simple ones are used-then we could   
achieve the goal of proving TNT consistent, by using means which are weaker than its   
own internal modes of reasoning.
 
238
 
This is the hope which was held by an important school of mathematicians and   
logicians in the early part of this century, l  
ed by David Hilbert. The goal was to prove the consistency of formalizations of number theory similar to TNT by employing a very   
restricted set of principles of reasoning called "finitistic" methods of reasoning. These   
would be the thin rope. Included among finitistic methods are all of propositional   
reasoning, as embodied in the Propositional Calculus, and additionally some kinds of   
numerical reasoning. But Gödel’s work showed tha  
t any effort to pull the heavy rope of TNT's consistency across the gap by using the thin rope of finitistic methods is doomed   
to failure. Gödel showed that in order to pull the heavy rope across the gap, you can't use   
a lighter rope; there just isn't a strong enough one. Less metaphorically, we can say: A  
ny system that is strong enough to prove TNT's consistency is at least as strong as T  
NT itself. And so ci
 
241
 
Achilles: My favorite path to Zen is through the  
short, fascinating and weird Zen parables c  
alled "koans". Tortoise: What is a koan?   
Achilles: A koan is a story about Zen masters and their student times it is like a riddle;   
other times like a fable; and other ti nothing you've ever heard before.
 
241
 
T  
ortoise: I would like to hear a koan or two. Achilles: And I would like to tell you one-or a few. Perhaps begin with the most famous   
one of all. Many centuries ago, the Zen master named Joshu, who lived to be 119   
y  
ears old. Tortoise: A mere youngster!   
Achilles: By your standards, yes. Now one day while Joshu and monk were standing   
together in the monastery, a dog wand The monk asked Joshu, "Does a dog have   
B  
uddha-nature, Tortoise: Whatever that is. So tell me-what did Joshu reply?   
Achilles: 'MU'.   
Tortoise: 'MU? What is this 'MU'? What about the dog? What about Buddha-nature?   
What's the answer?   
Achilles: Oh, but 'MU' is Joshu's answer. By saying 'MU', Joshu let the other monk know   
that only by not asking such questions can one know the answer to them.   
T  
ortoise: Joshu "unasked" the question. A
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
---
 
328
 
CHAPTER X  
Levels of Description,  
and Computer Systems
 
328
 
Levels of Description
 
328
 
Godel's string G, and a Bach fugue: they both have the property that  
they can be understood on different levels. We are all familiar with this  
kind of thing; and yet in some cases it confuses us, while in others we  
handle it without any difficulty at all. For example, we all know that we  
human beings are composed of an enormous number of cells (around  
twenty-five trillion), and therefore that everything we do could in principle  
be described in terms of cells. Or it could even be described on the level of  
molecules. Most of us accept this in a rather matter-of-fact way; we go to  
the doctor, who looks at us on lower levels than we think of ourselves. We  
read about DNA and "genetic engineering" and sip our coffee. We seem to  
have reconciled these two inconceivably different pictures of ourselves  
simply by disconnecting them from each other. We have almost no way to  
relate a microscopic description of ourselves to that which we feel ourselves  
to be, and hence it is possible to store separate representations of ourselves  
in quite separate "compartments" of our minds. Seldom do we have to flip  
back and forth between these two concepts of ourselves, wondering "How  
can these two totally different things be the same me?"
 
328
 
Or take a sequence of images on a television screen which shows  
Shirley MacLaine laughing. When we watch that sequence, we know that  
we are actually looking not at a woman, but at sets of flickering dots on a flat  
surface. We know it, but it is the furthest thing from our mind. We have  
these two wildly opposing representations of what is on the screen, but that  
does not confuse us. We can just shut one out, and pay attention to the  
other—which is what all of us do. Which one is "more real"? It depends on  
whether you're a human, a dog, a computer, or a television set.
 
328
 
Chunking and Chess Skill
 
329
 
the level  
of computer chess did not have any sudden spurt, and surpass human  
experts. In fact, a human expert can quite soundly and confidently trounce  
the best chess programs of this day.
 
329
 
The reason for this had actually been in print for many years. In the  
1940's, the Dutch psychologist Adriaan de Groot made studies of how chess  
novices and chess masters perceive a chess situation. Put in their starkest  
terms, his results imply that chess masters perceive the distribution of  
pieces in chunks. There is a higher-level description of the board than the  
straightforward "white pawn on K5, black rook on Q6" type of description,  
and the master somehow produces such a mental image of the board. This  
was proven by the high speed with which a master could reproduce an  
actual position taken from a game, compared with the novice's plodding  
reconstruction of the position, after both of them had had five-second  
glances at the board. Highly revealing was the fact that masters' mistakes  
involved placing whole groups of pieces in the wrong place, which left the  
game strategically almost the same, but to a novice's eyes, not at all the  
same. The clincher was to do the same experiment but with pieces  
randomly assigned to the squares on the board, instead of copied from actual  
games. The masters were found to be simply no better than the novices in  
reconstructing such random boards.
 
329
 
The conclusion is that in normal chess play, certain types of situation  
recur—certain patterns—and it is to those high-level patterns that the  
master is sensitive. He thinks on a different level from the novice; his set of  
concepts is different. Nearly everyone is surprised to find out that in actual  
play, a master rarely looks ahead any further than a novice does—and  
moreover, a master usually examines only a handful of possible moves!  
The trick is that his mode of perceiving the board is like a filter: he literally  
does not see bad moves when he looks at a chess situation—no more than chess  
amateurs see illegal moves when they look at a chess situation. Anyone who  
has played even a little chess has organized his perception so that diagonal  
rook-moves, forward captures by pawns, and so forth, are never brought to  
mind. Similarly, master-level players have built up higher levels of  
organization in the way they see the board; consequently, to them, bad moves are  
as unlikely to come to mind as illegal moves are, to most people. This might  
be called implicit pruning of the giant branching tree of possibilities. By  
contrast, explicit pruning would involve thinking of a move, and after  
superficial examination, deciding not to pursue examining it any further.
 
329
 
The distinction can apply just as well to other intellectual activities—  
for instance, doing mathematics. A gifted mathematician doesn't usually  
think up and try out all sorts of false pathways to the desired theorem, as  
less gifted people might do; rather, he just "smells" the promising paths,  
and takes them immediately.
 
329
 
Computer chess programs which rely on looking ahead have not been  
taught to think on a higher level; the strategy has just been to use brute  
286  
Levels of Description, and Computer Systemsforce look-ahead, hoping to crush all types of opposition. But it has not  
worked. Perhaps someday, a look-ahead program with enough brute force  
will indeed overcome the best human players—but that will be a small  
intellectual gain, compared to the revelation that intelligence depends  
crucially on the ability to create high-level descriptions of complex arrays,  
such as chess boards, television screens, printed pages, or paintings.
 
330
 
Similar Levels
 
330
 
Usually, we are not required to hold more than one level of understanding  
of a situation in our minds at once. Moreover, the different descriptions of  
a single system are usually so conceptually distant from each other that, as  
was mentioned earlier, there is no problem in maintaining them both; they  
are just maintained in separate mental compartments. What is confusing,  
though, is when a single system admits of two or more descriptions on  
different levels which nevertheless resemble each other in some way. Then  
we find it hard to avoid mixing levels when we think about the system, and  
can easily get totally lost.
 
330
 
Undoubtedly this happens when we think about our own  
psychology—for instance, when we try to understand people's motivations  
for various actions. There are many levels in the human mental  
structure—certainly it is a system which we do not understand very well yet.  
But there are hundreds of rival theories which tell why people act the way  
they do, each theory based on some underlying assumptions about how far  
down in this set of levels various kinds of psychological "forces" are found.  
Since at this time we use pretty much the same kind of language for all  
mental levels, this makes for much level-mixing and most certainly for  
hundreds of wrong theories. For instance, we talk of "drives"—for sex, for  
power, for fame, for love, etc., etc.—without knowing where these drives  
come from in the human mental structure. Without belaboring the point, I  
simply wish to say that our confusion about who we are is certainly related  
to the fact that we consist of a large set of levels, and we use overlapping  
language to describe ourselves on
 
330
 
Computer Systems
 
330
 
There is another place where many levels of description coexist for a single  
system, and where all the levels are conceptually quite close to one another.  
I am referring to computer systems. When a computer program is  
running, it can be viewed on a number of levels. On each level, the description  
is given in the language of computer science, which makes all the  
descriptions similar in some ways to each other—yet there are extremely important  
differences between the views one gets on the different levels. At the lowest  
level, the description can be so complicated that it is like the dot-description  
of a television picture. For some purposes, however, this is by far the most  
important view. At the highest level, the description is greatly chunked, and  
Levels of Description, and Computer Systems  
287takes on a completely different feel, despite the fact that many of the same  
concepts appear on the lowest and highest levels. The chunks on the  
high-level description are like the chess expert's chunks, and like the  
chunked description of the image on the screen: they summarize in capsule  
form a number of things which on lower levels are seen as separate. (See  
Fig. 57.) Now before things become too abstract, let us pass on to the
 
331
 
FIGURE 57. The idea of "chunking": a group of items is reperceived as a single "chunk".  
The chunk's boundary is a little like a cell membrane or a national border: it establishes a  
separate identity for the cluster within. According to context, one may wish to ignore the  
chunk's internal structure or to take it into account.
 
331
 
concrete facts about computers, beginning with a very quick skim of what a  
computer system is like on the lowest level. The lowest level? Well, not  
really, for I am not going to talk about elementary particles—but it is the  
lowest level which we wish to think about.
 
331
 
At the conceptual rock-bottom of a computer, we find a memory, a  
central processing unit (CPU), and some input-output (I/O) devices. Let us first  
describe the memory. It is divided up into distinct physical pieces, called  
words. For the sake of concreteness, let us say there are 65,536 words of  
memory (a typical number, being 2 to the 16th power). A word is further  
divided into what we shall consider the atoms of computer science—bits.  
The number of bits in a typical word might be around thirty-six. Physically,  
a bit is just a magnetic "switch" that can be in either of two positions.
 
332
 
You could call the two positions "up" and "down", or "x" and "o", or "1"  
and "0" . . . The third is the usual convention. It is perfectly fine, but it has  
the possibly misleading effect of making people think that a computer,  
deep down, is storing numbers. This is not true. A set of thirty-six bits does  
not have to be thought of as a number any more than two bits has to be  
thought of as the price of an ice cream cone. Just as money can do various  
things depending on how you use it, so a word in memory can serve many  
functions. Sometimes, to be sure, those thirty-six bits will indeed represent  
a number in binary notation. Other times, they may represent thirty-six  
dots on a television screen. And other times, they may represent a few  
letters of text. How a word in memory is to be thought of depends entirely  
on the role that this word plays in the program which uses it. It may, of  
course, play more than one role—like a note in a canon.
 
332
 
Instructions and Data
 
332
 
There is one interpretation of a word which I haven't yet mentioned, and  
that is as an instruction. The words of memory contain not only data to be  
acted on, but also the program to act on the data. There exists a limited  
repertoire of operations which can be carried out by the central processing  
unit—the CPU—and part of a word, usually its first several bits—is inter-  
pretable as the name of the instruction-type which is to be carried out.  
What do the rest of the bits in a word-interpreted-as-instruction stand for?  
Most often, they tell which other words in memory are to be acted upon. In  
other words, the remaining bits constitute a pointer to some other word (or  
words) in memory. Every word in memory has a distinct location, like a  
house on a street; and its location is called its address. Memory may have one  
"street", or many "streets"—they are called "pages". So a given word is  
addressed by its page number (if memory is paged) together with its  
position within the page. Hence the "pointer" part of an instruction is the  
numerical address of some word(s) in memory. There are no restrictions  
on the pointer, so an instruction may even "point" at itself, so that when it is  
executed, it causes a change in itself to be made.
 
332
 
How does the computer know what instruction to execute at any given  
time? This is kept track of in the CPU. The CPU has a special pointer which  
points at (i.e., stores the address of) the next word which is to be  
interpreted as an instruction. The CPU fetches that word from memory, and copies  
it electronically into a special word belonging to the CPU itself. (Words in  
the CPU are usually not called "words", but rather, registers.) Then the CPU  
executes that instruction. Now the instruction may call for any of a large  
number of types of operations to be carried out. Typical ones include:  
ADD the word pointed to in the instruction, to a register.  
(In this case, the word pointed to is obviously interpreted as a  
number.)  
Levels of Description, and Computer Systems  
289PRINT the word pointed to in the instruction, as letters.  
(In this case, the word is obviously interpreted not as a  
number, but as a string of letters.)  
JUMP to the word pointed to in the instruction.  
(In this case, the CPU is being told to interpret that particular  
word as its next instruction.)  
Unless the instruction explicitly dictates otherwise, the CPU will pick  
up the very next word and interpret it as an instruction. In other words, the  
CPU assumes that it should move down the "street" sequentially, like a  
mailman, interpreting word after word as an instruction. But this  
sequential order can be broken by such instructions as the JUMP instruction, and  
others.
 
333
 
Machine Language vs. Assembly language
 
333
 
This is a very brief sketch of machine language. In this language, the types of  
operations which exist constitute a finite repertoire which cannot be  
extended. Thus all programs, no matter how large and complex, must be  
made out of compounds of these types of instructions. Looking at a  
program written in machine language is vaguely comparable to looking at a  
DNA molecule atom by atom. If you glance back to Fig. 41, showing the  
nucleotide sequence of a DNA molecule—and then if you consider that  
each nucleotide contains two dozen atoms or so—and if you imagine trying  
to write the DNA, atom by atom, for a small virus (not to mention a human  
being!)—then you will get a feeling for what it is like to write a complex  
program in machine language, and what it is like to try to grasp what is  
going on in a program if you have access only to its machine language  
description.
 
333
 
It must be mentioned, however, that computer programming was  
originally done on an even lower level, if possible, than that of machine  
language—namely, connecting wires to each other, so that the proper  
operations were "hard-wired" in. This is so amazingly primitive by modern  
standards that it is painful even to imagine. Yet undoubtedly the people  
who first did it experienced as much exhilaration as the pioneers of  
modern computers ever do . .
 
333
 
We now wish to move to a higher level of the hierarchy of levels of  
description of programs. This is the assembly language level. There is not a  
gigantic spread between assembly language and machine language; indeed,  
the step is rather gentle. In essence, there is a one-to-one correspondence  
between assembly language instructions and machine language  
instructions. The idea of assembly language is to "chunk" the individual machine  
language instructions, so that instead of writing the sequence of bits  
"010111000" when you want an instruction which adds one number to  
another, you simply write ADD, and then instead of giving the address in  
binary representation, you can refer to the word in memory by a name.
 
334
 
Therefore, a program in assembly language is very much like a machine  
language program made legible to humans. You might compare the  
machine language version of a program to a TNT-derivation done in the  
obscure Godel-numbered notation, and the assembly language version to  
the isomorphic TNT-derivation, done in the original TNT-notation, which  
is much easier to understand. Or, going back to the DNA image, we can  
liken the difference between machine language and assembly language to  
the difference between painfully specifying each nucleotide, atom by atom,  
and specifying a nucleotide by simply giving its name (i.e., 'A', 'G', 'C', or  
'T'). There is a tremendous saving of labor in this very simple "chunking"  
operation, although conceptually not much has been changed.
 
334
 
Programs That Translate Programs
 
334
 
Perhaps the central point about assembly language is not its differences  
from machine language, which are not that enormous, but just the key idea  
that programs could be written on a different level at all! Just think about  
it: the hardware is built to "understand" machine language  
programs—sequences of bits—but not letters and decimal numbers. What happens when  
hardware is fed a program in assembly language? It is as if you tried to get a  
cell to accept a piece of paper with the nucleotide sequence written out in  
letters of the alphabet, instead of in chemicals. What can a cell do with a  
piece of paper? What can a computer do with an assembly language  
program?
 
334
 
And here is the vital point: someone can write, in machine language, a  
translation program. This program, called an assembler, accepts mnemonic  
instruction names, decimal numbers, and other convenient abbreviations  
which a programmer can remember easily, and carries out the conversion  
into the monotonous but critical bit-sequences. After the assembly  
language program has been assembled (i.e., translated), it is run—or rather, its  
machine language equivalent is run. But this is a matter of terminology.  
Which level program is running? You can never go wrong if you say that  
the machine language program is running, for hardware is always involved  
when any program runs—but it is also quite reasonable to think of the  
running program in terms of assembly language. For instance, you might  
very well say, "Right now, the CPU is executing a JUMP instruction",  
instead of saying, "Right now, the CPU is executing a '111010000'  
instruction". A pianist who plays the notes G-E-B E-G-B is also playing an  
arpeggio in the chord of E minor. There is no reason to be reluctant about  
describing things from a higher-level point of view. So one can think of the  
assembly language program running concurrently with the machine  
language program. We have two modes of describing what the CPU is doing.
 
335
 
Higher-Level Languages, Compilers, and Interpreters
 
335
 
The next level of the hierarchy carries much further the extremely  
powerful idea of using the computer itself to translate programs from a high level  
into lower levels. After people had programmed in assembly language for a  
number of years, in the early 1950's, they realized that there were a  
number of characteristic structures which kept reappearing in program  
after program. There seemed to be, just as in chess, certain fundamental  
patterns which cropped up naturally when human beings tried to  
formulate algorithms—exact descriptions of processes they wanted carried out. In  
other words, algorithms seemed to have certain higher-level components,  
in terms of which they could be much more easily and esthetically specified  
than in the very restricted machine language, or assembly language.  
Typically, a high-level algorithm component consists not of one or two machine  
language instructions, but of a whole collection of them, not necesssarily all  
contiguous in memory. Such a component could be represented in a  
higher-level language by a single item—a chunk.
 
335
 
Aside from standard chunks—the newly discovered components out  
of which all algorithms can be built—people realized that almost all  
programs contain even larger chunks—superchunks, so to speak. These  
superchunks differ from program to program, depending on the kinds of  
high-level tasks the program is supposed to carry out. We discussed super-  
chunks in Chapter V, calling them by their usual names: "subroutines" and  
"procedures". It was clear that a most powerful addition to any  
programming language would be the ability to define new higher-level entities in  
terms of previously known ones, and then to call them by name. This would  
build the chunking operation right into the language. Instead of there  
being a determinate repertoire of instructions out of which all programs  
had to be explicitly assembled, the programmer could construct his own  
modules, each with its own name, each usable anywhere inside the  
program, just as if it had been a built-in feature of the language. Of course,  
there is no getting away from the fact that down below, on a machine  
language level, everything would still be composed of the same old machine  
language instructions, but that would not be explicitly visible to the high-  
level programmer; it would be implicit.
 
335
 
The new languages based on these ideas were called compiler languages.  
One of the earliest and most elegant was called "Algol", for "Algorithmic  
Language". Unlike the case with assembly language, there is no  
straightforward one-to-one correspondence between statements in Algol  
and machine language instructions. To be sure, there is still a type of  
mapping from Algol into machine language, but it is far more "scrambled"  
than that between assembly language and machine language. Roughly  
speaking, an Algol program is to its machine language translation as a word  
problem in an elementary algebra text is to the equation it translates into.  
(Actually, getting from a word problem to an equation is far more complex,  
but it gives some inkling of the types of "unscrambling" that have to be  
carried out in translating from a high-level language to a lower-level lan-  
292  
Levels of Description, and Computer Systemsguage.) In the mid-1950's, successful programs called compilers were written  
whose function was to carry out the translation from compiler languages to  
machine language.
 
336
 
Also, interpreters were invented. Like compilers, interpreters translate  
from high-level languages into machine language, but instead of  
translating all the statements first and then executing the machine code, they read  
one line and execute it immediately. This has the advantage that a user  
need not have written a complete program to use an interpreter. He may  
invent his program line by line, and test it out as he goes along. Thus, an  
interpreter is to a compiler as a simultaneous interpreter is to a translator  
of a written speech. One of the most important and fascinating of all  
computer languages is LISP (standing for "List Processing"), which was  
invented by John McCarthy around the time Algol was invented.  
Subsequently, LISP has enjoyed great popularity with workers in Artificial  
Intelligence.
 
336
 
There is one interesting difference between the way interpreters work  
and compilers work. A compiler takes input (a finished Algol program, for  
instance) and produces output (a long sequence of machine language  
instructions). At this point, the compiler has done its duty. The output is  
then given to the computer to run. By contrast, the interpreter is constantly  
running while the programmer types in one LISP statement after another,  
and each one gets executed then and there. But this doesn't mean that each  
statement gets first translated, then executed, for then an interpreter  
would be nothing but a line-by-line compiler. Instead, in an interpreter, the  
operations of reading a new line, "understanding" it, and executing it are  
intertwined: they occur simultaneously.
 
336
 
Here is the idea, expanded a little more. Each time a new line of LISP  
is typed in, the interpreter tries to process it. This means that the  
interpreter jolts into action, and certain (machine language) instructions inside it get  
executed. Precisely which ones get executed depends on the LISP statement  
itself, of course. There are many JUMP instructions inside the interpreter,  
so that the new line of LISP may cause control to move around in a  
complex way—forwards, backwards, then forwards again, etc. Thus, each  
LISP statement gets converted into a "pathway" inside the interpreter, and  
the act of following that pathway achieves the desired effect.
 
336
 
Sometimes it is helpful to think of the LISP statements as mere pieces  
of data which are fed sequentially to a constantly running machine  
language program (the LISP interpreter). When you think of things this way,  
you get a different image of the relation between a program written in a  
higher-level language and the machine which is executing it.
 
336
 
Bootstrapping
 
336
 
Of course a compiler, being itself a program, has to be written in some  
language. The first compilers were written in assembly language, rather  
than machine language, thus taking full advantage of the already ac-  
Levels of Description, and Computer Systems  
293complished first step up from machine language. A summary of these  
rather tricky concepts is presented in Figure 58.
 
337
 
FIGURE 58. Assemblers and compilers  
are both translators into machine language.  
This is indicated by the solid lines.  
Moreover, since they are themselves  
programs, they are originally written in a  
language also. The wavy lines indicate that a  
compiler can be written in assembly  
language, and an assembler in machine  
language.
 
337
 
Now as sophistication increased, people realized that a partially written  
compiler could be used to compile extensions of itself. In other words, once  
a certain minimal core of a compiler had been written, then that minimal  
compiler could translate bigger compilers into machine language—which  
in turn could translate yet bigger compilers, until the final, full-blown  
compiler had been compiled. This process is affectionately known as  
"bootstrapping"—for obvious reasons (at least if your native language is  
English it is obvious). It is not so different from the attainment by a child of  
a critical level of fluency in his native language, from which point on his  
vocabulary and fluency can grow by leaps and bounds, since he can use  
language to acquire new language.
 
337
 
Levels on Which to Describe Running Programs
 
337
 
Compiler languages typically do not reflect the structure of the machines  
which will run programs written in them. This is one of their chief  
advantages over the highly specialized assembly and machine languages. Of  
course, when a compiler language program is translated into machine  
language, the resulting program is machine-dependent. Therefore one can  
describe a program which is being executed in a machine-independent way  
or a machine-dependent way. It is like referring to a paragraph in a book  
by its subject matter (publisher-independent), or its page number and  
position on the page (publisher-dependent).
 
337
 
As long as a program is running correctly, it hardly matters how you  
describe it or think of its functioning. It is when something goes wrong that  
294  
Levels of Description, and Computer Systemsit is important to be able to think on different levels. If, for instance, the  
machine is instructed to divide by zero at some stage, it will come to a halt  
and let the user know of this problem, by telling where in the program the  
questionable event occurred. However, the specification is often given on a  
lower level than that in which the programmer wrote the program. Here  
are three parallel descriptions of a program grinding to a halt:  
Machine Language Level:  
"Execution of the program stopped in location  
1110010101110111"  
Assembly Language Level:  
"Execution of the program stopped when the DIV (divide)  
instruction was hit"  
Compiler Language Level:  
"Execution of the program stopped during evaluation of the  
algebraic expression '(A + B)/Z'
 
338
 
One of the greatest problems for systems programmers (the people who  
write compilers, interpreters, assemblers, and other programs to be used by  
many people) is to figure out how to write error-detecting routines in such  
a way that the messages which they feed to the user whose program has a  
"bug" provide high-level, rather than low-level, descriptions of the  
problem. It is an interesting reversal that when something goes wrong in a  
genetic "program" (e.g., a mutation), the "bug" is manifest only to people  
on a high level—namely on the phenotype level, not the genotype level.  
Actually, modern biology uses mutations as one of its principal windows  
onto genetic processes, because of their multilevel traceability.
 
338
 
Microprogramming and Operating Systems
 
338
 
In modern computer systems, there are several other levels of the  
hierarchy. For instance, some systems—often the so-called "microcomputers"—  
come with machine language instructions which are even more  
rudimentary than the instruction to add a number in memory to a number in a  
register. It is up to the user to decide what kinds of ordinary machine-level  
instructions he would like to be able to program in; he "microprograms"  
these instructions in terms of the "micro-instructions" which are available.  
Then the "higher-level machine language" instructions which he has  
designed may be burned into the circuitry and become hard-wired, although  
they need not be. Thus microprogramming allows the user to step a little  
below the conventional machine language level. One of the consequences is  
that a computer of one manufacturer can be hard-wired (via  
microprogramming) so as to have the same machine language instruction set as a  
computer of the same, or even another, manufacturer. The  
microprogrammed computer is said to be "emulating" the other computer.
 
338
 
Then there is the level of the operating system, which fits between the  
Levels of Description, and Computer Systems  
295machine language program and whatever higher level the user is  
programming in. The operating system is itself a program which has the  
functions of shielding the bare machine from access by users (thus  
protecting the system), and also of insulating the programmer from the many  
extremely intricate and messy problems of reading the program, calling a  
translator, running the translated program, directing the output to the  
proper channels at the proper time, and passing control to the next user. If  
there are several users "talking" to the same CPU at once, then the  
operating system is the program that shifts attention from one to the other in  
some orderly fashion. The complexities of operating systems are  
formidable indeed, and I shall only hint at them by the following analogy.
 
339
 
Consider the first telephone system. Alexander Graham Bell could  
phone his assistant in the next room: electronic transmission of a voice!  
Now that is like a bare computer minus operating system: electronic  
computation! Consider now a modern telephone system.
 
339
 
when you  
think of how much flexibility there is, particularly in comparison to the  
erstwhile miracle of a "bare" telephone. Now sophisticated operating  
systems carry out similar traffic-handling and level-switching operations with  
respect to users and their programs. It is virtually certain that there are  
somewhat parallel things which take place in the brain: handling of many  
stimuli at the same time; decisions of what should have priority over what  
and for how long; instantaneous "interrupts" caused by emergencies or  
other unexpected occurrences; and so on.
 
339
 
Cushioning the User and Protecting the System
 
339
 
The many levels in a complex computer system have the combined effect of  
"cushioning" the user, preventing him from having to think about the  
many lower-level goings-on which are most likely totally irrelevant to him  
anyway. A passenger in an airplane does not usually want to be aware of the  
levels of fuel in the tanks, or the wind speeds, or how many chicken dinners  
are to be served, or the status of the rest of the air traffic around the  
destination—this is all left to employees on different levels of the airlines  
hierarchy, and the passenger simply gets from one place to another. Here  
again, it is when something goes wrong—such as his baggage not arriving—  
that the passenger is made aware of the confusing system of levels  
underneath him.
 
344
 
The idea that "you" know all about  
"yourself" is so familiar from interaction with people that it was natural to  
extend it to the computer—after all, it was intelligent enough that it could  
"talk" to them in English! Their question was not unlike asking a person,  
"Why are you making so few red blood cells today?" People do not know  
about that level—the "operating system level"—of their bodies.
 
344
 
The main cause of this level-confusion was that communication with all  
levels of the computer system was taking place on a single screen, on a  
single terminal. Although my friends' naivete might seem rather extreme,  
even experienced computer people often make similar errors when several  
levels of a complex system are all present at once on the same screen. They  
forget "who" they are talking to, and type something which makes no sense  
at that level, although it would have made perfect sense on another level. It  
might seem desirable, therefore, to have the system itself sort out the  
levels—to interpret commands according to what "makes sense".  
Unfortunately, such interpretation would require the system to have a lot of  
common sense, as well as perfect knowledge of the programmer's overall  
intent—both of which would require more artificial intelligence than exists  
at the present time.
 
344
 
The Border between Software and Hardware
 
344
 
One can also be confused by the flexibility of some levels and the rigidity of  
others. For instance, on some computers there are marvelous text-editing  
systems which allow pieces of text to be "poured" from one format into  
another, practically as liquids can be poured from one vessel into another.  
A thin page can turn into a wide page, or vice versa. With such power, you  
might expect that it would be equally trivial to change from one font to  
another—say from roman to italics. Yet there may be only a single font  
available on the screen, so that such changes are impossible. Or it may be  
feasible on the screen but not printable by the printer—or the other way  
around. After dealing with computers for a long time, one gets spoiled, and  
thinks that everything should be programmable: no printer should be so  
rigid as to have only one character set, or even a finite repertoire of  
them—typefaces should be user-specifiable! But once that degree of  
flexibility has been attained, then one may be annoyed that the printer cannot  
print in different colors of ink, or that it cannot accept paper of all shapes  
and sizes, or that it does not fix itself when it breaks . . .
 
340
 
Are Computers Super-Flexible or Super-Rigid?
 
340
 
One of the major goals of the drive to higher levels has always been to make  
as natural as possible the task of communicating to the computer what you  
want it to do. Certainly, the high-level constructs in compiler languages are  
closer to the concepts which humans naturally think in, than are lower-level  
constructs such as those in machine language. But in this drive towards ease  
of communication, one aspect of "naturalness" has been quite neglected.  
That is the fact that interhuman communication is far less rigidly  
constrained than human-machine communication. For instance, we often  
produce meaningless sentence fragments as we search for the best way to  
express something, we cough in the middle of sentences, we interrupt each  
other, we use ambiguous descriptions and "improper" syntax, we coin  
phrases and distort meanings—but our message still gets through, mostly.  
With programming languages, it has generally been the rule that there is a  
very strict syntax which has to be obeyed one hundred per cent of the time;  
there are no ambiguous words or constructions. Interestingly, the printed  
equivalent of coughing (i.e., a nonessential or irrelevant comment) is  
allowed, but only provided it is signaled in advance by a key word (e.g.,  
COMMENT), and then terminated by another key word (e.g., a semicolon).  
This small gesture towards flexibility has its own little pitfall, ironically: if a  
semicolon (or whatever key word is used for terminating a comment) is  
used inside a comment, the translating program will interpret that  
semicolon as signaling the end of the comment, and havoc will ensue.
 
340
 
If a procedure named INSIGHT has been defined and then called  
seventeen times in the program, and the eighteenth time it is misspelled as  
INSIHGT, woe to the programmer. The compiler will balk and print a  
rigidly unsympathetic error message, saying that it has never heard of  
INSIHGT. Often, when such an error is detected by a compiler, the compiler  
tries to continue, but because of its lack of insihgt, it has not understood  
what the programmer meant. In fact, it may very well suppose that  
something entirely different was meant, and proceed under that erroneous  
assumption. Then a long series of error messages will pepper the rest of the  
program, because the compiler—not the programmer—got confused.  
Imagine the chaos that would result if a simultaneous English-Russian  
interpreter, upon hearing one phrase of French in the English, began  
trying to interpret all the remaining English as French. Compilers often get  
lost in such pathetic ways. C'est la vie.
 
340
 
Perhaps this sounds condemnatory of computers, but it is not meant to  
be. In some sense, things had to be that way. When you stop to think what  
most people use computers for, you realize that it is to carry out very  
definite and precise tasks, which are too complex for people to do. If the  
computer is to be reliable, then it is necessary that it should understand,  
without the slightest chance of ambiguity, what it is supposed to do. It is  
also necessary that it should do neither more nor less than it is explicitly  
instructed to do. If there is, in the cushion underneath the programmer, a  
program whose purpose is to "guess" what the programmer wants or  
Levels of Description, and Computer Systems  
297means, then it is quite conceivable that the programmer could try to  
communicate his task and be totally misunderstood. So it is important that  
the high-level program, while comfortable for the human, still should be  
unambiguous and precise.
 
341
 
Second-Guessing the Programmer
 
341
 
Now it is possible to devise a programming language—and a program  
which translates it into the lower levels—which allows some sorts of  
imprecision. One way of putting it would be to say that a translator for such a  
programming language tries to make sense of things which are done  
"outside of the rules of the language". But if a language allows certain  
"transgressions", then transgressions of that type are no longer true  
transgressions, because they have been included inside the rules! If a  
programmer is aware that he may make certain types of misspelling, then he may  
use this feature of the language deliberately, kriowing that he is actually  
operating within the rigid rules of the language, despite appearances. In  
other words, if the user is aware of all the flexibilities programmed into the  
translator for his convenience, then he knows the bounds which he cannot  
overstep, and therefore, to him, the translator still appears rigid and  
inflexible, although it may allow him much more freedom than early  
versions of the language, which did not incorporate "automatic  
compensation for human error".
 
344
 
The trouble is that somewhere, all this flexibility has to "bottom out",  
to use the phrase from Chapter V. There must be a hardware level which  
underlies it all, and which is inflexible. It may lie deeply hidden, and there  
may be so much flexibility on levels above it that few users feel the  
hardware limitations—but it is inevitably there.
 
344
 
What is this proverbial distinction between software and hardware? It is  
the distinction between programs and machines—between long  
complicated sequences of instructions, and the physical machines which carry  
Levels of Description, and Computer Systems  
301them out. I like to think of software as "anything which you could send over  
the telephone lines", and hardware as "anything else". A piano is hardware,  
but printed music is software. A telephone set is hardware, but a telephone  
number is software. The distinction is a useful one, but not always so  
clear-cut.
 
345
 
We humans also have "software" and "hardware" aspects, and the  
difference is second nature to us. We are used to the rigidity of our  
physiology, the fact that we cannot, at will, cure ourselves of diseases, or  
grow hair of any color—to mention just a couple of simple examples. We  
can, however, "reprogram" our minds so that we operate in new conceptual  
frameworks. The amazing flexibility of our minds seems nearly  
irreconcilable with the notion that our brains must be made out of fixed-rule  
hardware, which cannot be reprogrammed. We cannot make our neurons  
fire faster or slower, we cannot rewire our brains, we cannot redesign the  
interior of a neuron, we cannot make any choices about the hardware—and  
yet, we can control how we think.
 
345
 
But there are clearly aspects of thought which are beyond our control.  
We cannot make ourselves smarter by an act of will; we cannot learn a new  
language as fast as we want; we cannot make ourselves think faster than we  
do; we cannot make ourselves think about several things at once; and so on.  
This is a kind of primordial self-knowledge which is so obvious that it is  
hard to see it at all; it is like being conscious that the air is there. We never  
really bother to think about what might cause these "defects" of our minds:  
namely, the organization of our brains. To suggest ways of reconciling the  
software of mind with the hardware of brain is a main goal of this book.
 
341
 
With "rubbery" languages of that type, there would seem to be two  
alternatives: (1) the user is aware of the built-in flexibilities of the language  
and its translator; (2) the user is unaware of them. In the first case, the  
language is still usable for communicating programs precisely, because the  
programmer can predict how the computer will interpret the programs he  
writes in the language. In the second case, the "cushion" has hidden  
features which may do things that are unpredictable (from the vantage  
point of a user who doesn't know the inner workings of the translator).  
This may result in gross misinterpretations of programs, so such a  
language is unsuitable for purposes where computers are used mainly for  
their speed and reliability.
 
341
 
Now there is actually a third alternative: (3) the user is aware of the  
built-in flexibilities of the language and its translator, but there are so many  
of them and they interact with each other in such a complex way that he  
cannot tell how his programs will be interpreted. This may well apply to the  
person who wrote the translating program; he certainly knows its insides as  
well as anyone could—but he still may not be able to anticipate how it will  
react to a given type of unusual construction.
 
341
 
One of the major areas of research in Artificial Intelligence today is  
called automatic programming, which is concerned with the development of  
yet higher-level languages—languages whose translators are sophisticated,  
in that they can do at least some of the following impressive things:  
generalize from examples, correct some misprints or grammatical errors,  
298  
Levels of Description, and Computer Systemstry to make sense of ambiguous descriptions, try to second-guess the user  
by having a primitive user model, ask questions when things are unclear,  
use English itself, etc. The hope is that one can walk the tightrope between  
reliability and flexibility.
 
342
 
AI Advances Are Language Advances
 
342
 
is striking how tight the connection is between progress in computer  
science (particularly Artificial Intelligence) and the development of new  
languages. A clear trend has emerged in the last decade: the trend to  
consolidate new types of discoveries in new languages. One key for the  
understanding and creation of intelligence lies in the constant development  
and refinement of the languages in terms of which processes for symbol  
manipulation are describable. Today, there are probably three or four  
dozen experimental languages which have been developed exclusively for  
Artificial Intelligence research. It is important to realize that any program  
which can be written in one of these languages is in principle  
programmable in lower-level languages, but it would require a supreme effort for a  
human; and the resulting program would be so long that it would exceed  
the grasp of humans. It is not that each higher level extends the potential of  
the computer; the full potential of the computer already exists in its  
machine language instruction set. It is that the new concepts in a high-level  
language suggest directions and perspectives by their very nature.
 
342
 
The "space" of all possible programs is so huge that no one can have a  
sense of what is possible. Each higher-level language is naturally suited for  
exploring certain regions of "program space"; thus the programmer, by  
using that language, is channeled into those areas of program space. He is  
notforced by the language into writing programs of any particular type, but  
the language makes it easy for him to do certain kinds of things. Proximity  
to a concept, and a gentle shove, are often all that is needed for a major  
discovery—and that is the reason for the drive towards languages of ever  
higher levels.
 
342
 
Programming in different languages is like composing pieces in  
different keys, particularly if you work at the keyboard. If you have learned or  
written pieces in many keys, each key will have its own special emotional  
aura. Also, certain kinds of figurations "lie in the hand" in one key but are  
awkward in another. So you are channeled by your choice of key. In some  
ways, even enharmonic keys, such as C-sharp and D-flat, are quite distinct  
in feeling. This shows how a notational system can play a significant role in  
shaping the final product.
 
342
 
A "stratified" picture of AI is shown in Figure 59, with machine  
components such as transistors on the bottom, and "intelligent programs"  
on the top. The picture is taken from the book Artificial Intelligence by  
Patrick Henry Winston, and it represents a vision of AI shared by nearly all  
AI workers. Although I agree with the idea that AI must be stratified in  
some such way, I do not think that, with so few layers, intelligent programs
 
343
 
can be reached. Between the machine language level and the level where  
true intelligence will be reached, I am convinced there will lie perhaps  
another dozen (or even several dozen!) layers, each new layer building on  
and extending the flexibilities of the layer below. What they will be like we  
can hardly dream of now . .
 
343
 
FIGURE 59. To create intelligent  
programs, one needs to build up a series of  
levels of hardware and software, so that one  
is spared the agony of seeing everything only  
on the lowest level. Descriptions of a single  
process on different levels will sound very  
different from each other, only the top one  
being sufficiently chunked that it is  
comprehensible to US. [Adapted from P. H.  
Winston, Artificial Intelligence (Reading,  
Mass.: Addison-Wesley, 1977).]
 
343
 
The Paranoid and the Operating System
 
345
 
Intermediate Levels and the Weather
 
345
 
We have seen that in computer systems, there are a number of rather  
sharply defined strata, in terms of any one of which the operation of a  
running program can be described. Thus there is not merely a single low  
level and a single high level—there are all degrees of lowness and highness.  
Is the existence of intermediate levels a general feature of systems which  
have low and high levels? Consider, for example, the system whose  
"hardware" is the earth's atmosphere (not very hard, but no matter), and  
whose "software" is the weather. Keeping track of the motions of all of the  
molecules simultaneously would be a very low-level way of "understanding"  
the weather, rather like looking at a huge, complicated program on the  
machine language level. Obviously it is way beyond human comprehension.  
But we still have our own peculiarly human ways of looking at, and  
describing, weather phenomena. Our chunked view of the weather is based  
on very high-level phenomena,
 
345
 
All of these phenomena  
involve astronomical numbers of molecules, somehow behaving in concert  
so that large-scale trends emerge. This is a little like looking at the weather  
in a compiler language.
 
346
 
something analogous to looking at the weather in an  
intermediate-level language, such as assembly language? For instance, are  
there very small local "mini-storms", something like the small whirlwinds  
which one occasionally sees, whipping up some dust in a swirling column a  
few feet wide, at most? Is a local gust of wind an intermediate-level chunk  
which plays a role in creating higher-level weather phenomena? Or is there  
just no practical way of combining knowledge of such kinds of phenomena  
to create a more comprehensive explanation of the weather?
 
“Is there” is that the beginning
 
346
 
Two other questions come to my mind. The first is: "Could it be that  
the weather phenomena which we perceive on our scale—a tornado, a  
drought—are just intermediate-level phenomena: parts of vaster, slower  
phenomena?" If so, then true high-level weather phenomena would be  
global, and their time scale would be geological. The Ice Age would be a  
high-level weather event. The second question is: "Are there intermediate-  
level weather phenomena which have so far escaped human perception,  
but which, if perceived, could give greater insight into why the weather is as  
it is?"
 
346
 
From Tornados to Quarks
 
346
 
This last suggestion may sound fanciful, but it is not all that far-fetched. We  
need only look to the hardest of the hard sciences—physics—to find  
peculiar examples of systems which are explained in terms of interacting "parts"  
which are themselves invisible. In physics, as in any other discipline, a system  
is a group of interacting parts. In most systems that we know, the parts  
retain their identities during the interaction, so that we still see the parts  
inside the system. For example, when a team of football players assembles,  
the individual players retain their separateness—they do not melt into  
some composite entity, in which their individuality is lost. Still—and this is  
important—some processes are going on in their brains which are evoked  
by the team-context, and which would not go on otherwise, so that in a  
minor way, the players change identity when they become part of the larger  
system, the team. This kind of system is called a nearly decomposable system  
(the term comes from H. A. Simon's article "The Architecture of  
Complexity"; see the Bibliography). Such a system consists of weakly interacting  
modules, each of which maintains its own private identity throughout the  
interaction but by becoming slightly different from how it is when outside  
of the system, contributes to the cohesive behavior of the whole system.  
The systems studied in physics are usually of this type. For instance, an  
atom is seen as made of a nucleus whose positive charge captures a number  
of electrons in "orbits", or bound states. The bound electrons are very  
much like free electrons, despite their being internal to a composite object.
 
346
 
Some systems studied in physics offer a contrast to the relatively  
straightforward atom. Such systems involve extremely strong interactions,  
as a result of which the parts are swallowed up into the larger system, and  
lose some or all of their individuality. An example of this is the nucleus of  
an atom, which is usually described as being "a collection of protons and  
Levels of Description, and Computer Systems  
303neutrons". But the forces which pull the component particles together are  
so strong that the component particles do not survive in anything like their  
"free" form (the form they have when outside a nucleus). And in fact a  
nucleus acts in many ways as a single particle, rather than as a collection of  
interacting particles. When a nucleus is split, protons and neutrons are  
often released, but also other particles, such as pi-mesons and gamma rays,  
are commonly produced. Are all those different particles physically present  
inside a nucleus before it is split, or are they just "sparks" which fly off  
when the nucleus is split? It is perhaps not meaningful to try to give an  
answer to such a question. On the level of particle physics, the difference  
between storing the potential to make "sparks" and storing actual subparti-  
cles is not so
 
Last word is “clear”
 
347
 
A nucleus is thus one system whose "parts", even though they are not  
visible while on the inside, can be pulled out and made visible. However,  
there are more pathological cases, such as the proton and neutron seen as  
systems themselves. Each of them has been hypothesized to be constituted  
from a trio of "quarks"—hypothetical particles which can be combined in  
twos or threes to make many known fundamental particles. However, the  
interaction between quarks is so strong that not only can they not be seen  
inside the proton and neutron, but they cannot even be pulled out at all!  
Thus, although quarks help to give a theoretical understanding of certain  
properties of protons and neutrons, their own existence may perhaps  
never be independently established. Here we have the antithesis of a  
"nearly decomposable system"—it is a system which, if anything, is "nearly  
indecomposable". Yet what is curious is that a quark-based theory of  
protons and neutrons (and other particles) has considerable explanatory  
power, in that many experimental results concerning the particles which  
quarks supposedly compose can be accounted for quite well, quantitatively,  
by using the "quark model".
 
347
 
Superconductivity: A "Paradox" of Renormalization
 
347
 
Chapter V we discussed how renormalized particles emerge from their  
bare cores, by recursively compounded interactions with virtual particles. A  
renormalized particle can be seen either as this complex mathematical  
construct, or as the single lump which it is, physically. One of the strangest  
and most dramatic consequences of this way of describing particles is the  
explanation it provides for the famous phenomenon of superconductivity:  
resistance-free flow of electrons in certain solids, at extremely low  
temperatures.
 
347
 
It turns out that electrons in solids are renormalized by their  
interactions with strange quanta of vibration called phonons (themselves  
renormalized as well!). These renormalized electrons are calledpolarons.  
Calculation shows that at very low temperatures, two oppositely spinning polarons  
will begin to attract each other, and can actually become bound together in  
a certain way. Under the proper conditions, all the current-carrying polar-  
304  
Levels of Description, and Computer Systemsons will get paired up, forming Cooper pairs. Ironically, this pairing comes  
about precisely because electrons—the bare cores of the paired polarons—  
repel each other electrically. In contrast to the electrons, each Cooper pair  
feels neither attracted to nor repelled by any other Cooper pair, and  
consequently it can can slip freely through a metal as if the metal were a  
vacuum. If you convert the mathematical description of such a metal from  
one whose primitive units are polarons into one whose primitive units are  
Cooper pairs, you get a considerably simplified set of equations. This  
mathematical simplicity is the physicist's way of knowing that "chunking"  
into Cooper pairs is the natural way to look at superconductivity.
 
348
 
Here we have several levels of particle: the Cooper pair itself; the two  
oppositely spinning polarons which compose it; the electrons and phonons  
which make up the polarons; and then, within the electrons, the virtual  
photons and positrons, etc. etc. We can look at each level and perceive  
phenomena there, which are explained by an understanding of the levels  
below.
 
348
 
"Sealing-off"
 
348
 
Similarly, and fortunately, one does not have to know all about quarks to  
understand many things about the particles which they may compose.  
Thus, a nuclear physicist can proceed with theories of nuclei that are based  
on protons and neutrons, and ignore quark theories and their rivals. The  
nuclear physicist has a chunked picture of protons and neutrons—a  
description derived from lower-level theories but' which does not require  
understanding the lower-level theories. Likewise, an atomic physicist has a  
chunked picture of an atomic nucleus derived from nuclear theory. Then a  
chemist has a chunked picture of the electrons and their orbits, and builds  
theories of small molecules, theories which can be taken over in a chunked  
way by the molecular biologist, who has an intuition for how small  
molecules hang together, but whose technical expertise is in the field of  
extremely large molecules and how they interact. Then the cell biologist  
has a chunked picture of the units which the molecular biologist pores over,  
and tries to use them to account for the ways that cells interact. The point is  
clear. Each level is, in some sense, "sealed off" from the levels below it.  
This is another of Simon's vivid terms, recalling the way in which a  
submarine is built in compartments, so that if one part is damaged, and water  
begins pouring in, the trouble can be prevented from spreading, by closing  
the doors, thereby sealing off the damaged compartment from  
neighboring compartments.
 
348
 
Although there is always some "leakage" between the hierarchical  
levels of science, so that a chemist cannot afford to ignore lower-level  
physics totally, or a biologist to ignore chemistry totally, there is almost no  
leakage from one level to a distant level. That is why people can have  
intuitive understandings of other people without necessarily  
understanding the quark model, the structure of nuclei, the nature of electron orbits,  
Levels of Description, and Computer Systems  
305the chemical bond, the structure of proteins, the organelles in a cell, the  
methods of intercellular communication, the physiology of the various  
organs of the human body, or the complex interactions among organs. All  
that a person needs is a chunked model of how the highest level acts; and as  
we all know, such models are very realistic and successful.
 
349
 
The Trade-off between Chunking and Determinism
 
349
 
There is, however, perhaps one significant negative feature of a chunked  
model: it usually does not have exact predictive power. That is, we save  
ourselves from the impossible task of seeing people as collections of quarks  
(or whatever is at the lowest level) by using chunked models; but of course  
such models only give us probabilistic estimates of how other people feel,  
will react to what we say or do, and so on. In short, in using chunked  
high-level models, we sacrifice determinism for simplicity. Despite not  
being sure how people will react to a joke, we tell it with the expectation  
that they will do something such as laugh, or not laugh—rather than, say,  
climb the nearest flagpole. (Zen masters might well do the latter!) A  
chunked model defines a "space" within which behavior is expected to fall,  
and specifies probabilities of its falling in different parts of that space.
 
349
 
"Computers Can Only Do What You Tell Them to Do"
 
349
 
Now these ideas can be applied as well to computer programs as to  
composite physical systems. There is an old saw which says, "Computers can only  
do what you tell them to do." This is right in one sense, but it misses the  
point: you don't know in advance the consequences of what you tell a  
computer to do; therefore its behavior can be as baffling and surprising  
and unpredictable to you as that of a person. You generally know in  
advance the space in which the output will fall, but you don't know details of  
where it will fall. For instance, you might write a program to calculate the  
first million digits of tt. Your program will spew forth digits of it much  
faster than you can—but there is no paradox in the fact that the computer  
is outracing its programmer. You know in advance the space in which the  
output will lie—namely the space of digits between 0 and 9—which is to say,  
you have a chunked model of the program's behavior; but if you'd known  
the rest, you wouldn't have written the program.
 
349
 
There is another sense in which this old saw is rusty. This involves the  
fact that as you program in ever higher-level languages, you know less and  
less precisely what you've told the computer to do! Layers and layers of  
translation may separate the "front end" of a complex program from the  
actual machine language instructions. At the level you think and program,  
your statements may resemble declaratives and suggestions more than they  
resemble imperatives or commands. And all the internal rumbling  
provoked by the input of a high-level statement is invisible to you, generally,  
just as when you eat a sandwich, you are spared conscious awareness of the  
digestive processes that it triggers.
 
350
 
In any case, this notion that "computers can only do what they are told  
to do," first propounded by Lady Lovelace in her famous memoir, is so  
prevalent and so connected with the notion that "computers cannot think"  
that we shall return to it in later Chapters when our level of sophistication is  
greater.
 
350
 
Two Types of System
 
350
 
There is an important division between two types of system built up from  
many parts. There are those systems in which the behavior of some parts  
tends to cancel out the behavior of other parts, with the result that it does  
not matter too much what happens on the low level, because most anything  
will yield similar high-level behavior. An example of this kind of system is a  
container of gas, where all the molecules bump and bang against each other  
in very complex microscopic ways; but the total outcome, from a  
macroscopic point of view, is a very calm, stable system with a certain  
temperature, pressure, and volume. Then there are systems where the effect of a  
single low-level event may get magnified into an enormous- high-level  
consequence. Such a system is a pinball machine, where the exact angle with  
which a ball strikes each post is crucial in determining the rest of its  
descending pathway.
 
350
 
computer is an elaborate combination of these two types of system. It  
contains subunits such as wires, which behave in a highly predictable  
fashion: they conduct electricity according to Ohm's law, a very precise,  
chunked law which resembles the laws governing gases in containers, since  
it depends on statistical effects in which billions of random effects cancel  
each other out, yielding a predictable overall behavior. A computer also  
contains macroscopic subunits, such as a printer, whose behavior is  
completely determined by delicate patterns of currents. What the printer prints  
is not by any means created by a myriad canceling microscopic effects. In  
fact, in the case of most computer programs, the value of every single bit in  
the program plays a critical role in the output that gets printed. If any bit  
were changed, the output would also change drastically.
 
350
 
Systems which are made up of "reliable" subsystems only—that is,  
subsystems whose behavior can be reliably predicted from chunked  
descriptions—play inestimably important roles in our daily lives, because  
they are pillars of stability. We can rely on walls not to fall down, on  
sidewalks to go where they went yesterday, on the sun to shine, on clocks to  
tell the time correctly, and so on. Chunked models of such systems are  
virtually entirely deterministic. Of course, the other kind of system which  
plays a very large role in our lives is a system that has variable behavior  
which depends on some internal microscopic parameters—often a very  
large number of them, moreover—which we cannot directly observe. Our  
chunked model of such a system is necessarily in terms of the "space" of  
operation, and involves probabilistic estimates of landing in different  
regions of that space.
 
350
 
A container of gas, which, as I already pointed out, is a reliable system  
Levels of Description, and Computer Systems  
307because of many canceling effects, obeys precise, deterministic laws of  
physics. Such laws are chunked laws, in that they deal with the gas as a whole,  
and ignore its constituents. Furthermore, the microscopic and macroscopic  
descriptions of a gas use entirely different terms. The former requires the  
specification of the position and velocity of every single component  
molecule; the latter requires only the specification of three new quantities:  
temperature, pressure, and volume, the first two of which do not even have  
microscopic counterparts. The simple mathematical relationship which  
relates these three parameters—pV — cT, where c is a constant—is a law  
which depends on, yet is independent of, the lower-level phenomena. Less  
paradoxically, this law can be derived from the laws governing the  
molecular level; in that sense it depends on the lower level. On the other hand, it is  
a law which allows you to ignore the lower level completely, if you wish; in  
that sense it is independent of the lower level.
 
351
 
It is important to realize that the high-level law cannot be stated in the  
vocabulary of the low-level description. "Pressure" and "temperature" are  
new terms which experience with the low level alone cannot convey. We  
humans perceive temperature and pressure directly; that is how we are  
built, so that it is not amazing that we should have found this law. But  
creatures which knew gases only as theoretical mathematical constructs  
would have to have an ability to synthesize new concepts, if they were to  
discover this law.
 
351
 
Epiphenomena
 
351
 
drawing this Chapter to a close, I would like to relate a story about a  
complex system. I was talking one day with two systems programmers for  
the computer I was using. They mentioned that the operating system  
seemed to be able to handle up to about thirty-five users with great  
comfort, but at about thirty-five users or so, the response time all of a sudden  
shot up, getting so slow that you might as well log off and go home and wait  
until later. Jokingly I said, "Well, that's simple to fix—just find the place in  
the operating system where the number '35' is stored, and change it to  
'60'!" Everyone laughed. The point is, of course, that there is no such place.  
Where, then, does the critical number—35 users—come from? The answer  
is: It is a visible consequence of the overall system organization—an "epiphenome-  
non".
 
351
 
Similarly, you might ask about a sprinter, "Where is the '9.3' stored,  
that makes him be able to run 100 yards in 9.3 seconds?" Obviously, it is not  
stored anywhere. His time is a result of how he is built, what his reaction  
time is, a million factors all interacting when he runs. The time is quite  
reproducible, but it is not stored in his body anywhere. It is spread around  
among all the cells of his body and only manifests itself in the act of the  
sprint itself.
 
351
 
Epiphenomena abound. In the game of "Go", there is the feature that  
"two eyes live". It is not built into the rules, but it is a consequence of the  
308  
Levels of Description, and Computer Systemsrules. In the human brain, there is gullibility. How gullible are you? Is your  
gullibility located in some "gullibility center" in your brain? Could a  
neurosurgeon reach in and perform some delicate operation to lower your  
gullibility, otherwise leaving you alone? If you believe this, you are pretty  
gullible, and should perhaps consider such an operation.
 
352
 
Mind vs. Brain
 
352
 
coming Chapters, where we discuss the brain, we shall examine whether  
the brain's top level—the mind—can be understood without understanding  
the lower levels on which it both depends and does not depend. Are there  
laws of thinking which are "sealed off" from the lower laws that govern the  
microscopic activity in the cells of the brain? Can mind be "skimmed" off of  
brain and transplanted into other systems? Or is it impossible to unravel  
thinking processes into neat and modular subsystems? Is the brain more  
like an atom, a renormalized electron, a nucleus, a neutron, or a quark? Is  
consciousness an epiphenomenon? To understand the mind, must one go  
all the way down to the level of nerve cells?
 
355
 
there are indeed three "holism'"s, but each one of them is composed  
out of smaller copies of the word "reductionism". And in  
complementary fashion, in the right-hand piece, there is indeed one  
"reductionism", but it is composed out of smaller copies of the word "holism".  
Now this is all fine and good, but in your silly squabble, the two of you  
have actually missed the forest for the trees. You see, what good is it to  
argue about whether "holism" or "reductionism" is right, when the  
proper way to understand the matter is to transcend the question, by  
answering "mu"?
 
355
 
Crab: holism is the most natural thing in the world to grasp. It's simply the  
belief that "the whole is greater than the sum of its parts". No one in  
his right mind could reject holism.  
Anteater: reductionism is the most natural thing in the world to grasp. It's  
simply the belief that "a whole can be understood completely if you  
understand its parts, and the nature of their 'sum'". No one in her  
left brain could reject reductionism.
 
355
 
Crab: I reject reductionism. I challenge you to tell me, for instance, how to  
understand a brain reductionistically. Any reductionistic explanation  
of a brain will inevitably fall far short of explaining where the  
consciousness experienced by a brain arises from.  
Anteater: I reject holism. I challenge you to tell me, for instance, how a  
holistic description of an ant colony sheds any more light on it than is  
shed by a description of the ants inside it, and their roles, and their  
interrelationships. Any holistic explanation of an ant colony will  
inevitably fall far short of explaining where the consciousness experienced  
by an ant colony arises from.
 
355
 
Achilles: Oh, no! The last thing which I wanted to do was to provoke  
another argument. Anyway, now that I understand the controversy, I  
believe that my explanation of "mu" will help greatly. You see, "mu" is  
an ancient Zen answer which, when given to a question, UNASKS the  
question. Here, the question seems to be, "Should the world be  
understood via holism, or via reductionism?" And the answer of "mu" here  
rejects the premises of the question, which are that one or the other  
must be chosen. By unasking the question, it reveals a wider truth: that  
there is a larger context into which both holistic and reductionistic  
explanations fit.
 
380
 
CHAPTER XI  
Brains and Thoughts
 
380
 
New Perspectives on Thought
 
380
 
was only with the advent of computers that people actually tried to  
create "thinking" machines, and witnessed bizarre variations on the theme  
of thought. Programs were devised whose "thinking" was to human  
thinking as a slinky flipping end over end down a staircase is to human  
locomotion. All of a sudden the idiosyncracies, the weaknesses and powers, the  
vagaries and vicissitudes of human thought were hinted at by the  
newfound ability to experiment with alien, yet hand-tailored forms of  
thought—or approximations of thought. As a result, we have acquired, in  
the last twenty years or so, a new kind of perspective on what thought is,  
and what it is not. Meanwhile, brain researchers have found out much  
about the small-scale and large-scale hardware of the brain. This approach  
has not yet been able to shed much light on how the brain manipulates  
concepts, but it gives us some ideas about the biological mechanisms on  
which thought manipulation rests.  
In the coming two Chapters, then, we will try to unite some insights  
gleaned from attempts at computer intelligence with some of the facts  
learned from ingenious experiments on living animal brains, as well as with  
results from research on human thought processes done by cognitive  
psychologists. The stage has been set by the Prelude, Ant Fugue; now we  
develop the ideas more deeply.
 
380
 
Intensionality and Extensionality
 
380
 
Thought must depend on representing reality in the hardware of the brain. In  
the preceding Chapters, we have developed formal systems which  
represent domains of mathematical reality in their symbolisms. To what extent is  
it reasonable to use such formal systems as models for how the brain might  
manipulate ideas?
 
380
 
We saw, in the pq-system and then in other more complicated systems,  
how meaning, in a limited sense of the term, arose as a result of an  
isomorphism which maps typographical symbols onto numbers,  
operations, and relations; and strings of typographical symbols onto statements.  
Now in the brain we don't have typographical symbols, but we have  
something even better: active elements which can store information and  
transmit it and receive it from other active elements. Thus we have active  
symbols, rather than passive typographical symbols. In the brain, the rules  
Brains and Thoughts  
337are mixed right in with the symbols themselves, whereas on paper, the  
symbols are static entities, and the rules are in our heads.
 
381
 
It is important not to get the idea, from the rather strict nature of all  
the formal systems we have seen, that the isomorphism between symbols  
and real things is a rigid, one-to-one mapping, like the strings which link a  
marionette and the hand guiding it. In TNT, the notion "fifty" can be  
expressed in different symbolic ways; for example,  
((SSSSSSSO ‱ SSSSSSSO)+(SO ‱ SO))  
((SSSSSO ‱ SSSSSO)+(SSSSSO ‱ SSSSSO))  
That these both represent the same number is not a priori clear. You can  
manipulate each expression independently, and at some point stumble  
across a theorem which makes you exclaim, "Oh—it's that number!"
 
381
 
In your mind, you can also have different mental descriptions for a  
single person; for example,  
The person whose book I sent to a friend in Poland a while back.  
The stranger who started talking with me and my friends tonight  
in this coffee house.  
That they both represent the same person is not a priori clear. Both  
descriptions may sit in your mind, unconnected. At some point during the  
evening you may stumble across a topic of conversation which leads to the  
revelation that they designate the same person, making you exclaim,  
"Oh—you're that person!"
 
381
 
Not all descriptions of a person need be attached to some central  
symbol for that person, which stores the person's name. Descriptions can be  
manufactured and manipulated in themselves. We can invent nonexistent  
people by making descriptions of them; we can merge two descriptions  
when we find they represent a single entity, we can split one description  
into two when we find it represents two things, not one—and so on. This  
"calculus of descriptions" is at the heart of thinking. It is said to be inten-  
sional and not extensional, which means that descriptions can "float" without  
being anchored down to specific, known objects. The intensionality of  
thought is connected to its flexibility; it gives us the ability to imagine  
hypothetical worlds, to amalgamate different descriptions or chop one  
description into separate pieces, and so on.
 
381
 
Suppose a friend who has borrowed your car telephones you to say  
that your car skidded off a wet mountain road, careened against a bank,  
and overturned, and she narrowly escaped death. You conjure up a series  
of images in your mind, which get progressively more vivid as she adds  
details, and in the end you "see it all in your mind's eye". Then she tells you  
that it's all been an April Fool's joke, and both she and the car are fine! In  
many ways that is irrelevant. The story and the images lose nothing of their  
vividness, and the memory will stay with you for a long, long time. Later,  
you may even think of her as an unsafe driver because of the strength of  
338  
Brains and Thoughtsthe first impression, which should have been wiped out when you learned it  
was all untrue. Fantasy and fact intermingle very closely in our minds, and  
this is because thinking involves the manufacture and manipulation of  
complex descriptions, which need in no way be tied down to real events or  
things.  
A flexible, intensional representation of the world is what thinking is  
all about. Now how can a physiological system such as the brain support  
such a system?
 
382
 
The Brain's "Ants"
 
382
 
The most important cells in the brain are nerve cells, or neurons (see  
Fig. 65), of which there are about ten billion. (Curiously, outnumbering the  
neurons by about ten to one are the glial cells, or glia. Glia are believed to  
play more of a supporting role to the neurons' starring role, and therefore  
we will not discuss them.) Each neuron possesses a number of synapses  
("entry ports") and one axon ("output channel"). The input and output are  
electrochemical flows: that is, moving ions. In between the entry ports of a  
neuron and its output channel is its cell body, where "decisions" are made.
 
382
 
FIGURE 65. Schematic drawing of a  
neuron. [Adapted From D. Wooldridge, The  
Machinery of the Brain (New York:  
McGraw-Hill, 1963), p. 6.]
 
383
 
The type of decision which a neuron faces—and this can take place up to a  
thousand times per second—is this: whether or not to fire—that is, to  
release ions down its axon, which eventually will cross over into the entry  
ports of one or more other neurons, thus causing them to make the same  
sort of decision. The decision is made in a very simple manner; if the sum  
of all inputs exceeds a certain threshold, yes; otherwise, no. Some of the  
inputs can be negative inputs, which cancel out positive inputs coming from  
somewhere else. In any case, it is simple addition which rules the lowest  
level of the mind. To paraphrase Descartes' famous remark, "I think,  
therefore I sum" (from the Latin Cogito, ergo am).
 
383
 
Now although the manner of making the decision sounds very simple,  
there is one fact which complicates the issue: there may be as many as  
200,000 separate entry ports to a neuron, which means that up to 200,000  
separate summands may be involved in determining the neuron's next  
action. Once the decision has been made, a pulse of ions streaks down the  
axon towards its terminal end. Before the ions reach the end, however,  
they may encounter a bifurcation—or several. In such cases, the single  
output pulse splits up as it moves down the bifurcating axon, and by the  
time it has reached the end, "it" has become "they"—and they may reach  
their destinations at separate times, since the axon branches along which  
they travel may be of different lengths and have different resistivities. The  
important thing, though, is that they all began as one single pulse, moving  
away from the cell body. After a neuron fires, it needs a short recovery time  
before firing again; characteristically this is measured in milliseconds, so  
that a neuron may fire up to about a thousand times per second.
 
383
 
Larger Structures in the Brain
 
383
 
Now we have described the brain's "ants". What about "teams", or  
"signals"? What about "symbols"? We make the following observation: despite  
the complexity of its input, a single neuron can respond only in a very  
primitive way—by firing, or not firing. This is a very small amount of  
information. Certainly for large amounts of information to be carried or  
processed, many neurons must be involved. And therefore one might guess  
that larger structures, composed from many neurons, would exist, which  
handle concepts on a higher level. This is undoubtedly true, but the most  
naive assumption—that there is a fixed group of neurons for each different  
concept—is almost certainly false.
 
383
 
There are many anatomical portions of the brain which can be  
distinguished from each other, such as the cerebrum, the cerebellum, the  
hypothalamus (see Fig. 66). The cerebrum is the largest part of the human  
brain, and is divided into a left hemisphere and a right hemisphere. The  
outer few millimeters of each cerebral hemisphere are coated with a  
layered "bark", or cerebral cortex. The amount of cerebral cortex is the major  
distinguishing feature, in terms of anatomy, between human brains and  
brains of less intelligent species. We will not describe any of the brain's  
suborgans in detail because, as it turns out, only the roughest mapping
 
“Can”
 
384
 
50.]  
at this time be made between such large-scale suborgans and the activities,  
mental or physical, which they are responsible for. For instance, it is known  
that language is primarily handled in one of the two cerebral  
hemispheres—in fact, usually the left hemisphere. Also, the cerebellum is the  
place where trains of impulses are sent off to muscles to control motor  
activity. But how these areas carry out their functions is still largely a
 
384
 
Mappings between Brains
 
384
 
Now an extremely important question comes up here. If thinking does take  
place in the brain, then how are two brains different from each other? How  
is my brain different from yours? Certainly you do not think exactly as I do,  
nor as anyone else does. But we all have the same anatomical divisions in  
our brains. How far does this identity of brains extend? Does it go to the  
neural level? Yes, if you look at animals on a low enough level of the  
thinking-hierarchy—the lowly earthworm, for instance. The following  
quote is from the neurophysiologist, David Hubel, speaking at a conference  
on communication with extraterrestrial intelligence:  
The number of nerve cells in an animal like a worm would be measured, I  
suppose, in the thousands. One very interesting thing is that we may point to a  
particular individual cell in a particular earthworm, and then identify the  
same cell, the corresponding cell in another earthworm of the same species.1  
Brains and Thoughts  
341Earthworms have isomorphic brains! One could say, "There is only one  
earthworm."
 
385
 
earthworm."  
But such one-to-one mappability between individuals' brains  
disappears very soon as you ascend in the thinking-hierarchy and the number of  
neurons increases—confirming one's suspicions that there is not just one  
human! Yet considerable physical similarity can be detected between  
different human brains when they are compared on a scale larger than a  
single neuron but smaller than the major suborgans of the brain. What  
does this imply about how individual mental differences are represented in  
the physical brain? If we looked at my neurons' interconnections, could we  
find various structures that could be identified as coding for specific things  
I know, specific beliefs I have, specific hopes, fears, likes and dislikes I  
harbor? If mental experiences can be attributed to the brain, can  
knowledge and other aspects of mental life likewise be traced to specific locations  
inside the brain, or to specific physical subsystems of the brain? This will be  
a central question to which we will often return in this Chapter and the  
next.
 
385
 
Localization of Brain Processes: An Enigma
 
385
 
Enigma  
In an attempt to answer this question, the neurologist Karl Lashley, in a  
long series of experiments beginning around 1920 and running for many  
years, tried to discover where in its brain a rat stores its knowledge about  
maze running. In his book The Conscious Brain, Steven Rose describes  
Lashley's trials and tribulations this way:  
Lashley was attempting to identify the locus of memory within the cortex,  
and, to do so, first trained rats to run mazes, and then removed various  
cortical regions. He allowed the animals to recover and tested the retention of  
the maze-running skills. To his surprise it was not possible to find a particular  
region corresponding to the ability to remember the way through a maze.  
Instead all the rats which had had cortex regions removed suffered some kind  
of impairment, and the extent of the impairment was roughly proportional to  
the amount of cortex taken off. Removing cortex damaged the motor and  
sensory capacities of the animals, and they would limp, hop, roll, or stagger,  
but somehow they always managed to traverse the maze. So far as memory  
was concerned, the cortex appeared to be equipotential, that is, with all  
regions of equal possible utility. Indeed, Lashley concluded rather gloomily in  
his last paper "In Search of the Engrain", which appeared in 1950, that the  
only conclusion was that memory was not possible at all.2
 
385
 
Curiously, evidence for the opposite point of view was being developed  
in Canada at roughly the same time that Lashley was doing his last work, in  
the late 1940's. The neurosurgeon Wilder Penfield was examining the  
reactions of patients whose brains had been operated on, by inserting  
electrodes into various parts of their exposed brains, and then using small  
electrical pulses to stimulate the neuron or neurons to which the electrodes  
had been attached. These pulses were similar to the pulses which come  
from other neurons. What Penfield found was that stimulation of certain  
342  
Brains and Thoughtsneurons would reliably create specific images or sensations in the patient.  
These artificially provoked impressions ranged from strange but  
indefinable fears to buzzes and colors, and, most impressively of all, to entire  
successions of events recalled from some earlier time of life, such as a  
childhood birthday party. The set of locations which could trigger such  
specific events was extremely small—basically centered upon a single  
neuron. Now these results of Penfield dramatically oppose the conclusions  
of Lashley, since they seem to imply that local areas are responsible for  
specific memories, after all.
 
386
 
What can one make of this? One possible explanation could be that  
memories are coded locally, but over and over again in different areas of  
the cortex—a strategy perhaps developed in evolution as security against  
possible loss of cortex in fights, or in experiments conducted by  
neurophysiologists. Another explanation would be that memories can be  
reconstructed from dynamic processes spread over the whole brain, but  
can be triggered from local spots. This theory is based on the notion of  
modern telephone networks, where the routing of a long-distance call is  
not predictable in advance, for it is selected at the time the call is placed,  
and depends on the situation all over the whole country. Destroying any  
local part of the network would not block calls; it would just cause them to  
be routed around the damaged area. In this sense any call is potentially  
nonlocalizable. Yet any call just connects up two specific points; in this sense  
any call is localizable.
 
386
 
Specificity in Visual Processing
 
386
 
Some of the most interesting and significant work on localization of brain  
processes has been done in the last fifteen years by David Hubel and  
Torsten Wiesel, at Harvard. They have mapped out visual pathways in the  
brains of cats, starting with the neurons in the retina, following their  
connections towards the rear of the head, passing through the "relay  
station" of the lateral geniculate, and ending up in the visual cortex, at the  
very back of the brain. First of all, it is remarkable that there exist well-  
defined neural pathways, in light of Lashley's results. But more remarkable  
are the properties of the neurons located at different stages along the  
pathway.
 
386
 
It turns out that retinal neurons are primarily contrast sensors. More  
specifically, the way they act is this. Each retinal neuron is normally firing at  
a "cruising speed". When its portion of the retina is struck by light, it may  
either fire faster or slow down and even stop firing. However, it will do so  
only provided that the surrounding part of the retina is less illuminated. So  
this means that there are two types of neuron: ‱"on-center", and "off-  
center". The on-center neurons are those whose firing rate increases  
whenever, in the small circular retinal area to which they are sensitive, the  
center is bright but the outskirts are dark; the off-center neurons are those  
which fire faster when there is darkness in the center and brightness in the  
Brains and Thoughts  
343outer ring. If an on-center pattern is shown to an off-center neuron, the  
neuron will slow down in firing (and vice versa). Uniform illumination will  
leave both types of retinal neuron unaffected; they will continue to fire at  
cruising speed.
 
387
 
From the retina, signals from these neurons proceed via the optic  
nerve to the lateral geniculate, located somewhere towards the middle of  
the brain. There, one can find a direct mapping of the retinal surface in the  
sense that there are lateral-geniculate neurons which are triggered only by  
specific stimuli falling on specific areas of the retina. In that sense, the  
lateral geniculate is disappointing; it seems to be only a "relay station", and  
not a further processor (although to give it its due, the contrast sensitivity  
seems to be enhanced in the lateral geniculate). The retinal image is coded  
in a straightforward way in the firing patterns of the neurons in the lateral  
geniculate, despite the fact that the neurons there are not arranged on a  
two-dimensional surface in the form of the retina, but in a three-  
dimensional block. So two dimensions get mapped onto three, yet the  
information is preserved: an isomorphism. There is probably some deep  
meaning to the change in the dimensionality of the representation, which is  
not yet fully appreciated. In any case, there are so many further  
unexplained stages of vision that we should not be disappointed but pleased  
by the fact that—to some extent—we have figured out this one stage!
 
387
 
From the lateral geniculate, the signals proceed back to the visual  
cortex. Here, some new types of processing occur. The cells of the visual  
cortex are divided into three categories: simple, complex, and hyper-  
complex. Simple cells act very much like retinal cells or lateral geniculate  
cells: they respond to point-like light or dark spots with contrasting  
surrounds, in particular regions of the retina. Complex cells, by contrast,  
usually receive input from a hundred or more other cells, and they detect light  
or dark bars oriented at specific angles on the retina (see Fig. 67). Hyper-  
complex cells respond to corners, bars, or even "tongues" moving in specific  
directions (again see Fig. 67). These latter cells are so highly specialized  
that they are sometimes called "higher-order hypercomplex cells".
 
387
 
A "Grandmother Cell"?
 
387
 
Because of the discovery of cells in the visual cortex which can be triggered  
by stimuli of ever-increasing complexity, some people have wondered if  
things are not leading in the direction of "one cell, one concept"—for  
example, you would have a "grandmother cell" which would fire if, and  
only if, your grandmother came into view. This somewhat humorous  
example of a "superhypercomplex cell" is not taken very seriously.  
However, it is not obvious what alternative theory seems reasonable. One  
possibility is that larger neural networks are excited collectively by sufficiently  
complex visual stimuli. Of course, the triggering of these larger mul-  
tineuron units would somehow have to come from integration of signals  
emanating from the many hypercomplex cells. How this might be done,  
nobody knows. Just when we seem to be approaching the threshold where
 
388
 
"symbol" might emerge from "signal", the trail gets lost—a tantalizingly  
unfinished story. We will return to this story shortly, however, and try to fill  
in some of it.
 
388
 
Earlier I mentioned the coarse-grained isomorphism between all  
human brains which exists on a large anatomical scale, and the very  
finegrained, neural-level isomorphism which exists between earthworm brains.  
It is quite interesting that there is also an isomorphism between the visual  
processing apparatus of cat, monkey, and human, the "grain" of which is  
somewhere between coarse and fine. Here is how that isomorphism works.  
First of all, all three species have "dedicated" areas of cortex at the back of  
their brains where visual processing is done: the visual cortex. Secondly, in  
Brains and Thoughts  
345each of them, the visual cortex breaks up into three subregions, called areas  
17, 18, and 19 of the cortex. These areas are still universal, in the sense that  
they can be located in the brain of any normal individual in any of the three  
species. Within each area you can go still further, reaching the "columnar"  
organization of the visual cortex. Perpendicular to the surface of the  
cortex, moving radially inwards towards the inner brain, visual neurons are  
arranged in "columns"—that is, almost all connections move along the  
radial, columnar direction, and not between columns. And each column  
maps onto a small, specific retinal region. The number of columns is not  
the same in each individual, so that one can't find "the same column".  
Finally, within a column, there are layers in which simple neurons tend to  
be found, and other layers in which complex neurons tend to be found.  
(The hypercomplex neurons tend to be found in areas 18 and 19  
predominantly, while the simple and complex ones are found mostly in area 17.)  
It appears that we run out of isomorphisms at this level of detail. From here  
on down to the individual neuron level, each individual cat, monkey, or  
human has a completely unique pattern—somewhat like a fingerprint or a  
signature.
 
389
 
One minor but perhaps telling difference between visual processing in  
cats' brains and monkeys' brains has to do with the stage at which  
information from the two eyes is integrated to yield a single combined higher-level  
signal. It turns out that it takes place slightly later in the monkey than in the  
cat, which gives each separate eye's signal a slightly longer time to get  
processed by itself. This is not too surprising, since one would expect that  
the higher a species lies in the intelligence hierarchy, the more complex will  
be the problems which its visual system will be called upon to handle; and  
therefore signals ought to pass through more and more early processing  
before receiving a final "label". This is quite dramatically confirmed by  
observations of the visual abilities of a newborn calf, which seems to be born  
with as much power of visual discrimination as it will ever have. It will shy  
away from people or dogs, but not from other cattle. Probably its entire  
visual system is "hard-wired" before birth, and involves relatively little  
cortical processing. On the other hand, a human's visual system, so deeply  
reliant on the cortex, takes several years to reach maturity.
 
389
 
Funneling into Neural Modules
 
389
 
puzzling thing about the discoveries so far made about the organization  
of the brain is that few direct correspondences have been found between  
large-scale hardware and high-level software. The visual cortex, for  
instance, is a large-scale piece of hardware, which is entirely dedicated to a  
clear software purpose—the processing of visual information—yet all of  
the processing so far discovered is still quite low-level. Nothing  
approaching recognition of objects has been localized in the visual cortex. This means  
that no one knows where or how the output from complex and  
hypercomplex cells gets transformed into conscious recognition of shapes,  
346  
Brains and Thoughtsrooms, pictures, faces, and so on. People have looked for evidence of the  
"funneling" of many low-level neural responses into fewer and fewer  
higher-level ones, culminating in something such as the proverbial  
grandmother cell, or some kind of multineuron network, as mentioned above. It  
is evident that this will not be found in some gross anatomical division of  
the brain, but rather in a more microscopic analysis.
 
390
 
One possible alternative to the the grandmother cell might be a fixed  
set of neurons, say a few dozen, at the thin end of the "funnel", all of which  
fire when Granny comes into view. And for each different recognizable  
object, there would be a unique network and a funneling process that  
would focus down onto that network. There are more complicated  
alternatives along similar lines, involving networks which can be excited in  
different manners, instead of in a fixed manner. Such networks would be the  
"symbols" in our brains.
 
390
 
But is such funneling necessary? Perhaps an object being looked at is  
implicitly identified by its "signature" in the visual cortex—that is, the  
collected responses of simple, complex, and hypercomplex cells. Perhaps  
the brain does not need any further recognizer for a particular form. This  
theory, however, poses the following problem. Suppose you are looking at  
a scene. It registers its signature on your visual cortex; but then how do you  
get from that signature to a verbal description of the scene? For instance,  
the paintings of Edouard Vuillard, a French post-impressionist, often take  
a few seconds of scrutiny, and then suddenly a human figure will jump out  
at you. Presumably the signature gets imprinted on the visual cortex in the  
first fraction of a second—but the picture is only understood after a few  
seconds. This is but one example of what is actually a common  
phenomenon—a sensation of something "crystallizing" in your mind at the  
moment of recognition, which takes place not when the light rays hit your  
retina, but sometime later, after some part of your intelligence has had a  
chance to act on the retinal signals.
 
390
 
The crystallization metaphor yields a pretty image derived from  
statistical mechanics, of a myriad microscopic and uncorrelated activities in a  
medium, slowly producing local regions of coherence which spread and  
enlarge; in the end, the myriad small events will have performed a  
complete structural revamping of their medium from the bottom up, changing  
it from a chaotic assembly of independent elements into one large,  
coherent, fully linked structure. If one thinks of the early neural activities as  
independent, and of the end result of their many independent firings as  
the triggering of a well-defined large "module" of neurons, then the word  
"crystallization" seems quite apt.
 
390
 
Another argument for funneling is based on the fact that there are a  
myriad distinct scenes which can cause you to feel you have perceived the  
same object—for example, your grandmother, who may be smiling or  
frowning, wearing a hat or not, in a bright garden or a dark train station,  
seen from near or far, from side or front, and so on. All these scenes  
produce extremely different signatures on the visual cortex; yet all of them  
could prompt you to say "Hello, Granny." So a funneling process must take  
Brains and Thoughts  
347place at some point after the reception of the visual signature and before  
the words are uttered. One could claim that this funneling is not part of the  
perception of Granny, but just part of verbalization. But it seems quite  
unnatural to partition the process that way, for you could internally use the  
information that it is Granny without verbalizing it. It would be very  
unwieldy to handle all of the information in the entire visual cortex, when  
so much of it could be thrown away, since you don't care about where  
shadows fall or how many buttons there are on her blouse, etc.
 
391
 
Another difficulty with a non-funneling theory is to explain how there  
can be different interpretations for a single signature—for example, the  
Escher picture Convex and Concave (Fig. 23). Just as it seems obvious to us  
that we do not merely perceive dots on a television screen, but chunks,  
likewise it seems ridiculous to postulate that perception has taken place  
when a giant dot-like "signature" has been created on the visual cortex.  
There must be some funneling, whose end result is to trigger some specific  
modules of neurons, each of which is associated with the concepts—the  
chunks—in the scene.
 
391
 
Modules Which Mediate Thought Processes
 
391
 
Thus we are led to the conclusion that for each concept there is a fairly  
well-defined module which can be triggered—a module that consists of a  
small group of neurons—a "neural complex" of the type suggested earlier.  
A problem with this theory—at least if it is taken naively—is that it would  
suggest that one should be able to locate such modules somewhere within  
the brain. This has not yet been done, and some evidence, such as the  
experiments by Lashley, points against localization. However, it is still too  
early to tell. There may be many copies of each module spread around, or  
modules may overlap physically; both of these effects would tend to  
obscure any division of neurons into "packets". Perhaps the complexes are  
like very thin pancakes packed in layers which occasionally pass through  
each other; perhaps they are like long snakes which curl around each  
other, here and there flattening out, like cobras' heads; perhaps they are  
like spiderwebs; or perhaps they are circuits in which signals travel round  
and round in shapes stranger than the dash of a gnat-hungry swallow.  
There is no telling. It is even possible that these modules are software,  
rather than hardware, phenomena—but this is something which we will  
discuss later.
 
391
 
There are many questions that come to mind concerning these  
hypothesized neural complexes. For instance:  
Do they extend into the lower regions of the brain, such as the  
midbrain, the hypothalamus, etc.?  
Can a single neuron belong to more than one such complex?  
To how many such complexes can a single neuron belong?  
By how many neurons can such complexes overlap?  
348  
Brains and ThoughtsAre these complexes pretty much the same for everybody?  
Are corresponding ones found in corresponding places in  
different people's brains?  
Do they overlap in the same way in everybody's brain?
 
392
 
Philosophically, the most important question of all is this: What would  
the existence of modules—for instance, a grandmother module—tell us?  
Would this give us any insight into the phenomenon of our own  
consciousness? Or would it still leave us as much in the dark about what  
consciousness is, as does knowledge that a brain is built out of neurons and glia? As  
you might guess from reading the Ant Fugue, my feeling is that it would go  
a long way towards giving us an understanding of the phenomenon of  
consciousness. The crucial step that needs to be taken is from a low-  
level—neuron-by-neuron—description of the state of a brain, to a high-  
level—module-by-module—description of the same state of the same brain.  
Or, to revert to the suggestive terminology of the Ant Fugue, we want to  
shift the description of the brain state from the signal level to the symbol  
level.
 
392
 
Active Symbols
 
392
 
Let us from now on refer to these hypothetical neural complexes, neural  
modules, neural packets, neural networks, multineuron units—call them  
what you will, whether they come in the form of pancakes, garden rakes,  
rattlesnakes, snowflakes, or even ripples on lakes—as symbols. A description  
of a brain state in terms of symbols was alluded to in the Dialogue. What  
would such a description be like? What kinds of concepts is it reasonable to  
think actually might be "symbolized"? What kinds of interrelations would  
symbols have? And what insights would this whole picture provide into  
consciousness?
 
392
 
The first thing to emphasize is that symbols can be either dormant, or  
awake (activated). An active symbol is one which has been triggered—that is,  
one in which a threshold number of neurons have been caused to fire by  
stimuli coming from outside. Since a symbol can be triggered in many  
different ways, it can act in many different ways when awakened. This  
suggests that we should think of a symbol not as a fixed entity, but as a  
variable entity. Therefore it would not suffice to describe a brain state by  
saying "Symbols A, B,. . ., N are all active"; rather, we would have to supply  
in addition a set of parameters for each active symbol, characterizing some  
aspects of the symbol's internal workings. It is an interesting question  
whether in each symbol there are certain core neurons, which invariably  
fire when the symbol is activated. If such a core set of neurons exists, we  
might refer to it as the "invariant core" of the symbol. It is tempting to  
assume that each time you think of, say, a waterfall, some fixed neural  
process is repeated, without doubt embellished in differerit ways  
depending on the context, but reliably occurring. However, it is not clear that this  
must be so.
 
393
 
Now what does a symbol do, when awakened? A low-level description  
would say, "Many of its neurons fire." But this no longer interests us. The  
high-level description should eliminate all reference to neurons, and  
concentrate exclusively on symbols. So a high-level description of what makes a  
symbol active, as distinguished from dormant, would be, "It sends out  
messages, or signals, whose purpose is to try to awaken, or trigger, other  
symbols." Of course these messages would be carried as streams of nerve  
impulses, by neurons—but to the extent that we can avoid such  
phraseology, we should, for it represents a low-level way of looking at things, and we  
hope that we can get along on purely a high level. In other words, we hope  
that thought processes can be thought of as being sealed off from neural  
events in the same way that the behavior of a clock is sealed off from the  
laws of quantum mechanics, or the biology of cells is sealed off from the  
laws of quarks.
 
393
 
But what is the advantage of this high-level picture? Why is it better to  
say, "Symbols A and B triggered symbol C" than to say, "Neurons 183  
through 612 excited neuron 75 and caused it to fire"? This question was  
answered in the Ant Fugue: It is better because symbols symbolize things, and  
neurons don't. Symbols are the hardware realizations of concepts. Whereas  
a group of neurons triggering another neuron corresponds to no outer  
event, the triggering of some symbol by other symbols bears a relation to  
events in the real world—or in an imaginary world. Symbols are related to  
each other by the messages which they can send back and forth, in such a  
way that their triggering patterns are very much like the large-scale events  
which do happen in our world, or could happen in a world similar to ours.  
In essence, meaning arises here for the same reason as it did in the  
pq-system—isomorphism; only here, the isomorphism is infinitely more  
complex, subtle, delicate, versatile, and intensional.
 
393
 
Incidentally, the requirement that symbols should be able to pass  
sophisticated messages to and fro is probably sufficient to exclude neurons  
themselves from playing the role of symbols. Since a neuron has only a  
single way of sending information out of itself, and has no way of selectively  
directing a signal now in one direction, now in another, it simply does not  
have the kind of selective triggering power which a symbol must have to act  
like an object in the real world. In his book The Insect Societies, E. O. Wilson  
makes a similar point about how messages propagate around inside ant  
colonies:  
[Mass communication] is defined as the transfer, among groups,  
of information that a single individual could not pass to another.3  
It is not such a bad image, the brain as an ant colony!
 
393
 
The next question—and an extremely important one it is, too—  
concerns the nature and "size" of the concepts which are represented in the  
brain by single symbols. About the nature of symbols there are questions  
like this: Would there be a symbol for the general notion of waterfalls, or  
would there be different symbols for various specific waterfalls? Or would  
both of these alternatives be realized? About the "size" of symbols, there are  
questions like this: Would there be a symbol for an entire story? Or for a  
350  
Brains and Thoughtsmelody? Or a joke? Or is it more likely that there would only be symbols for  
concepts roughly the size of words, and that larger ideas, such as phrases or  
sentences, would be represented by concurrent or sequential activation of  
various symbols?
 
394
 
Let us consider the issue of the size of concepts represented by  
symbols. Most thoughts expressed in sentences are made up out of basic,  
quasi-atomic components which we do not usually analyze further. These  
are of word size, roughly—sometimes a little longer, sometimes a little  
shorter. For instance, the noun "waterfall", the proper noun "Niagara  
Falls", the past-tense suffix "-ed", the verb "to catch up with", and longer  
idiomatic phrases are all close to atomic. These are typical elementary  
brush strokes which we use in painting portraits of more complex concepts,  
such as the plot of a movie, the flavor of a city, the nature of consciousness,  
etc. Such complex ideas are not single brush strokes. It seems reasonable to  
think that the brush strokes of language are also brush strokes of thought,  
and therefore that symbols represent concepts of about this size. Thus a  
symbol would be roughly something for which you know a word or stock  
phrase, or with which you associate a proper name. And the representation  
in the brain of a more complex idea, such as a problem in a love affair,  
would be a very complicated sequence of activations of various symbols by  
other symbols.
 
394
 
Classes and Instances
 
394
 
There is a general distinction concerning thinking: that between categories  
and individuals, or classes and instances. (Two other terms sometimes used  
are "types" and "tokens".) It might seem at first sight that a given symbol  
would inherently be either a symbol for a class or a symbol for an  
instance—but that is an oversimplification. Actually, most symbols may play  
either role, depending on the context of their activation. For example, look  
at the list below:  
(1) a publication  
(2) a newspaper  
(3) The San Francisco Chronicle  
(4) the May 18 edition of the Chronicle  
(5) my copy of the May 18 edition of the Chronicle  
(6) my copy of the May 18 edition of the Chronicle as  
it was when I first picked it up (as contrasted with  
my copy as it was a few days later: in my fireplace,  
burning)  
Here, lines 2 to 5 all play both roles. Thus, line 4 is an instance of of the  
general class of line 3, and line 5 is an instance of line 4. Line 6 is a special  
kind of instance of a class: a manifestation. The successive stages of an object  
during its life history are its manifestations. It is interesting to wonder if the  
cows on a farm perceive the invariant individual underneath all the  
manifestations of the jolly farmer who feeds them hay.
 
395
 
The Prototype Principle
 
395
 
The list above seems to be a hierarchy of generality—the top being a very  
broad conceptual category, the bottom some very humble particular thing  
located in space and time. However, the idea that a "class" must always be  
enormously broad and abstract is far too limited. The reason is that our  
thought makes use of an ingenious principle, which might be called the  
prototype principle:  
The most specific event can serve as a general example  
of a class of events.  
Everyone knows that specific events have a vividness which imprints them  
so strongly on the memory that they can later be used as models for other  
events which are like them in some way. Thus in each specific event, there is  
the germ of a whole class of similar events. This idea that there is generality  
in the specific is of far-reaching importance.
 
395
 
Now it is natural to ask: Do the symbols in the brain represent classes,  
or instances? Are there certain symbols which represent only classes, while  
other symbols represent only instances? Or can a single symbol serve duty  
either as a class symbol or instance symbol, depending which parts of it are  
activated? The latter theory seems appealing; one might think that a "light"  
activation of a symbol might represent a class, and that a deeper, or more  
complex, activation would contain more detailed internal neural firing  
patterns, and hence would represent an instance. But on second thought,  
this is crazy: it would imply, for example, that by activating the symbol for  
"publication" in a sufficiently complex way, you would get the very complex  
symbol which represents a specific newspaper burning in my fireplace. And  
every other possible manifestation of every other piece of printed matter  
would be represented internally by some manner of activating the single  
symbol for "publication". That seems much too heavy a burden to place on  
the single symbol "publication". One must conclude, therefore, that  
instance symbols can exist side by side with class symbols, and are not just  
modes of activation of the latter.
 
395
 
The Splitting-off of Instances from Classes
 
395
 
the other hand, instance symbols often inherit many of their properties  
from the classes to which those instances belong. If I tell you I went to see a  
movie, you will begin "minting" a fresh new instance symbol for that  
particular movie; but in the absence of more information, the new instance  
symbol will have to lean rather heavily on your pre-existing class symbol for  
"movie". Unconsciously, you will rely on a host of presuppositions about  
that movie—for example, that it lasted between one and three hours, that it  
was shown in a local theater, that it told a story about some people, and so  
on. These are built into the class symbol as expected links to other symbols  
(i.e., potential triggering relations), and are called default options. In any  
352  
Brains and Thoughtsfreshly minted instance symbol, the default options can easily be  
overridden, but unless this is explicitly done, they will remain in the instance  
symbol, inherited from its class symbol. Until they are overridden, they  
provide some preliminary basis for you to think about the new instance—  
for example, the movie I went to see—by using the reasonable guesses  
which are supplied by the "stereotype", or class symbol.
 
396
 
A fresh and simple instance is like a child without its own ideas or  
experiences—it relies entirely on its parents' experiences and opinions and  
just parrots them. But gradually, as it interacts more and more with the rest  
of the world, the child acquires its own idiosyncratic experiences and  
inevitably begins to split away from the parents. Eventually, the child  
becomes a full-fledged adult. In the same way, a fresh instance can split off  
from its parent class over a period of time, and become a class, or  
prototype, in its own right.
 
396
 
For a graphic illustration of such a splitting-off process, suppose that  
some Saturday afternoon you turn on your car radio, and happen to tune  
in on a football game between two "random" teams. At first you do not  
know the names of the players on either team. All you register, when the  
announcer says, "Palindromi made the stop on the twenty-seven yard line,  
and that brings up fourth down and six to go," is that some player stopped  
some other player. Thus it is a case of activation of the class symbol  
"football player", with some sort of coordinated activation of the symbol for  
tackling. But then as Palindromi figures in a few more key plays, you begin  
building up a fresh instance symbol for him in particular, using his name,  
perhaps, as a focal point. This symbol is dependent, like a child, on the class  
symbol for "football player": most of your image of Palindromi is supplied  
by your stereotype of a football player as contained in the "football player"  
symbol. But gradually, as more information comes to you, the "Palindromi"  
symbol becomes more autonomous, and relies less and less on concurrent  
activation of its parent class symbol. This may happen in a few minutes, as  
Palindromi makes a few good plays and stands out. His teammates may still  
all be represented by activations of the class symbol, however. Eventually,  
perhaps after a few days, when you have read some articles in the sports  
section of your paper, the umbilical cord is broken, and Palindromi can  
stand on his own two feet. Now you know such things as his home town and  
his major in college; you recognize his face; and so on. At this point,  
Palindromi is no longer conceived of merely as a football player, but as a  
human being who happens also to be a football player. "Palindromi" is an  
instance symbol which can become active while its parent class symbol  
(football player) remains dormant.
 
396
 
Once, the Palindromi symbol was a satellite orbiting around its mother  
symbol, like an artificial satellite circling the Earth, which is so much bigger  
and more massive. Then there came an intermediate stage, where one  
symbol was more important than the other, but they could be seen as  
orbiting around each other—something like the Earth and the Moon.  
Finally, the new symbol becomes quite autonomous; now it might easily  
serve as a class symbol around which could start rotating new satellites—  
Brains and Thoughts  
353symbols for other people who are less familiar but who have something in  
common with Palindromi, and for whom he can serve as a temporary  
stereotype, until you acquire more information, enabling the new symbols  
also to become autonomous.
 
397
 
The Difficulty of Disentangling Symbols from Each Other
 
397
 
Other  
These stages of growth and eventual detachment of an instance from a  
class will be distinguishable from each other by the way in which the  
symbols involved are linked. Sometimes it will no doubt be very difficult to  
tell just where one symbol leaves off and the other one begins. How "active"  
is the one symbol, compared to the other? If one can be activated  
independently of the other, then it would he quite sensible to call them  
autonomous.
 
397
 
We have used an astronomy metaphor above, and it is interesting that  
the problem of the motion of planets is an extremely complex one—in fact  
the general problem of three gravitationally interacting bodies (such as the  
Earth, Moon, and Sun) is far from solved, even after several centuries of  
work. One situation in which it is possible to obtain good approximate  
solutions, however, is when one body is much more massive than the other  
two (here, the Sun); then it makes sense to consider that body as stationary,  
with the other two rotating about it; on top of this can finally be added the  
interaction between the two satellites. But this approximation depends on  
breaking up the system into the Sun, and a "cluster": the Earth-Moon  
system. This is an approximation, but it enables the system to be  
understood quite deeply. So to what extent is this cluster a part of reality, and to  
what extent is it a mental fabrication, a human imposition of structure on  
the universe? This problem of the "reality" of boundaries drawn between  
what are perceived to be autonomous or semi-autonomous clusters will  
create endless trouble when we relate it to symbols in the brain.
 
397
 
One greatly puzzling question is the simple issue of plurals. How do we  
visualize, say, three dogs in a teacup? Or several people in an elevator? Do  
we begin with the class symbol for "dog" and then rub three "copies" off of  
it? That is, do we manufacture three fresh instance symbols using the class  
symbol "dog" as template? Or do we jointly activate the symbols "three" and  
"dog"? By adding more or less detail to the scene being imagined, either  
theory becomes hard to maintain. For instance, we certainly do not have a  
separate instance symbol for each nose, mustache, grain of salt, etc., that we  
have ever seen. We let class symbols take care of such numerous items, and  
when we pass people on the street who have mustaches, we somehow just  
activate the "mustache" class symbol, without minting fresh instance  
symbols, unless we scrutinize them carefully.
 
397
 
On the other hand, once we begin to distinguish individuals, we cannot  
rely on a single class symbol (e.g., "person") to timeshare itself among all  
the different people. Clearly there must come into existence separate  
instance symbols for individual people. It would be ridiculous to imagine  
354  
Brains and Thoughtsthat this feat could be accomplished by "juggling"—that is, by the single  
class symbol flitting back and forth between several different modes of  
activation (one for each person).
 
398
 
Between the extremes, there must be room for many sorts of  
intermediate cases. There may be a whole hierarchy of ways of creating the  
class-instance distinction in the brain, giving rise to symbols—and symbol-  
organizations—of varying degrees of specificity. The following different  
kinds of individual and joint activation of symbols might be responsible for  
mental images of various degrees of specificity:  
(1) various different modes or depths of activation of a single  
class symbol;  
(2) simultaneous activation of several class symbols in some  
coordinated manner;  
(3) activation of a single instance symbol;  
(4) activation of a single instance symbol in conjunction with  
activation of several class symbols;  
(5) simultaneous activation of several instance symbols and  
several class symbols in some coordinated manner.
 
398
 
This brings us right back to the question: "When is a symbol a  
distinguishable subsystem of the brain?" For instance, consider the second  
example—simultaneous activation of several class symbols in some  
coordinated manner. This could easily be what happens when "piano sonata" is  
the concept under consideration (the symbols for "piano" and "sonata"  
being at least two of the activated symbols). But if this pair of symbols gets  
activated in conjunction often enough, it is reasonable to assume that the  
link between them will become strong enough that they will act as a unit,  
when activated together in the proper way. So two or more symbols can act  
as one, under the proper conditions, which means that the problem of  
enumerating the number of symbols in the brain is trickier than one might  
guess.
 
398
 
Sometimes conditions can arise where two previously unlinked  
symbols get activated simultaneously and in a coordinated fashion. They may  
fit together so well that it seems like an inevitable union, and a single new  
symbol is formed by the tight interaction of the two old symbols. If this  
happens, would it be fair to say that the new symbol "always had been there  
but never had been activated"—or should one say that it has been  
"created"?
 
398
 
In case this sounds too abstract, let us take a concrete example: the  
Dialogue Crab Canon. In the invention of this Dialogue, two existing  
symbols—that for "musical crab canon", and that for "verbal dialogue"—  
had to be activated simultaneously and in some way forced to interact.  
Once this was done, the rest was quite inevitable: a new symbol—a class  
symbol—was born from the interaction of these two, and from then on it  
was able to be activated on its own. Now had it always been a dormant  
symbol in my brain? If so, then it must also have been a dormant symbol in  
Brains and Thoughts  
355the brain of every human who ever had its component symbols, even if it  
never was awakened in them. This would mean that to enumerate the  
symbols in anyone's brain, one would have to count all dormant symbols—all  
possible combinations and permutations of all types of activations of all  
known symbols. This would even include those fantastic creatures of  
software that one's brain invents when one is asleep—the strange mixtures  
of ideas which wake up when their host goes to sleep . . . The existence of  
these "potential symbols" shows that it is really a huge oversimplification to  
imagine that the brain is a well-defined collection of symbols in well-defined  
states of activation. It is much harder than that to pin down a brain state on  
the symbol level.
 
399
 
Symbols —Software or Hardware?
 
399
 
With the enormous and ever-growing repertoire of symbols that exist in  
each brain, you might wonder whether there eventually comes a point  
when the brain is saturated—when there is just no more room for a new  
symbol. This would come about, presumably, if symbols never overlapped  
each other—if a given neuron never served a double function, so that  
symbols would be like people getting into an elevator. "Warning: This  
brain has a maximum capacity of 350,275 symbols!"
 
399
 
This is not a necessary feature of the symbol model of brain function,  
however. In fact, overlapping and completely tangled symbols are probably  
the rule, so that each neuron, far from being a member of a unique symbol,  
is probably a functioning part of hundreds of symbols. This gets a little  
disturbing, because if it is true, then might it not just as easily be the case  
that each neuron is part of every single symbol? If that were so, then there  
would be no localizability whatsoever of symbols—every symbol would be  
identified with the whole of the brain. This would account for results like  
Lashley's cortex removal in rats—but: it would also mean abandonment of  
our original idea of breaking the bruin up into physically distinct  
subsystems. Our earlier characterization of symbols as "hardware realizations of  
concepts" could at best be a great oversimplification. In fact, if every symbol  
were made up of the same component neurons as every other symbol, then  
what sense would it make to speak of distinct symbols at all? What would be  
the signature of a given symbol's activation—that is, how could the  
activation of symbol A be distinguished from the activation of symbol B?  
Wouldn't our whole theory go down the drain? And even if there is not a  
total overlap of symbols, is our theory not more and more difficult to  
maintain, the more that symbols do overlap? (One possible way of  
portraying overlapping symbols is shown in Figure 68.)
 
399
 
There is a way to keep a theory based on symbols even if physically,  
they overlap considerably or totally. Consider the surface of a pond, which  
can support many different types of waves or ripples. The hardware—  
namely the water itself—is the same in all cases, but it possesses different  
possible modes of excitation. Such software excitations of the same
 
400
 
hardware can all be distinguished from each other. By this analogy, I do  
not mean to go so far as to suggest that all the different symbols are just  
different kinds of "waves" propagating through a uniform neural medium  
which admits of no meaningful division into physically distinct symbols.  
But it may be that in order to distinguish one symbol's activation from that  
of another symbol, a process must be carried out which involves not only  
locating the neurons which are firing, but also identifying very precise  
details of the timing of the firing of those neurons. That is, which neuron  
preceded which other neuron, and by how much? How many times a  
second was a particular neuron firing? Thus perhaps several symbols can  
coexist in the same set of neurons by having different characteristic neural  
firing patterns. The difference between a theory having physically distinct  
symbols, and a theory having overlapping symbols which are distinguished  
from each other by modes of excitation, is that the former gives hardware  
realizations of concepts, while the latter gives partly hardware, partly  
software realizations of concepts.
 
401
 
Liftability of Intelligence
 
401
 
Thus we are left with two basic problems in the unraveling of thought  
processes, as they take place in the brain. One is to explain how the  
low-level traffic of neuron firings gives rise to the high-level traffic of  
symbol activations. The other is to explain the high-level traffic of symbol  
activation in its own terms—to make a theory which does not talk about the  
low-level neural events. If this latter is possible—and it is a key assumption  
at the basis of all present research into Artificial Intelligence—then  
intelligence can be realized in other types of hardware than brains. Then  
intelligence will have been shown to be a property that can be "lifted" right out of  
the hardware in which it resides—or in other words, intelligence will be a  
software property. This will mean that the phenomena of consciousness and  
intelligence are indeed high-level in the same sense as most other complex
 
402
 
phenomena of nature: they have their own high-level laws which depend  
on, yet are "liftable" out of, the lower levels. If, on the other hand, there is  
absolutely no way to realize symbol-triggering patterns without having all  
the hardware of neurons (or simulated neurons), this will imply that  
intelligence is a brain-bound phenomenon, and much more difficult to unravel  
than one which owes its existence to a hierarchy of laws on several different  
levels.
 
402
 
Here we come back to the mysterious collective behavior of ant  
colonies, which can build huge and intricate nests, despite the fact that the  
roughly 100,000 neurons of an ant brain almost certainly do not carry any  
information about nest structure. How, then, does the nest get created?  
Where does the information reside? In particular, ponder where the  
information describing an arch such as is shown in Figure 69 can be found.  
Somehow, it must be spread about in the colony, in the caste distribution,  
the age distribution—and probably largely in the physical properties of the  
ant-body itself. That is, the interaction between ants is determined just as  
much by their six-leggedness and their size and so on, as by the information  
stored in their brain. Could there be an Artificial Ant Colony?
 
402
 
Can One Symbol Be Isolated?
 
402
 
Is it possible that one single symbol could be awakened in isolation from all  
others? Probably not. Just as objects in the world always exist in a context of  
other objects, so symbols are always connected to a constellation of other  
symbols. This does not necessarily mean that symbols can never be  
disentangled from each other. To make a rather simple analogy, males and  
females always arise in a species together: their roles are completely  
intertwined, and yet this does not mean that a male cannot be distinguished  
from a female. Each is reflected in the other, as the beads in Indra's net  
reflect each other. The recursive intertwining of the functions F(n) and  
M(n) in Chapter V does not prevent each function from having its own  
characteristics. The intertwining of F and M could be mirrored in a pair of  
RTN's which call each other. From this we can jump to a whole network of  
ATN's intertwined with each other—a heterarchy of interacting recursive  
procedures. Here, the meshing is so inherent that no one ATN could be  
activated in isolation; yet its activation may be completely distinctive, not  
confusable with that of any other of the ATN's. It is not such a bad image,  
the brain as an ATN-colony!
 
402
 
Likewise, symbols, with all their multiple links to each other, are  
meshed together and yet ought to be able to be teased apart. This might  
involve identifying a neural network, a network plus a mode of  
excitation—or possibly something of a completely different kind. In any  
case, if symbols are part of reality, presumably there exists a natural way to  
chart them out in a real brain. However, if some symbols were finally  
identified in a brain, this would not mean that any one of them could be  
awakened in isolation.
 
403
 
The fact that a symbol cannot be awakened in isolation does not  
diminish the separate identity of the symbol; in fact, quite to the contrary: a  
symbol's identity lies precisely in its ways of being connected (via potential  
triggering links) to other symbols. The network by which symbols can  
potentially trigger each other constitutes the brain's working model of the  
real universe, as well as of the alternate universes which it considers (and  
which are every bit as important for the individual's survival in the real  
world as the real world is).
 
403
 
The Symbols of Insects
 
403
 
Our facility for making instances out of classes and classes out of instances  
lies at the basis of our intelligence, and it is one of the great differences  
between human thought and the thought processes of other animals. Not  
that I have ever belonged to another species and experienced at first hand  
how it feels to think their way—but from the outside it is apparent that no  
other species forms general concepts as we do, or imagines hypothetical  
worlds—variants on the world as it is, which aid in figuring out which  
future pathway to choose. For instance, consider the celebrated "language  
of the bees"—information-laden dances which are performed by worker  
bees returning to the hive, to inform other bees of the location of nectar.  
While there may be in each bee a set of rudimentary symbols which are  
activated by such a dance, there is no reason to believe that a bee has an  
expandable vocabulary of symbols. Bees and other insects do not seem to  
have the power to generalize—that is, to develop new class symbols from  
instances which we would perceive as nearly identical.
 
404
 
completely hard-wired behavior. Now in the wasp brain,  
there may be rudimentary symbols, capable of triggering each other; but  
there is nothing like the human capacity to see several instances as instances  
of an as-yet-unformed class, and then to make the class symbol; nor is there  
anything like the human ability to wonder, "What if I did this—what would  
ensue in that hypothetical world?" This type of thought process requires an  
ability to manufacture instances and to manipulate them as if they were  
symbols standing for objects in a real situation, although that situation may  
not be the case, and may never be the case.
 
404
 
Class Symbols and Imaginary Worlds
 
404
 
Let us reconsider the April Fool's joke about the borrowed car, and the  
images conjured up in your mind during the telephone call. To begin with,  
you need to activate symbols which represent a road, a car, a person in a  
car. Now the concept "road" is a very general one, with perhaps several  
stock samples which you can unconsciously pull out of dormant memory  
when the occasion arises. "Road" is a class, rather than an instance. As you  
listen to the tale, you quickly activate symbols which are instances with  
gradually increasing specificity. For instance, when you learn that the road  
was wet, this conjures up a more specific image, though you realize that it is  
most likely quite different from the actual road where the incident took  
place. But that is not important; what matters is whether your symbol is  
sufficiently well suited for the story—that is, whether the symbols which it  
can trigger are the right kind.
 
404
 
As the story progresses, you fill in more aspects of this road: there is a  
high bank against which a car could smash. Now does this mean that you  
are activating the symbol for "bank", or does it mean that you are setting  
some parameters in your symbol for "road"? Undoubtedly both. That is,  
the network of neurons which represents "road" has many different ways  
of firing, and you are selecting which subnetwork actually shall fire. At the  
same time, you are activating the symbol for "bank", and this is probably  
instrumental in the process of selecting the parameters for "road", in that  
its neurons may send signals to some of those in "road"—and vice versa. (In  
case this seems a little confusing, it is because I am somewhat straddling  
levels of description—I am trying to set up an image of the symbols, as well  
as of their component neurons.)
 
404
 
No less important than the nouns are the verbs, prepositions, etc.  
They, too, activate symbols, which send messages back and forth to each  
other. There are characteristic differences between the kinds of triggering  
patterns of symbols for verbs and symbols for nouns, of course, which  
means that they may be physically somewhat differently organized. For  
instance, nouns might have fairly localized symbols, while verbs and  
prepositions might have symbols with many "tentacles" reaching all around the  
cortex; or any number of other possibilities.
 
404
 
After the story is all over, you learn it was all untrue. The power of  
Brains and Thoughts  
361"rubbing off" instances from classes, in the way that one makes rubbings  
from brasses in churches, has enabled you to represent the situation, and  
has freed you from the need to remain faithful to the real world. The fact  
that symbols can act as templates for other symbols gives you some mental  
independence of reality: you can create artificial universes, in which there  
can happen nonreal events with any amount of detail that you care to  
imbue them with. But the class symbols themselves, from which all of this  
richness springs, are deeply grounded in reality.
 
405
 
Usually symbols play isomorphic roles to events which seem like they  
could happen, although sometimes symbols are activated which represent  
situations which could not happen—for example, watches sizzling, tubas  
laying eggs, etc. The borderline between what could and what could not  
happen is an extremely fuzzy one. As we imagine a hypothetical event, we  
bring certain symbols into active states—and depending on how well they  
interact (which is presumably reflected in our comfort in continuing the  
train of thought), we say the event "could" or "could not" happen. Thus the  
terms "could" and "could not" are extremely subjective. Actually, there is a  
good deal of agreement among people about which events could or could  
not happen. This reflects the great amount of mental structure which we all  
share—but there is a borderline area where the subjective aspect of what  
kinds of hypothetical worlds we are willing to entertain is apparent. A  
careful study of the kinds of imaginary events that people consider could  
and could not happen would yield much insight into the triggering  
patterns of the symbols by which people think.
 
405
 
Intuitive Laws of Physics
 
405
 
When the story has been completely told, you have built up quite an  
elaborate mental model of a scene, and in this model all the objects obey  
physical law. This means that physical law itself must be implicitly present  
in the triggering patterns of the symbols. Of course, the phrase "physical  
law" here does not mean "the laws of physics as expounded by a physicist",  
but rather the intuitive, chunked laws which all of us have to have in our  
minds in order to survive.
 
405
 
A curious sidelight is that one can voluntarily manufacture mental  
sequences of events which violate physical law, if one so desires. For  
instance, if I but suggest that you imagine a scene with two cars  
approaching each other and then passing right through each other, you won't have  
any trouble doing so. The intuitive physical laws can be overridden by  
imaginary laws of physics; but how this overriding is done, how such  
sequences of images are manufactured—indeed what any one visual image  
is—all of these are deeply cloaked mysteries—inaccessible pieces of  
knowledge.
 
405
 
Needless to say, we have in our brains chunked laws not only of how  
inanimate objects act, but also of how plants, animals, people and societies  
act—in other words, chunked laws of biology, psychology, sociology, and so  
362  
Brains and Thoughtson. All of the internal representations of such entities involve the inevitable  
feature of chunked models: determinism is sacrificed for simplicity. Our  
representation of reality ends up being able only to predict probabilities of  
ending up in certain parts of abstract spaces of behavior—not to predict  
anything with the precision of physics.
 
406
 
Procedural and Declarative Knowledge
 
406
 
A distinction which is made in Artificial Intelligence is that between  
procedural and declarative types of knowledge. A piece of knowledge is said to  
be declarative if it is stored explicitly, so that not only the programmer but  
also the program can "read" it as if it were in an encyclopedia or an  
almanac. This usually means that it is encoded locally, not spread around.  
By contrast, procedural knowledge is not encoded as facts—only as  
programs. A programmer may be able to peer in and say, "I see that because of  
these procedures here, the program 'knows' how to write English  
sentences"—but the program itself may have no explicit awareness of how it  
writes those sentences. For instance, its vocabulary may include none of the  
words "English", "sentence", and "write" at all! Thus procedural  
knowledge is usually spread around in pieces, and you can't retrieve it, or "key"  
on it. It is a global consequence of how the program works, not a local  
detail. In other words, a piece of purely procedural knowledge is an  
epiphenomenon.
 
406
 
In most people there coexists, along with a powerful procedural  
representation of the grammar of their native language, a weaker declarative  
representation of it. The two may easily be in conflict, so that a native  
speaker will often instruct a foreigner to say things he himself would never  
say, but which agree with the declarative "book learning" he acquired in  
school sometime. The intuitive or chunked laws of physics and other  
disciplines mentioned earlier fall mainly on the procedural side; the  
knowledge that an octopus has eight tentacles falls mainly on the declarative side.
 
406
 
In between the declarative and procedural extremes, there are all  
possible shades. Consider the recall of a melody. Is the melody stored in  
your brain, note by note? Could a surgeon extract a winding neural  
filament from your brain, then stretch it straight, and finally proceed to  
pinpoint along it the successively stored notes, almost as if it were a piece of  
magnetic tape? If so, then melodies are stored declaratively. Or is the recall  
of a melody mediated by the interaction of a large number of symbols,  
some of which represent tonal relationships, others of which represent  
emotional qualities, others of which represent rhythmic devices, and so on?  
If so, then melodies are stored procedurally. In reality, there is probably a  
mixture of these extremes in the way a melody is stored and recalled.
 
406
 
It is interesting that, in pulling a melody out of memory, most people  
do not discriminate as to key, so that they are as likely to sing "Happy  
Birthday" in the key of F-sharp as in the key of C. This indicates that tone  
relationships, rather than absolute tones, are stored. But there is no reason  
Brains and Thoughts  
363that tone relationships could not be stored quite declaratively. On the other  
hand, some melodies are very easy to memorize, whereas others are  
extremely elusive. If it were just a matter of storing successive notes, any  
melody could be stored as easily as any other. The fact that some melodies  
are catchy and others are not seems to indicate that the brain has a certain  
repertoire of familiar patterns which are activated as the melody is heard.  
So, to "play back" the melody, those patterns would have to be activated in  
the same order. This returns us to the concept of symbols triggering one  
another, rather than a simple linear sequence of declaratively stored notes  
or tone relationships.
 
407
 
How does the brain know whether a piece of knowledge is stored  
declaratively? For instance, suppose you are asked, "What is the population  
of Chicago?" Somehow the number five million springs to mind, without  
your wondering, "Gee, how would I go about counting them all?" Now  
suppose I ask you, "How many chairs are there in your living room?" Here,  
the opposite happens—instead of trying to dredge the answer out of a  
mental almanac, you immediately either go to the room and count the  
chairs, or you manufacture the room in your head and count the chairs in  
the image of the room. The questions were of a single type—"how  
many?"—yet one of them caused a piece of declarative knowledge to be  
fetched, while the other one caused a procedural method of finding the  
answer to be invoked. This is one example where it is clear that you have  
knowledge about how you classify your own knowledge; and what is more,  
some of that metaknowledge may itself be stored procedurally, so that it is  
used without your even being aware of how it is done.
 
407
 
Visual Imagery
 
407
 
One of the most remarkable and difficult-to-describe qualities of  
consciousness is visual imagery. How do we create a visual image of our living  
room? Of a roaring mountain brook? Of an orange? Even more  
mysterious, how do we manufacture images unconsciously, images which guide  
our thoughts, giving them power and color and depth? From what store  
are they fetched? What magic allows us to mesh two or three images, hardly  
giving a thought as to how we should do it? Knowledge of how to do this is  
among the most procedural of all, for we have almost no insight into what  
mental imagery is.
 
407
 
It may be that imagery is based on our ability to suppress motor  
activity. By this, I mean the following. If you imagine an orange, there may  
occur in your cortex a set of commands to pick it up, to smell it, to inspect it,  
and so on. Clearly these commands cannot be carried out, because the  
orange is not there. But they can be sent along the usual channels towards  
the cerebellum or other suborgans of the brain, until, at some critical point,  
a "mental faucet" is closed, preventing them from actually being carried  
out. Depending on how far down the line this "faucet" is situated, the  
images may be more or less vivid and real-seeming. Anger can cause us to  
364  
Brains and Thoughtsimagine quite vividly picking up some object and throwing it, or kicking  
something; yet we don't actually do so. On the other hand, we feel so "near"  
to actually doing so. Probably the faucet catches the nerve impulses "at the  
last moment".
 
408
 
Here is another way in which visualization points out the distinction  
between accessible and inaccessible knowledge. Consider how you  
visualized the scene of the car skidding on the mountain road. Undoubtedly  
you imagined the mountain as being much larger than the car. Now did  
this happen because sometime long ago you had occasion to note that "cars  
are not as big as mountains"; then you committed this statement to rote  
memory; and in imagining the story, you retrieved this fact, and made use  
of it in constructing your image? A most unlikely theory. Or did it happen  
instead as a consequence of some introspectively inaccessible interactions of  
the symbols which were activated in your brain? Obviously the latter seems  
far more likely. This knowledge that cars are smaller than mountains is not  
a piece of rote memorization, but a piece of knowledge which can be  
created by deduction. Therefore, most likely it is not stored in any single  
symbol in your brain, but rather it can be produced as a result of the  
activation, followed by the mutual interaction, of many symbols—for  
example, those for "compare", "size", "car", "mountain", and probably  
others. This means that the knowledge is stored not explicitly, but  
implicitly, in a spread-about manner, rather than as a local "packet of  
information". Such simple facts as relative sizes of objects have to be assembled,  
rather than merely retrieved. Therefore, even in the case of a verbally  
accessible piece of knowledge,, there are complex inaccessible processes  
which mediate its coming to the state of being ready to be said.
 
408
 
We shall continue our exploration of the entities called "symbols" in  
different Chapters. In Chapters XVIII and XIX, on Artificial Intelligence,  
we shall discuss some possible ways of implementing active symbols in  
programs. And next Chapter, we shall discuss some of the insights that our  
symbol-based model of brain activity give into the comparison of brains.
 
412
 
CHAPTER XII  
Minds and Thoughts
 
412
 
Can Minds Be Mapped onto Each Other?
 
412
 
Now that we have hypothesized the existence of very high-level active  
subsystems of the brain (symbols), we may return to the matter of a possible  
isomorphism, or partial isomorphism, between two brains. Instead of  
asking about an isomorphism on the neural level (which surely does not exist),  
or on the macroscopic suborgan level (which surely does exist but does not  
tell us very much), we ask about the possibility of an isomorphism between  
brains on the symbol level: a correspondence which not only maps symbols  
in one brain onto symbols in another brain, but also maps triggering  
patterns onto triggering patterns. This means that corresponding symbols  
in the two brains are linked in corresponding ways. This would be a true  
functional isomorphism—the same type of isomorphism as we spoke of  
when trying to characterize what it is that is invariant about all butterflies.
 
412
 
It is clear from the outset that such an isomorphism does not exist  
between any pair of human beings. If it did, they would be completely  
indistinguishable in their thoughts; but in order for that to be true, they  
would have to have completely indistinguishable memories, which would  
mean they would have to have led one and the same life. Even identical  
twins do not approach, in the remotest degree, this ideal.
 
412
 
How about a single individual? When you look back over things which  
you yourself wrote a few years ago, you think "How awful!" and smile with  
amusement at the person you once were. What is worse is when you do the  
same thing with something you wrote or said five minutes ago. When this  
happens, it shows that you do not fully understand the person you were  
moments ago. The isomorphism from your brain now to your brain then  
is imperfect. What, then, of the isomorphisms to other people, other  
species . . . ?
 
412
 
The opposite side of the coin is shown by the power of the  
communication that arises between the unlikeliest partners. Think of the barriers  
spanned when you read lines of poetry penned in jail by Francois Villon,  
the French poet of the 1400's. Another human being, in another era,  
captive in jail, speaking another language . . . How can you ever hope to  
have a sense of the connotations behind the facade of his words, translated  
into English? Yet a wealth of meaning comes through.
 
412
 
Thus, on the one hand, we can drop all hopes of finding exactly  
isomorphic software in humans, but on the other, it is clear that some  
people think more alike than others do. It would seem an obvious conclu-
 
414
 
sion that there is some sort of partial software isomorphism connecting the  
brains of people whose style of thinking is similar—in particular, a  
correspondence of (1) the repertoire of symbols, and (2) the triggering patterns  
of symbols.
 
414
 
Comparing Different Semantic Networks
 
414
 
But what is a partial isomorphism? This is a most difficult question to  
answer. It is made even more difficult by the fact that no one has found an  
adequate way to represent the network of symbols and their triggering  
patterns. Sometimes a picture of a small part of such a network of symbols  
is drawn, where each symbol is represented as a node into which, and out of  
which, lead some arcs. The lines represent triggering relationships—in  
some sense. Such figures attempt to capture something of the intuitively  
sensible notion of "conceptual nearness". However, there are many  
different kinds of nearness, and different ones are relevant in different contexts.  
A tiny portion of my own "semantic network" is shown in Figure 70. The  
problem is that representing a complex interdependency of many symbols  
cannot be carried out very easily with just a few lines joining vertices.
 
414
 
Another problem with such a diagram is that it is not accurate to think  
of a symbol as simply "on" or "off". While this is true of neurons, it does  
not carry upwards, to collections of them. In this respect, symbols are quite  
a bit more complicated than neurons—as you might expect, since they are  
made up of many neurons. The messages that are exchanged between  
symbols are more complex than the mere fact, "I am now activated". That  
is more like the neuron-level messages. Each symbol can be activated in  
many different ways, and the type of activation will be influential in  
determining which other symbols it tries to activate. How these intertwining  
triggering relationships can be represented in a pictorial manner—indeed,  
whether they can be at all—is not clear.
 
414
 
But for the moment, suppose that issue had been solved. Suppose we  
now agree that there are certain drawings of nodes, connected by links (let  
us say they come in various colors, so that various types of conceptual  
nearness can be distinguished from each other), which capture precisely  
the way in which symbols trigger other symbols. Then under what  
conditions would we fee} that two such drawings were isomorphic, or nearly  
isomorphic? Since we are dealing with a visual representation of the  
network of symbols, let us consider an analogous visual problem. How would  
you try to determine whether two spiderwebs had been spun by spiders  
belonging to the same species? Would you try to identify individual vertices  
which correspond exactly, thereby setting up an exact map of one web onto  
the other, vertex by vertex, fiber by fiber, perhaps even angle by angle?  
This would be a futile effort. Two webs are never exactly the same; yet  
there is still some sort of "style", "form", what-have-you, that infallibly  
brands a given species' web.
 
414
 
In any network-like structure, such as a spiderweb, one can look at  
local properties and global properties. Local properties require only a very  
Minds and Thoughts  
371nearsighted observer—for example an observer who can only see one  
vertex at a time; and global properties require only a sweeping vision,  
without attention to detail. Thus, the overall shape of a spiderweb is a  
global property, whereas the average number of lines meeting at a vertex is  
a local property. Suppose we agree that the most reasonable criterion for  
calling two spiderwebs "isomorphic" is that they should have been spun by  
spiders of the same species. Then it is interesting to ask which kind of  
observation—local or global—tends to be a more reliable guide in  
determining whether two spiderwebs are isomorphic. Without answering the  
question for spiderwebs, let us now return to the question of the  
closeness—or isomorphicness, if you will—of two symbol networks.
 
415
 
Translations of "Jabberwocky"
 
415
 
Imagine native speakers of English, French, and German, all of whom have  
excellent command of their respective native languages, and all of whom  
enjoy wordplay in their own language. Would their symbol networks be  
similar on a local level, or on a global level? Or is it meaningful to ask such a  
question? The question becomes concrete when you look at the preceding  
translations of Lewis Carroll's famous "Jabberwocky".
 
415
 
I chose this example because it demonstrates, perhaps better than an  
example in ordinary prose, the problem of trying to find "the same node"  
in two different networks which are, on some level of analysis, extremely  
nonisomorphic. In ordinary language, the task of translation is more  
straightforward, since to each word or phrase in the original language,  
there can usually be found a corresponding word or phrase in the new  
language. By contrast, in a poem of this type, many "words" do not carry  
ordinary meaning, but act purely as exciters of nearby symbols. However,  
what is nearby in one language may be remote in another.
 
415
 
Thus, in the brain of a native speaker of English, "slithy" probably  
activates such symbols as "slimy", "slither", "slippery", "lithe", and "sly", to  
varying extents. Does "lubricilleux" do the corresponding thing in the  
brain of a Frenchman? What indeed would be "the corresponding thing"?  
Would it be to activate symbols which are the ordinary translations of those  
words? What if there is no word, real or fabricated, which will accomplish  
that? Or what if a word does exist, but is very intellectual-sounding and  
Latinate ("lubricilleux"), rather than earthy and Anglo-Saxon ("slithy")?  
Perhaps "huilasse" would be better than "lubricilleux"? Or does the Latin  
origin of the word "lubricilleux" not make itself felt to a speaker of French  
in the way that it would if it were an English word ("lubricilious", perhaps)?
 
415
 
An interesting feature of the translation into French is the  
transposition into the present tense. To keep it in the past would make some  
unnatural turns of phrase necessary, and the present tense has a much  
fresher flavor in French than the past. The translator sensed that this  
would be "more appropriate"—in some ill-defined yet compelling sense—  
and made the switch. Who can say whether remaining faithful to the  
English tense would have been better?
 
416
 
In the German version, the droll phrase "er an-zu-denken-fing"  
occurs; it does not correspond to any English original. It is a playful reversal  
of words, whose flavor vaguely resembles that of the English phrase "he  
out-to-ponder set", if I may hazard a reverse translation. Most likely this  
funny turnabout of words was inspired by the similar playful reversal in the  
Eirglish of one line earlier: "So rested he by the Tumtum tree". It  
corresponds, yet doesn't correspond.  
Incidentally, why did the Tumtum tree get changed into an "arbre  
Te-te" in French? Figure it out for yourself.
 
416
 
The word "manxome" in the original, whose "x" imbues it with many  
rich overtones, is weakly rendered in German by "manchsam", which  
back-translates into English as "maniful". The French "manscant" also lacks  
the manifold overtones of "manxome". There is no end to the interest of  
this kind of translation task.  
When confronted with such an example, one realizes that it is utterly  
impossible to make an exact translation. Yet even in this pathologically  
difficult case of translation, there seems to be some rough equivalence  
obtainable. Why is this so, if there really is no isomorphism between the  
brains of people who will read the different versions? The answer is that  
there is a kind of rough isomorphism, partly global, partly local, between  
the brains of all the readers of these three poems.
 
416
 
ASU's
 
416
 
An amusing geographical fantasy will give some intuition for this kind of  
quasi-isomorphism. (Incidentally, this fantasy is somewhat similar to a  
geographical analogy devised by M. Minsky in his article on "frames",  
which can be found in P. H. Winston's book The Psychology of Computer  
Vision.) Imagine that you are given a strange atlas of the USA, with all  
natural geological features premarked—such as rivers, mountains, lakes,  
and so on—but with nary a printed word. Rivers are shown as blue lines,  
mountains by color, and so on. Now you are told to convert it into a road  
atlas for a trip which you will soon make. You must neatly fill in the names  
of all states, their boundaries, time zones, then all counties, cities, towns, all  
freeways and highways and toll routes, all county roads, all state and  
national parks, campgrounds, scenic areas, dams, airports, and so on . . .  
All of this must be carried out down to the level that would appear in a  
detailed road atlas. And it must be manufactured out of your own head.  
You are not allowed access to any information which would help you for  
the duration of your task.
 
416
 
You are told that it will pay off, in ways that will become clear at a later  
date, to make your map as true as you can. Of course, you will begin by  
filling in large cities and major roads, etc., which you know. And when you  
have exhausted your factual knowledge of an area, it will be to your  
advantage to use your imagination to help you reproduce at least the flavor  
of that area, if not its true geography, by making up fake town names, fake  
populations, fake roads, fake parks, and so on. This arduous task will take  
Minds and Thoughts  
373months. To make things a little easier, you have a cartographer on hand to  
print everything in neatly. The end product will be your personal map of  
the "Alternative Structure of the Union"—your own personal "ASU".
 
417
 
Your personal ASU will be very much like the USA in the area where  
you grew up. Furthermore, wherever your travels have chanced to lead  
you, or wherever you have perused maps with interest, your ASU will have  
spots of striking agreement with the USA: a few small towns in North  
Dakota or Montana, perhaps, or the whole of metropolitan New York,  
might be quite faithfully reproduced in your ASU.
 
417
 
A Surprise Reversal
 
417
 
When your ASU is done, a surprise takes place. Magically, the country you  
have designed comes into being, and you are transported there. A friendly  
committee presents you with your favorite kind of automobile, and  
explains that, "As a reward for your designing efforts, you may now enjoy an  
all-expense-paid trip, at a leisurely pace, around the good old A. S. of U.  
You may go wherever you want, do whatever you wish to do, taking as long  
as you wish—compliments of the Geographical Society of the ASU.  
And—to guide you around—here is a road atlas." To your surprise, you  
are given not the atlas which you designed, but a regular road atlas of the  
USA.
 
417
 
When you embark on your trip, all sorts of curious incidents will take  
place. A road atlas is being used to guide you through a country which it  
only partially fits. As long as you stick to major freeways, you will probably  
be able to cross the country without gross confusions. But the moment you  
wander off into the byways of New Mexico or rural Arkansas, there will be  
adventure in store for you. The locals will not recognize any of the towns  
you're looking for, nor will they know the roads you're asking about. They  
will only know the large cities you name, and even then the routes to those  
cities will not be the same as are indicated on your map. It will happen  
occasionally that some of the cities which are considered huge by the locals  
are nonexistent on your map of the USA; or perhaps they exist, but their  
population according to the atlas is wrong by an order of magnitude.
 
417
 
Centrality and Universality
 
417
 
What makes an ASU and the USA, which are so different in some ways,  
nevertheless so similar? It is that their most important cities and routes of  
communication can be mapped onto each other. The differences between  
them are found in the less frequently traveled routes, the cities of smaller  
size, and so on. Notice that this cannot be characterized either as a local or a  
global isomorphism. Some correspondences do extend down to the very  
local level—for instance, in both New Yorks, the main street may be Fifth  
Avenue, and there may be a Times Square in both as well—yet there may  
not be a single town that is found in both Montanas. So the local-global  
374  
Minds and Thoughtsdistinction is not relevant here. What is relevant is the centrality of the city,  
in terms of economics, communication, transportation, etc. The more vital  
the city is, in one of these ways, the more certain it will be to occur in both  
the ASU and the USA.
 
418
 
In this geographic analogy, one aspect is very crucial: that there are  
certain definite, absolute points of reference which will occur in nearly all  
ASU's: New York, San Francisco, Chicago, and so on. From these it is then  
possible to orient oneself. In other words, if we begin comparing my ASU  
with yours, I can use the known agreement on big cities to establish points  
of reference with which I can communicate the location of smaller cities in  
my ASU. And if I hypothesize a voyage from Kankakee to Fruto and you  
don't know where those towns are, I can refer to something we have in  
common, and thereby guide you. And if I talk about a voyage from Atlanta  
to Milwaukee, it may go along different freeways or smaller roads, but the  
voyage itself can still be carried out in both countries. And if you start  
describing a trip from Horsemilk to Janzo, I can plot out what seems to me  
to be an analogous trip in my ASU, despite not having towns by those  
names, as long as you constantly keep me oriented by describing your  
position with respect to nearby larger towns which are found in my ASU as  
well as in yours.
 
418
 
My roads will not be exactly the same as yours, but, with our separate  
maps, we can each get from a particular part of the country to another. We  
can do this, thanks to the external, predetermined geological facts—  
mountain chains, streams, etc.—facts which were available to us both  
as we worked on our maps. Without those external features, we would  
have no possibility of reference points in common. For instance, if you had  
been given only a map of France, and I had been given a map of Germany,  
and then we had both filled them in in great detail, there would be no way  
to try to find "the same place" in our fictitious lands. It is necessary to begin  
with identical external conditions—otherwise nothing will match.
 
418
 
Now that we have carried our geographical analogy quite far, we  
return to the question of isomorphisms between brains. You might well  
wonder why this whole question of brain isomorphisms has been stressed so  
much. What does it matter if two brains are isomorphic, or quasi-isomor-  
phic, or not isomorphic at all? The answer is that we have an intuitive sense  
that, although other people differ from us in important ways, they are still  
"the same" as we are in some deep and important ways. It would be  
instructive to be able to pinpoint what this invariant core of human  
intelligence is, and then to be able to describe the kinds of "embellishments"  
which can be added to it, making each one of us a unique embodiment of  
this abstract and mysterious quality called "intelligence".
 
418
 
In our geographic analogy, cities and towns were the analogues of  
symbols, while roads and highways were analogous to potential triggering  
paths. The fact that all ASU's have some things in common, such as the East  
Coast, the West Coast, the Mississippi River, the Great Lakes, the Rockies,  
and many major cities and roads is analogous to the fact that we are all  
forced, by external realities, to construct certain class symbols and trigger-  
Minds and Thoughts  
375ing paths in the same way. These core symbols are like the large cities, to  
which everyone can make reference without ambiguity. (Incidentally, the  
fact that cities are localized entities should in no way be taken as indicative  
that symbols in a brain are small, almost point-like entities. They are merely  
symbolized in that manner in a network.)
 
419
 
The fact is that a large proportion of every human's network of  
symbols is universal. We simply take what is common to all of us so much for  
granted that it is hard to see how much we have in common with other  
people. It takes the conscious effort of imagining how much—or how  
little—we have in common with other types of entities, such as stones, cars,  
restaurants, ants, and so forth, to make evident the large amount of overlap  
that we have with randomly chosen people. What we notice about another  
person immediately is not the standard overlap, because that is taken for  
granted as soon as we recognize the humanity of the other person; rather,  
we look beyond the standard overlap and generally find some major  
differences, as well as some unexpected, additional overlap.
 
419
 
Occasionally, you find that another person is missing some of what you  
thought was the standard, minimal core—as if Chicago were missing from  
their ASU, which is almost unimaginable. For instance, someone might not  
know what an elephant is, or who is President, or that the earth is round. In  
such cases, their symbolic network is likely to be so fundamentally different  
from your own that significant communication will be difficult. On the  
other hand, perhaps this same person will share some specialized kind of  
knowledge with you—such as expertise in the game of dominoes—so that  
you can communicate well in a limited domain. This would be like meeting  
someone who comes from the very same rural area of North Dakota as you  
do, so that your two ASU's coincide in great detail over a very small region,  
which allows you to describe how to get from one place to another very  
fluently.
 
419
 
How Much Do Language and Culture Channel Thought?
 
419
 
If we now go back to comparing our own symbol network with those of a  
Frenchman and a German, we can say that we expect them to have the  
standard core of class symbols, despite the fact of different native  
languages. We do not expect to share highly specialized networks with them,  
but we do not expect such sharing with a randomly chosen person who  
shares our native language, either. The triggering patterns of people with  
other languages will be somewhat different from our own, but still the  
major class symbols, and the major routes between them, will be universally  
available, so that more minor routes can be described with reference to  
them.
 
419
 
Now each of our three people may in addition have some command of  
the languages of the other two. What is it that marks the difference  
between true fluency, and a mere ability to communicate? First of all,  
someone fluent in English uses most words at roughly their regular  
frequencies. A non-native speaker will have picked up some words from  
376  
Minds and Thoughtsdictionaries, novels, or classes—words which at some time may have been  
prevalent or preferable, but which are now far down in frequency—for  
example, "fetch" instead of "get", "quite" instead of "very", etc. Though the  
meaning usually comes through, there is an alien quality transmitted by the  
unusual choice of words.
 
420
 
But suppose that a foreigner learns to use all words at roughly the  
normal frequencies. Will that make his speech truly fluent? Probably not.  
Higher than the word level, there is an association level, which is attached  
to the culture as a whole—its history, geography, religion, children's  
stories, literature, technological level, and so on. For instance, to be able to  
speak modern Hebrew absolutely fluently, you need to know the Bible  
quite well in Hebrew, because the language draws on a stock of biblical  
phrases and their connotations. Such an association level permeates each  
language very deeply. Yet there is room for all sorts of variety inside  
fluency—otherwise the only truly fluent speakers would be people whose  
thoughts were the most stereotyped possible!
 
420
 
Although we should recognize the depth to which culture affects  
thought, we should not overstress the role of language in molding thoughts.  
For instance, what we might call two "chairs" might be perceived by a  
speaker of French as objects belonging to two distinct types: "chaise" and  
"fauteuil" ("chair" and "armchair"). People whose native language is  
French are more aware of that difference than we are—but then people  
who grow up in a rural area are more aware of, say, the difference between  
a pickup and a truck, than a city dweller is. A city dweller may call them  
both "trucks". It is not the difference in native language, but the difference  
in culture (or subculture), that gives rise to this perceptual difference.
 
420
 
The relationships between the symbols of people with different native  
languages have every reason to be quite similar, as far as the core is  
concerned, because everyone lives in the same world. When you come  
down to more detailed aspects of the triggering patterns, you will find that  
there is less in common. It would be like comparing rural areas in  
Wisconsin in ASU's which had been made up by people who had never lived in  
Wisconsin. This will be quite irrelevant, however, as long as there is  
sufficient agreement on the major cities and major routes, so that there are  
common points of reference all over the map.
 
420
 
Trips and Itineraries in ASU's
 
420
 
Without making it explicit, I have been using an image of what a "thought"  
is in the ASU-analogy—namely, I have been implying that a thought  
corresponds to a trip. The towns which are passed through represent the symbols  
which are excited. This is not a perfect analogy, but it is quite strong. One  
problem with it is that when a thought recurs in someone's mind  
sufficiently often, it can get chunked into a single concept. This would  
correspond to quite a strange event in an ASU: a commonly taken trip  
would become, in some strange fashion, a new town or city! If one is to  
continue to use the ASU-metaphor, then, it is important to remember that  
Minds and Thoughts  
377the cities represent not only the elementary symbols, such as those for  
"grass", "house", and "car", but also symbols which get created as a result of  
the chunking ability of a brain—symbols for such sophisticated concepts as  
"crab canon", "palindrome", or "ASU".
 
421
 
Now if it is granted that the notion of taking a trip is a fair counterpart  
to the notion of having a thought, then the following difficult issue comes  
up: virtually any route leading from one city to a second, then to a third,  
and so on, can be imagined, as long as one remembers that some  
intervening cities are also passed through. This would correspond to the activation  
of an arbitrary sequence of symbols, one after another, making allowance for  
some extra symbols—those which lie en route. Now if virtually any  
sequence of symbols can be activated in any desired order, it may seem that a  
brain is an indiscriminate system, which can absorb or produce any thought  
whatsoever. But we all know that that is not so. In fact, there are certain  
kinds of thoughts which we call knowledge, or beliefs, which play quite a  
different role from random fancies, or humorously entertained  
absurdities. How can we characterize the difference between dreams, passing  
thoughts, beliefs, and pieces of knowledge?
 
421
 
Possible, Potential, and Preposterous Pathways
 
421
 
There are some pathways—you can t hink of them as pathways either in an  
ASU or in a brain—which are taken routinely in going from one place to  
another. There are other pathways which can only be followed if one is led  
through them by the hand. These pathways are "potential pathways",  
which would be followed only if special external circumstances arose. The  
pathways which one relies on over and over again are pathways which  
incorporate knowledge—and here I mean not only knowledge of facts  
(declarative knowledge), but also knowledge of how-to''s (procedural  
knowledge). These stable, reliable pathways are what constitute knowledge.  
Pieces of knowledge merge gradually with beliefs, which are also  
represented by reliable pathways, but perhaps ones which are more susceptible  
to replacement if, so to speak, a bridge goes out, or there is heavy fog. This  
leaves us with fancies, lies, falsities, absurdities, and other variants. These  
would correspond to peculiar routes such as: New York City to Newark via  
Bangor, Maine and Lubbock, Texas. They are indeed possible pathways,  
but ones which are not likely to be stock routes, used in everyday voyages.
 
421
 
A curious, and amusing, implication of this model is that all of the  
"aberrant" kinds of thoughts listed above are composed, at rock bottom,  
completely out of beliefs or pieces of knowledge. That is, any weird and  
snaky indirect route breaks up into a number of non-weird, non-snaky  
direct stretches, and these short, straightforward symbol-connecting routes  
represent simple thoughts that one can rely on—beliefs and pieces of  
knowledge. On reflection, this is hardly surprising, however, since it is quite  
reasonable that we should only be able to imagine fictitious things that are  
somehow grounded in the realities we have experienced, no matter how  
378  
Minds and Thoughtswildly they deviate from them. Dreams are perhaps just such random  
meanderings about the ASU's of our minds. Locally, they make sense—but  
globally . . .
 
422
 
Different Styles of Translating Novels
 
422
 
A poem like "Jabberwocky" is like an unreal journey around an ASU,  
hopping from one state to another very quickly, following very curious  
routes. The translations convey this aspect of the poem, rather than the  
precise sequence of symbols which are triggered, although they do their  
best in that respect. In ordinary prose, such leaps and bounds are not so  
common. However, similar problems of translation do occur. Suppose you  
are translating a novel from Russian to English, and come across a sentence  
whose literal translation is, "She had a bowl of borscht." Now perhaps many  
of your readers will have no idea what borscht is. You could attempt to  
replace it by the "corresponding" item in their culture—thus, your  
translation might run, "She had a bowl of Campbell's soup." Now if you think this  
is a silly exaggeration, take a look at the first sentence of Dostoevsky's novel  
Crime and Punishment in Russian and then in a few different English  
translations. I happened to look at three different English paperback translations,  
and found the following curious situation.
 
422
 
The first sentence employs the street name "S. Pereulok" (as  
transliterated). What is the meaning of this? A careful reader of Dostoevsky's work  
who knows Leningrad (which used to be called "St. Petersburg"—or should  
I say "Petrograd"?) can discover by doing some careful checking of the rest  
of the geography in the book (which incidentally is also given only by its  
initials) that the street must be "Stoliarny Pereulok". Dostoevsky probably  
wished to tell his story in a realistic way, yet not so realistically that people  
would take literally the addresses at which crimes and other events were  
supposed to have occurred. In any case, we have a translation problem; or  
to be more precise, we have several translation problems, on several  
different levels.
 
422
 
First of all, should we keep the initial so as to reproduce the aura of  
semi-mystery which appears already in this first sentence of the book? We  
would get "S. Lane" ("lane" being the standard translation of "pereulok").  
None of the three translators took this tack. However, one chose to write  
"S. Place". The translation of Crime and Punishment which I read in high  
school took a similar option. I will never forget the disoriented feeling I  
experienced when I began reading the novel and encountered those streets  
with only letters for names. I had some sort of intangible malaise about the  
beginning of the book; I was sure that I was missing something essential,  
and yet I didn't know what it was ... I decided that all Russian novels were  
very weird.
 
422
 
Now we could be frank with the reader (who, it may be assumed,  
probably won't have the slightest idea whether the street is real or fictitious  
anyway!) and give him the advantage of our modern scholarship, writing  
Minds and Thoughts  
379"Stoliarny Lane" (or "Place"). This was the choice of translator number 2,  
who gave the translation as "Stoliarny Place".
 
423
 
What about number 3? This is the most interesting of all. This  
translation says "Carpenter's Lane". And why not, indeed? After all, "stoliar"  
means "carpenter" and "ny" is an adjectival ending. So now we might  
imagine ourselves in London, not Petrograd, and in the midst of a situation  
invented by Dickens, not Dostoevsky. Is that what we want? Perhaps we  
should just read a novel by Dickens instead, with the justification that it is  
"the corresponding work in English". When viewed on a sufficiently high  
level, it is a "translation" of the Dostoevsky novel—in fact, the best possible  
one! Who needs Dostoevsky?
 
423
 
We have come all the way from attempts at great literal fidelity to the  
author's style, to high-level translations of flavor. Now if this happens  
already in the first sentence, can you imagine how it must go on in the rest  
of the book? What about the point where a German landlady begins  
shouting in her German-style Russian? How do you translate broken  
Russian spoken with a German accent, into English?
 
423
 
Then one may also consider the problems of how to translate slang and  
colloquial modes of expression. Should one search for an "analogous"  
phrase, or should one settle for a word-by-word translation? If you search  
for an analogous phrase, then you run the risk of committing a "Campbell's  
soup" type of blunder; but if you translate every idiomatic phrase word by  
word, then the English will sound alien. Perhaps this is desirable, since the  
Russian culture is an alien one to speakers of English. But a speaker of  
English who reads such a translation will constantly be experiencing,  
thanks to the unusual turns of phrase, a sense—an artificial sense—of  
strangeness, which was not intended by the author, and which is not  
experienced by readers of the Russian original.
 
423
 
Problems such as these give one pause in considering such statements  
as this one, made by Warren Weaver, one of the first advocates of  
translation by computer, in the late 1940's: 'When I look at an article in Russian, I  
say, 'This is really written in English, hut it has been coded in some strange  
symbols. I will now proceed to decode.'"1 Weaver's remark simply cannot  
be taken literally; it must rather be considered a provocative way of saying  
that there is an objectively describable meaning hidden in the symbols, or at  
least something pretty close to objective; therefore, there would be no  
reason to suppose a computer could not ferret it out, if sufficiently well  
programmed.
 
423
 
High-Level Comparisons between Programs
 
423
 
Weaver's statement is about translations between different natural  
languages. Let's consider now the problem of translating between two  
computer languages. For instance, suppose two people have written programs  
which run on different computers, and we want to know if the two  
programs carry out the same task. How can we find out? We must compare the  
programs. But on what level should this be done? Perhaps one program-  
380  
Minds and Thoughtsmer wrote in a machine language, the other in a compiler language. Are  
two such programs comparable? Certainly. But how to compare them? One  
way might be to compile the compiler language program, producing a  
program in the machine language of its home computer.
 
424
 
Now we have two machine language programs. But there is another  
problem: there are two computers, hence two different machine  
languages—and they may be extremely different. One machine may have  
sixteen-bit words; the other thirty-six-bit words. One machine may have  
built-in stack-handling instructions (pushing and popping), while the other  
lacks them. The differences between the hardware of the two machines  
may make the two machine language programs seem incomparable—and  
yet we suspect they are performing the same task, and we would like to see  
that at a glance. We are obviously looking at the programs from much too  
close a distance.
 
424
 
What we need to do is to step back, away from machine language,  
towards a higher, more chunked view. From this vantage point, we hope we  
will be able to perceive chunks of program which make each program seem  
rationally planned out on a global, rather than a local, scale—that is, chunks  
which fit together in a way that allows one to perceive the goals of the  
programmer. Let us assume that both programs were originally written in  
high-level languages. Then some chunking has already been done for us.  
But we will run into other troubles. There is a proliferation of such  
languages: Fortran, Algol, LISP, APL, and many others. How can you  
compare a program written in APL with one written in Algol? Certainly not  
by matching them up line by line. You will again chunk these programs in  
your mind, looking for conceptual, functional units which correspond.  
Thus, you are not comparing hardware, you are not comparing  
software—you are comparing "etherware"—the pure concepts which lie  
back of the software. There is some sort of abstract "conceptual skeleton"  
which must be lifted out of low levels before you can carry out a meaningful  
comparison of two programs in different computer languges, of two  
animals, or of two sentences in different natural languages.
 
424
 
Now this brings us back to an earlier question which we asked about  
computers and brains: How can we make sense of a low-level description of  
a computer or a brain? Is there, in any reasonable sense, an objective way to  
pull a high-level description out of a low-level one, in such complicated  
systems? In the case of a computer, a full display of the contents of  
memory—a so-called memory dump—is easily available. Dumps were  
commonly printed out in the early days of computing, when something went  
wrong with a program. Then the programmer would have to go home and  
pore over the memory dump for hours, trying to understand what each  
minuscule piece of memory represented. In essence, the programmer  
would be doing the opposite of what a compiler does: he would be  
translating from machine language into a higher-level language, a conceptual  
language. In the end, the programmer would understand the goals of the  
program and could describe it in high-level terms—for example, "This  
program translates novels from Russian to English", or "This program  
composes an eight-voice fugue based on any theme which is fed in".
 
425
 
High-Level Comparisons between Brains
 
425
 
Now our question must be investigated in the case of brains. In this case, we  
are asking, "Are people's brains also capable of being 'read', on a high  
level? Is there some objective description of the content of a brain?" In the  
Ant Fugue, the Anteater claimed to be able to tell what Aunt Hillary was  
thinking about, by looking at the scurryings of her component ants. Could  
some superbeing—a Neuroneater, perhaps—conceivably look down on our  
neurons, chunk what it sees, and come up with an analysis of our thoughts?
 
425
 
Certainly the answer must be yes, since we are all quite able to describe,  
in chunked (i.e., non-neural) terms, the activity of our minds at any given  
time. This means that we have a mechanism which allows us to chunk our  
own brain state to some rough degree, and to give a functional description  
of it. To be more precise, we do not chunk all of the brain state—we only  
chunk those portions of it which are active. However, if someone asks us  
about a subject which is coded in a currently inactive area of our brain, we  
can almost instantly gain access to the appropriate dormant area and come  
up with a chunked description of it—that is, some belief on that subject.  
Note that we come back with absolutely zero information on the neural  
level of that part of the brain: our description is so chunked that we don't  
even have any idea what part of our brain it is a description of. This can be  
contrasted with the programmer whose chunked description comes from  
conscious analysis of every part of the memory dump.
 
425
 
Now if a person can provide a chunked description of any part of his  
own brain, why shouldn't an outsider too, given some nondestructive  
means of access to the same brain, not only be able to chunk limited  
portions of the brain, but actually to give a complete chunked description  
of it—in other words, a complete documentation of the beliefs of the  
person whose brain is accessible? It is obvious that such a description would  
have an astronomical size, but that is not of concern here. We are interested  
in the question of whether, in principle, there exists a well-defined, high-  
level description of a brain, or whether, conversely, the neuron-level  
description—or something equally physiological and intuitively  
unenlightening—is the best description that in principle exists. Surely, to  
answer this question would be of the highest importance if we seek to know  
whether we can ever understand ourselves.
 
425
 
Potential Beliefs, Potential Symbols
 
425
 
It is my contention that a chunked description is possible, but when we get  
it, all will not suddenly be clear and light. The problem is that in order to  
pull a chunked description out of the brain state, we need a language to  
describe our findings. Now the most appropriate way to describe a brain, it  
would seem, would be to enumerate the kinds of thoughts it could  
entertain, and the kinds of thoughts it could not entertain—or, perhaps, to  
enumerate its beliefs and the things which it does not believe. If that is the  
382  
Minds and Thoughtskind of goal we will be striving for in a chunked description, then it is easy  
to see what kinds of troubles we will run up against.
 
426
 
Suppose you wanted to enumerate all possible voyages that could be  
taken in an ASU; there are infinitely many. How do you determine which  
ones are plausible, though? Well, what does "plausible" mean? We will have  
precisely this kind of difficulty in trying to establish what a "possible  
pathway" from symbol to symbol in a brain is. We can imagine an upside-  
down dog flying through the air with a cigar in its mouth—or a collision  
between two giant fried eggs on a freeway—or any number of other  
ridiculous images. The number of far-fetched pathways which can be  
followed in our brains is without bound, just as is the number of insane  
itineraries that could be planned on an ASU. But just what constitutes a  
"sane" itinerary, given an ASU? And just what constitutes a "reasonable"  
thought, given a brain state? The brain state itself does not forbid any  
pathway, because for any pathway there are always circumstances which  
could force the following of that pathway. The physical status of a brain, if  
read correctly, gives information telling not which pathways could be  
followed, but rather how much resistance would be offered along the way.
 
426
 
Now in an ASU, there are many trips which could be taken along two  
or more reasonable alternative routes. For example, the trip from San  
Francisco to New York could go along either a northern route or a  
southern route. Each of them is quite reasonable, but people tend to take them  
under different circumstances. Looking at a map at a given moment in time  
does not tell you anything about which route will be preferable at some  
remote time in the future—that depends on the external circumstances  
under which the trip is to be taken. Likewise, the "reading" of a brain state  
will reveal that several reasonable alternative pathways are often available,  
connecting a given set of symbols. However, the trip among these symbols  
need not be imminent; it may be simply one of billions of "potential" trips,  
all of which figure in the readout of the brain state. From this follows an  
important conclusion: there is no information in the brain state itself which  
tells which route will be chosen. The external circumstances will play a  
large determining role in choosing the route.
 
426
 
What does this imply? It implies that thoughts which clash totally may  
be produced by a single brain, depending on the circumstances. And any  
high-level readout of the brain state which is worth its salt must contain all  
such conflicting versions. Actually this is quite obvious—that we all are  
bundles of contradictions, and we manage to hang together by bringing out  
only one side of ourselves at a given time. The selection cannot be  
predicted in advance, because the conditions which will force the selection are  
not known in advance. What the brain state can provide, if properly read, is  
a conditional description of the selection of routes.
 
426
 
Consider, for instance, the Crab's plight, described in the Prelude. He  
can react in various ways to the playing of a piece of music. Sometimes he  
will be nearly immune to it, because he knows it so well. Other times, he will  
be quite excited by it, but this reaction requires the right kind of triggering  
from the outside—for instance, the presence of an enthusiastic listener, to  
Minds and Thoughts  
383whom the work is new. Presumably, a high-level reading of the Crab's brain  
state would reveal the potential thrill (and conditions which would induce  
it), as well as the potential numbness (and conditions which would induce  
it). The brain state itself would not tell which one would occur on the next  
hearing of the piece, however; it could only say, "If such-&-such conditions  
obtain, then a thrill will result; otherwise . . ."
 
427
 
Thus a chunked description of a brain state would give a catalogue of  
beliefs which would be evoked conditionally, dependent on circumstances.  
Since not all possible circumstances can be enumerated, one would have to  
settle for those which one thinks are "reasonable". Furthermore, one would  
have to settle for a chunked description of the circumstances themselves,  
since they obviously cannot—and should not—be specified down to the  
atomic level! Therefore, one will not be able to make an exact, deterministic  
prediction saying which beliefs will be pulled out of the brain state by a  
given chunked circumstance. In summary, then, a chunked description of a  
brain state will consist of a probabilistic catalogue, in which are listed those  
beliefs which are most likely to be induced (and those symbols which are  
most likely to be activated) by various sets of "reasonably likely"  
circumstances, themselves described on a chunked level. Trying to chunk  
someone's beliefs without referring to context is precisely as silly as trying  
to describe the range of a single person's "potential progeny" without  
referring to the mate.
 
427
 
The same sorts of problems arise in enumerating all the symbols in a  
given person's brain. There are potentially not only an infinite number of  
pathways in a brain, but also an infinite number of symbols. As was pointed  
out, new concepts can always be formed from old ones, and one could  
argue that the symbols which represent such new concepts are merely  
dormant symbols in each individual, waiting to be awakened. They may  
never get awakened in the person's lifetime, but it could be claimed that  
those symbols are nonetheless always there, just waiting for the right  
circumstances to trigger their synthesis. However, if the probability is very  
low, it would seem that "dormant" would be a very unrealistic term to apply  
in the situation. To make this clear, try to imagine all the "dormant  
dreams" which are sitting there inside your skull while you're awake. Is it  
conceivable that there exists a decision procedure which could tell  
"potentially dreamable themes" from "undreamable themes", given your brain  
state?
 
427
 
Where Is the Sense of Self?
 
427
 
Looking back on what we have discussed, you might think to yourself,  
"These speculations about brain and mind are all well and good, but what  
about the feelings involved in consciousness? These symbols may trigger  
each other all they want, but unless someone perceives the whole thing,  
there's no consciousness."  
This makes sense to our intuition on some level, but it does not make  
much sense logically. For we would then be compelled to look for an  
384  
Minds and Thoughtsexplanation of the mechanism which does the perceiving of all the active  
symbols, if it is not covered by what we have described so far. Of course, a  
"soulist" would not have to look any further—he would merely assert that  
the perceiver of all this neural action is the soul, which cannot be described  
in physical terms, and that is that. However, we shall try to give a "non-  
soulist" explanation of where consciousness arises.
 
428
 
Our alternative to the soulist explanation—and a disconcerting one it  
is, too—is to stop at the symbol level and say, "This is it—this is what  
consciousness is. Consciousness is that property of a system that arises  
whenever there exist symbols in the system which obey triggering patterns  
somewhat like the ones described in the past several sections." Put so  
starkly, this may seem inadequate. How does it account for the sense of "I",  
the sense of self?
 
428
 
Subsystems
 
428
 
There is no reason to expect that "I", or "the self", should not be  
represented by a symbol. In fact, the symbol for the self is probably the most  
complex of all the symbols in the brain. For this reason, I choose to put it on  
a new level of the hierarchy and call it a subsystem, rather than a symbol. To  
be precise, by "subsystem", I mean a constellation of symbols, each of which  
can be separately activated under the control of the subsystem itself. The  
image I wish to convey of a subsystem is that it functions almost as an  
independent "subbrain", equipped with its own repertoire of symbols  
which can trigger each other internally. Of course, there is also much  
communication between the subsystem and the "outside" world—that is,  
the rest of the brain. "Subsystem" is just another name for an overgrown  
symbol, one which has gotten so complicated that it has many subsymbols  
which interact among themselves. Thus, there is no strict level distinction  
between symbols and subsystems.
 
428
 
Because of the extensive links between a subsystem and the rest of the  
brain (some of which will be described shortly), it would be very difficult to  
draw a sharp boundary between the subsystem and the outside; but even if  
the border is fuzzy, the subsystem is quite a real thing. The interesting  
thing about a subsystem is that, once activated and left to its own devices, it  
can work on its own. Thus, two or more subsystems of the brain of an  
individual may operate simultaneously. I have noticed this happening on  
occasion in my own brain: sometimes I become aware that two different  
melodies are running through my mind, competing for "my" attention.  
Somehow, each melody is being manufactured, or "played", in a separate  
compartment of my brain. Each of the systems responsible for drawing a  
melody out of my brain is presumably activating a number of symbols, one  
after another, completely oblivious to the other system doing the same  
thing. Then they both attempt to communicate with a third subsystem of  
my brain—my self-symbol—and it is at that point that the "I" inside my  
brain gets wind of what's going on; in other words, it starts picking up a  
chunked description of the activities of those two subsystems.
 
429
 
429
 
Typical subsystems might be those that represent the people we know  
intimately. They are represented in such a complex way in our brains that  
their symbols enlarge to the rank of subsystem, becoming able to act  
autonomously, making use of some resources in our brains for support. By  
this, I mean that a subsystem symbolizing a friend can activate many of the  
symbols in my brain just as I can. For instance, I can fire up my subsystem  
for a good friend and virtually feel myself in his shoes, running through  
thoughts which he might have, activating symbols in sequences which  
reflect his thinking patterns more accurately than my own. It could be said  
that my model of this friend, as embodied in a subsystem of my brain,  
constitutes my own chunked description of his brain.
 
429
 
Does this subsystem include, then, a symbol for every symbol which I  
think is in his brain? That would be redundant. Probably the subsystem  
makes extensive use of symbols already present in my brain. For instance,  
the symbol for "mountain" in my brain can be borrowed by the subsystem,  
when it is activated. The way in which that symbol is then used by the  
subsystem will not necessarily be identical to the way it is used by my full  
brain. In particular, if I am talking with my friend about the Tien Shan  
mountain range in Central Asia (neither of us having been there), and I  
know that a number of years ago he had a wonderful hiking experience in  
the Alps, then my interpretation of his remarks will be colored in part by  
my imported images of his earlier Alpine experience, since I will be trying  
to imagine how he visualizes the area.
 
429
 
In the vocabulary we have been building up in this Chapter, we could  
say that the activation of the "mountain" symbol in me is under control of  
my subsystem representing him. The effect of this is to open up a different  
window onto to my memories from the one which I normally use—namely,  
my "default option" switches from the full range of my memories to the set  
of my memories of his memories. Needless to say, my representations of his  
memories are only approximations to his actual memories, which are  
complex modes of activation of the symbols in his brain, inaccessible to me.
 
429
 
My representations of his memories are also complex modes of  
activation of my own symbols—those for "primordial" concepts, such as grass,  
trees, snow, sky, clouds, and so on. These are concepts which I must assume  
are represented in him "identically" to the way they are in me. I must also  
assume a similar representation in him of even more primordial notions:  
the experiences of gravity, breathing, fatigue, color, and so forth. Less  
primordial but perhaps a nearly universal human quality is the enjoyment  
of reaching a summit and seeing a view. Therefore, the intricate processes  
in my brain which are responsible for this enjoyment can be taken over  
directly by the friend-subsystem without much loss of fidelity.
 
429
 
430
 
We should note, however, that computer systems are beginning to run  
into some of the same kinds of complexity, and therefore some of these  
notions have been given names. For instance, my "mountain" symbol is  
analogous to what in computer jargon is called shared (or reentrant) code—  
code which can be used by two or more separate timesharing programs  
running on a single computer. The fact that activation of one symbol can  
have different results when it is part of different subsystems can be  
explained by saying that its code is being processed by different interpreters.  
Thus, the triggering patterns in the "mountain" symbol are not absolute;  
they are relative to the system within which the symbol is activated.
 
430
 
The reality of such "subbrains" may seem doubtful to some. Perhaps  
the following quote from M. C. Escher, as he discusses how he creates his  
periodic plane-filling drawings, will help to make clear what kind of  
phenomenon I am referring to:  
While drawing I sometimes feel as if I were a spiritualist medium, controlled  
by the creatures which I am conjuring up. It is as if they themselves decide on  
the shape in which they choose to appear. They take little account of my  
critical opinion during their birth and I cannot exert much influence on the  
measure of their development. They are usually very difficult and obstinate  
creatures.2  
Here is a perfect example of the near-autonomy of certain subsystems of  
the brain, once they are activated. Escher's subsystems seemed to him  
almost to be able to override his esthetic judgment. Of course, this opinion  
must be taken with a grain of salt, since those powerful subsystems came  
into being as a result of his many years of training and submission to  
precisely the forces that molded his esthetic sensitivities. In short, it is  
wrong to divorce the subsystems in Escher's brain from Escher himself or  
from his esthetic judgement. They consititute a vital part of his esthetic  
sense, where "he" is the complete being of the artist.
 
430
 
The Self-Symbol and Consciousness
 
430
 
A very important side effect of the 5^-subsystem is that it can play the role  
of "soul", in the following sense: in communicating constantly with the rest  
of the subsystems and symbols in the brain, it keeps track of what symbols  
are active, and in what way. This means that it has to have symbols for  
mental activity—in other words, symbols for symbols, and symbols for the  
actions of symbols.
 
431
 
What kind of guarantee is there that a subsystem, such as I have here  
postulated, which represents the self, actually exists in our brains? Could a  
whole complex network of symbols such as has been described above evolve  
without a self-symbol evolving? How could these symbols and their  
activities play out "isomorphic" mental events to real events in the  
surrounding universe, if there were no symbol for the host organism? All the stimuli  
coming into the system are centered on one small mass in space. It would be  
quite a glaring hole in a brain's symbolic structure not to have a symbol for  
the physical object in which it is housed, and which plays a larger role in the  
events it mirrors than any other object. In fact, upon reflection, it seems  
that the only way one could make sense of the world surrounding a  
localized animate object is to understand the role of that object in relation  
to the other objects around it. This necessitates the existence of a self-  
symbol; and the step from symbol to subsystem is merely a reflection of the  
importance of the self-symbol, and is not a qualitative change.
 
431
 
Our First Encounter with Lucas
 
431
 
The Oxford philosopher J. R. Lucas (not connected with the Lucas  
numbers described earlier) wrote a remarkable article in 1961, entitled "Minds,  
Machines, and Godel". His views are quite opposite to mine, and yet he  
manages to mix many of the same ingredients together in coming up with  
his opinions. The following excerpt is quite relevant to what we have just  
been discussing:  
At one's first and simplest attempts to philosophize, one becomes entangled in  
questions of whether when one knows something one knows that one knows  
it, and what, when one is thinking of" oneself, is being thought about, and what  
is doing the thinking. After one has been puzzled and bruised by this problem  
for a long time, one learns not to press these questions: the concept of a  
conscious being is, implicitly, realized to be different from that of an  
unconscious object. In saying that a conscious being knows something, we are saying  
not only that he knows it, but that he knows that he knows it, and that he  
knows that he knows that he knows it, and so on, as long as we care to pose the  
388  
Minds and Thoughtsquestion: there is, we recognize, an infinity here, but it is not an infinite  
regress in the bad sense, for it is the questions that peter out, as being  
pointless, rather than the answers. The questions are felt to be pointless  
because the concept contains within itself the idea of being able to go on  
answering such questions indefinitely. Although conscious beings have the  
power of going on, we do not wish to exhibit this simply as a succession of  
tasks they are able to perform, nor do we see the mind as an infinite sequence  
of selves and super-selves and super-super-selves. Rather, we insist that a  
conscious being is a unity, and though we talk about parts of the mind, we do  
so only as a metaphor, and will not allow it to be taken literally.
 
432
 
The paradoxes of consciousness arise because a conscious being can be  
aware of itself, as well as of other things, and yet cannot really be construed as  
being divisible into parts. It means that a conscious being can deal with  
Godelian questions in a way in which a machine cannot, because a conscious  
being can both consider itself and its performance and yet not be other than  
that which did the performance. A machine can be made in a manner of  
speaking to "consider" its performance, but it cannot take this "into account"  
without thereby becoming a different machine, namely the old machine with  
a "new part" added. But it is inherent in our idea of a conscious mind that it  
can reflect upon itself and criticize its own performances, and no extra part is  
required to do this: it is already complete, and has no Achilles' heel.
 
432
 
The thesis thus begins to become more of a matter of conceptual analysis  
than mathematical discovery. This is borne out by considering another  
argument put forward by Turing. So far, we have constructed only fairly simple  
and predictable artifacts. When we increase the complexity of our machines,  
there may, perhaps, be surprises in store for us. He draws a parallel with a  
fission pile. Below a certain "critical" size, nothing much happens: but above  
the critical size, the sparks begin to fly. So too, perhaps, with brains and  
machines. Most brains and all machines are, at present, "sub-critical"—they  
react to incoming stimuli in a stodgy and uninteresting way, have no ideas of  
their own, can produce only stock responses—but a few brains at present, and  
possibly some machines in the future, are super-critical, and scintillate on  
their own account. Turing is suggesting that it is only a matter of complexity,  
and that above a certain level of complexity a qualitative difference appears,  
so that "super-critical" machines will be quite unlike the simple ones hitherto  
envisaged.
 
432
 
This may be so. Complexity often does introduce qualitative differences.  
Although it sounds implausible, it might turn out that above a certain level of  
complexity, a machine ceased to be predictable, even in principle, and started  
doing things on its own account, or, to use a very revealing phrase, it might  
begin to have a mind of its own. It might begin to have a mind of its own. It  
would begin to have a mind of its own when it was no longer entirely  
predictable and entirely docile, but was capable of doing things which we  
recognized as intelligent, and not just mistakes or random shots, but which we  
had not programmed into it. But then it would cease to be a machine, within  
the meaning of the act. What is at stake in the mechanist debate is not how  
minds are, or might be, brought into being, but how they operate. It is  
essential for the mechanist thesis that the mechanical model of the mind shall  
operate according to "mechanical principles," that is, that we can understand  
the operation of the whole in terms of the operations of its parts, and the  
operation of each part either shall be determined by its initial state and the  
construction of the machine, or shall be a random choice between a  
determinate number of determinate operations. If the mechanist produces a machine  
which is so complicated that this ceases to hold good of it, then it is no longer a  
Minds and Thoughts  
389machine for the purposes of our discussion, no matter how it was constructed.  
We should say, rather, that he had created a mind, in the same sort of sense as  
we procreate people at present. There would then be two ways of bringing  
new minds into the world, the traditional way, by begetting children born of  
women, and a new way by constructing very, very complicated systems of, say,  
valves and relays. When talking of the second way, we should take care to  
stress that although what was created looked like a machine, it was not one  
really, because it was not just the total of its parts. One could not tell what it  
was going to do merely by knowing the way in which it was built up and the  
initial state of its parts: one could nol even tell the limits of what it could do,  
for even when presented with a Godd-type question, it got the answer right.  
In fact we should say briefly that any system which was not floored by the  
Godel question was eo ipso not a Turing machine, i.e. not a machine within the  
meaning of the act.3
 
433
 
In reading this passage, my mind constantly boggles at the rapid  
succession of topics, allusions, connotations, confusions, and conclusions.  
We jump from a Carrollian paradox to Godel to Turing to Artificial  
Intelligence to holism and reductionism, all in the span of two brief pages.  
About Lucas one can say that he is nothing if not stimulating. In the  
following Chapters, we shall come back to many of the topics touched on so  
tantalizingly and fleetingly in this odd passage.
 
449
 
C HAPTER X  
IIIBlooP and FlooP and GlooP
 
449
 
Self-Awareness and Chao  
 
449
 
BLooP, FLooP, AND GLooP are not trolls, talking ducks, or the s  
ounds made by a sinking ship-they are three computer languages , each one w  
ith its own special purpose. These languages were invented specially for t  
his Chapter. They will be of use in explaining some new senses of the w  
ord "recursive"-in particular, the notions of primitive recursivity and g  
eneralrecursivity. They will prove very helpful in clarifying the machinery o  
f self-reference in T  
N
 
449
 
We seem to be making a rather abrupt transition from brains and   
minds to technicalities of mathematics and computer science. Though the   
transition is abrupt in some ways, it makes some sense. We just saw how a  
certain kind of self-awareness seems to be at the crux of consciou  
sness. Now we are going to scrutinize "self-awareness" in more formal s  
ettings, such as TNT. The gulf between TNT and a mind is wide, but some of t  
he ideas will be most illuminating, and perhaps metaphorically t  
ransportable back to our thoughts about consciou  
snes
 
449
 
One of the amazing things about TNT's self-awareness is that it i  
s intimately connected to questions about order versus chaos among t  
he natural numbers. In particular, we shall see that an orderly system o  
f sufficient complexity that it can mirror itself cannot be totally orderly-  
it must contain some strange, chaotic features. For readers who have s  
ome Achilles in them, this will be hard to take. However, there is a "ma  
gical" compensation: there is a kind of order to the disorder, which is now its o  
wn field of study, called "recursive function theory". Unfortunately, we will n  
ot be able to do much more than hint at the fascination of this subj  
ec
 
449
 
Representability and Refriger  
ator
 
449
 
Phrases such as "sufficiently complex", "sufficiently powerful" and the l  
ike have cropped up quite often earlier. Just what do they mean? Let us g  
o back to the battle of the Crab and Tortoise, and ask, "What q  
ualifies something as a record player?" The Crab might claim that his r  
efrigerator is a "Perfect" record player. Then to prove it, he could set any r  
ecord whatsoever atop it, and say, "You see--it's playing it!" The Tortoise, if h  
e wanted to counter this Zen-like act, would have to reply, "No-your refrigerator is too low-fidelity to be counted as a phonograph: it cannot   
reproduce sounds ·at all (let alone itïżœ self-breaking sound)." The T  
ortois
 
450
 
can only make a record called "I Cannot Be Played on Record Player X  
" provided that Record Player X is really a record player! The T  
ortoise's method is quite insidious, as it plays on the strength, rather than on t  
he weakness, of the system. And therefore he requires "sufficiently h  
i-fi" record p
 
450
 
Ditto for formal versions of number theory. The reason that TNT is a  
formalization of N'is that its symbols act the right way: that is, its t  
heorems are not silent like a refrigerator-they speak actual truths of N. Of c  
ourse, so do the theorems of the pq-system. Does it, too, count as "a f  
ormalization of number theory", or is it more like a refrigerator? Well, it is a little b  
etter than a refrigerator, but it is still pretty weak. The pq-system does n  
ot include enough of the core truths of N to count as "a number t  
heor
 
450
 
What, then, are these "core truths" of N? They are the primitive recursive truths; that means they involve only predictably terminating c  
alculations. These core truths serve for N as Euclid's first four postulates served f  
or geometry: they allow you to throw out certain candidates before the g  
ame begins, on the grounds of "insufficient power". From here on out, t  
he representability of all primitive recursive truths will be the criterion for calling a  
system "sufficiently powerful".
 
450
 
Ganto's Ax in Metamathem  
atic
 
450
 
The significance of the notion is shown by the following key fact: If y  
ou have a sufficiently powerful formalization of number theory, then Godel's   
method is applicable, and consequently your system is incomplete. If, on t  
he other hand, your system is not sufficiently powerful (i.e., not all p  
rimitive recursive truths are theorems), then your system is, precisely by virtue o  
f that lack, incomplete. Here we have a reformulation of "Ganto's Ax" in   
metamathematics: whatever the system does, Godel's Ax will chop its h  
ead offt Notice also how this completely parallels the high-fidelity-versus-lowfidelity battle in the Contracrostipu  
nctu
 
450
 
Actually, it turns out that much weaker systems are still vulnerable t  
o the Godel method; the criterion that all primitive recursive truths need b  
e represented as theorems is far too stringent. It is a little like a thief who will   
only rob "sufficiently rich" people, and whose criterion is that the p  
otential victim should be carrying at least a million dollars in cash. In the case o  
f TNT, luckily, we will be able to act in our capacity as thieves, for the m  
illion in cash is there-which is to say, TNT does indeed contain all p  
rimitive recursive truths as theorems.
 
450
 
Now before we plunge into a detailed discussion of primitive r  
ecursive functions and predicates, I would like to tie the themes of this Chapter t  
o themes from earlier Chapters, so as to provide a bit better m  
otivatio
 
450
 
Finding Order by Choosing the Right F  
ilte
 
450
 
We saw at a very early stage that formal systems can be difficult and u  
nruly beasts because they have lengthening and shortening rules, which c  
a
 
451
 
p ossibly lead to never-ending searches among strings. The discovery o  
f Godel-numbering showed that any search for a string having a s  
pecial typographical property has an arithmetical cousin: an isomorphic s  
earch for an integer with a corresponding special arithmetical property. Consequently, the quest for decision procedures for formal systems involves  
solving the mystery of unpredictably long s earches-chaos-among t  
he integers. Now in the Aria with Diverse Variations, I gave perhaps too m  
uch weight to apparent manifestations of chaos in problems about integers. A  
s a matter of fact, people have tamed wilder examples of apparent c  
haos than the "wondrousness" problem, finding them to be quite gentle b  
easts after all. Achilles' powerful faith in the regularity and predictability o  
f numbers should therefore be accorded quite a bit of respect-especially a  
s it reflects the beliefs of n early all mathematicians up till the l 930's. To s  
how why order versus chaos is such a subtle and significant issue, and to tie it i  
n with questions about the location and revelation of meaning, I would like t  
o quote a beautiful and memorable passage from Are Quanta Real?-a Galilean Dialogue by the late J. M. J
 
451
 
SALVIATI Suppose I give you two sequences of numbers, such a  
s 7 85 398 163 39 7 448 309615 66 084 .  
..a  
nd 1, -1/3 , +1/5, -1/7, +1/9, -1/11, +1/13, -1/15, .  
..If I asked you, Simplicio, what the next number of the first sequence is, wha  
t would you s  
ay? SIMPLICIO I could not tell you. I think it is a random sequence and t  
hat there is no law in i  
t. SALVIATI And for the second s  
equence? SIMPLICIO That would be easy. It must be +1/  
17. SALVIATI Right. But what would you say if I told you that the f  
irst sequence is also constructed by a law and this law is in fact identical with t  
he one you have just discovered for the ,-econd s  
equence? SIMPLICI0 This does not seem probable to m  
e. SALVIATI But it is indeed so, since the first sequence is simply the beginning of the decimal fraction [expansion] of the sum of the second. Its value i  
s 1r  
/4. SIMPLICI0 You are full of such mathematical tricks, but I do not see wha  
t this has to do with abstraction and reality  
. SALVIATI The relationship with abstraction is easy to see. The first sequence looks random unless one has developed through a process of abstraction a kind of filter which sees a simple structure behind the a  
pparent r  
andomness. It is exactly in this manner that laws of nature are discovered. N  
ature presents us with a host of phenomena which appear mostly as chaotic randomness until we select some significant events, and abstract from their   
particular, irrelevant circumstances so that they become idealized. Only t  
hen can they exhibit their true structure in full splendor.   
SAGREDO This is a marvelous idea! It suggests that when we try to understand nature, we should look at the phenomena as if they were messages to b  
 
452
 
understood. Except that each message appears to be random until we establish a code to read it. This code takes the form of an abstraction, that is, we   
choose to ignore certain things as irrelevant and we thus partially select t  
he content of the message by a free choice. These irrelevant signals form t  
he "background noise," which will limit the accuracy of our m  
essage. But since the code is not absolute there may be several messages in t  
hesame raw material of the data, so changing the code will result in a message o  
fequally deep significance in something that was merely noise before, a  
nd conversely: In a new code a former message may be devoid of m  
eaning. Thus a code presupposes a free choice among different, c  
omplementary aspects, each of which has equal claim to reality, if I may use this d  
ubious w  
ord. Some of these aspects may be completely unknown to us now but they m  
ay reveal themselves to an observer with a different system of a  
bstractions. But tell me, Salviati, how can we then still claim that we discover s  
omething out there in the objective real world? Does this not mean that we are m  
erelycreating things according to our own images and that reality is only w  
ithino  
urselves? SALVIATI I don't think that this is necessarily so, but it is a question w  
hich requires deeper reflection
 
452
 
Jauch is here dealing with messages that come not from a "  
sentient being" but from nature itself. The questions that we raised in Chapter V  
I on the relation of meaning to messages can be raised equally well w  
ith messages from nature. Is nature chaotic, or is nature patterned? And what   
is the role of intelligence in determining the answer to this q  
uestio
 
452
 
To back off from the philosophy, however, we can consider the point   
about the deep regularity of an apparently random sequence. Might t  
he function Q( n) from Chapter V have a simple, nonrecursive e  
xplanation, too? Can every problem, like an orchard, be seen from such an angle t  
hat its secret is revealed? Or are there some problems in number theory w  
hich, no matter what angle they are seen from, remain m  
ysteries? With this prologue, I feel it is time to move ahead to define the p  
recise meaning of the term "predictably long search". This will be a  
ccomplished in terms of the language B
 
452
 
Primordial Steps of the Language B  
loo
 
452
 
Our topic will be searches for natural numbers which have various properties. In order to talk about the length of any search, we shall have to d  
efine some primordial steps, out of which all searches are built, so that length can   
be measured in terms of number of steps. Some steps which we m  
ight consider primordial a  
re: adding any two natural n  
umbers; multiplying any two natural numbers;  
determining if two numbers are e qual;  
determining the larger (smaller) of two n
 
453
 
Loops and Upper Boun  
d
 
453
 
If we try to formulate a test for, say, primality in terms of such steps, w  
e shall soon see that we have to include a control structure-that is, descriptions   
of the order to do things in, when to branch back and try something a  
gain, when to skip over a set of steps, when to stop, and similar matters.
 
453
 
It is typical of any algorithm-that is, a specific delineation of how t  
o carry out a task-that it includes a mixture of (1) specific operations to b  
e performed, and (2) control statements. Therefore, as we develop o  
ur language for expressing predictably long calculations, we shall have t  
o incorporate primordial control structures also. In fact, the hallmark o  
f BlooP is its limited set of control structures. It does not allow you to b  
ranch to arbitrary steps, or to repeat groups of steps without limit; in B  
looP, essentially the only control structure is the bounded loop: a set of i  
nstructions which can be executed over and over again, up to a predefined m  
aximum number of times, called the upper bound, or ceiling, of the loop. If the c  
eiling were 300, then the loop might be executed 0, 7, or 300 times-but not 3  
0
 
453
 
Now the exact values of all the upper bounds in a program n  
eed not be put in numerically by the programmer-indeed, they may not be   
known in advance. Instead, any upper bound may be determined b  
y calculations carried out before its loop is entered. For instance, if y  
ou wanted to calculate the value of 2  
3  
", there would be two loops. F  
irst, you evaluate 3  
n  
, which involves n multiplications. Then, you put 2 t  
o that power, which involves 3  
n multiplications. Thus, the upper b  
ound for the second loop is the result of the calculation of the first l
 
453
 
Here is how you would expres.ïżœ this in a BlooP p  
rogram: DEFINE PROCEDURE "TWO-TO-THE-THREE-TO-THE" [  
N]:BLOCK 0: B  
EGINCELL( 0) Âą: t  
;LOOP N TIMES:  
BLOCK t: B  
EGINCELL(0) Âą: 3 x C  
ELL(0);BLOCK 1: END  
;CELL( t ) Âą: t ;  
LOOP CELL( 0) T  
IMES:BLOCK 2: B  
EGINCELL( t ) Âą: 2 x CELL( t );  
BLOCK 2: END  
;O UTPUTÂą: CELL( t )  
;BLOCK 0: E
 
453
 
Conventions of B  
loo
 
453
 
Now it is an acquired skill to be able to look at an algorithm written in a  
computer language, and figure out what it is doing. However, I hope that   
this algorithm is simple enough that it makes sense without too m  
uc
 
454
 
scrutiny. A procedure is defined, having one input parameter, N; its output i  
s the desired v  
alu
 
454
 
This procedure definition has what is called block structure, w  
hich means that certain portions of it are to be considered as units, or blocks. A  
ll the statements in a block get executed as a unit. Each block has a n  
umber (the outermost being BLOCK 0), and is delimited by a BEGIN and an E  
ND.In our example, BLOCK t and BLOCK 2 contain just one statement eachbut shortly you will see longer blocks. A LOOP statement always means t  
oexecute the block immediately under it repeatedly. As can be seen a  
bove, blocks can be n
 
454
 
The strategy of the above algorithm is as described earlier. You b  
egin by taking an auxiliary variable, called CELL(0); you set it initially to 1, a  
ndthen, in a loop, you multiply it repeatedly by 3 until you've done so e  
xactly N times. Next, you do the analogous thing for CELL( 1 )-set it to 1, m  
ultiplyby 2 exactly CELL(0) times, then quit. F inally, you set OUTPUT to the v  
alueof CELL( 1 ). This is the value returned to the outside world-the o  
nlyexternally visible behavior of the p  
rocedur
 
454
 
A number of points about the notation should be made here. First, t  
he meaning of the left-arrow 'Âą:' is this:  
Evaluate the expression to its right, then take the result and set t  
he CELL (or OUTPUT) on its left to that v  
alue.So the meaning of a command such as CELL( t) Âą: 3 X CELL( 1) is to t  
riplethe value stored in CELL( 1 ). You may think of each CELL as being a separ  
ateword in the memory of some computer. The only difference between a  
CELL and a true word is that the latter can o nly hold integers up to s  
omefinite limit, whereas we allow a CELL to hold any natural number, no m  
atterhow b
 
454
 
Every procedure in BlooP, when called, yields a value-namely t  
he value of the variable called OUTPUT. At the beginning of execution of a  
nyprocedure, it is assumed as a default option that OUTPUT has the value 0  
.That way, even if the procedure never resets OUTPUT at all, OUTPUT has a  
well-defined value at all t  
ime
 
454
 
IF-Statements and B  
ranchin
 
454
 
Now let us look at another procedure which will show us somo o  
ther features of BlooP which give it more generality. How do you find o  
ut, knowing only how to add, what the value of M -N is? The trick is to a  
ddvarious numbers onto N until you find the one which yields M. H  
owever,what happens if Mis smaller than N? What if we are trying to take 5 f  
rom 2? In the domain of natural numbers, there is no answer. But we would like  
our BlooP procedure to give an answer anyway-let's say 0. Here, then, is a  
BlooP procedure which does s  
ubtractio
 
455
 
DEFINE PROCEDURE "MINUS " [M,N]  
: BLOCK 0: B  
EGIN IF M < N, THEN:   
QUIT BLOCK O  
; LOOP AT MOST M + t TIMES:  
BLOCK t: B  
EGIN IF OUTPUT + N = M, THE  
N: ABORT LOOP t  
; OUTPUT Âą: OUTPUT + t  
; BLOCK t: END;   
BLOCK 0: END
 
455
 
Here we are making use of the implicit feature that OUTPUT begins a  
t 0 .If M is less than N, then the subtraction is impossible, and we s  
implyjump to the bottom of BLOCK O right away, and the answer is 0. That i  
swhat is meant by the line QUIT BLOCK 0. But if Mis not less than N, t  
henwe skip over that QUIT-statement, and carry out the next command i  
nsequence (here, a LOOP-statement). That is how IF-statements always w  
orkin B  
loo
 
455
 
So we enter LOOP 1, so called because the block which it tells us t  
o repeat is BLOCK t. We try adding Oto N, then 1, 2, etc., until we find a  
number that gives M. At that point, we ABORT the loop we are in, meaning   
we jump to the statement immediately following the END which marks t  
he bottom of the loop's block. In this case, that jump brings us just b  
elow BLOCK 1: END, which is to say, to the last statement of the algorithm, a  
nd we are done. OUTPUT now contains the correct answer.
 
455
 
Notice that there are two distinct instructions for jumping d  
ownwards: QUIT, and ABORT. The former pertains to blocks, the latter to loops. Q  
UIT BLOCK n means to jump to the last line of BLOCK n, whereas ABORT LOOP n  
means to jump just below the last line of BLOCK n. This distinction o  
nly matters when you are inside a loop and want to continue looping but to q  
uit the block this time around. Then you can say QUIT and the proper t  
hing will h  
appen. Also notice that the words AT MOST now precede the upper bound o  
f the loop, which is a warning that the loop may be aborted before the u  
pper bound is r  
eache
 
455
 
Automatic Chunki  
n
 
455
 
Now there are two last features of BlooP to explain, both of them v  
ery important. The first is that, once a procedure has been defined, it may b  
e called inside later procedure definitions. The effect of this is that once a  
n operation has been defined in a procedure, it is considered as simple as a p  
rimordial step. Thus, BlooP features automatic chunking. You might compare it t  
o the way a good ice skater acquires new motions: not by defining them a  
s long sequences of primordial muscle-actions, but in terms of p  
reviously learned motions, which were themselves learned as compounds of e  
arlie
 
456
 
learned motions, etc.-and the nestedness, or chunkedness, can go back   
many layers until you hit primordial muscle-actions. And thus, the repertoire of BlooP programs, like the repertoire of a skater's tricks, grows, q  
uite literally, by loops and b
 
456
 
BlooP T  
est
 
456
 
The other feature of BlooP is that certain procedures can have YES or N  
O as their output, instead of an integer value. Such procedures are te  
sts,rather than functions. To indicate the difference, the name of a test m  
ust terminate in a question mark. Also, in a test, the default option for O  
UTPUT is not 0, of course, but N  
 
456
 
Let us see an example of these last two features of BlooP in a  
n algorithm which tests its argument for primality:   
DEFINE PROCEDURE "PRIME?" [  
N]: BLOCK 0: B  
EGIN IF N = 0, T  
HEN: QUIT BLOCK 0  
; CELL(0) Âą: 2  
; LOOP AT MOST MINUS [N,2] T  
IMES: BLOCK t: B  
EGIN IF REMAINDER [N,CELL(0)] = 0, THE  
N: QUIT BLOCK 0  
; CELL( 0) ¹: CELL( 0) + t ;   
BLOCK t: E  
ND; OUT PUTÂą: Y  
ES; BLOCK 0: E
 
456
 
Notice that I have called two procedures inside this algorithm: MINUS a  
nd REMAINDER. (The latter is presumed to have been previously defined, a  
nd you may work out its definition yourself.) Now this test for primality w  
orks by trying out potential factors of N one by one, starting at 2 and i  
ncreasing to a maximum of N - 1. In case any of them divides N exactly (i.e., g  
ives remainder 0) , then we jump down to the bottom, and since OUTPUT s  
till has its default value at this stage, the answer is NO. Only if N has no exact   
divisors will it survive the entirety of LOOP 1; then we will emerge s  
moothly at the statement OUTPUT Âą: YES, which will get executed, and then t  
he procedure is o  
ve
 
456
 
BlooP Programs Contain Chains of Procedure  
 
456
 
We have seen how to define procedures in BlooP; however, a p  
rocedure definition is only a part of a program. A program consists of a chain o  
f procedure definitions (each only calling previously defined procedures), optionally followed by one or more calls on the procedures defined. Thus, a  
 
457
 
example of a full BlooP program would be the definition of the procedure   
TWO-TO-THE-THREE-TO-THE, followed by the c  
allTWO-TO-THE-TH REE-TO-THE [ 2  
] which wou.ld yield an answer of 512.
 
457
 
If you have only a chain of procedure definitions, then nothing e  
ver gets executed; they are all just waiting for some call, with specific numeri  
cal values, to set them in motion. It is like a meat grinder waiting for s  
ome meat to grind-or rather, a chain of meat grinders all linked together, each   
of which is fed from earlier ones ... In the case of meat grinders, the i  
mage is perhaps not so savory; however, in the case of BlooP programs, such a  
construct is quite important, and we will call it a "call-less program". T  
his notion is illustrated in Figure 7
 
457
 
Now BlooP is our language for defining predictably t  
erminating calculations. The standard name for functions which are Blo  
oP-computable is primitive recursive functions; and the standard name for p  
ropertieswhich can be detected by BlooP-tests is primitive recursive p  
redicates.Thus, the function 2  
3  
11 is a primitive recursive function; and the statement "n is a prime number" is a primitive recursive p  
redicat
 
457
 
It is clear intuitively that the Goldbach property is primitive recursive  
, and to make that quite explicit, here is a procedure definition in B  
looP, showing how to test for its presence or a  
bsence: DEFINE PROCEDURE "GOLDBACH?" [  
N]: BLOCK 0: B  
EGIN CELL( 0) Âą: 2  
; LOOP AT MOST N T  
IMES: BLOCK t: B  
EGIN IF {PRIME? [CELL(0  
)] AND PRIME? [MINUS [N,CELL(0) ]]},  
TH EN:   
BLOCK 2: B  
EGIN OUT PUTÂą: Y  
ES; QUIT BLOCK 0  
; BLOCK 2: END  
CELL(0) Âą: CELL(0) + 1  
; BLOCK t: END;   
BLOCK 0: END  
. As usual, we assume NO until proven YES, and we do a brute force search   
among pairs of numbers which sum up to N. If both are prime, we quit t  
he outermost block; otherwise we just go back and try again, until all possibilities are e  
xhausted. (Warning: The fact that the Goldbach property is primitive recursive   
does not make !he question "Do all numbers have the Goldbach property  
?" a simple question-far from i  
 
458
 
FIGURE 72. The structure of a c  
all-lRss BlooP program. For this program to b  
e self-contained, each procedure d  
efinition may only call procedures de.fined above i
 
458
 
Suggested Ex  
ercise
 
458
 
Can you write a similar BlooP procedure which tests for the presence o  
r absence of the Tortoise property ( or the Achilles property)? If so, do it. If   
not, is it merely because you are ignorant about upper bounds, or could i  
t be that there is a fundamental obstacle preventing the formulation of s  
uch an algorithm in BlooP? And what about the same questions, with respect t  
o the property of wondrousness, defined in the Dialogue?
 
458
 
Below, I list some functions and properties, and you ought to take t  
he time to determine whether you believe they are primitive r  
ecursive (BlooP-programmable) or not. This means that you must carefully c  
onsider what kinds of operations will be involved in the calculations which t  
hey require, and whether ceilings can be given for all the loops i  
nvolved. FACTORIAL [N] = NI (the factorial of N  
) (e.g., FACTORIAL [4] = 2  
4) REMAINDER [M,N] = the remainder upon dividing M by N  
(e.g., REMAINDER [24,7] = 3  
) Pl-DIGIT [N] = the Nth digit of 1r, after the decimal point   
(e.g., Pl-DIGIT [1] = 1  
,Pl-DIGIT [2] = 4  
, Pl-DIGIT [ 1000000] = 1
 
459
 
FIBO [N] = the Nth Fibonacci n  
umber (e.g., FIBO [9] = 3  
4) PRIME-BEYOND [N] = the lowest prime beyond N  
(e.g., PRIME-BEYOND [33] = 3  
7) PERFECT [N] = the Nth "perfect" number (a number such as 28 w  
hose divisors sum up to itself: 28 = 1 + 2 + 4 + 7 + 14)  
(e.g., PERFECT [2] = 2  
8) PRIME? [N] = YES if N is prime, otherwise N  
O. PERFECT? [N] = YES if N is perfect, otherwise N  
O. TRIVIAL? [A,B,C,N] = YES if A  
N+ BN = C  
N is correct; otherwise N  
O. (e.g., TRIVIAL? [3,4,5,2] = YES,  
TRIVIAL? [3,4,5,3] = N  
O) PIERRE? [A,B,C] = YES if A  
N +BN =, C  
N IS satisfiable for some value  
of N greater than 1, otherwise N  
O. (e.g., PIERRE? [3,4,5] = YES,  
PIERRE? [ 1,2,3] = N  
O) FERMAT? [ N] = YES if A  
N + BN = C  
N IS satisfied by some p  
ositive values of A, B, C; otherwise N  
O. (e.g., FERMAT? [2] = Y  
ES) TORTOISE-PAIR? [M,N] = YES if both Mand M + N are prime, o  
therwise N  
O. (e.g., TORTOISE-PAIR [5, 17 42] = Y  
ES, TORTOISE-PAIR [ 5,1 00] = N  
O) TORTOISE? [N] = YES if N is the difference of two primes, otherwise N  
O. (e.g., TORTOISE [ 1742] = Y  
ES, TORTOISE [7] = N  
O) MIU-WELL-FORMED? [N] = YES if N, when seen as a string of the MIUsystem, is well-formed; otherwise N  
O. (e.g., MIU-WELL-FORMED? [ 31 O] = YES,  
MIU-WELL-FORMED? [ 415] = N  
O) MIU-PROOF-PAIR? [M,N] = YES if M, as seen as a sequence of strings o  
f the MIU-system, is a derivation of N, as seen as a string of the MIU-syste  
m; otherwise N  
O. (e.g., MIU-PROOF-PAIR? [ 3131131 1 1130 1,3 01] = Y  
ES, MIU-PROOF-PAIR? [3 1 1 130,30] = N  
O) MIU-THEOREM? [N] = YES if N, seen as a MIU-system string, is a t  
heorem; otherwise N  
O. (e.g., MIU-THEOREM? [ 31 1] = YES,  
MIU-THEOREM? [30] = N  
O, MIU-THEOREM? [70 1] = N  
O) TNT-THEOREM? [N] = YES if N, seen as a TNT-string, is a theorem  
. (e.g., TNT-THEOREM? [666 11 1666] = Y  
ES, TNT-THEOREM? [ 123666 11 1666] = N  
O, TNT-THEOREM? [ 7014] = N
 
460
 
FALSE? [N] = YES if N, seen as a TNT-string, 1s a false statement o  
f number theory; otherwise N  
O. (e.g., FALSE? [666 11 1666] = N  
O, FALSE? [223666 11 1666] = YES,  
FALSE? [ 7014] = N  
O) The last seven examples are particularly relevant to our future  
metamathematical explorations, so they highly merit your s  
crutin
 
460
 
Expressibility and Representability
 
460
 
Now before we go on to some interesting questions about BlooP and are l  
ed to its relative, FlooP, let us return to the reason for introducing BlooP i  
n the first place, and connect it to TNT. Earlier, I stated that the critical m  
ass for Godel's method to be applicable to a formal system is attained when a  
ll primitive recursive notions are representable in that system. Exactly wha  
t does this mean? First of all, we must distinguish between the notions o  
f representability and expressibility. Expressing a predicate is a mere m  
atter of translation from English into a strict formalism. It has nothing to do w  
ith theoremhood. For a predicate to be represented, on the other hand, is a  
much stronger notion. It means t  
hat (1) All true instances of the predicate are theorems;  
( 2)All false instances are n
 
460
 
By "instance", I mean the string produced when you replace all f  
ree variables by numerals. For example, the predicate m + n = k is r  
epresented in the pq-system, because each true instance of the predicate is a t  
heorem, each false instance is a nontheorem. Thus any specific addition, w  
hether true or false, translates into a decidable string of the pq-system. However, t  
he pq-system is unable to express-let alone represent-any other p  
roperties of natural numbers. Therefore it would be a weak candidate indeed in a  
competition of systems which can do number t  
heor
 
460
 
Now TNT has the virtue of being able to express virtually any numbertheoretical predicate; for example, it is easy to write a TNT-string whi  
ch expresses the predicate "b has the Tortoise property". Thus, in terms o  
f expressive power, TNT is all we w  
ant. However, the question "Which properties are represented in TNT?" i  
s precisely the question "How powerful an axiomatic system is TNT?" Are a  
ll possible predicates represented in TNT? If so, then TNT can answer a  
ny question of number theory; it is c
 
460
 
Primitive Recursive Predicates Are Represented in T  
N
 
460
 
Now although completeness will turn out to be a chimera, TNT is at l  
east complete with respect to primitive recursive predicates. In other words, any   
statement of number theory whose truth or falsity can be decided by a
 
461
 
computer within a predictable length of time is also decidable inside T  
NT. Or, one final restatement of the same t  
hing: If a BlooP test can be written for some property of n  
atural numbers, then that property is represented in T  
N
 
461
 
Are There Functions Which Are Not Primitive Recurs
 
461
 
Now the kinds of properties which can be detected by BlooP tests are   
widely varied, including whether a number is prime or perfect, has t  
he Goldbach property, is a power of 2, and so on and so forth. It would not b  
e crazy to wonder whether every property of numbers can be detected b  
y some suitable BlooP program. The fact that, as of the present moment, w  
e have no way of testing whether a number is wondrous or not need n  
ot disturb us too much, for it might merely mean that we are ignorant about   
wondrousness, and that with more digging around, we could discover a  
universal formula for the upper bound to the loop involved. Then a B  
looP test for wondrousness could be written on the spot. Similar remarks c  
ould be made about the Tortoise p  
ropert
 
461
 
So the question really is, "Can upper bounds always be given for the   
length of calculations-or, is there an inherent kind of jumbliness to t  
he natural number system, which sometimes prevents calculation lengths f  
rom being predictable in advance?" The striking thing is that the latter is t  
he case, and we are about to see why. It is the sort of thing that would h  
ave driven Pythagoras, who first proved that the square root of 2 is irrational,   
out of his mind. In our demonstration, we will use the celebrated diagona  
lmethod, discovered by Georg Cantor, the founder of set t  
heor
 
461
 
Pool B, Index Numbers, and Blue P  
rogram
 
461
 
We shall begin by imagining a curiom. notion: the pool of all possible B  
looP programs. Needless to say, this pool--"Pool B"-is an infinite one. We w  
ant to consider a subpool of Pool B, obtained by three successive fi  
ltering operations. The first filter will retain for us only call-less programs. F  
rom this subpool we then eliminate all tests, leaving only functions. (By the way, i  
n call-less programs, the last procedure in the chain determines whether t  
he program as a whole is considered a test, or a function.) The third filter w  
ill retain only functions which have exactly one input parameter. (Again r  
eferring to the final procedure in the chain.) What is l  
eft? A complete pool of all call-less BlooP programs which c  
alculate functions of exactly one input p  
arameter. Let us call these special BlooP programs Blue Programs.
 
461
 
What we would like to do now is to assign an unambiguous i  
ndexnumber to each Blue Program. How can this be done? The easiest way-  
we shall use it-is to list them in order of length: the shortest possible B  
lu
 
462
 
Program b eing# 1, the second shortest being #2, etc. Of course, there w  
ill be many programs tied for each length. To break such ties, we use alphabetical order. Here, "alphabetical order" is taken in an extended s  
ense, where the alphabet includes all the special characters of BlooP, in s  
ome arbitrary order, such as the followi  
ng: ABCDE F GH I J K L  
MNO PQ RSTUV W XY  
Z+x0 t 2 3 4 5 6 7 8 9 ¹: = < >   
( )(  
]{} ' ?:  
;, -and at the end comes the lowly blank! Altogether, fifty-six characters. F  
orconvenience's sake, we can put all Blue Programs of length 1 in Volume 1  
, programs of 2 characters in Volume 2, etc. Needless to say, the first f  
ew volumes will be totally empty, while later volumes will have many, m  
any entries (though each volume will only have a finite number). The very fi  
rst Blue Program would be this o  
ne: DEFINE PROCEDURE "A" (  
B]: BLOCK 0: B  
EGIN BLOCK 0: END  
. This rather silly meat grinder returns a value of Ono matter what its i  
nput is. It occurs in Volume 56, since it has 56 characters (counting n  
ecessary blanks, including blanks separating successive l
 
462
 
Soon after Volume 56, the volumes will get extremely fat, b  
ecause there are just so many millions of ways of combining symbols to make B  
lue BlooP programs. But no matter-we are not going to try to print out t  
his infinite catalogue. All that we care about is that, in the abstract, it i  
s well-defined, and that each Blue BlooP program therefore has a uniq  
ue and definite index number. This is the crucial idea.
 
462
 
Let us designate the function calculated by the kth Blue Program t  
his w  
ay: Blueprogra m{# k} (  
N]Here, k is the index number of the program, and N is the single i  
nput parameter. For instance, Blue Program #12 might return a value twice t  
he size of its input:  
Blueprogram{# l2} (N] = 2 X N   
The meaning of the equation above is that the program named on the   
left-hand side returns the same value as a human would calculate from t  
he ordinary algebraic expression on the right-hand side. As another e  
xample, perhaps the 5000th Blue Program calculates the cube of its input paramet  
er: Blueprogram{#5000} (N] = N3
 
463
 
The Diagonal M  
etho
 
463
 
Very well-now we apply the "twist": Cantor's diagonal method. We s  
hall take this catalogue of Blue Programs and use it to define a new function o  
f one varia ble-Bluediag [NJ-which will turn out not to be anywhere in the   
list (which is why its name is in italics). Yet Bluediag will clearly be a  
well-defined, calculable function of one variable, and so we will have t  
o conclude that functions exist which simply are not programmable in B  
looP. Here is the definition of Bluediag [  
N]:Equation (1) ... Bluediag [N] = 1 + Blueprogram{#N} [  
N]The strategy is: feed each meat grinder with its own index number, th  
en add 1 to the output. To illustrate, let us find Bluediag [ 12 ]. We saw t  
hat Blueprogram {# l2} is the function 2N; therefore ,Bluediag [12] must h  
ave the value 1 + 2 X 12, or 25. Likewis e,Bluediag [5000] would have the v  
alue 125,000,000,001, since that is 1 more than the cube of 5000. Similarly, you   
can find Bluediag of any particular argument you w  
is
 
463
 
The peculiar thing about Bluediag [N] is that it is not represented i  
n the catalogue of Blue Programs. It cannot be. The reason is this. To be a   
Blue Program, it would have to have an index number-say it were B  
lue Program # X. This assumption is expressed by w  
riting Equation (2) ... Bluediag [N] = Blueprogram{# X} [  
N]But there is an inconsistency between the equations (1) and (2). It b  
ecomes apparent at the moment we try to calculate the value of Bluediag [ X ], fo  
r we can do so by letting N take the value of X in either of the two e  
quations. If we substitute into equation (1), we g  
et: Bluediag [ X] = 1 + Blueprogram{ # X} [ X  
] But if we substitute into equation (2) instead, we g  
et: Bluediag [ X] = Blueprogram{# X} [ X  
] Now Bluediag [ X] cannot be equal to a number and also to the successor o  
f that number. But that is what the two equations say. So we will have to g  
o back and erase some assumption on which the inconsistency is based. T  
he only possible candidate for erasure is the assumption expressed by Equation (2): that the function Bluediag [N] is able to be coded up as a B  
lue BlooP program. And that is the proof that Bluediag lies outside the realm o  
f primitive recursive functions. Thus, we have achieved our aim of d  
estroying Achilles' cherished but na"ive notion that every number-theoretical f  
unction must be calculable within a predictable number of s  
tep
 
463
 
There are some subtle things going on here. You might ponder t  
his, for instance: the number of steps involved in the calculation o  
f Bluediag [N], for each specific value of N, is predictable-but the diffe  
rent methods of prediction cannot all be united into a general recipe for pr
 
464
 
ing the length of calculation of B luediag [ N]. This is an "infinite conspiracy", related to the Tortoise's notion of "infinite coincidences", and also t  
o w-incompleteness. But we shall not trace out the relations in d  
etai
 
464
 
Cantor's Original Diagonal Arg  
umen
 
464
 
Why is this called a diagonal argument? The terminology comes f  
rom Cantor's original diagonal argument, upon which many other a  
rguments (such as ours) have subsequently been based. To explain Cantor's o  
riginal argument will take us a little off course, but it is worthwhile to do s  
o. Cantor, too, was concerned with showing that some item is not in a c  
ertain list. Specifically, what Cantor wanted to show was that if a "directory" o  
f real numbers were made, it would inevitably leave some real n  
umbers out-so that actually, the notion of a complete directory of real numbers is a  
contradiction in t
 
464
 
It must be understood that this pertains not just to directories of fini  
te size, but also to directories of infinite size. It is a much deeper result than t  
he statement "the number of reals is infinite, so of course they cannot be l  
isted in a finite directory". The essence of Cantor's result is that there are (  
at least) two distinct types of infinity: one kind of infinity describes how m  
any entries there can be in an infinite directory or table, and another d  
escribes how many real numbers there are (i.e., how many points there are on a l  
ine, or line segment)-and this latter is "bigger", in the sense that the r  
eal numbers cannot be squeezed into a table whose length is described by t  
he former kind of infinity. So let us see how Cantor's argument involves t  
he notion of diagonal, in a literal s
 
464
 
Let us consider just real numbers between O and 1. Assume, for the   
sake of argument, that an infinite list could be given, in which each p  
ositive integer N is matched up with a real number r(N) between O and 1, and i  
n which each real number between O and 1 occurs somewhere down the l  
ine. Since real numbers are given by infinite decimals, we can imagine that t  
he beginning of the table might look as f  
ollows: r( 1 ): . I4 1 5 9 2 6 5 3  
r(2): .3 3 3 3 3 3 3 3 3   
r(3): . 71 8 2 8 1 8 2 8  
r(4): . 41 4 2 1 3 5 6 2  
r(5): .5 0 0 0 0 0 0 0 0   
The digits that run down the diagonal are in boldface: 1, 3, 8, 2, 0, ... N  
ow those diagonal digits are going to be used in making a special real n  
umber d, which is between O and 1 but which, we will see, is not in the list. T  
o make d, you take the diagonal digits in order, and change each one of t  
hem to some other digit. When you prefix this sequence of digits by a d  
ecimal point, you have d. There are of course many ways of changing a digit t  
o some other digit, and correspondingly many different d's. Suppose, f  
o
 
465
 
example, that we subtract 1 from the diagonal digits (with the convention that 1  
taken from O is 9). Then our number d will b  
e: .0 2 7 1 9   
Now, because of the way we constructed it,  
H  
ence, d 's 1st digit is not the same as the 1st digit of r(l);  
d's 2nd digit is not the same as the 2nd digit of r(2  
); d's 3rd digit is not the same as the 3rd digit of r(3  
); ... and so o  
n. d is different from r(l);  
d is different from r(2  
); d is different from r(3  
); ... and so o  
n. In other words, d is not in the lis
 
465
 
What Does a Diagonal Argument P
 
465
 
Now comes the crucial difference between Cantor's proof and our proofit is in the matter of what assumption to go back and undo. In Canto  
r's argument, the shaky assumption was that such a table could be drawn u  
p. Therefore, the conclusion warranted by the construction of d is that n  
o exhaustive table of reals can be drawn up after all-which amounts t  
o saying that the set of integers is just not big enough to index the set of reals.  
On the other hand, in our proof, we know that the directory of Blue B  
looP programs can be drawn up-the set of integers is big enough to index the   
set of Blue BlooP programs. So, we have to go back and retract s  
ome shakier idea which we used. And that idea is that Bluediag [N] is c  
alculable by some program in BlooP. This is a subtle difference in the application o  
f the diagonal metho
 
465
 
It may become clearer if we apply it to the alleged "List of All Gre  
at Mathematicians" in the Dialogue-a more concrete example. The diag  
onal itself is "Dboups". If we perform the desired diagonal-subtraction, we wil  
l get "Cantor". Now two conclusions are possible. If you have an unshaka  
ble belief that the list is complete, then you must conclude that Cantor is not a  
Great Mathematician, for his name differs from all those on the list. On t  
he other hand, if you have an unshakable belief that Cantor is a Gre  
at Mathematician, then you must conclude that the List of All Gre  
at Mathematicians is incomplete, for Cantor's name is not on the list! (Woe t  
o those who have unshakable beliefs on both sides!) The former case corresponds to our proof that Bluediag [N] is not primitive recursive; the l  
atter case corresponds to Cantor's proof that the list of reals is incomplet
 
466
 
Cantor's proof uses a diagonal in the literal sense of the word. Other   
"diagonal" proofs are based on a more general notion, which is a  
bstracted from the geometric sense of the word. The essence of the diagonal method   
is the fact of using one integer in two different ways-or, one could s  
ay, using one integer on two different levels-thanks to which one can construct a  
n item which is outside of some predetermined list. One time, the int  
eger serves as a vertical index, the other time as a horizontal index. In Canto  
r's construction this is very clear. As for the function Bluediag [N], it inv  
olves using one integer on two different levels-first, as a Blue Program i  
ndex number; and second, as an input p
 
466
 
The Insidious Repeatability of the Diagonal A  
rgumen
 
466
 
At first, the Cantor argument may seem less than fully convincing. I  
sn't there some way to get around it? Perhaps by throwing in the d  
iagonally constructed number d, one might obtain an exhaustive list. If you c  
onsider this idea, you will see it helps not a bit to throw in the number d, for as s  
oon as you assign it a specific place in the table, the diagonal method b  
ecomes applicable to the new table, and a new missing number d' can be constructed, which is not in the new table. No matter how many times y  
ou repeat the operation of constructing a number by the diagonal method a  
nd then throwing it in to make a "more complete" table, you still are caught on   
the ineradicable hook of Cantor's method. You might even try to build a  
table of reals which tries to outwit the Cantor diagonal method by t  
akin
 
467
 
the whole trick, lock, stock, and barrel, including its insidious repeatability  
, into account somehow. It is an interesting exercise. But if you tackle it, y  
ou will see that no matter how you twist and turn trying to avoid the C  
antor "hook", you are still caught on it. One might say that any self-procl  
aimed "table of all reals" is hoist by its own petard.
 
467
 
The repeatability of Cantor's diagonal method is similar to the repeatability of the Tortoise's diabolic method for breaking the Cra  
b's phonographs, one by one, as they got more and more "hi-fi" and-at l  
east so the Crab hoped-more "Perfect". This method involves constructing,  
for each phonograph, a particular song which that phonograph c  
annot reproduce. It is not a coincidence that Cantor's trick and the Tortoise's  
trick share this curious repeatability; indeed, the Contrncrostipunctus m  
ight well have been named "Cantorcrostipunctus" instead. Moreover, as t  
he Tortoise subtly hinted to the innocent Achilles, the events in the Contracrostipunctus are a paraphrase of the construction which Godel used in p  
roving his Incompleteness Theorem; it follows that the Godel construction is a  
lso very much like a diagonal construction. This will become quite apparent i  
n the next two Chapters.
 
467
 
From BlooP to F  
loo
 
467
 
We have now defined the class of primitive recursive functions and primitive recursive properties of natural numbers by means of programs written   
in the language BlooP. We have also shown that BlooP doesn't capture a  
ll the functions of natural numbers which we can define in words. We e  
ven constructed an "unBlooPable" function, Bluediag [ N], by Cantor's diag  
onal method. What is it about BlooP that makes Bluediag unrepresentable in it?  
How could BlooP be improved so that Bluediag became r  
epresentabl
 
467
 
BlooP's defining feature was the boundedness of its loops. What if w  
e drop that requirement on loops, and invent a second language, c  
alled "FlooP" ('F' for "free")? FlooP will be identical to BlooP except in o  
ne respect: we may have loops without ceilings, as well as loops with c  
eilings (although the only reason one would include a ceiling when writing a  
loop-statement in FlooP would be for the sake of elegance). These n  
ew loops will be called MU-LOOPS. This follows the convention of mathematical logic, in which "free" searches (searches without bounds) are u  
sually indicated by a symbol called a "”,-operator" (mu-operator). Thus, loopstatements in FlooP may look like t  
his: MU-LO  
OP: BLOCK n: B  
EGIN BLOCK n: E
 
468
 
This feature will allow us to write tests in FlooP for such properties a  
s wondrousness and the Tortoise property-tests which we did not kno  
w how to program in BlooP because of the potential open-endedness of t  
he searches involved. I shall leave it to interested readers to write a FlooP t  
est for wondrousness which does the following t  
hings: (1) If its input, N, is wondrous, the program halts and gives t  
he answer YES.  
( 2)If N is unwondrous, but causes a closed cycle other t  
han1-4-2-1-4-2-1 -... , the program halts and gives the a  
nswerN  
O.( 3)If N is unwondrous, and causes an "endlessly rising progression", the program never halts. This is FlooP's way of answering by not answering. FlooP's nonanswer bears a s  
trangeresemblance to Joshu's nonanswer "  
MU".The irony of case 3 is that OUTPUT always has the value NO, but it is a  
lways inaccessible, since the program is still grinding a way. That tro  
ublesome third alternative is the price that we must pay for the right to write f  
ree loops. In all FlooP programs incorporating the MU-LOOP option, nontermination will always be one theoretical alternative. Of course there will b  
e many FlooP programs which actually terminate for all possible input values. For instance, as I mentioned earlier, it is suspected by most people w  
ho have studied wondrousness that a FlooP program such as suggested a  
bove will always terminate, and moreover with the answer YES each t
 
468
 
Terminating and Nonterminating FlooP P  
rogram
 
468
 
It would seem extremely desirable to be able to separate FlooP p  
rocedures into two classes: terminators and nonterminators. A terminator will e  
ventually halt no matter what its input, despite the "MU-ness" of its loops. A non terminator will go on and on forever, for at least one choice of input. If we   
could always tell, by some kind of complicated inspection of a Flo  
oP program, to which class it belonged, there would be some r  
emarkable repercussions (as we shall shortly see). Needless to say, the operation o  
f class-checking would itself have to be a terminating operation-oth  
erwise one would gain noth
 
468
 
Turing's T  
ricker
 
468
 
The idea springs to mind that we might let a BlooP procedure do t  
he inspection. But BlooP procedures only accept numerical input, not programs! However, we can get around that . .. by coding programs into   
numbers! This sly trick is just Godel-numbering in another of its m  
any manifestations. Let the fifty-six characters of the FlooP alphabet get t  
he "codons" 901 ,902, ... , 956, respectively. So each FlooP program now g  
et
 
469
 
a very long Godel number. For instance, the shortest BlooP function (  
which is also a terminating FlooP p  
rogram)-DEFINE PROCEDURE "A" [  
B]: BLOCK 0: B  
EGIN BLOCK 0: END  
. -would get the Godel number partially shown below:  
904,905,906,909,914,905, . .......... , 905,914,904,95  
5, D EF INE E  
N
 
469
 
Now our scheme would be to write a BlooP test called TERMINAT  
OR? which says YES if its input number codes for a terminating FlooP p  
rogram, NO if not. This way we could hand the task over to a machine and w  
ith luck, distinguish terminators from nonterminators. However, an ingenious argument given by Alan Turing shows that no BlooP program c  
an make this distinction infallibly. The trick is actually much the same a  
s Godel's trick, and therefore closely related to the Cantor diagonal trick. W  
e shall not give it here-suffice it to say that the idea is to feed the termination tester its own Godel number. This is not so simple, however, for it is   
like trying to quote an entire sentence inside itself. You have to quote t  
he quote, and so forth; it seems to lead to an infinite regress. However, T  
uring figured out a trick for feeding a program its own Godel number. A s  
olution to the same problem in a different context will be presented next C  
hapter. In the present Chapter, we shall take a different route to the same g  
oal, which is namely to prove that a termination tester is impossible. For r  
eaders who wish to see an elegant and simple presentation of the Turing approach, I recommend the article by Hoare and Allison, mentioned in t  
he B
 
469
 
A Termination Tester Would Be M  
agica
 
469
 
Before we destroy the notion, let us delineate just why having a t  
ermination tester would be a remarkable thing. In a sense, it would be like having a  
magical dowsing rod which could solve all problems of number theory i  
n one swell FlooP. Suppose, for instance, that we wished to know if t  
he Goldbach Variation is a true conjecture or not. That is, do all numbers h  
ave the Tortoise property? We would begin by writing a FlooP test c  
alled TORTOISE? which checks whether its input has the Tortoise property. N  
ow the defect of this procedure-namely that it doesn't terminate if the Tortoise property is absent-here turns into a virtue! For now we run t  
he termination tester on the procedure TORTOISE?. If it says YES, that m  
eans that TORTOISE? terminates for all values of its input-in other words, a  
ll numbers have the Tortoise property. If it says NO, then we know t  
here exists a number which has the Achilles property. The irony is that we n  
ever actually use the program TORTOISE? at all-we just inspect i
 
469
 
This idea of solving any problem in number theory by coding it into a
 
470
 
program and then waving a termination tester over the program is n  
ot unlike the idea of testing a koan for genuineness by coding it into a f  
olded string and then running a test for Buddha-nature on the string instead. A  
s Achilles suggested, perhaps the desired information lies "closer to t  
he surface" in one representation than in a  
nothe
 
470
 
Pool F, Index Numbers, and Green P  
rogram
 
470
 
Well, enough daydreaming. How can we prove that the termination t  
ester is impossible? Our argument for its impossibility will hinge on trying t  
o apply the diagonal argument to Floo P,just as we did to BlooP. We shall s  
ee that there are some subtle and crucial differences between the two c  
ase
 
470
 
As we did for BlooP, imagine the pool of all FlooP programs. We s  
hall call it "Pool F". Then perform the same three filtering operations on P  
ool F, so that you get, in the e  
nd: A complete pool of all call-less FlooP programs which c  
alculate functions of exactly one input p  
arameter. Let us call these special FlooP-programs Green Programs (since they may g  
o forever).
 
470
 
Now just as we assigned index numbers to all Blue Programs, we c  
an assign index numbers to Green Programs, by ordering them in a catalogue  
, each volume of which contains all Green Programs of a fixed length,   
arranged in alphabetical order.
 
470
 
So far, the carry-over from BlooP to FlooP has been s  
traightforward. Now let us see if we can also carry over the last part: the diagonal t  
rick. What if we try to define a diagonal f  
unction? Greendiag [N] = 1 + Greenprogram{#N} [  
N]Suddenly, there is a snag: this function Greendiag [N] may not have a  
well-defined output value for all input values N. This is simply because w  
e have not filtered out the nonterminator programs from Pool F, and therefore we have no guarantee that we can calculate Greendiag [N] for all v  
alues of N. Sometimes we may enter calculations which never terminate. And t  
he diagonal argument cannot be carried through in such a case, for it d  
epends on the diagonal function having a value for all possible i
 
470
 
The Termination Tester Gives Us Red P  
rogram
 
470
 
To remedy this, we would have to make use of a termination tester, if o  
ne existed. So let us deliberately introduce the shaky assumption that o  
ne exists, and let us use it as our fourth filter. We run down the list of G  
reen Programs, eliminating one by one all non terminators, so that in the end we   
are left with:
 
471
 
A complete pool of all call-less FlooP programs which c  
alculate functions of exactly one input parameter, and which terminate f  
or all values of their input.  
Let us call these special FlooP programs Red Programs (since they all m  
ust stop). Now, the diagonal argument will go through. We d  
efine Reddiag [N] = 1 + Redprogram{#N} [  
N] and in an exact parallel to Bluediag. we are forced to conclude that Reddiag [N] is a well-defined, calculable function of one variable which is not in   
the catalogue of Red Programs, and is hence not even calculable in t  
he powerful language FlooP. Perhaps it is time to move on to G  
looP? GlooP .  
.. Yes, but what is GlooP? If FlooP is BlooP unchained, then GlooP must b  
e FlooP unchained. But how can you take the chains off twice? How do y  
ou make a language whose power transcends that of FlooP? In Reddiag, w  
e have found a function whose values we humans know how to calculate-t  
he method of doing so has been explicitly described in English-but whi  
ch seemingly cannot be programmed in the language FlooP. This is a s  
erious dilemma because no one has ever found any more powerful c  
omputer language than F  
loo
 
471
 
Careful investigation into the power of computer languages has b  
een carried out. We need not do it ourselves; let it just be reported that there i  
s a vast class of computer languages all of which can be proven to have exact  
ly the same expressive power as FlooP does, in this sense: any calculation w  
hich can be programmed in any one of the languages can be programmed i  
n them all. The curious thing is that almost any sensible attempt at d  
esigning a computer language ends up by creating a member of this class-which i  
s to say, a language of power equal to that of FlooP. It takes some doing t  
o invent a reasonably interesting computer language which is weaker t  
han those in this class. BlooP is, of course, an example of a weaker language, b  
ut it is the exception rather than the rule. The point is that there are s  
ome extremely natural ways to go about inventing algorithmic languages; a  
nd different people, following independent routes, usually wind up creat  
ing equivalent languages, with the only difference being style, rather t  
han powe
 
471
 
... Is a M  
yt
 
471
 
In fact, it is widely believed that there cannot be any more powerf  
ul language for describing calculations than languages that are equivalent t  
o FlooP. This hypothesis was formulated in the l 930's by two people, independently of each other: Alan Turing-about whom we will say m  
ore later-and Alonzo Church, one of the eminent logicians of this century. I  
 
472
 
is called the Church-Turing Thesis. If we accept the CT-Thesis, we have t  
o conclude that "GlooP" is a myth-there are no restrictions to remove i  
n FlooP, no ways to increase its power by "unshackling" it, as we did B  
loo
 
472
 
This puts us in the uncomfortable position of asserting that people c  
an calculate Reddiag [N] for any value of N, but there is no way to program a  
computer to do so. For, if it could be done at all, it could be done in   
FlooP-and by construction, it can't be done in FlooP. This conclusion is s  
o peculiar that it should cause us to investigate very carefully the pillars o  
n which it rests. And one of them, you will recall, was our shaky assumpti  
on that there is a decision procedure which can tell terminating from nonterminating FlooP programs. The idea of such a decision procedure already   
seemed suspect, when we saw that its existence would allow all problems o  
f number theory to be solved in a uniform way. Now we have double t  
he reason for believing that any termination test is a myth-that there is n  
o way to put FlooP programs in a centrifuge and separate out the terminat  
ors from the n  
onterminator
 
472
 
Skeptics might maintain that this is nothing like a rigorous proof t  
hat such a termination test doesn't exist. That is a valid objection; however, t  
he Turing approach demonstrates more rigorously that no computer program can be written in a language of the FlooP class which can perform a  
termination test on all FlooP progr  
am
 
472
 
The Church-Turing T  
hesi
 
472
 
Let us come back briefly to the Church-Turing Thesis. We will talk a  
bout it-and variations on it-in considerable detail in Chapter XVII; for now i  
t will suffice to state it in a couple of versions, and postpone discussion of i  
ts merits and meanings until then. Here, then, are three related ways to s  
tate the CT-Thesis:  
(1) What is human-computable is m  
achine-computable. (2) What is machine-computable is FlooP-com  
putable. (3) What is human-computable is F  
looP-computable (i.e., general or partial r
 
472
 
Terminology: General and Partial Recur  
siv
 
472
 
We have made a rather broad survey, in this Chapter, of some notions f  
rom number theory and their relations to the theory of computable functions. I  
t is a very wide and flourishing field, an intriguing blend of computer s  
cience and modern mathematics. We should not conclude this Chapter without   
introducing the standard terminology for the notions we have been d  
ealing w  
it
 
472
 
As has already been mentioned, "Bloop-computable" is s  
ynonymous with "primitive recursive". Now FlooP-computable functions can be di
 
473
 
vided into two realms: (1) those which are computable by terminating F  
looP programs: these are said to be general recursive; and (2) those which a  
re computable only by nonterminating FlooP programs: these are said to b  
e partial recursive. (Similarly for predicates.) People often just say "  
recursive" when they mean "general r
 
473
 
The Power of T  
N
 
473
 
It is interesting that TNT is so powerful that not only are all p  
rimitive recursive predicates represented, but moreover all general recursive predicates are represented. We shall not prove either of these facts, because s  
uch proofs would be superfluous to our aim, which is to show that TNT is   
incomplete. If TNT could not represent some primitive or general recursive predicates, then it would be incomplete in an uninteresting way-so w  
e might as well assume that it can, and then show that it is incomplete in a  
n interesting w
 
481
 
C HA PTER X  
IV On Formally Undecidabl  
e Propositions of TNT  
and Related Systems1
 
481
 
The Two Ideas of the "
 
481
 
THis CHAPTER'S TITLE is an adaptation of the title of Godel's fam  
ous 1 931 paper-"TNT" having been substituted for "Principia Mathematica".  
Godel's paper was a technical one, concentrating on making his p  
roof watertight and rigorous; this Chapter will be more intuitive, and in it I w  
ill stress the two key ideas which are at the core of the proof. The first key i  
dea is the deep discovery that there are strings of TNT which can be interpreted as speaking about other strings of TNT; in short, that TNT, as a  
language, is capable of "introspection", or self-scrutiny. This is what comes   
from Godel-numbering. The second key idea is that the property of selfscrutiny can be entirely concentrated into a single string; thus that s  
tring's sole focus of attention is itself. This "focusing trick" is traceable, in esse  
nce, to the Cantor diagonal m  
etho
 
481
 
In my opinion, if one is interested in understanding Godel's proof in a  
deep way, then one must recognize that the proof, in its essence, consists of   
a fusion of these two main ideas. Each of them alone is a master stroke; t  
o put them together took an act of genius. If I were to choose, h  
owever, which of the two key ideas is deeper, I would unhesitatingly pick the fi  
rst one-the idea of Godel-numbering, for that idea is related to the w  
hole notion of what meaning and reference are, in symbol-manipulating systems. This is an idea which goes far beyond the confines of mathemati  
cal logic, whereas the Cantor trick, rich though it is in mathematical consequences, has little if any relation to issues in real l
 
481
 
The First Idea: Proof-P  
air
 
481
 
Without further ado, then, let us proceed to the elaboration of the p  
roof itself. We have already given a fairly careful notion of what the G  
odel isomorphism is about, in Chapter IX. We now shall describe a mathematical notion which allows us to translate a statement such as "The string 0  
=0 is a theorem of TNT" into a statement of number theory. This will i  
nvolve the notion of proof-pairs. A proof-pair is a pair of natural numbers r  
elated in a particular way. Here is the i dea:
 
482
 
Two natural numbers, m and n respectively, form a TNTproof-pair if and only if m is the Godel number of a TNTderivation whose bottom line is the string with Godel number n  
. The analogous notion exists with respect to the MIU-system, and it is a l  
ittle easier on the intuition to consider that case first. So, for a moment, let u  
s back off from TNT-proof-pairs, and look at MIU-proof-pairs. T  
heir definition is p arallel:  
Two natural numbers, m and n respectively, form a MIU-proofpair if and only if m is the Godel number of a MIU-sy  
stem derivation whose bottom line is the string with Godel number n  
. Let us see a couple of examples involving MIU-proof-pairs. First, l  
et m = 3131131111301, n = 301. These values of m and n do indeed form a  
MIU-proof-pair, because m is the Godel number of the MIU-deri  
vation M  
l M  
il M  
illi M  
UI whose last line is MUI, having Godel number 301, which is n. By c  
ontrast, let m = 31311311130, and n = 30. Why do these two values not form a  
MIU-proof-pair? To see the answer, let us write out the alleged der  
ivation which m codes f  
or: M  
l M  
il M  
ill M  
U There is an invalid step in this alleged derivation! It is the step from t  
he second to the third line: from Mil to Mill. There is no rule of inference i  
n the MIU-system which permits such a typographical s  
tep. Correspondingly-and this is most crucial-there is no arithmetical rule of   
inference which carries you from 311 to 3111. This is perhaps a t  
rivial observation, in light of our discussion in Chapter IX, yet it is at the heart o  
f the Godel isomorphism. What we do in any formal system has its parallel i  
n arithmetical manipulatio
 
482
 
In any case, the values m = 31311311130, n = 30 certainly do n  
ot form a MIU-proof-pair. This in itself does not imply that 30 is not a  
MIU-number. There could be another value of m which forms a MIUproof-pair with 30. (Actually, we know by earlier reasoning that MU is not a  
MIU-theorem, and therefore no number at all can form a MIU-pr  
oof-pair with 3
 
482
 
Now what about TNT-proof-pairs? Here are two parallel e  
xamples, one being merely an alleged TNT-proof-pair, the other being a v  
alid TNT-proof-pair. Can you spot which is which? (Incidentally, here is w  
her
 
483
 
the '611' codon comes in. Its purpose is to separate the Godel numbers of   
successive lines in a TNT-derivation. In that sense, ' 611' serves as a punctuation mark. In the MIU-system, the initial '3' of all lines is sufficien  
t-no extra punctuation is neede  
d.) (I) m = 626,262,636,223,123,262 ,1 11,6 66,611,223 ,12 3,666,1 11,6  
66 n = 123 ,666,1 11,6  
66 (2) m = 626,262,636,223, 123,262,l l l ,666, 611,223,333,262,636,123,262,l l l,6  
66 n = 223,333,262,636,123,262,l l L  
666 It is quite simple to tell which one is which, simply by translating back to t  
he old notation, and making some routine examinations to s  
ee (1) whether the alleged derivation coded for by m is actually a  
legitimate deri  
vation; (2) if so, whether the last line of the derivation coincides with t  
he string which n codes f  
or. Step 2 is t rivial; and step 1 is also utterly straightforward, in this s ense:  
there are no open-ended searches involved, no hidden endless l  
oops. Think of the examples above involving the MIU-system, and now j  
ust mentally substitute the rules of TNT for the MIU-system's rules, and t  
he axioms of TNT for the MIU-system's one axiom. The algorithm in b  
oth cases is the same. Let me make that algorithm e  
xplicit: Go down the lines in the derivation one by o  
ne. Mark those which are a  
xioms. For each line which is not an axiom, check whether it follows b  
y any of the rules of inference from earlier lines in the a  
lleged derivation.  
If all nonaxioms follow by rules of inference from earlier l  
ines, then you have a legitimate derivation; otherwise it is a p  
hony der  
ivation. At each stage, there is a clear set of tasks to perform, and the number o  
f them is quite easily determinable in adva
 
483
 
Proof-Pair-ness Is Primitive Recur  
siv
 
483
 
The reason I am stressing the boundedness of these loops is, as you m  
ay have sensed, that I am about to a  
sïżœert FUNDAMENTAL FACT 1: The property of being a proof-pair is a  
primitive recursive number-theoretical property, and can therefore be tested for by a BlooP p  
rogram. There is a notable contrast to be made here with that other c  
losely related number-theoretical property: that of being a theorem-number. T  
 
484
 
assert that n is a theorem-number is to assert that some value of m e  
xists which forms a proof-pair with n. (Incidentally, these comments a  
pply equally well to TNT and to the MIU-system; it may perhaps help to k  
eep both in mind, the MIU-system serving as a prototype.) To check whether n  
is a theorem-number, you must embark on a search through all its p  
otential proof-pair "partners" m-and here you may be getting into an e  
ndless chase. No one can say how far you will have to look to find a number w  
hich forms a proof-pair with n as its second element. That is the whole p  
roblem of having lengthening and shortening rules in the same system: they lead t  
o a certain degree of unpredicta
 
484
 
The example of the Goldbach Variation may prove helpful at t  
his point. It is trivial to test whether a pair of numbers (m,n) form a Tortoisepair: that is to say, both m and n + m should be prime. The test is e  
asy because the property of primeness is primitive recursive: it admits of a  
predictably terminating test. But if we want to know whether n p  
ossesses the Tortoise property, then we are asking, "Does any number m form a  
Tortoise-pair with n as its second element?"-and this, once again, leads u  
s out into the wild, MU-loopy unknown .
 
484
 
. . . And Is Therefore Represented in T  
N
 
484
 
The key concept at this juncture, then, is Fundamental Fact I given a  
bove, for from it we can c  
onclude FuNDAMENT AL FACT 2: The property of forming a proof-pair i  
s testable in BlooP, and consequently, it is represented in TNT b  
y some formula having two free v  
ariables. Once again, we are being casual about specifying which system t  
hese proof-pairs are relative to; it really doesn't matter, for both Fundament  
al Facts hold for any formal system. That is the nature of formal systems: it i  
s always possible to tell, in a predictably terminating way, whether a g  
iven sequence of lines forms a proof, or not-and this carries over to t  
he corresponding arithmetical n
 
484
 
The Power of Proof-P  
air
 
484
 
Suppose we assume we are dealing with the MIU-system, for the sake of   
concreteness. You probably recall the string we called "MUMON", w  
hose interpretation on one level was the statement "MU is a theorem of t  
he MIU-system". We can show how MUMON would be expressed in TNT, i  
n terms of the formula which represents the notion of MIU-proof-pairs. L  
et us abbreviate that formula, whose existence we are assured of by Fundamental Fact 2, this way:  
MIU-PROOF-PAIR{a,a'
 
485
 
Since it is a property of two numbers, it is represented by a formula with  
two free variables. (Note: In this Chapter we shall always use a  
ustere TNT-so be careful to distinguish between the variables a, a', a ''.) In o  
rder to assert "MU is a theorem of the Ml U-system", we would have to make t  
he isomorphic statement "30 is a theorem-number of the MIU-system", a  
nd then translate that into TNT-notation. With the aid of our a  
bbreviation, this is easy (remember also from Chapter VIII that to indicate the replacement of every a' by a numeral, we write that numeral followed by "/a '"  
): 3a:MI LI-PROOF-PAIR{ a,SSSSSSSSSSSSSSSSSSSSSSSSSSSSSS0/ a  
'} Count the S's: there are 30. Note that this is a closed sentence of T  
NT, because one free variable was quantified, the other replaced by a n  
umeral. A clever thing has been done here, by the way. Fundamental Fact 2 gave u  
s a way to talk about proof-pairs; we have figured out how to talk a  
bout theorem-numbers, as well: you just acid an existential quantifier in front! A  
more literal translation of the string above would be, "There exists s  
ome number a that forms a MIU-proof-pair with 30 as its second e  
lemen
 
485
 
Suppose that we wanted to do something parallel with respect t  
o TNT-say, to express the statement ''0=0 is a theorem of TNT". We m  
ay abbreviate the formula which Fundamental Fact 2 assures us exists, in a  
n analogous way (with two free variables, again)  
: TNT-PROOF-PAIR{a,a'  
} (The interpretation of this abbreviated TNT-formula is: "Natural n  
umbers a and a' form a TNT-proof-pair.'') The next step is to transform o  
ur statement into number theory, following the MUMON-model above. T  
he statement becomes "There exists some number a which forms a TNTproof-pair with 666,111,666 as its second element". The TNT-formula   
which expresses this is:  
3a:TNT-PROOF-PAIR{a,SSSSS . ........ SSSSS0/a' }   
--------------many, many S  
's! (in fact, 666, 111,666 of them  
) -a closed sentence of TNT. (Let us call it "JOSHO", for reasons to a  
ppear momentarily.) So you see that there is a way to talk not only about t  
he primitive recursive notion of TNT-proof-pairs, but also about the relate  
d but trickier notion of TNT-theorem-n
 
485
 
To check your comprehension of these ideas, figure out how to translate into TNT the following statements of meta-  
TNT: (1) 0=0 is not a theorem of T:\IT.  
(2) ~0=0 is a theorem of T  
!\T. (3) ~0=0 is not a theorem of TNT.
 
486
 
How do the solutions differ from the example done above, and from e  
ach other? Here are a few more translation exerc  
ises. (4) JOsHO is a theorem of TNT. (Call the TNT-string w  
hich expresses this "META-j0SH0".  
) (5) META-JOsHO is a theorem of TNT. (Call the TNT-st  
ring which expresses this "META-META-JOsH  
O".) (6) META-META-j0SH0 is a theorem of T  
NT. (7) META-META-META-JOSHO is a theorem of T  
NT. (etc., e  
tc.) Example 5 shows that statements of meta-meta-TNT can be translated i  
nto TNT-notation; example 6 does the same for meta-meta-meta-TNT, e  
t
 
486
 
It is important to keep in mind the difference between expressing a  
property, and representing it, at this point. The property of being a TNTtheorem-number, for instance, is expressed by the f  
ormula 3a:TNT-PROOF-PAIR{a,a'  
} Translation: "a' is a TNT-theorem-number". However, we have n  
o guarantee that this formula represents the notion, for we have no guarant  
ee that this property is primitive recursive-in fact, we have more than a   
sneaking suspicion that it isn't. (This suspicion is well warranted. T  
he property of being a TNT-theorem-number is not primitive recursive, a  
nd no TNT-formula can represent the property!) By contrast, the property o  
f being a proof-pair, by virtue of its primitive recursivity, is both e  
xpressible and representable, by the formula already i
 
486
 
Substitution Leads to the Second Idea
 
486
 
The preceding discussion got us to the point where we saw how TNT can   
"introspect" on the notion of TNT-theoremhood. This is the essence of t  
he first part of the proof. We now wish to press on to the second major idea of   
the proof, by developing a notion which allows the concentration of t  
his introspection into a single formula. To do this, we need to look at w  
hat happens to the Godel number of a formula when you modify the f  
ormula structurally in a simple way. In fact, we shall consider this specific modification:  
replacement of all free variables by a specific n  
umeral. Below are shown a couple of examples of this operation in the left-hand  
column, and in the right-hand column are exhibited the parallel changes in   
Godel n  
umber
 
487
 
F  
ormula a=a  
We now replace a  
ll free variables b  
y the numeral for 2  
: SSO=S  
SO Godel n  
umber 262,111,2  
62 l  
123,123,666,l l l,123,123  
,666 * * * * *   
~3a:3a':a" = (SSa · S  
Sa') We now replace a  
ll free variables b  
y the numeral for 4  
: 223,333,262,636,333,262,163,6  
36, 262,163,163,l l l,362,123,1  
23,262, 236 ,123 , I 2  
l  
62, 1 63,3  
23 ~3a:3a':SSSSO= (SSa · SSa') 223,333,262,636,333,262,1  
63,636, 123,123,123,123,666,l l l,362,1  
23, 123,262,236,123,123,262,163,3  
2
 
487
 
An isomorphic arithmetical process is going on in the right-hand  
column, in which one huge number is turned into an even huger n  
umber. The function which makes the new number from the old one would not b  
e too difficult to describe arithmetically, in terms of additions, multiplications, powers of IO and so on-but we need not do so. The main point i  
s this: that the relation among (1) the original Godel number, (2) t  
he number whose numeral is inserted, and (3) the resulting Godel number, i  
s a primitive recursive relation. That is to say, a BlooP test could be writ  
ten which, when fed as input any three natural numbers, says YES if they they   
are related in this way, and NO if they aren't. You may test yourself on y  
our ability to perform such a test-and at the same time convince yourself t  
hat there are no hidden open-ended loops to the process-by checking the   
following two sets of three numbers:  
(1) 362,262,112,262,163,323,l l l,123,123,123,123  
,666; 2  
; 362,123,123,666,112,123, 123,666,323,l l l,123,123,123,123  
,666. (2) 223,362,262,236,262,323,111,262,1  
63; I  
; 223,362,123,666,236,123,666,323,111,262,163  
. As usual, one of the examples checks, the other does not. Now this relationship between three numbers will be called the substitution r  
elationship. Because it is primitive recursive, it is represented by some formula of T  
NT having three free variables. Let us abbreviate that TNT-formula by t  
he following n  
otation: SUB{ a,a' ,a"
 
488
 
Because this formula represents the substitution relationship, the f  
ormula shown below must be a T  
NT-theorem: SUB  
{SSSSS ..... SSSSSO/a,SSO/a',SSSSSS   
..... SSSSO/a"  
} ------------------------262,lJl,262 S's 123,123,666,ll l,123,123,666 S  
's (This is based on the first example of the substitution relation shown in t  
he parallel columns earlier in this section.) And again because the SUB formula represents the substitution relation, the formula shown below certainly is not a T  
NT-theorem: SUB{SSSO/a,SSO/a' .SO/a' '
 
488
 
Arithmoquin  
in
 
488
 
We now have reached the crucial point where we can combine all of o  
ur disassembled parts into one meaningful whole. We want to use the machinery of the TNT-PROOF-PAIR and SUB formulas in some way to construct a  
single sentence of TNT whose interpretation is: 'This very string of T  
NT is not a TNT-theorem." How do we do it? Even at this point, with all t  
he necessary machinery in front of us, the answer is not easy to f
 
488
 
A curious and perhaps frivolous-seeming notion is that of sub  
stituting a formula's own Godel number into itself. This is quite parallel to that o  
ther curious, and perhaps frivolous-seeming, notion of "quining" in the Air on   
G's String. Yet quining turned out to have a funny kind of importance, i  
n that it showed a new way of making a self-referential sentence. Selfreference of the Quine variety sneaks up on you from behind the first t  
ime you see it-but once you understand the principle, you appreciate that it i  
s quite simple and lovely. The arithmetical version of quining-let's call i  
t arithmoquining-will allow us to make a TNT-sentence which is "  
about itsel
 
488
 
Let us see an example of arithmoquining. We need a formula with a  
t least one free variable. The following one will d  
o: a=SO  
This formula's Godel number is 262,111,123,666, and we will stick t  
his number into the formula itself-or rather, we will stick its numeral in. H  
ere is the r  
esult: sssss ..... ssssso = s  
o -----------262,111,123,666 S  
's This new formula asserts a silly falsity-that 262,111,123,666 equals 1. I  
f we had begun with the string ~a=SO and then arithmoquined, we w  
ould have come up with a true statement-as you can see for yourself.
 
488
 
When you arithmoquine, you are of course performing a special c  
as
 
489
 
of the substitution operation we defined earlier. If we wanted to s  
peak about arithmoquining inside TNT, we would use the f  
ormula S UB{a'', a", a'}  
where the first two variables are the same. This comes from the fact that w  
e are using a single number in two different ways (shades of the Can  
tor diagonal method!). The number a" is both (1) the original Godel n  
umber, and (2) the insertion-number. Let us invent an abbreviation for the a  
bove f  
ormula: ARITHMOQUINE{a", a  
'} What the above formula says, in English, is:  
a' is the Godel number of the formula gotten by arith  
moquining the formula with Godel number a  
". Now the preceding sentence is long and ugly. Let's introduce a concise a  
nd elegant term to summarize it. We'll say   
a' is the arithmoquinification of a"  
to mean the same thing. For instance, the arithmoquinification o  
f 262, 111,123,666 is this unutterably gigantic n  
umber: 123,123,123, ..... ,123,123 ,1 23,666 ,l l l ,123,  
666 2 62,111,123,666 copies of ' 12  
3' (This is just the Godel number of the formula we got when w  
e arithmoquined a=S0.) We can s peak quite easily about a  
rithmoquining inside T
 
489
 
The Last S  
tra
 
489
 
Now if you look back in the Air on G's String, you will see that the ult  
imate trick necessary for achieving self-reference in Quine's way is to quine a   
sentence which itself talks about the concept of quining. It's not enough j  
ust to quine-you must quine a quine-rnentioning sentence! All right, thenthe parallel trick in our case must be to arithmoquine some formula w  
hich itself is talking about the notion of arithmoquining!
 
489
 
Without further ado, we'll now write that formula down, and call it G's  
uncle:  
~3a:3a':<TNT-PROOF-PAIR{a,a'}AARITHMOQU1NE{a'',  
a'}> You can see explicitly how arithmoquinification is thickly involved in t  
he plot. Now this "uncle" has a Godel number, of course, which we'll call 'u
 
490
 
The head and tail of u's decimal expansion, and even a teeny bit of i  
ts midsection, can be read off directly  
: u = 223,333,262,636,333,262,16 3,636,212, ... ,161, ... , 21  
3 For the rest, we'd have to know just how the formulas TNT-PROOF-PAIR   
and ARITHMOQUINE actually look when written out. That is too c  
omplex, and it is quite beside the point, in any c  
as
 
490
 
Now all we need to do is-arithmoquine this very uncle! What t  
his entails is "booting out" all free variables-of which there is only o  
ne, namely a"-and putting in the numeral for u everywhere. This gives u  
s: ~3a:3a':<TNT-PROOF-PAIR{a,a'}AARITHMOQUINE{SSS ... SSS0/a",a  
'}> ---------u S  
's And this, believe it or not, is Godel's string, which we can call 'G'. Now t  
here are two questions we must answer without delay. They a  
re (1) What is G's Godel n  
umber? (2) What is the interpretation of G  
? Question 1 first. How did we make G? Well, we began with the uncle, a  
nd arithmoquined it. So, by the definition of arithmoquinification, G's G  
odel number i  
s: the arithmoquinification of u  
. Now question 2. We will translate G into English in stages, getting g  
radually more comprehensible as we go along. For our first rough try, we make a  
p retty literal trans  
lation: "There do not exist numbers a and a' such that both ( 1) they fo  
rm a TNT-proof-pair, and (2) a' is the arithmoquinification of u ."  
Now certainly there is a number a' which is the arithmoquinification o  
f u-so the problem must lie with the other number, a. This observ  
ation allows us to rephrase the translation of G as f  
ollows: "There is no number a that forms a T  
NT-proof-pair with the arithmoquinification of u ."  
(This step, which can be confusing, is explained below in more detail.) D  
o you see what is happening? G is saying t  
his: "The formula whose Godel number is the a  
rithmoquinification of u is not a theorem of T  
NT." But-and this should come as no surprise by now-that formula is n  
one other than G itself; whence we can make the ultimate translation of G a  
s "G is not a theorem of T
 
491
 
-or if you p  
refer, "I am not a theorem of T  
NT." We have gradually pulled a high-level interpretation-a sentence o  
f meta-TNT-out of what was originally a low-level interpretation-a sentence of number t  
heor
 
491
 
TNT Says "Uncle!
 
491
 
The main consequence of this amazing construction has already b  
een delineated in Chapter IX: it is the incompleteness of TNT. To reiterate t  
he argument:  
Is G a TNT-theorem? If so, then it must assert a truth. But  
what in fact does G assert? Its own nontheoremhood. Thus f  
rom its theoremhood would follow its nontheoremhood: a contradict  
ion. Now what about G being a nontheorem? This is acceptable, i  
n that it doesn't lead to a contradiction. But G's nontheoremhood is   
what G asserts-hence G asserts a truth. And since G is not a  
theorem, there exists (at least) one truth which is not a theorem o  
f T
 
491
 
Now to explain that one tricky step again. I will use another s  
imilar example. Take this s  
tring: ~3a:3a':<TORTOISE-PAIR{a,a'}ATENTH-POWER{SS0/a'',  
a'}> where the two abbreviations are for strings of TNT which you can w  
rite down yourself. TENTH-POWER{a",a '} represents the statement "a' is t  
he tenth power of a "". The literal translation into English is t  
hen: "There do not exist numbers a and a' such that both (1) they f  
orm a Tortoise-pair, and (2) a' is the tenth power of 2  
." But clearly, there is a tenth power of 2-namely 1024. Therefore, what t  
he string is really saying is t  
hat "There is no number a that forms a Tortoise-pair with 102  
4" which can be further boiled down t  
o: "1024 does not have the Tortoise propert  
y." The point is that we have achieved a way of substituting a description of a  
number, rather than its numeral, into a predicate. It depends on using o  
ne extra quantified variable (a'). Here, it was the number 1024 that w  
as described as "the tenth power of 2"; above, it was the number described a  
s "the arithmoquinification of u
 
492
 
"Yields Nontheoremhood When Arithm  
oquine
 
492
 
Let us pause for breath for a moment, and review what has been done. T  
he best way I know to give some perspective is to set out explicitly how i  
t compares with the version of the Epimenides paradox due to Quine. Here  
1s a map:  
falsehood Âą:  
:;, quotation of a phrase Âą:  
=> preceding a predicate Âą:  
:;, by a subj  
ect preceding a predicate Âą:  
=> by a quoted p  
hrase preceding a predicate Âą:  
=> by itself, in q  
uotes ("quini  
ng") yields falsehood when quined Âą:  
=> (a predicate without a subj  
ect) "yields falsehood when quined" Âą:  
=> (the above predicate, q  
uoted) "yields falsehood when quined" Âą:  
:;, yields falsehood when quine  
d (complete sentence formed b  
y quining the above predic  
ate) n  
ontheoremhood Godel number of a s  
tring substituting a numeral (  
or definite term) into an open fo  
rmula substituting the Godel n  
umber of a string into an open fo  
rmula substituting the Godel n  
umber of an open formula into t  
he formula itself ("arithmoquining  
") the "uncle" of G  
(an open formula of T  
NT) the number u (the Godel n  
umber of the above open form  
ula) G i  
tself (sentence of TNT formed b  
y substituting u into the u  
ncle, i.e., arithmoquining the u
 
492
 
Godel's Second T  
heore
 
492
 
Since G's interpretation is true, the interpretation of its negation ~G i  
s false. And we know that no false statements are derivable in TNT. Th u  
s neither G nor its negation ~G can be a theorem of TNT. We have found a "  
hole" in our system-an undecidable proposition. This has a number of ramifications. Here is one curious fact which follows from G's undecidabilit y:  
although neither G nor ~G is a theorem, the formula <Gv~G> is a  
theorem, since the rules of the Propositional Calculus ensure that all   
well-formed formulas of the form <Pv~P> are t  
heorem
 
492
 
This is one simple example where an assertion inside the system and a  
n assertion about the system seem at odds with each other. It makes o  
ne wonder if the system really reflects itself accurately. Does the "refle  
cted metamathematics" which exists inside TNT correspond well to t  
he metamathematics which we do? This was one of the questions which intrigued Godel when he wrote his paper. In particular, he was interested i  
n whether it was possible, in the "reflected metamathematics'', to p  
rove TNT's consistency. Recall that this was a great philosophical dilemma o  
 
493
 
the d ay: how to prove a system consistent. Godel found a simple way t  
o express the statement "TNT is consistent" in a TNT formula; and then he   
showed that this formula (and all others which express the same idea) a  
re only theorems of TNT under one condition: that TNT is inconsistent. T  
his perverse result was a severe blow to optimists who expected that one c  
ould find a rigorous proof that mathematics is contradiction-fr  
e
 
493
 
How do you express the statement "TNT is consistent" inside TNT? I  
t hinges on this simple fact: that inconsistency means that two formulas, x  
and ~ x, one the negation of the other, are both theorems. But if both x  
and ~ x are theorems, then according to the Propositional Calculus, a  
ll well-formed formulas are theorems. Thus, to show TNT's consistency, i  
t would suffice to exhibit one single sentence of TNT which can be proven t  
o be a nontheorem. Therefore, one way to express "TNT is consistent" is t  
o say "The formula ~0=0 is not a theorem of TNT". This was a  
lready proposed as an exercise a few pages back. The translation i  
s: ~3a:TNT-PROOF-PAIR{a  
,SSSSS ..... SSSSS0/a'  
} --------------223 ,666,111,666 S  
's It can be shown, by lengthy but fairly straightforward reasoning, t  
hat-as long as TNT is consistent-this oath-of-consistency by TNT is not a theorem of TNT. So TNT's powers of introspection are great when it comes t  
o expressing things, but fairly weak when it comes to proving them. This i  
s quite a provocative result, if one applies it metaphorically to the h  
uman problem of s
 
493
 
TNT Is w-lnco  
mplet
 
493
 
TNT's incompleteness is of the "omega" variety-defined in Chapter VIII.  
This means that there is some infinite pyramidal family of strings all o  
f which are theorems, but whose associated "summarizing string" is a nontheorem. It is easy to exhibit the summarizing string which is a nontheor  
em: u S  
's ïżœ Va:~3a':<TNT-PROOF-PAIR{a,a' }AARITHMOQUINE{SSS ... SSS0/a",a  
'}> To understand why this string is a nontheorem, notice that it is e  
xtremely similar to G itself-in fact, G can be made from it in one step (  
viz., according to TNT's Rule of Interchange). Therefore, if it were a t  
heorem, so would G be. But since G isn't a theorem, neither can this b
 
493
 
Now we want to show that all of the strings in the related pyram  
idal family are theorems. We can write them down easily enoug
 
494
 
u  
u S  
's _,.___ -3a': <TNT-PROOF-PAIR{0/a,a'} A ARITHMOQUINE{SSS ... SSS0/a", a  
'}>-3a': <TNT-PROOF-PAIR{S0/a,a'} A ARITHMOQUINE{SSS ... SSS0/a", a  
')>-3a': <TNT-PROOF-PAIR{SS0/a,a'} A ARITHMOQUINE{SSS ... SSS0/a", a  
'}>-3a': <TNT-PROOF-PAIR{SSS0/a,a') AARITHMOQUINE{SSS ... SSS0/a", a  
')>What does each one assert? Their translations, one by one, a  
re: "O and the arithmoquinification of u do not form a TNT-proof-  
pair." "l and the arithmoquinification of u do not form a TNT-proof-  
pair." "2 and the arithmoquinification of u do not form a TNT-proof-p  
air." "3 and the arithmoquinification of u do not form a TNT-proof-p  
air." Now each of these assertions is about whether two specific integers form a   
proof-pair or not. (By contrast, G itself is about whether one specific i  
nteger is a theorem-number or not.) Now because ·G is a nontheorem, no i  
nteger forms a proof-pair with G's Godel number. Therefore, each of the statements of the family is true. Now the crux of the matter is that the p  
roperty of being a proof-pair is primitive r ecursive, hence represented, so that e  
ach of the statements in the list above, being true, must translate into a t  
heorem of TNT-which means that everything in our infinite pyramidal family is a  
theorem. And that shows why TNT is w
 
494
 
Two Different Ways to Plug Up the H  
ol
 
494
 
Since G's interpretation is true, the interpretation of its negation ~G i  
s false. And, using the assumption that TNT is consistent, we know that n  
o false statements are derivable in TNT. Thus neither G nor its negation ~  
G is a theorem of TNT. We have found a hole in our system-an u  
ndecidable proposition. Now this need be no source of alarm, ifwe are philosophical  
ly detached enough to recognize what this is a symptom of. It signifies t  
hat TNT can be extended, just as absolute geometry could be. In fact, T  
NT can be extended in two distinct directions, just as absolute geometry c  
ould be. It can be extended in a standard direction-which corresponds to extending absolute geometry in the Euclidean direction; or, it can be extended in a nonstandard direction-which corresponds, of course, to extending absolute geometry in the non-Euclidean direction. Now the standard type of extension would i  
nvolve adding G as a new a  
xio
 
495
 
This suggestion seems rather innocuous and perhaps even desirable, sinc  
e, after all, G asserts something true about the natural number system. B  
ut what about the nonstandard type of extension? If it is at all parallel to t  
he case of the parallel postulate, it must i  
nvolve adding the negation of G as a new a  
xiom. But how can we even contemplate doing such a repugnant, hideous t  
hing? After all, to paraphrase the memorable words of Girolamo Saccheri, isn  
't what ~G says "repugnant to the nature of the natural n
 
495
 
Supernatural Num  
ber
 
495
 
I hope the irony of this quotation strikes you. The exact problem w  
ith Saccheri's approach to geometry was that he began with a fixed notion o  
f what was true and what was not true, and he set out only to prove what h  
e'd assessed as true to start with. Despite the cleverness of his approa  
ch-which involved denying the fifth postulate, and then proving many "repugna  
nt" propositions of the ensuing geometry-Saccheri never entertained t  
he possibility of other ways of thinking about points and lines. Now we s  
hould be wary of repeating this famous mistake. We must consider impartially, t  
o the extent that we can, what it would mean to add ~ Gas an axiom to T  
NT. Just think what mathematics would be like today if people had n  
ever considered adding new axioms of the following s  
orts: 3a:(a+a)=  
S0 3a:Sa=  
O 3a:(a · a) =SS  
O 3 a:S (a·a) =  
0 While each of them is "repugnant to the nature of previously k  
nown number systems", each of them also provides a deep and wonderful extension of the notion of whole numbers: rational numbers, negative n  
umbers, irrational numbers, imaginary numbers. Such a possibility is what ~G i  
s trying to get us to open our eyes to. Now in the past, each new extension o  
f the notion of number was greeted with hoots and catcalls. You can hear t  
his particularly loudly in the names attached to the unwelcome arrivals, such a  
s "irrational numbers", "imaginary numbers". True to this tradition, we s  
hall name the numbers which ~G is announcing to us the supernatural n  
umbers, .  
showing how we feel they violate all reasonable and commonsensical not
 
495
 
If we are going to throw ~ Gin as the sixth axiom of TNT, we h  
ad better understand how in the world it. could coexist, in one system, with t  
he infinite pyramidal family we just finished discussing. To put it b luntly, ~  
G s  
ays: 4  
52 "There exists some number which forms a TNT-p  
roof-pair with the arithmoquinification of u
 
496
 
-but the various members of the pyramidal family s uccessively a  
ssert:"O is not that n  
umber" "l is not that n  
umber" "2 is not that n  
umber" This is rather confusing, because it seems to be a complete contradiction  
(which is why it is called "w-inconsiste ncy"). At the root of our  
confusion-much as in the case of the splitting of geometry-is our stubborn resistance to adopt a modified interpretation for the symbols, d  
espite the fact that we are quite aware that the system is a modified s ystem. W  
e want to get away without reinterpreting any symbols-and of course that   
will prove i
 
496
 
The reconciliation comes when we reinterpret 3 as "There exists a  
generalized natural number", rather than as "There exists a n  
atural number". As we do this, we shall also reinterpret Vin the c  
orresponding way. This means that we are opening the door to some extra n  
umbers besides the natural numbers. These are the supernatural numbers. T  
he naturals and supernaturals together make up the totality of generali  
zed n
 
496
 
The apparent contradicrion vanishes into thin air, now, for the pyramidal family still says what it said b efore: "No natural number forms a  
TNT-proof-pair with the arithmoquinification of u ." The family doesn'  
t say anything about supernatural numbers, because there are no numerals   
for them. But now, ~G says, "There exists a generalized natural n  
umber which forms a TNT-proof-pair with the arithmoquinification of u. " It i  
s clear that taken together, the f amily and ~G tell us something: that there i  
s a supernatural number which forms a TNT-proof-pair with the arithmoquinification of u. That is all-there is no contradiction any more.   
TNT +~G is a consistent system, under an interpretation which i  
ncludes supernatural n  
umber
 
496
 
Since we have now agreed to extend the interpretations of the t  
wo quantifiers, this means that any theorem which involves either of them h  
as an extended meaning. For example, the commutativity t  
heorem Va:Va':(a+a')=(a' +  
a) now tells us that addition is commutative for all generalized n  
atural numbers-in other words, not only for natural numbers, but also f  
or supernatural numbers. Likewise, the TNT-theorem which says "2 is not t  
he square of a natural number  
"-~3a:( a·a)=  
SSO -now tells us that 2 is not the square of a supernatural number, either. I  
nfact, supernatural numbers share all the properties of natural numbers, a  
 
497
 
long as those properties are given to us in theorems of TNT. In o  
ther words, everything that can be formally proven about natural numbers i  
s thereby established also for supernatural numbers. This means, in particular, that supernatural numbers are not anything already familiar to you,  
such as fractions, or negative numbers, or complex numbers, or whatev  
er. The supernatural numbers are, instead, best visualized as integers w  
hich are greater than all natural numbers--as infinitely large integers. Here is t  
he point: although theorems of TNT can rule out negative numbers, fractions, irrational numbers, and complex numbers, still there is no way t  
o rule out infinitely large integers. The problem is, there is no way even t  
o express the statement "There are no infinite q
 
497
 
This sounds quite strange, at first. Just exactly how big is the n  
umber which makes a TNT-proof-pair with G's Godel number? (Let's call i t'/',  
for no particular reason.) Unfortunately, we have not got any good vocabulary for describing the sizes of infinitely large integers, so I am afraid I  
cannot convey a sense of J's magnitude. But then just how big is i (  
the square root of -1)? Its size cannot be imagined in terms of the sizes of   
familiar natural numbers. You can't say, "Well, i is about half as big as 14,  
and 9/10 as big as 24." You have to say, "i squared is -1", and more or l  
ess leave it at that. A quote from Abraham Lincoln seems a propos h  
ere. When he was asked, "How long should a man's legs be?" he drawled, "  
Long enough to reach the ground." That is more or less how to answer t  
he question about the size of /-it should be just the size of a number which  
specifies the structure of a proof of G-no bigger, no smal  
le
 
497
 
Of course, any theorem of TNT has many different derivations, so y  
ou might complain that my characterization o f/ is non unique. That is so. B  
ut the parallel with i-the square root of - I-still holds. Namely, recall t  
hat there is another number whose square is also minus o ne: -i. Now i and -  
i are not the same number. They just have a property in common. The o  
nly trouble is that it is the property which defines them! We have to choose one   
of them-it doesn't matter which one--and call it "i". In fact there's no w  
ay of telling them apart. So for all we know we could have been calling t  
he wrong one "i" for all these centuries and it would have made no differenc  
e. Now, like i, I is also nonuniquely defined. So you just have to think o f/ a  
s being some specific one of the many possible supernatural numbers w  
hich form TNT-proof-pairs with the arithmoquinification of u
 
497
 
Supernatural Theorems Have Infinitely Long Derivati  
on
 
497
 
We haven't yet faced head on what it means to throw -G in as an a  
xiom. We have said it but not stressed it. The point is that -G asserts that G has a  
proof How can a system survive, when one of its axioms asserts that its o  
wn negation has a proof? We must be in hot water now! Well, it is not so bad a  
s you might think. As long as we only construct finite proofs, we will n  
ever prove G. Therefore, no calamitous collision between G and its negation -  
G will ever take p lace. The supernatural number / won't cause any d  
isaste
 
498
 
However, we will have to get used to the idea that ~G is now the one w  
hich asserts a truth ("G has a proof"), while G asserts a falsity ("G has n  
o proof"). In standard number theory it is the other way around-but t  
hen, in standard number theory there aren't any supernatural numbers. N  
otice that a supernatural theorem of TNT-namely G-may assert a falsity, b  
ut all natural theorems still assert t  
ruth
 
498
 
Supernatural Addition and Mul  
tiplicatio
 
498
 
There is one extremely curious and unexpected fact about s  
upernaturals which I would like to tell you, without proof. (I don't know the p  
roof either.) This fact is reminiscent of the Heisenberg uncertainty principle in   
quantum mechanics. It turns out that you can "index" the supernaturals i  
n a simple and natural way by associating with each supernatural number a  
trio of ordinary integers (including n egative ones). Thus, our o  
riginal supernatural number, /, might have the index set (9,-8,3), and its successor, / + 1, might have the index set (9,-8,4). Now there is no unique way t  
o index the supernaturals; different methods offer different advantages a  
nd disadvantages. Under some indexing schemes, it is very easy to calculat  
e the index triplet for the sum of two supernaturals, given the indices of t  
he two numbers to be added. Under other indexing schemes, it is very easy to   
calculate the index triplet for the product of two supernaturals, given t  
he indices of the two numbers to be multiplied. But under no indexing s  
cheme is it possible to calculate both. More p recisely, if the sum's index can b  
e calculated by a recursive function, then the product's index will not be a  
r ecursive function; and conversely, if the product's index is a r  
ecursive function, then the sum's index will not be. Therefore, supernatural schoolchildren who learn their supernatural plus-tables will have to be excused i  
f they do not know their supernatural times-tables-and vice versa! Y  
ou cannot know both at the same t  
im
 
498
 
Supernaturals Are Useful .
 
498
 
supernatural fractions (ratios of two supernaturals), supernatural r  
eal numbers, and so on. In fact, the calculus can be put on a new footing, u  
sing the notion of supernatural real numbers. Infinitesimals such as dx and dy,  
those old bugaboos of mathematicians, can be completely justified, b  
y considering them to be reciprocals of infinitely large real numbers! S  
ome theorems in advanced analysis can be proven more intuitively with the a  
id of "nonstandard a
 
498
 
But Are They R
 
498
 
Nonstandard number theory is a disorienting thing when you first meet u  
p with it. But then, non-Euclidean geometry is also a disorienting subject. I  
 
499
 
both instances, one is powerfully driven to ïżœsk, "But which one of these t  
wo rival theories is correct? Which is th1‱ truth?" In a certain sense, there is n  
o answer to such a question. (And yet, in another sense-to be d  
iscussed later-there is an answer.) The reason that there is no answer to t  
he question is that the two rival theories, although they employ the s  
ame terms, do not talk about the same concepts. Therefore, they are o  
nly superficially rivals, just like Euclidean and non-Euclidean geometries. I  
n geometry, the words "point", "line". and so on are undefined terms, a  
nd their meanings are determined by the axiomatic system within which t  
hey are u  
se
 
499
 
Likewise for number theory. When we decided to formalize TNT, w  
e preselected the terms we would use as interpretation words-for i  
nstance, words such as "number", "plus", "times", and so on. By taking the step o  
f formalization, we were committing ourselves to accepting whatever p  
assive meanings these terms might take on. But-just like Saccheri-we didn  
't anticipate any surprises. We thought we knew what the true, the real, t  
he only theory of natural numbers was. We didn't know that there would b  
e some questions about numbers which TNT would leave open, and w  
hich could therefore be answered ad libitum by extensions of TNT heading o  
ff in different directions. Thus, there is no basis on which to say that n  
umber theory "really" is this way or that, just as one would be loath to say that t  
he square root of - 1 "really" exists, or "really" does n  
o
 
499
 
Bifurcations in Geometry, and Physicists
 
499
 
There is one argument which can be, and perhaps ought to be, r  
aised against the preceding. Suppose experiments in the real, p hysical world c  
an be explained more economically in terms of one particular version o  
f geometry than in terms of any other. Then it might make sense to say t  
hat that geometry is "true". From the point of view of a physicist who wants t  
o use the "correct" geometry, then it makes some sense to distinguish between the "true" geometry, and o ther geometries. But this cannot be t  
aken too simplistically. Physicists are always dealing with approximations a  
nd idealizations of situations. For instance, my own Ph.D. work, mentioned i  
n Chapter V, was based on an extreme idealization of the problem of a cryst  
al in a magnetic field. The mathematics which emerged was of a high d  
egree of beauty and symmetry. Despite--or rather, because of-the artificiality o  
f the model, some fundamental features emerged conspicuously in t  
he graph. These features then suggest some guesses about the kinds of t  
hings that might happen in more realistic ïżœituations. But without the s  
implifying assumptions which produced my graph, there could never be such i  
nsights. One can see this kind of thing over and over again in p hysics, where a  
physicist uses a "nonreal" situation to learn about deeply hidden features o  
f reality. Therefore, one should be extremely cautious in saying that t  
he brand of geometry which physicists might wish to use would represent "  
t
 
500
 
true geometry'-', for in fact, physicists will always use a variety of different  
geometries, choosing in any g iven situation the one that seems simplest a  
nd most c  
onvenien
 
500
 
Furthermore-and perhaps this is even more to the point-phys  
icists do not study just the 3-D space we live in. There are whole families o  
f "abstract spaces" within which physical calculations take place, spaces w  
hich have totally different geometrical properties from the physical space w  
ithin which we live. Who is to say, then, that "the true geometry" is defined b  
y the space in which Uranus and Neptune orbit around the sun? There i  
s "Hilbert space", where quantum-mechanical wave functions undulat  
e; there is "momentum space", where Fourier components dwell; there i  
s "reciprocal space", where wave-vectors cavort; there is "phase s  
pace", where many-particle configurations swish; and so on. There is a  
bsolutely no reason that the geometries of all these spaces should be the same; in fa  
ct, they couldn't possibly be the same! So it is essential and vital for phys  
icists that different and "rival" geometries should e
 
500
 
Bifurcations in Number Theory, and B  
anker
 
500
 
So much for geometry. What about number theory? Is it also essential a  
nd vital that different number theories should coexist with each other? If you   
asked a bank officer, my guess is that you would get an expression o  
f horror and disbelief. How could 2 and 2 add up to anything but 4? A  
nd moreover, if 2 and 2 did not make 4, wouldn't world economies c  
ollapse immediately under the unbearable uncertainty opened up by that fact? N  
ot really. First of all, nonstandard number theory doesn't threaten the a  
ge-old idea that 2 plus 2 equals 4. It differs from ordinary number theory only i  
n the way it deals with the concept of the infinite. After all, every theorem o  
f TNT remains a theorem in any extension of TNT! So bankers need not d  
espair of the chaos that will arrive when nonstandard number theory takes o  
ve
 
500
 
And anyway, entertaining fears about old facts being changed b  
etrays a misunderstanding of the relationship between mathematics and the r  
eal world. Mathematics only tells you answers to questions in the real w  
orld after you have taken the one vital step of choosing which kind of mathematics to apply. Even if there were a rival number theory which used t  
he symbols '2', '3', and '+', and in which a theorem said "2 + 2 = 3", t  
here would be little reason for bankers to choose to use that theory! For t  
hat theory does not fit the way money works. You fit your mathematics to t  
he world, and not the other way around. For instance, we don't apply n  
umber theory to cloud systems, because the very concept of whole numbers h  
ardly fits. There can be one cloud and another cloud, and they will come together  
and instead of there being two clouds, there will still only be one. T  
his doesn't prove that 1 plus 1 equals l; it just proves that our numbertheoretical concept of "one" is not applicable in its full power to cloudc  
ountin
 
501
 
Bifurcations in N umber Theory, and Metamathematic  
ian
 
501
 
So bankers, cloud-counters, and most of the rest of us need not w  
orry about the advent of supernatural numbers: they won't affect our e  
veryday perception of the world in the slightest. The only people who m  
ight actually be a little worried are people whose endeavors depend in s  
ome crucial way on the nature of infinite entities. There aren't too many s  
uch people around-but mathematical logicians are members of this c  
ategory. How can the existence of a bifurcation in number theory affect them? W  
ell, number theory plays two roles in logic: (1) when axiomatized, it is an objec  
t of study; and (2) when used informally, it is an indispensable tool with which   
formal systems can be investigated. This is the use-mention distinction o  
nce again, in fact: in role (1), number theory is mentioned, in role (2) it is u  
se
 
501
 
Now mathematicians have judged that number theory is applicable to   
the study of formal systems even if not to cloud-counting, just as b  
ankers have judged that the arithmetic of real numbers is applicable to t  
heir transactions. This is an extramathematical judgement, and shows that t  
he thought processes involved in doing mathematics, just like those in o  
ther areas, involve "tangled hierarchies" in which thoughts on one level can   
affect thoughts on any other level. Levels are not cleanly separated, as t  
he formalist version of what mathematics is would have one believe.
 
501
 
The formalist philosophy claims that mathematicians only deal w  
ith abstract symbols, and that they couldn't care less whether those sym  
bols have any applications to or connect.ions with reality. But that is quite a  
distorted picture. Nowhere is this clearer than in metamathematics. If t  
he theory of numbers is itself used as an aid in gaining factual knowledge a  
bout formal s ystems, then mathematicians are tacitly showing that they b  
elieve these ethereal things called "natural numbers" are actually part of realitynot just figments of the imagination. This is why I parenthetically remarked earlier that, in a certain sense. there is an answer to the question o  
f which version of number theory is "true". Here is the nub of the matte  
r: mathematical logicians must choose which version of number theory to p  
ut their faith in. In particular, they cannot remain neutral on the question o  
f the existence or nonexistence of supernatural numbers, for the two different theories may give different answers to questions in m  
etamathematic
 
501
 
For instance, take this question: "Is ~G finitely derivable in TNT?" N  
o one actually knows the answer. Nevertheless, most mathematical l  
ogicians would answer no without hesitation. The intuition which motivates t  
hat answer is based on the fact that if --G were a theorem, TNT would b  
e w-inconsistent, and this would force supernaturals down your throat if y  
ouwanted to interpret TNT meaningfully-a most unpalatable thought for  
most people. After all, we didn't intend or expect supernaturals to be p  
artof TNT when we invented it. That is, we-or most of us-believe that it i  
spossible to make a formalization of number theory which does not forc  
eyou into believing that supernatural numbers are every bit as real a  
snaturals. It is that intuition about reality which determines which "fork" o  
fnumber theory mathematicians will put their faith in, when the chips a  
r
 
502
 
down. But this faith may be wrong. Perhaps every consistent form  
alization of number theory which humans invent will imply the existence of supernaturals, by being w-inconsistent. This is a queer thought, but it is conceiva
 
502
 
If this were the case-which I doubt, but there is no disproof available-then G would not have to be undecidable. In fact, there might be n  
o undecidable formulas of TNT at all. There could simply be one unbifurcated theory of numbers-which necessarily includes supernaturals. This i  
s not the kind of thing mathematical logicians expect, but it is s  
omething which ought not to be rejected outright. Generally, mathematical logicians  
believe that TNT -and systems similar to it-are w-consistent, and that t  
he Godel string which can be constructed in any such system is u  
ndecidable within that system. That means that they can choose to add either it or i  
ts negation as an a
 
502
 
Hilbert's Tenth Problem and the T  
ortois
 
502
 
I would like to conclude this Chapter by mentioning one extension o  
f Godel's Theorem. (This material is more fully covered in the article "Hilbert's Tenth Problem" by Davis and Hersh, for which see the Bibliography.) For this, I must define what a Diophantine equation is. This is a  
n equation in which a polynomial with fixed integral coefficients and exponents is set to 0. For ins  
tance, a = O  
a  
nd 5x + 13y - 1 = 0   
a  
nd 5p5 + 17q17 - 177 = 0   
a  
nd a  
123,666.111,666 + b 123,666,1 11,666 _ c 12 3,666.111,666 = o  
are Diophantine equations. It is in general a difficult matter to k  
now whether a given Diophantine equation has any integer solutions or not. I  
n fact, in a famous lecture at the beginning of the century, Hilbert a  
sked mathematicians to look for a general algorithm by which one could determine in a finite number of steps if a given Diophantine equation has i  
nteger solutions or not. Little did he suspect that no such algorithm exists!
 
503
 
Now for the simplification of G. It has been shown that whenever y  
ou have a sufficiently powerful formal number theory, and a Godelnumbering for it, there is a Diophantine equation which is equivalent to G  
. The equivalence lies in the fact that this equation, when interpreted on a  
metamathematical level, asserts of itself that it has no solutions. Turn i  
t around: if you found a solution to it, you could construct from it t  
he Godel number of a proof in the system that the equation has no s  
olutions! This is what the Tortoise did in the Prelude, using Fermat's equation as h  
is Diophantine equation. It is nice to know that when you do this, you c  
an retrieve the sound of Old Bach from the molecules in the a ir
 
508
 
C HA PTER X  
V Jumping out of the System
 
508
 
A More Powerful Formal S  
yste
 
508
 
ONE or THE things which a thoughtful critic of Godel's proof might d  
o would be to examine its generality. Such a critic might, for e  
xample, suspect that Godel has just cleverly taken advantage of a hidden defect i  
n one particular formal system, TNT. If this were the case, then perhaps a  
formal system superior to TNT could be developed which would not b  
e subject to the Godelian trick, and Godel's Theorem would lose much of i  
ts sting. In this Chapter we will carefully scrutinize the properties of T  
NT which made it vulnerable to the arguments of last C  
hapte
 
508
 
A natural thought is this: If the basic trouble with TNT is that i  
t contains a "hole"-in other words, a sentence which is undecidable, n  
amely G-then why not simply plug up the hole? Why not just tack G onto TNT  
as a sixth axiom? Of course, by comparison to the other axioms, G is a  
ridiculously huge giant, and the resulting system-TNT +G-would have a  
rather comical aspect due to the disproportionateness of its axioms. Be t  
hat as it may, adding G is a reasonable suggestion. Let us consider it d  
one. Now, it is to be hoped, the new system, TNT+G, is a superior for  
mal system-one which is not only supernatural-free, but also complete. It i  
s certain that TNT +G is superior to TNT in at least one respect: the string G  
is no longer undecidable in this new system, since it is a t  
heore
 
508
 
What was the vulnerability of TNT due to? The essence of its vulnerability was that it was capable of expressing statements about itself-  
in particular, the s  
tatement "I Cannot Be Proven in Formal System T  
NT" or, expanded a b  
it, "There does not exist a natural number which forms a  
TNT-proof-pair with the Godel number of this s
 
508
 
Is there any reason to expect or hope that TNT +G would be invulnerable to Godel's proof? Not really. Our new system is just as expressive a  
s TNT. Since Godel's proof relies primarily on the expressive power of a  
formal system, we should not be surprised to see our new system suc
 
509
 
too. The trick will be to find a string which expresses the s  
tatement "I Cannot Be Proven in Formal System TNT +  
G." Actually, it is not much of a trick, once you have seen it done for TNT. A  
ll the same principles are employed; only the context shifts slightly. (Figuratively speaking, we take a tune we know and simply sing it again, only in a  
higher key.) As before, the string which we are looking for-let us call i  
t "G' "-is constructed by the intermediary of an "uncle". But instead o  
f being based on the formula which represents TNT-proof-pairs, it is b  
ased on the similar but slightly more complicated notion of TNT +G-proofpairs. This notion of TNT +G-proof-pairs is only a slight extension of t  
he original notion of TNT-p  
roo
 
509
 
A similar extension could be envisaged for the MIU-system. We have   
seen the unadulterated form of MIU-proof-pairs. Were we now to add M  
U as a second axiom, we would be dealing with a new system-the MIU +  
MU system. A derivation in this extended system is presente  
d: M  
U M  
UU a  
xiom rule 2  
There is a MIU +MU-proof-pair which corresponds-namely, m = 3  
0300, n = 300. Of course, this pair of numbers does not form a MIU-proofpair-only a MIU +MU-proof-pair. The addition of an extra axiom d  
oes not substantially complicate the arithmetical properties of proof-pairs. T  
he significant fact about them-that being a proof-pair is p  
rimitive recursive-is prese  
rve
 
509
 
The Godel Method R  
eapplie
 
509
 
Now, returning to TNT+G, we will find a similar situation. TNT+Gproof-pairs, like their predecessors. are primitive recursive, so they a  
re represented inside TNT +G by a formula which we abbreviate in an obvious m  
anner: (TNT +G)-PROOF-PAlR{a,  
a'} Now we just do everything all over again. We make the counterpart of G b  
y beginning with an "uncle", just as before:  
~3a:3a':<(TNT +G)-PROOF-PAlR{a,a'}AARITHMOQUlNE{a'',  
a'}> Let us say its Godel-number is u'. Now we arithmoquine this very u  
ncle. That will give us G  
': 4  
66 ~3a:3a':<(TNT +G)-PROOF-PAlR{a,  
a'} AARITHMOQUlNE  
{SSS .... SSS0/a",a  
'}> ïżœ u' S
 
510
 
Its interpretation i  
s "There is no number a that forms a TNT +G-proof-p  
air with the arithmoquinification of u '.  
" More concisely  
, "I Cannot Be Proven in Formal System T  
NT+G." M  
ultifurcation Well (yawn), the details are quite boring from here on out. G' is precisely t  
o TNT +G as G was to TNT itself. One finds that either G' or ~G' can b  
e added to TNT +G, to yield a further splitting of number theory. And, l  
est you think this only happens to the "good guys", this very same d  
astardly trick can be played upon TNT +~G-that is, upon the nonstandard extension of TNT gotten by adding G's negation. So now we see (Fig. 75) t  
hat there are all sorts of bifurcations in number theor  
 
510
 
FIGURE 75. "Multifurcation" of TNT. Each extension of TNT has its very own G  
odel sentence; that sentence, or its negation, can be added on, so that from each extension t  
here sprouts a pair of further extensions, a process which goes on ad inf
 
510
 
Of course, this is just the beginning. Let us imagine moving down t  
he leftmost branch of this downwards-pointing tree, where we always toss i  
n the Godel sentences (rather than their negations). This is the best we can d  
o by way of eliminating supernaturals. After adding G, we add G'. Then w  
e add G", and G'", and so on. Each time we make a new extension of T  
NT, its vulnerability to the Tortoise's method-pardon me, I mean G  
odel's method-allows a new string to be devised, having the int  
erpretation "I Cannot Be Proven in Formal System X
 
511
 
Naturally, after a while, the whole process begins to seem u  
tterly predictable and routine. Why, all the "holes" are made by one single   
technique! This means that, viewed as typographical objects, they are a  
ll cast from one single mold, which in turn means that one single a  
xiom schema suffices to represent all of them! So if this is so, why not plug up a  
ll the holes at once and be done with this nasty business of inc  
ompleteness once and for all? This would be accomplished by adding an axiom schema t  
o TNT, instead of just one axiom at a time. Specifically, this axiom s  
chema would be the mold in which all of G, G', G", G' ", etc., are cast. By a  
dding this axiom schema (let's call it " G,."), we would be outsmarting the "Godelization" method. Indeed, it seems quite clear that adding Gw to TNT w  
ould be the last step necessary for the complete axiomatization of all of numbertheoretical t
 
511
 
It was at about this point in the Contracrostipunctus that the Tor  
toise related the Crab's invention of "Record Player Omega". However, r  
eaders were left dangling as to the fate of that device, since before completing h  
is tale, the tuckered-out Tortoise decided that he had best go home to s  
leep (but not before tossing off a sly reference to Godel's Incompleteness Theorem). Now, at last, we can get around to clearing up that dangling detail .  
.. Perhaps you already have an inkling, after reading the Birthday Cantatatata
 
511
 
Essential Incompletenes  
 
511
 
As you probably suspected, even this fantastic advance over TNT suffers the same fate. And what makes it quite weird is that it is still for, in essence, the same reason. The axiom schema is not powerful enough, and the Godel construction can again be effected. Let me spell this out a little. (One can do it much more rigorously tpan I shall here.) If there is a way of capturing t  
he various strings G, G', G", G'", ... in a single typographical mold, then there is a way of describing their Godel numbers in a single arithmetical m  
old. And this arithmetical portrayal of an infinite class of numbers can then b  
e represented inside TNT +Gw by some formula OMEGA-AXlOM{a} who  
se interpretation is: "a is the Godel number of one of the axioms coming f  
rom Gw". When a is replaced 1:,y any specific numeral, the formula which results  
will be a theorem of TNT +Gw if and only if the numeral stands for t  
he Godel number of an axiom coming from the s  
chem
 
511
 
With the aid of this new formula, it becomes possible to represent e  
ven such a complicated notion as TNT+Gw-proof-pairs inside TNT +  
Gw: (TNT +G w)-PROOF-PAIR{a,  
a'} Using this formula, we can construct a new uncle, which we proceed t  
o arithmoquine in the by now thoroughly familiar way, making yet a  
nother undecidable string, which will be called "TNT +Gw+1". At this point, y  
ou might well wonder, "Why isn't Gw+1 among the axioms created by the a  
xiom schema Gw?" The answer is that Gw was not clever enough to foresee its o  
wn embeddability inside number t  
heo1·y
 
512
 
In the Contracrostipunctus, one of the essential steps in the Tor  
toise's making an "unplayable record" was to get a hold of a m  
anufacturer's blueprint of the record player which he was out to destroy. This w  
as necessary so that he could figure out to what kinds of vibrations it w  
as vulnerable, and then incorporate into his record such grooves as w  
ould code for sounds which would induce those vibrations. It is a close ana  
logue to the Godel trick, in which the system's own properties are reflected i  
nside the notion of proof-pairs, and then used against it. Any system, no m  
atter how complex or tricky it is, can be Godel-numbered, and then the notion o  
f its proof-pairs can be defined-and this is the petard by which it is h  
oist. Once a system is well-defined, or "boxed", it becomes vulne  
rabl
 
512
 
This principle is excellently illustrated by the Cantor diagonal trick, which finds an omitted real number for each well-defined list of reals between O and 1. It is the act of giving an explicit list-a "box" of realswhich causes the downfall. Let us see how the Cantor trick can be r  
epeated over and over again. Consider what happens if, starting with some list L  
, you do the fo  
llowing: (la) Take list L, and construct its diagonal number d  
. (lb) Throw d somewhere into list L, making a new list L +  
d. (2a) Take list L +d, and construct its diagonal number d'.  
(2b) Throw d' somewhere into list L +d, making a new l  
ist L+d +d'.  
Now this step-by-step process may seem a doltish way to patch up L, for w  
e could have made the entire list d, d', d", d'", ... at once, given L o  
riginally. But if you think that making such a list will enable you to complete your l  
ist of reals, you are very wrong. The problem comes at the moment you a  
sk, "Where to incorporate the list of diagonal numbers inside L?" No m  
atter how diabolically clever a scheme you d evise for ensconcing the d-n  
umbers inside L, once you have done it, then the new list is still vulnerable. As w  
as said above, it is the act of g iving an explicit list-a "box" of r  
eals-that causes the d
 
512
 
Now in the case of formal systems, it is the act of giving an explicit recipe for what supposedly characterizes number-theoretical truth that causes the incompleteness. This is the crux of the problem with TNT + Gw. Once you insert all the G's in a well-defined way into TNT, there is seen to be some other G-some unforeseen G-which you didn't capture in your axiom schema. And in the case of the TC-battle inside the Contracrostipunctus, the instant a record player's "architecture" is determined, the record player becomes capable of being shaken to piece
 
512
 
So what is to be done? There is no end in sight. It appears that TNT, even when extended ad infinitum, cannot be made complete. TNT is therefore said to suffer from essential incompleteness because the incompleteness here is part and parcel of TNT; it is an essential part of the nature of TNT and cannot be eradicated in any way, whether simpleminded or ingenious. What's more, this problem will haunt any formal version of number theory, whether it is an extension of TNT, a modification of TNT, or an alternative to TNT. The fact of the matter is this: the possibility of constructing, in a given system, an undecidable string via Godel's self-reference method, depends on three basic conditions:
 
513
 
(1) That the system should be rich enough so that all desired statements about numbers, whether true or false, can be expressed in it. (Failure on this count means that the system is from the very start too weak to be counted as a rival to TNT, because it can't even express number-theoretical notions that TNT can. In the metaphor of the Contracrostipunctus, it is as if one did not have a phonograph but a refrigerator or some other kind of object.
 
513
 
( 2)That all general recursive relations should be represented by formulas in the system. (Failure on this count means the system fails to capture in a theorem some general recursive truth, which can only be considered a pathetic bellyflop if it is attempting to produce all of number theory's truths. In the Contracrostipunctus metaphor, this is like having a recordplayer, but one of low fid
 
513
 
( 3)That the axioms and typographical patterns defined by its rules be recognizable by some terminating decision procedure. (Failure on this count means that there is no method to distinguish valid derivations in the system from invalidones-thus that the "formal system" is not formal after all,and in fact is not even well-defined. In the Contracrostipunctus metaphor, it is a phonograph which is still on the drawing board, only partially designe
 
513
 
Satisfaction of these three conditions guarantees that any consistent system will be incomplete, because Godel's construction is applicabl
 
513
 
The fascinating thing is that any such system digs its own hole; the system's own richness brings about its own downfall. The downfall occurs essentially because the system is powerful enough to have self-referential sentences. In physics, the notion exists of a "critical mass" of a fissionable substance, such as uranium. A solid lump of the substance will just sit there, if its mass is less than critical. But beyond the critical mass, such a lump will undergo a chain reaction, and blow up. It seems that with formal systems there is an analogous critical point. Below that point, a system is "harmless" and does not even approach defining arithmetical truth formally; but beyond the critical point, the system suddenly attains the capacity for self-reference, and thereby dooms itself to incompleteness. The threshold seems to be roughly when a system attains the three properties listed above
 
514
 
Once this ability for self-reference is attained, the system has a hole which is tailor-made for itself; the hole takes the features of the system into account and uses them against the syste
 
514
 
The Passion According to L  
uca
 
514
 
The baffling repeatability of the Godel argument has been used by various people-notably J. R. Lucas-as ammunition in the battle to show that there is some elusive and ineffable quality to human intelligence, which makes it unattainable by "mechanical automata"-that is, computers. Lucas begins his article "Minds, Machines, and Godel" with these words: Godel's theorem seems to me to prove that Mechanism is false, that is, that minds cannot be explained as machines.1 Then he proceeds to give an argument which, paraphrased, runs like this. For a computer to be considered as intelligent as a person is, it must be able to do every intellectual task which a person can do. Now Lucas claims that no computer can do "Godelization" (one of his amusingly irreverent terms) in the manner that people can. Why not? Well, think of any particular formal system, such as TNT, or TNT+ G, or even TNT+ Gw. One can write a computer program rather easily which will systematically generate theorems of that system, and in such a manner that eventually, any preselected theorem will be printed out. That is, the theorem-generating program won't skip any portion of the "space" of all theorems. Such a program would be composed of two major parts: (1) a subroutine which stamps out axioms, given the "molds" of the axiom schemas (if there are any), and (2) a  
subroutine which takes known theorems (including axioms, of course) and   
applies rules of inference to produce new theorems. The program would alternate
 
514
 
We can anthropomorphically say that this program "knows" some f  
acts of number theory-name ly, it knows those facts which it prints out. If i  
t fails to print out some true fact of number theory, then of course it doesn'  
t "know" that fact. Therefore, a computer program will be inferior to h  
uman beings if it can be shown that humans know something which the program   
cannot know. Now here is where Lucas starts rolling. He says that w  
e humans can always do the Godel trick on any formal system as powerful a  
s TNT-and hence no matter what the formal system, we know more than i  
t does. Now this may only sound like an argument about formal systems, b  
ut it can also be slightly modified so that it becomes, seemingly, an invinc  
ible argument against the possibility of Artificial Intelligence ever r  
eproducing the human level of intelligence. Here is the gist of it:   
Rigid internal codes entirely rule computers and robots; ergo .  
.. Computers are isomorphic to formal systems. Now .  
.. Any computer which wants to be as smart as we are has got to b  
e able to do number theory as well as we can, so .
 
515
 
Among other things, it has to be able to do primitive r  
ecursive arithmetic. But for this very reason .  
.. It is vulnerable to the Godelian "hook", which implies that .  
.. We, with our human intelligence, can concoct a certain statem  
ent of number theory which is true, but the computer is blind t  
o that statement's truth (i.e., will never print it out), p  
recisely because of Godel's boomeranging a  
rgument. This implies that there is one thing which computers just c  
annot be programmed to do, but which we can do. So we a  
re s  
marter. Let us enjoy, with Lucas, a transient moment of anthropocentric g  
lory: However complicated a machine we construct, it will, if it is a mach  
ine, correspond to a formal system, which in turn will be liable to the G  
odel procedure for finding a formula unprovable-in-that-system. This formula t  
he machine will be unable to produce as being true, although a mind can see it i  
s true. And so the machine will still not be an adequate model of the mind. W  
e are trying to produce a model of the mind which is mechanical-which i  
s essentially "dead"-but the mind, being in fact "alive," can always go o  
ne better than any formal, ossified, dead system can. Thanks to Godel's t  
heorem, the mind always has the last word.2
 
515
 
On first sight, and perhaps even on careful analysis, Lucas' a  
rgument appears compelling. It usually evokes rather polarized reactions. S  
ome seize onto it as a nearly religious proof of the existence of souls, w  
hile others laugh it off as being unworthy of comment. I feel it is wrong, b  
ut fascinatingly so-and therefore quite worthwhile taking the time to r  
ebut. In fact, it was one of the major early forces driving me to think over t  
he matters in this book. I shall try to rebut it in one way in this Chapter, and i  
n other ways in Chapter XVI
 
515
 
We must try to understand more deeply why Lucas says the c  
omputer cannot be programmed to "know" as much as we do. Basically the idea i  
s that we are always outside the system, and from out there we can a  
lways perform the "Godelizing" operation, which yields something which t  
he program, from within, can't see is true. But why can't the "  
Godelizing operator", as Lucas calls it, be programmed and added to the program as a   
third major component? Lucas e  
xplains: The procedure whereby the Godelian formula is constructed is a s  
tandard procedure-only so could we be sure that a Godelian formula can be constructed for every formal system. But if it is a standard procedure, then a  
machine should be able to be programmed to carry it out too .... This w  
ould correspond to having a system with an additional rule of inference w  
hich allowed one to add, as a theorem, the Godelian formula of the rest of t  
he formal system, and then the Godelian formula of this new, s  
trengthened, formal system, and so on. It would be tantamount to adding to the o  
riginal formal system an infinite sequence of axioms, each the Godelian formula o  
f the system hitherto obtained . ... We might expect a mind, faced with a   
machine that possessed a Godelizing operator, to take this into account, a  
n
 
516
 
out-Godel the new machine, Godelizing operator and all. This has, in fac  
t, proved to be the case. Even if we adjoin to a formal system the infinite set o  
f axioms consisting of the successive Godelian formulae, the resulting system is   
still incomplete, and contains a formula which cannot be proved-iri-thesystem, although a rational being can, standing outside the system, see that i  
t is true, We had expected this, for even if an infinite set of axioms were a  
dded, they would have to be specified by some finite rule or specification, and t  
his further rule or specification could then be taken into account by a m  
ind considering the enlarged formal system. In a sense .just because the mind h  
as the last word, it can always pick a hole in any formal system presented to it as a  
model of its own workings. The mechanical model must be, in some s  
ense, finite and definite: and then the mind can always go one better.3
 
516
 
Jumping Up a Dime  
nsio
 
516
 
A visual image provided by M, C. Escher is extremely useful in aiding t  
he intuition here: his drawing Dragon (Fig, 76). Its most salient feature i s, o  
f course, its subject matter-a dragon biting its tail, with all the G  
odelian connotations which that carries. But there is a deeper theme to this p  
icture, Escher himself wrote the following most interesting comments. The fi  
rst comment is about a set of his drawings all of which are concerned with "  
the conflict between the flat and the spatial"; the second comment is a  
bout Dragon in p  
articula
 
516
 
I. Our three-dimensional space is the only true reality we kno w. The twodimensional is every bit as fictitious as the four-dimensional, for nothing i  
s flat, not even the most finely polished mirror. And yet we stick to the convention that a wall or a piece of paper is flat, and curiously enough, we still go o  
n, as we have done since time immemorial, producing illusions of space on j  
ust such plane surfaces as these. Surely it is a bit absurd to draw a few lines a  
nd then claim: "This is a house". This odd situation is the theme of the next fi  
ve pictures [including Dragon ).4
 
516
 
II. However much this dragon tries to be spatial, he remains completely flat.  
Two incisions are made in the paper on which he is printed. Then it is fo  
lded in such a way as to leave two square openings. But this dragon is an o  
bstinate beast, and in spite of his two dimensions he persists in assuming that he h  
as three; so he sticks his head through one of the holes and his tail through t  
he other.5
 
516
 
This second remark especially is a very telling remark. The message is t  
hat no matter how cleverly you try to simulate three dimensions in two, you a  
re always missing some "essence of three-dimensionality". The dragon t  
ries very hard to fight his two-dimensionality. He defies the two-dime  
nsionality of the paper on which he thinks he is drawn, by sticking his head t  
hrough it; and yet all the while, we outside the drawing can see the pathetic f  
utility of it all, for the dragon and the holes and the folds are all merely twodimensional simulations of those concepts, and not a one of them is r  
eal. But the dragon cannot step out of his two-dimensional space, and c  
anno
 
517
 
steps further. For instance, we could tear it out of the book, fold it, cut   
holes in it, pass it through itself, and photograph the whole mess, so that i  
t again becomes two-dimensional. And to that photograph, we could o  
nce again do the same trick. Each time, at the instant that it becomes twodimensional-no matter how cleverly we seem to have simulated t  
hree dimensions inside two-it becomes vulnerable to being cut and f  
olded a
 
518
 
Now with this wonderful Escherian metaphor, let us return to t  
he program versus the human. We were talking about trying to e  
ncapsulate the "G6delizing operator" inside the program itself. Well, even if we h  
ad written a program which carried the operation out, that program w  
ould not capture the essence of Godel's method. For once again, we, outside t  
he system, could still "zap" it in a way which it couldn't do. But then are w  
e arguing with, or against, Lucas?
 
518
 
The Limits of Intelligent S  
ystem
 
518
 
Against. For the very fact that we cannot write a program to do "Godelizing" must make us somewhat suspicious that we ourselves could do it i  
n every case. It is one thing to make the argument in the abstract t  
hat Godelizing "can be done"; it is another thing to know how to do it in every   
particular case. In fact, as the formal systems (or programs) escalate i  
n complexity, our own ability to "Godelize" will eventually begin to waver. I  
t must, since, as we have said above, we do not have any algorithmic way o  
f describing how to perform it. If we can't tell explicitly what is involved i  
n applying the Godel method in all cases, then for each of us there w  
ill eventually come some case so complicated that we simply can't figure o  
ut how to apply i
 
518
 
Of course, this borderline of one's abilities will be somewhat illdefined, just as is the borderline of weights which one can pick up off t  
he ground. While on some days you may not be able to pick up a 2  
50-pound object, on other days maybe you can. Nevertheless, there are no d  
ays whatsoever on which you can pick up a 250-ton object. And in this s  
ense, though everyone's Godelization threshold is vague, for each person, t  
here are systems which lie far beyond his ability to G
 
518
 
This notion is illustrated in the Birthday Cantatatata. At first, it s  
eems obvious that the Tortoise can proceed as far as he wishes in p  
estering Achilles. But then Achilles tries to sum up all the answers in a single s  
woop. This is a move of a different character than any that has gone before, and i  
s given the new name 'w'  
. The newness of the name is quite important. It i  
s the first example where the old naming scheme-which only inc  
luded names for all the natural numbers-had to be transcended. Then c  
ome some more extensions, some of whose names seem quite obvious, others o  
f which are rather tricky. But eventually, we run out of names once again-  
at the point where the answe  
r-schemas W wW w, w '  
w ' .  
.  
.are all subsumed into one outrageously complex answer schema. T  
he altogether new name 'e0' is supplied for this one. And the reason a  
new name is needed is that some fundamentally new kind of step h  
as been taken-a sort of irregularity has been encountered. Thus a new n  
ame must be supplied ad h
 
519
 
There Is No Recursive Rule for Naming Ordina  
l
 
519
 
Now offband you might think that these irregularities in the p  
rogression from ordinal to ordinal (as these names of infinity are called) could b  
e handled by a computer program. That is, there would be a program t  
o produce new names in a regular way, and when it ran out of gas, it w  
ould invoke the "irregularity handler", which would supply a new name, a  
nd pass control back to the simple one. But this will not work. It turns out t  
hat the irregularities themselves happen in irregular ways, and one would n  
eed also a second-order program-that is, a program which makes new programs which make new names. And even this is not enough. Eventually, a  
third-order program becomes necessary. And so on, and so o  
 
519
 
All of this perhaps ridiculous-seeming complexity stems from a d  
eep theorem, due to Alonzo Church and Stephen C. Kleene, about the structure of these "infinite ordinals", which says:  
There is no recursively related n  
otation-system which gives a name to every constructive o  
rdinal. What "recursively related notation-systems" are, and what "  
constructive ordinals" are, we must leave to the more technical sources, such as Hart  
ley Rogers' book, to explain. But the intuitive idea has been presented. As t  
he ordinals get bigger and bigger, there are irregularities, and irregularities i  
n the irregularities, and irregularities in the irregularities in the irregularities, etc. No single scheme, no matter how complex, can name a  
ll the ordinals. And from this, it follows that no algorithmic method can t  
ell how to apply the method of Godel to all possible kinds of formal s  
ystems. And unless one is rather mystically inclined, therefore one must c  
onclude that any human being simply will reach the limits of his own ability t  
o Godelize at some point. From there on out, formal systems of that complexity, though admittedly incomplete for the Godel reason, will have as m  
uch power as that human b
 
519
 
Other Refutations of L  
uca
 
519
 
Now this is only one way to argue against Lucas' position. There are o  
thers, possibly more powerful, which we shall present later. But this counterargument has special interest because it brings up the fascinating c  
oncept of trying to create a computer program which can get outside of itself, s  
ee itself completely from the outside, and apply the Godel zapping-trick t  
o itself. Of course this is just as impossible as for a record player to be able t  
o play records which would cause it to b  
rea
 
519
 
But-one should not consider TNT defective for that reason. If t  
here is a defect anywhere, it is not in TNT, but in our expectations of what i  
t should be able to do. Furthermore, it is helpful to realize that we are e  
quallyvulnerable to the word trick which Godel transplanted into m  
athematical formalisms: the Epimenides paradox. This was quite cleverly pointed o  
u
 
520
 
by C .H. Whitely, when he proposed the sentence "Lucas cannot consistently assert this sentence." If you think about it, you will see that (1) it i  
s true, and yet (2) Lucas cannot consistently assert it. So Lucas is also "incomplete" with respect to truths about the world. The way in which he mirrors  
the world in his brain structures prevents him from simultaneously b  
eing "consistent" and asserting that true sentence. But Lucas is no more vulnerable than any of us. He is just on a par with a sophisticated formal sys  
te
 
520
 
An amusing way to see the incorrectness of Lucas' argument is t  
o translate it into a battle between men and women ... In his w  
anderings, Loocus the Thinker one day comes across an unknown object-a w  
oman. Such a thing he has never seen before, and at first he is wondrous thrilled a  
t her likeness to himself; but then, slightly scared of her as well, he cries to a  
ll the men about him, "Behold! I can look upon her face, which is s  
omething she cannot do-therefore women can never be like me!" And thus h  
eproves man's superiority over women, much to his relief, and that of h  
is male companions. Incidentally, the same argument proves that Loocus i  
s superior to all other males, as well-but he doesn't point that out to t  
hem. The woman argues back: "Yes, you can see my face, which is something I  
can't do-but I can see your face, which is something you can't do! We'  
re even." However, Loocus comes up with an unexpected counter: "I'm s  
orry, you're deluded if you think you can see my face. What you women do is n  
otthe same as what we men do-it is, as I have already pointed out, of a  
n inferior caliber, and does not deserve to be called by the same name. You  
may call it 'womanseeing'. Now the fact that you can 'womansee' my face i  
s of no import, because the situation is not symmetric. You see?" "I womansee," womanreplies the woman, and womanwalks away . ..
 
520
 
Well, this is the kind of "heads-in-the-sand" argument which you have   
to be willing to stomach if you are bent on seeing men and women r  
unning ahead of computers in these intellectual b  
attle
 
520
 
Self-Transcendence-A Modern M  
yt
 
520
 
It is still of great interest to ponder whether we humans ever can jump o  
ut of ourselves--or whether computer programs can jump out of thems  
elves. Certainly it is possible for a program to modify itself-but such modifiability has to be inherent in the program to start with, so that cannot b  
e counted as an example of "jumping out of the system". No matter how a  
program twists and turns to get out of itself, it is still following the r  
ules inherent in itself. It is no more possible for it to escape than it is for a  
human being to decide voluntarily not to obey the laws of physics. Physics i  
s an overriding system, from which there can be no escape. However, there i  
s a lesser ambition which it is possible to achieve: that is, one can certain  
ly jump from a subsystem of one's brain into a wider subsystem. One can s  
tep out of ruts on occasion. This is still due to the interaction of v  
arious subsystems of one's brain, but it can feel very much like stepping e  
ntirely out of oneself. Similarly, it is entirely conceivable that a partial ability t  
o "step outside of itse lf" could be embodied in a computer program.
 
521
 
However, it is important to see the distinction between perceiving oneself, and transcending oneself. You can gain visions of yourself in all sorts o  
f ways-in a mirror, in photos or movies, on tape, through the d  
escriptions of others, by getting psychoanalyzed, and so on. But you cannot quite b  
reak out of your own skin and be on the outside of yourself (modern occul  
t movements, pop psychology fads, etc. notwithstanding). TNT can t  
alk about itself, but it cannot jump out of itself. A computer program c  
an modify itself but it cannot violate its own instructions-it can at best c  
hange some parts of itself by o beying its own instructions. This is reminiscent of t  
he humorous paradoxical question, "Can God make a stone so heavy that h  
e can't lift i
 
521
 
Advertisement and Framing D  
evice
 
521
 
This drive to jump out of the system is a pervasive one, and lies behind a  
ll progress in art, music, and other human endeavors. It also lies behind suc  
h trivial undertakings as the making of radio and television commercial  
s. This insidious trend has been beautifully perceived and described b  
y Erving Goffman in his book Frame Analy  
sis: For example, an obviously professional actor completes a commercial p  
itch and, with the camera still on him, turns in obvious relief from his task, now t  
o take real pleasure in consuming the product he had been adver  
tising. This is, of course, but one example of the way in which TV and r  
adio commercials are coming to exploit framing devices to give an appearance o  
f naturalness that (it is hoped) will override the reserve auditors have developed. Thus, use is currently being made of children's voices, p  
resumably because these seem unschooled; street noises, and other effects to give t  
he impression of interviews with unpaid respondents; false starts, filled p  
auses, byplays, and overlapping speech to ïżœimulate actual conversation; and, follow  
ing Welles, the interception of a firm's jingle commercials to give news of i  
ts new product, alternating occasionally with interception by a public i  
nterest spot, this presumably keeping the faith of the auditor alive.   
The more that auditors withdraw to minor expressive details as a test o  
f genuineness, the more that advertisers chase after them. What results is a s  
ort of interaction pollution, a disorder t.hat is also spread by the public r  
elations consultants of political figures, and, more modestly, by micro-sociology.6  
Here we have yet another example of an escalating "TC-battle"-the antagonists this time being Truth and Commerci
 
521
 
naturalness
 
521
 
Simplicio, Salviati, Sagredo: Why Three?
 
521
 
There is a fascinating connection between the problem of jumping out o  
f the system and the quest for complete objectivity. When I read Jauch's f  
our Dialogues in Are Quanta Real? based on Galileo's four Dialogues Conce  
rning Two New Sciences, I found myself wondering why there were three c  
haracters participating: Simplicio, Salviati, and Sagredo. Why wouldn't two h  
av
 
522
 
sufficed: Simplicio, the educated simpleton, and Salviati, the knowledgeable thinker? What function does Sagredo have? Well, he is supposed to be a   
sort of neutral third p arty, dispassionately weighing the two sides a  
nd coming out with a "fair" and "impartial" judgment. It sounds very balanced, and yet there is a problem: Sagredo is always agreeing with S  
alviati, not with Simplicio. How come Objectivity Personified is playing favorites?  
One answer, of course, is that Salviati is enunciating correct views, s  
o Sagredo has no choice. But what, then, of fairness or "equal t
 
522
 
By adding Sagredo, Galileo (and Jauch) stacked the deck more a  
gainst Simplicio, rather than less. Perhaps there should be added a yet higherlevel Sagredo-someone who will be objective about this whole situation .  
.. You can see where it is going. We are getting into a never-ending series o  
f "escalations in objectivity", which have the curious property of never getting any more objective than at the first level: where Salviati is simply right  
, and Simplicio wrong. So the puzzle remains: why add Sagredo at all? A  
nd the answer is, it gives the illusion of stepping out of the system, in s  
ome intuitively appealing s
 
522
 
Zen and "Stepping Out"
 
522
 
In Zen, too, we can see this preoccupation with the concept of transcen  
ding the system. For instance, the koan in which Tozan tells his monks that "  
the higher Buddhism is not Buddha". Perhaps, self-transcendence is even the   
central theme of Zen. A Zen person is always trying to understand m  
ore deeply what he is, by stepping more and more out of what he sees h  
imself to be, by breaking every rule and convention which he perceives himself to   
be chained by-needless to say, including those of Zen itself. S  
omewhere along this elusive path may come enlightenment. In any case (as I see it)  
, the hope is that by gradually deepening one's self-awareness, by g  
radually widening the scope of "the system", one will in the end come to a feeling o  
f being at one with the entire universe.
 
538
 
C HAPTER X  
VISelf-Ref and Self-Re  
 
538
 
IN THIS CHAPTER, we will look at some of the mechanisms which c  
reate self-reference in various contexts, and compare them to the mechan  
isms which allow some kinds of systems to reproduce themselves. Some remarkable and beautiful parallels between these mechanisms will come to l
 
538
 
Implicitly and Explicitly Self-Referential Sen  
tence
 
538
 
To begin with, let us look at sentences which, at first glance, may seem t  
o provide the simplest examples of self-reference. Some such sentences a  
re t  
hese: (1) This sentence contains five w  
ords. ( 2)This sentence is meaningless because it is self-referenti  
al.( 3)This sentence no v  
erb.( 4)This sentence is false. (Epimenides p  
aradox)( 5)The sentence I am now writing is the sentence you are n  
owrea  
ding.All but the last one (which is an anomaly) involve the simple-seemin  
g mechanism contained in the phrase "this sentence". But that mechanism i  
s in reality far from simple. All of these sentences are "floating" in t  
he context of the English language. They can be compared to icebergs, w  
hose tips only are visible. The word sequences are the tips of the icebergs, a  
nd the processing which must be done to understand them is the hidden p  
art. In this sense their meaning is implicit, not explicit. Of course, no senten  
ce's meaning is completely explicit, but the more explicit the self-reference i  
s, the more exposed will be the mechanisms underlying it. In this case, for t  
he self-reference of the sentences above to be recognized, not only has one t  
o be comfortable with a language such as English which can deal with linguistic subject matter, but also one has to be able to figure out the referent o  
f the phrase "this sentence". It seems simple, but it depends on our v  
ery complex yet totally assimilated ability to handle English. What is especia lly  
important here is the ability to figure out the referent of a noun p  
hrase with a demonstrative adjective in it. This ability is built up slowly, a  
nd should by no means be considered trivial. The difficulty is perhaps underlined when a sentence such as number 4 is presented to someone na'i  
ve about paradoxes and linguistic tricks, such as a child. They may say, "Wh  
at sentence is false?" and it may take a bit of persistence to get across the i  
dea that the sentence is talking about itself. The whole idea is a little mi
 
539
 
boggling at first. A couple of pictures may help (Figs. 8 3, 84). Figure 83 is a   
picture which can be interpreted on two levels. On one level, it is a senten  
ce pointing at itself; on the other level, it is a picture of Epimenides e  
xecuting his own death s  
entenc
 
539
 
Figure 84, showing visible and invisible portions of the i  
ceberg, suggests the relative proportion of sentence to processing required for t  
he recognition of self-reference:
 
539
 
FIGURE 84.
 
539
 
It is amusing to try to create a self-referring sentence without using t  
he trick of saying "this sentence". One could try to quote a sentence insi  
de itself. Here is an a  
ttempt: The sentence "The sentence contains five words" contains five w  
ords. But such an attempt must fail, for any sentence that could be q  
uoted entirely inside itself would have to be shorter than itself. This is a  
ctually possible, but only if you are willing to entertain infinitely long s  
entences, such a
 
540
 
The sentenc  
e "The s  
entence "The s  
entence "'The s  
entence is infinitely l  
ong"' is infinitely l  
ong·· is infinitely long"   
is infinitely long.
 
540
 
But this cannot work for finite sentences. For the same reason, G  
odel's string G could not contain the explicit numeral for its Godel number: i  
t would not fit. No string of TNT can contain the TNT-numeral for its own   
Godel number, for that numeral always contains more symbols than t  
he string itself does. But you can get around this by having G contain a  
description of its own Godel number, by means of the notions of "sub" a  
nd "
 
540
 
One way of achieving self-reference in an English sentence by m  
eans of description instead of by self-quoting or using the phrase "this s  
entence" is the Quine method, illustrated in the dialogue Air on G's String. T  
he understanding of the Quine sentence requires less subtle mental proce  
ssing than the four examples cited earlier. Although it may appear at first to b  
e trickier, it is in some ways more explicit. The Quine construction is q  
uite like the Godel construction, in the way that it creates self-reference b  
y describing another typographical entity which, as it turns out, is isom  
orphic to the Quine sentence itself. The description of the new typ  
ographical entity is carried out by two parts of the Quine sentence. One part is a set o  
f instructions telling how to build a certain phrase, while the other p  
art contains the construction materials to be used; that is, the other part is a   
template. This resembles a floating cake of soap more than it resembles a  
n iceberg (See Fig. 8
 
540
 
FIGURE 8
 
541
 
The self-reference of this sentence is achieved in a more direct way than i  
n the Epimenides paradox; less hidden processing is needed. By the way, it is   
interesting to point out that the phrase "this sentence" appears in t  
he previous sentence; yet it is not there to cause self-reference; you p  
robably understood that its referent was the Quine sentence, rather than t  
he sentence in which it occurs. This just goes to show how pointer p  
hrases such as "this sentence" are interpreted according to context, and helps t  
o show that the processing of such phrases is indeed quite i
 
541
 
A Self-Reproducing P  
rogra
 
541
 
The notion of quining, and its usage in creating self-reference, have already been explained inside the Dialogue itself, so we need not dwell o  
n such matters here. Let us instead show how a computer program can u  
se precisely the same technique to reproduce itself. The following selfreproducing program is written in a BlooP-like language and is based o  
n following a phrase by its own quotation (the opposite order from quining, s  
o I reverse the name "quine" to make "eniuq"):  
DEFINE PROCEDURE "ENIUQ" [TEMPLATE]: PRINT [TEMPLATE, LEFT-B  
RACKET, QUOTE-MARK, TEMPLATE, QUOTE-MARK, RIGHT-BRACKET, PERIOD  
]. E  
NIUQ ['DEFINE PROCEDURE "ENIUQ" [TEMPLATE] : PRINT [TEMPLATE, LEFT-BRA  
CKET, QUOTE-MARK, TEMPLATE, QUOTE-MARK, RIGHT-BRACKET, PERIO  
D]. ENIUQ']  
. ENlUQ is a procedure defined in the first two lines, and its input is c  
alled "TEMPLAT E". It is understood that when the procedure is c  
alled, TEMPLATE's value will be some string of typographical characters. T  
he effect of ENlUQ is to carry out a printing operation, in which T  
EMPLATE gets printed twice: the first time just plain; the second time wrapped i  
n (single) quotes and brackets, and garnished with a final period. Thus, i  
f TEMPLATE's value were the string DOUBLE-BUBBLE, then performin  
g ENlUQ on it would y
 
541
 
DOUBLE-BUBBLE ['DOUBLE-BUBBLE'].
 
541
 
Now in the last four lines of the program above, the procedure ENlUQ i  
s called with a specific value of TEMPLATE-namely the long string inside t  
he single quotes: DEFINE ... ENlUQ. That value has been carefully chosen; i  
t consists of the definition of ENlUQ, followed by the word ENlUQ. This makes   
the program itself-or, if you prefer, a perfect copy of it-get printed o  
ut. It is very similar to Quine's version of the Epimenides s  
entence: "yields falsehood when preceded by its q  
uotation" yields falsehood when preceded by its quot  
ation. It is very important to realize that the character string which appears i  
n quotes in the last three lines of the program above-that is, the value o  
 
542
 
TEMPLATE-is never interpreted as a sequence of instructions. That i  
t happens to be one is, in a s ense,just an accident. As was pointed out a  
bove, it could just as well have been DOUBLE-BUBBLE or any other string o  
f characters. The beauty of the scheme is that when the same string a  
ppears in the top two lines of this program, it is treated as a program (because it i  
s not in quotes). Thus in this program, one string functions in two ways: fi  
rst as program, and second as data. This is the secret of s  
elf-reproducing programs, and, as we shall see, of self-reproducing molecules. It is u  
seful, incidentally, to call any kind of self-reproducing object or entity a self-rep  
; and likewise to call any self-referring object or entity a self-ref. I will u  
se those terms occasionally from here on.
 
542
 
The preceding program is an elegant example of a s  
elf-reproducing program written in a language which was not designed to make the writin  
g of self-reps particularly easy. Thus, the task had to be carried out u  
sing those notions and operations which were assumed to be part of t  
he language-such as the word QUOTE-MARK, and the command PRINT. B  
ut suppose a language were designed expressly for making self-reps easy t  
o write. Then one could write much shorter self-reps. For example, s  
uppose that the operation of eniuq-ing were a built-in feature of the l  
anguage, needing no explicit definition (as we assumed PRINT was). Then a t  
eeny self-rep would be this:
 
542
 
ENIUQ ['ENIUQ']
 
542
 
It is very similar to the Tortoise's version of Quine's version of t  
he Epimenides self-ref, where the verb "to quine" is assumed to be k  
nown: "yields falsehood when quined" yields falsehood when q  
uined. But self-reps can be even shorter. For instance, in some c  
omputer language it might be a convention that any program whose first symbol i  
s an asterisk is to be copied before being executed normally. Then t  
he program consisting of merely one asterisk is a self-rep! You may complain  
that this is silly and depends on a totally arbitrary convention. In doing s  
o, you are echoing my earlier point that it is almost cheating to use the p  
hrase "this sentence" to achieve self-reference-it relies too much on the processor, and not enough on explicit directions for self-reference. Using a  
n asterisk as an example of a self-rep is like using the word "I" as an e  
xample of a self-ref: both conceal all the interesting aspects of their r  
espective p
 
542
 
This is reminiscent of another curious type of self-reproduction: v  
ia photocopy machine. It might be claimed that any written document is a  
self-rep because it can cause a copy of itself to be printed when it is p  
laced in a photocopy machine and the appropriate button is pushed. But somehow this violates our notion of self-reproduction; the piece of paper is n  
ot consulted at all, and is therefore not directing its own reproduction. Again  
, everything is in the processor. Before we call something a self-rep, we w  
ant to have the feeling that, to the maximum extent possible, it explicitly contains the directions for copying i  
tsel
 
543
 
To be sure, explicitness is a matter of degree; nonetheless there is a  
n intuitive borderline on one side of which we perceive true s  
elf-directed self-reproduction, and on the other side of which we merely see cop  
ying being carried out by an inflexible and autonomous copying mach  
in
 
543
 
What Is a Copy
 
543
 
Now in any discussion of self-refs and self-reps, one must sooner or l  
ater come to grips with the essential i ssue: what is a copy? We already dealt w  
ith that question quite seriously in Chapters V and VI; and now we come b  
ack to it. To give the flavor of the issue, let us describe some highly fanciful, y  
et plausible, examples of self-
 
543
 
A Self-Reproducing S  
on
 
543
 
Imagine that there is a nickelodeon in the local bar which, if you p  
ress buttons 11-U, will play a song whose lyrics go this w  
ay: Put another nickel in, in the n  
ickelodeon, All I want is 11-U, and music, music, m  
usic. We could make a little diagram of what happens one evening (Fig. 86).  
p  
erson S  
ong nickelode  
on FIGURE 86. A seljïżœreproducing s  
ong. Although the effect is that the song reproduces itself, it would feel s  
trange to call the song a self-rep, because of the fact that when it passes t  
hrough the 11-U stage, not all of the information is there. The information o  
nly gets put back by virtue of the fact that it is fully stored in the nickelodeonthat is, in one of the arrows in the diagram, not in one of the ovals. It i  
s questionable whether this song contains a complete description of how t  
o get itself played again, because the symbol pair " 11-U" is only a trigger, n  
ot a copy
 
543
 
A "Crab" P  
rogra
 
543
 
Consider next a computer program which prints itself out backw  
ards. (Some readers might enjoy thinking about how to write such a program i  
 
544
 
the BlooP-like language above, using the given self-rep as a model.) W  
ould this funny program count as a self-rep? Yes, in a way, because a t  
rivial transformation performed on its output will restore the original p  
rogram. It seems fair to say that the output contains the same information as t  
he program i tself.just recast in a simple way. Yet it is clear that someone m  
ight look at the output and not recognize it as a program printed backwards. T  
o recall terminology from Chapter VI, we could say that the "inner messages" of the output and the program itself are the same, but they h  
ave different "outer messages"-that is, they must be read by using differ  
ent decoding mechanisms. Now if one counts the outer message as part of t  
he information-which seems quite reasonable-then the total information i  
s not the same after all, so the program can't be counted as a s
 
544
 
However, this is a disquieting conclusion, because we are a  
ccustomed to considering something and its mirror image as containing the s  
ame information. But recall that in Chapter VI, we made the concept of "intrinsic meaning" dependent on a hypothesized universal notion of intelligenc  
e. The idea was that, in determining the intrinsic meaning of an object, w  
e could disregard some types of outer message-those which would be universally understood. That is, if the decoding mechanism seems f  
undamental enough, in some still ill-defined sense, then the inner message which it l  
ets be revealed is the only meaning that counts. In this example, it s  
eems reasonably safe to guess that a "standard intelligence" would consider t  
wo mirror images to contain the same information as each other; that is, i  
t would consider the isomorphism between the two to be so trivial as to b  
e ignorable. And thus our intuition that the program is in some sense a f  
air self-rep, is allowed to s
 
544
 
Epimenides Straddles the C  
hanne
 
544
 
prints itself out, but translated into a different computer language. O  
ne might liken this to the following curious version of the Quine version of t  
he Epimenides s  
elf-ref: "est une expression qui, quand elle est precedee de sa t  
raduction, mise entre guillemets, clans la langue provenant de l'autre cote d  
e la Manche, cree une faussete" is an expression which, when it i  
s preceded by its translation, placed in quotation marks, into t  
he language originating on the other side of the Channel, yields a  
fa  
lsehood. You might try to write down the sentence which is described by this w  
eird concoction. (Hint: It is not itself-or at least it is not if "it self" is taken in a  
naive sense.) If the notion of "self-rep by retrograde motion" (i.e., a program which writes itself out backwards) is reminiscent of a crab canon, t  
he notion of "self-rep by translation" is no less reminiscent of a canon w  
hich involves a transposition of the theme into another k  
e
 
545
 
A Program That Prints Out Its Own Godel Numb  
e
 
545
 
The idea of printing out a translation instead of an exact copy of t  
he original program may seem pointless. However, if you wanted to write a  
self-rep program in BlooP or FlooP, you would have to resort to some s  
uch device, for in those languages, OUTPUT is always a number, rather than a  
typographical string. Therefore, you would have to make the prograrr,  
print out its own Godel number: a very huge integer whose decimal   
expansion codes for the program, character by character, by using three_  
digit codons. The program is coming as close as it can to printing i  
tself, within the means available to it: it prints out a copy of itself in a  
nother "space", and it is easy to switch back and forth between the space of i  
ntegers and the space of strings. Thus, the value of OUTPUT is not a mere t  
rigger, like "1 1-U". Instead, all the information of the original program lies "  
close to the surface" of the o  
utpu
 
545
 
Godelian Self-Refe  
renc
 
545
 
This comes very close to describing the mechanism of Godel's self-ref G  
. After all, that string of TNT contains a description not of itself, but of an   
integer (the arithmoquinification of u). It just so happens that that i  
nteger is an exact "image" of the string G, m the space of natural numbers. T  
hus, G refers to a translation of itself into another space. We still feel comfortable in calling Ga self-referential string, because the isomorphism b  
etween the two spaces is so tight that we can consider them to be ide  
ntica
 
545
 
This isomorphism that mirrors TNT inside the abstract realm o  
f natural numbers can be likened to the quasi-isomorphism that mirrors t  
he real world inside our brains, by means of symbols. The symbols play   
quasi-isomorphic roles to the objects. and it is thanks to them that we c  
an think. Likewise, the Godel numbers play isomorphic roles to strings, and i  
t is thanks to them that we can find metamathematical meanings in statements about natural numbers. The amazing, nearly magical, thing about G  
is that it manages to achieve self-reference despite the fact that the language in which it is written, TNT, seems to offer no hope of referring to i  
ts own structures, unlike English, in which it is the easiest thing in the w  
orld to discuss the English l
 
545
 
So G is an outstanding example of a self-ref via translation-hardly t  
he most straightforward case. One might also think back to some of t  
he Dialogues, for some of them, too, are self-refs via translation. For i  
nstance, take the Sonata for Unaccompanied Achilles. In that Dialogue, several references are made to the Bach Sonatas for unaccompanied violin, and t  
he Tortoise's suggestion of imagining harpsichord accompaniments is particularly interesting. After all, if one applies this idea to the Dialogue itself, one   
invents lines which the Tortoise is saying; but if one assumes that A  
chilles' part stands alone (as does the violin), then it is quite wrong to attribute a  
ny lines at all to the Tortoise. In any case, here again is a self-ref by means of a  
mapping which maps Dialogues onto pieces by Bach. And this mapping i  
 
546
 
left, of course, for the reader to notice. Yet even if the reader does n  
ot notice it, the mapping is still there, and the Dialogue is still a self-r
 
546
 
A Self-Rep by Augmentatio  
 
546
 
We have been likening self-reps to canons. What, then, would be a fa  
ir analogue to a canon by augmentation? Here is a possibility: consider a  
program which contains a dummy loop whose only purpose is to slow u  
p the program. A parameter might tell how often to repeat the loop. A  
self-rep could be made which prints out a copy of itself, but with the   
parameter changed, so that when that copy is run, it will run at half the   
speed of its parent program; and its "daughter" will in turn run at h  
alf again the speed, and so on ... None of these programs prints itself o  
ut precisely; yet all clearly belong to a single "family".
 
546
 
This is reminiscent of the self-reproduction of living o  
rganisms. Clearly, an individual is never identical to either of its parents; w hy, then, i  
s the act of making young called "self-reproduction"? The answer is t  
hat there is a coarse-grained isomorphism between parent and child; it is a  
n isomorphism which preserves the information about species. Thus, what i  
s reproduced is the class, rather than the instance. This is also the case in t  
he recursive picture Gplot, in Chapter V: that is, the mapping b  
etween "magnetic butterflies" of various sizes and shapes is coarse-grained; no t  
wo are identical, but they all belong to a single "species", and the m  
apping preserves precisely that fact. In terms of self-replicating programs, t  
his would correspond to a family of programs, all written in "dialects" of a  
single computer language; each one can write itself out, but slightly modified, so that it comes out in a dialect of its original l
 
546
 
A Kimian Self-Re  
 
546
 
Perhaps the sneakiest example of a self-rep is the following: instead o  
f writing a legal expression in the compiler language, you type one of the   
compiler's own error messages. When the compiler looks at your "program", the first thing it does is get confused, because your "program" i  
s ungrammatica l; hence the compiler prints out an error message. All you   
need to do is arrange that the one it prints out will be the one you typed i  
n. This kind of self-rep, suggested to me by Scott Kim, exploits a diff  
erent level of the system from the one you would normally approach. Although i  
t may seem frivolous, it may have counterparts in complex s ystems w  
here self-reps vie against each other for survival, as we shall soon d  
iscus
 
546
 
What Is the O
 
546
 
Besides the question "What constitutes a copy?", there is another fundamental philosophical question concerning self-reps. That is the o  
bvers
 
547
 
side of the coin: "What is the original?" This can best be explained b  
y referring to some e  
xample
 
547
 
(1) a program which, when interpreted by some i  
nterpreter running on some computP.r, prints itself o  
u
 
547
 
(2) a program which, when interpreted by some int  
erpreter running on some computer, prints itself out along with a  
complete copy of the interpreter (which, after all, is also a   
p rogram);
 
547
 
(3) a program which, when interpreted by some i  
nterpreter running on some computer, not only prints itself out a  
long with a complete copy of the interpreter, but also directs a  
mechanical assembly process in which a second c  
omputer, identical to the one on which the interpreter and p  
rogram are running, is put togeth  
e
 
547
 
is clear that in (1), the program is t.he self-rep. But in (3), is it the p  
rogram which is the self-rep, or the compound s ystem of program plus int  
erpreter, or the union of program, interpreter, and p  
rocesso
 
547
 
C learly, a self-rep can involve more than just printing itself out. In fact,  
most of the rest of this Chapter is a discussion of self-reps in which d  
ata, program, interpreter, and processor are all extremely intertwined, and i  
n which self-replication involves replicating all of them at o
 
547
 
Typogen  
etic
 
547
 
We are now about to broach one of the most fascinating and p  
rofound topics of the twentieth c entury: the study of "the molecular logic of t  
he living state", to borrow Albert Lehninger's richly evocative phrase. And   
logic it is, too-but of a sort more complex and beautiful than any a h  
uman mind ever imagined. We will come at it from a slightly novel angle: via a  
n artificial solitaire game which I call Typogenetics-short for "Typ  
ographical Genetics". In Typogenetics I have tried to capture some ideas of m  
olecular genetics in a typographical s ystem which, on first sight, resembles v  
ery much the formal s ystems exemplified by the MIU-system. Of course, Typogenetics involves many simplifications, and therefore is useful p  
rimarily for didactic p
 
547
 
I should explain immediately that the field of molecular biology is a  
field in which phenomena on several levels interact, and that Typogen  
etics is only trying to illustrate phenomena from one or two levels. In p  
articular, purely chemical aspects have been completely avoided-they belong to a  
level lower than is here dealt with; similarly, all aspects of classical gen  
etics (viz., nonmolecular genetics) have also been avoided-they belong to a l  
evel higher than is here dealt with. I have intended in Typogenetics only to g  
ive an intuition for those processes centered on the celebrated Central Dogma o  
 
548
 
Molecular Biology, enunciated by Francis Crick (one of the co-discoverers o  
f the double-helix structure of DNA):  
DNA ::;, RNA ::;, p  
roteins. It is my hope that with this very skeletal model I have constructed t  
he reader will perceive some simple unifying principles of the fieldprinciples which might otherwise be obscured by the enormously intr  
icate interplay of phenomena at many different levels. What is sacrificed is, o  
f course, strict accuracy; what is gained is, I hope, a little ins  
igh
 
548
 
Strands, Bases, Enzy  
me
 
548
 
The game of Typogenetics involves typographical manipulation on sequences of letters. There are four letters inv  
olved: A C G T  
. Arbitrary sequences of them are called strands. Thus, some strands a  
re: G  
GGG ATTAC  
CA C  
ATCATCATCA"( Incidentally, "STRAND" spelled backwards begins with "DNA". This is   
appropriate since strands, in Typogenetics, play the role of pieces of D  
NA (which, in real genetics, are often called "strands"). Not only this, b  
ut "STRAND" fully spelled out backwards is "DNA RTS", which may be t  
aken as an acronym for "DNA Rapid Transit Service". This, too, is a  
ppropriate, for the function of "messenger RNA"-which in Typogenetics is represented by strands as well-is quite well characterized by the p  
hrase "Rapid Transit Service" for DNA, as we shall see l
 
548
 
I will sometimes refer to the letters A, C, G, T as bases, and to t  
he positions which they occupy as units. Thus, in the middle strand, there a  
re seven units, in the fourth of which is found the base A
 
548
 
If you have a strand, you can operate on it and change it in v  
arious ways. You can also produce additional strands, either by copying, or b  
y cutting a strand in two. Some operations lengthen strands, some s  
horten them, and some leave their length a  
lon
 
548
 
Operations come in packets-that is, several to be performed t  
ogether, in order. Such a packet of operations is a little like a programmed m  
achine which moves up and down the strand doing things to it. These m  
obile machines are called "typographical enzymes"--enzymes for short. Enzyme  
s operate on strands one unit at a time, and are said to be "bound" to the u  
nit they are operating on at any given m  
omen
 
548
 
I will show how some sample enzymes act on particular strings. T  
he first thing to know is that each enzyme likes to start out bound to a  
particular letter. Thus, there are four kinds of enzyme-those which p  
refe
 
549
 
A, those which prefer C, etc. Given the sequence of operations which a  
n enzyme performs, you can figure out which letter it prefers, but for now I  
'll just give them without explanation Here's a sample enzyme, consisting o  
f three operat  
ions: { ( 1) Delete the unit to which the enzyme is bound (and then b  
ind to the next unit to the right  
). ( 2)Move one unit to the right.  
( 3)Insert a T (to the immediate right of this u  
nit).This enzyme happens to like to bind to A initially. And here's a s  
amples  
trand: A  
CAWhat happens if our enzyme binds to the left A and begins acting? Step 1  
deletes the A, so we are left with CA--and the enzyme is now bound to t  
he C .Step 2 slides the enzyme rightwards, to the A, and Step 3 appends a T  
onto the end to form the strand CAT. And the enzyme has done i  
tscomplete duty: it has transformed ACA into C
 
549
 
What if it had bound itself to the right A of ACA? It would have d  
eletedthat A and moved off the end of the strand. Whenever this happens, t  
heenzyme quits (this is a general principle). So the entire effect would just b  
e to lop off one s  
ymbol. Let's see some more examples. Here is another e  
nzyme: { (1) Search for the nearest p )rimidine to the right of this unit.   
( 2)Go into Copy m  
ode.( 3)Search for the nearest purine to the right of of this u  
nit.( 4)Cut the strand here (viz., to the right of the present u  
nit).Now this contains the terms "pyrimidine" and "purine". They are e  
asy terms. A and G are called purines, and C and T are called pyrimidines. S  
osearching for a pyrimidine merely means searching for the nearest C or T
 
549
 
Copy Mode and Double S  
trand
 
549
 
The other new term is Copy mode. Any strand can be "copied" onto a  
nother strand, but in a funny way. Instead of copying A onto A, you copy it onto T  
,and vice versa. And instead of copying C onto C, you copy it onto G, and   
vice versa. Note that a purine copies onto a pyrimidine, and vice versa. T  
his is called complementary base pairing. The complements are shown b  
elow: c  
omplement {  
A ·¹=⇒ T  
}  
purines p  
yrimidines G ‱±:⇒ C
 
550
 
You can perhaps remember this molecular pairing scheme by recalling t  
hat Achilles is paired with the Tortoise, and the Crab with his Genes.
 
550
 
When "copying" a strand, therefore, you don't actually copy it, but y  
ou manufacture its complementary strand. And this one will be written u  
pside down above the original strand. Let's see this in concrete terms. Let t  
he previous enzyme act on the following strand (and that enzyme also h  
appens to like to start at A):  
CAAAG  
AGMTCCTCTTTGAT There are many places it could start. Let's take the second A, for e  
xample. The enzyme binds to it, then executes step 1: Search for the n  
earest pyrimidine to the right. Well, this means a C or a T. The first one is a T   
somewhere near the middle of the strand, so that's where we go. Now s  
tep 2: Copy mode. Well, we just put an upside-down A above our T. But that's   
not all, for Copy mode remains in effect until it is shut off-or until t  
he enzyme is done, whichever comes first. This means that every base which i  
s passed through by the enzyme while Copy mode is on will get a complementary base put above it. Step 3 says to look for a purine to the right o  
f our T. That is the G two symbols in from the right-hand end. Now as w  
e move up to that G, we must "copy"-that is, create a complementary s  
trand. Here's what that gives:  
\{':)':) \{':) \{\{\{  
') CAAAGAGMTCCTCTTTGAT   
The last step is to cut the strand. This will yield two pieces:  
\{':)':) \{':) \{\{\{  
') CAAAGAGMT  
CCTCTTTG and A  
T. And the instruction packet is done. We are left with a double s  
trand, however. Whenever this happens, we separate the two complement  
ary strands from each other (general principle); so in fact our end product is a  
set of three strands  
: AT, CAAAGAGGA, and CAAAGAGMTCCTCTTTG.  
Notice that the upside-down strand has been turned right side up, a  
nd thereby right and left have been r
 
550
 
Now you have seen most of the typographical operations which can b  
e carried out on strands. There are two other instructions which should b  
e mentioned. One shuts off Copy mode; the other switches the enzyme from a  
strand to the upside-down strand above it. When this happens, if you k  
eep the paper right side up, then you must switch "left" and "right" in all the   
instructions. Or better, you can keep the wording and just turn the p  
aper around so the top strand becomes legible. If the "switch" command i  
 
551
 
given, but there is no complementary base where the enzyme is bound a  
t that instant, then the enzyme just detaches itself from the strand, and its j  
ob is d  
on
 
551
 
It should be mentioned that when a "cut" instruction is e  
ncountered, this pertains to both strands (if there are t wo); however, "delete" p  
ertains only to the strand on which the enzyme is working. If Copy mode is o  
n, then the "insert" command pertains to both strands-the base itself into t  
he strand the enzyme is working on, and its complement into the other s  
trand. If Copy mode is off, then the "insert" command pertains only to the o  
ne strand, so a blank space must be inserted into the complementary strand.
 
551
 
And, whenever Copy mode i5 on, "move" and "search" c  
ommands require that one manufacture complementary bases to all bases which t  
he sliding enzyme touches. Incidentally, Copy mode is always off when a  
n enzyme starts to work. If Copy mode is off, and the command "Shut o  
ff copy mode" is encountered, nothing happens. Likewise, if Copy mode i  
s already on, and the command "Turn copy mode on" is encountered, t  
hen nothing h  
appen
 
551
 
Amino A  
cid
 
551
 
There are fifteen types of command, listed below:   
cut cut s  
trand(s) de! delete a base f  
rom s  
trand sw1 switch enzyme to other s  
trand mvr move one unit to the r  
ight mvl move one unit to the l  
eft cop turn on Copy m  
ode off turn off Copy m  
ode ma insert A to the right of this u  
nit me insert C to the right of this u  
nit mg insert G to the right of this u  
nit int insert T to the right of this u  
nit rpy search for the nearest pyrimidine to the r  
ight rpu search for the nearest purine to the r  
ight lpy search for the nearest pyrimidine to the l  
eft I pu search for the nearest purine to the l  
eft Each one has a three-letter abbreviation. We shall refer to the t  
hree-letter abbreviations of commands as amino acids. Thus, every enzyme is made up of a  
sequence of amino acids. Let us write down an arbitrary e  
nzyme: rpu - inc - cop - mvr - mvl - swi - lpu - i  
nt and an arbitrary s  
trand: TAG  
ATCCAGTCCATCG
 
552
 
and see how the enzyme acts on the strand. It so happens that the e  
nzyme binds to G only. Let us bind to the middle G and begin. Search r  
ightwards for a purine ( viz., A or G). We (the enzyme) skip over TCC and land on A  
.Insert a C. Now we h  
ave TAG  
ATCCAGTCCACTCGA where the arrow points to the unit to which the enzyme is bound. Set C  
opy mode. This puts an upside-down G above the C. Move right, move l  
eft,then switch to the other strand. Here's what we have so fa  
r: +  
r:)  
\f TAG  
ATCCAGTCCACTCGA Let's turn it upside down, so that the enzyme is attached to the l  
ower stra  
nd: \/  
r:JJlJ\/JJlr:J\/JJl\/r:J\/1 A  
G ♩  
Now we search leftwards for a purine, and find A. Copy mode is on, but t  
hecomplementary bases are already there, so nothing is added. Finally, w  
e insert a T (in Copy mode), and quit:  
\/r:JJl\/J\/JJlr:J\/JJl\/  
r:J\/1 A  
TG Our final product is thus two strlnds:  
ATG, and T  
AGATCCAGTCCACATCGA The old one is of course g
 
552
 
Translation and the Typogenetic C  
od
 
552
 
Now you might be wondering where the enzymes and strands come fro  
m, and how to tell the initial binding-preference of a given enzyme. One w  
ay might be just to throw some random strands and some random e  
nzymes together, and see what happens when those enzymes act on those s  
trands and their progeny. This has a similar flavor to the MU-puzzle, where t  
here were some given rules of inference and an axiom, and you just began. T  
he only difference is that here, every time a strand is acted on, its original f  
orm is gone forever. In the MU-puzzle, acting on Ml to make MIU didn't d  
estroy M  
 
552
 
But in Typogenetics, as in real genetics, the scheme is quite a b  
it trickier. We do begin with some arbitrary strand, somewhat like an a  
xiom in a formal system. But we have, initially, no "rules of inference"-that i  
s, no enzymes. However, we can translate each strand into one or m  
ore enzymes! Thus, the strands themselves will dictate the operations w  
hich will be performed upon them, and those operations will in turn produc  
 
553
 
new strands which will dictate further enzymes, etc. e tc.! This is mixin  
g levels with a vengeance! Think, for the sake of comparison, how d  
ifferent the MU-puzzle would have been if each new theorem produced could h  
ave been turned into a new rule of inference by means of some c  
od
 
553
 
How is this "translation" done? It involves a Typogenetic Code by w  
hich adjacent pairs of bases-called "duplets"-in a single strand r  
epresent different amino acids. There are sixteen possible duplets: AA, AC, AG, AT,   
CA, CC, etc. And there are fifteen amino acids. The Typogenetic Code i  
s shown in Figure 8
 
553
 
FIGURE 87. The Typogenetic Code, b  
y which each duplet in a strand codes for o  
ne of fifteen "amino acids" ( or a p  
unctuation m
 
553
 
According to the table, the translation of the duplet GC is "inc" ("insert a  
C"); that of AT is "swi" ("switch strands"); and so on. Therefore it b  
ecomes clear that a strand can dictate an enzyme very straightforwardly. F  
or example, the s  
trand TAG  
ATCCAGTCCACATCGA breaks up into d uplets as fo  
llows: TA GA TC CA GT CC AC AT CG A  
with the A left over at the end. Its translation into an enzyme is:   
rpy - ina - rpu - mvr - int - mvl - cut - swi - cop.   
(Note that the leftover A contributes nothing.)
 
553
 
Tertiary Structure of E  
nzyme
 
553
 
What about the little letters 's', 'l', and 'r' in the lower righthand corner o  
f each box? They are crucial in determining the enzyme's binding-preference, and in a peculiar way. In order to figure out what letter an enzy  
me likes to bind to, you have to figure out the enzyme's "tertiary struct  
ure", which is itself determined by the enzyme's "primary structure". By i  
t
 
554
 
primary structure is meant its amino acid sequence. By its tertiary structure i  
s meant the way it likes to "fold up". The point is that e nzymes don't l  
ike being in straight lines, as we have so far exhibited them. At each i  
nternal amino acid (all but the two ends), there is a possibility of a "kink", which i  
s dictated by the letters in the corners. In particular, 'l' and 'r' stand fo  
r "left" and "right", and 's' stands for "straight". So let us take our m  
ost recent sample enzyme, and let it fold itself up to show its tertiary s  
tructure. We will start with the enzyme's primary structure, and move along it f  
rom left to right. At each amino acid whose corner-letter is 'l' we'll put a left  
turn, for those with 'r', we'll put a right turn, and at 's' we'll put no turn. I  
n Figure 88 is shown the two-dimensional conformation for our enzyme
 
554
 
FIGURE 88. The tertiary structure of a typoenzy
 
554
 
Note the left-kink at "rpu", the right-kink at "swi", and so on. Notice a  
lso that the first segment ("rpy ïżœ ina") and the last segment ("swi ïżœ cop") a  
re perpendicular. This is the key to the binding-preference. In fact, t  
he relative orientation of the first and last segments of an enzyme's tertiary s  
tructure determines the binding-preference of the enzyme. We can always o  
rient the enzyme so that its first segment points to the right. If we do so, then t  
he last segment determines the binding-preference, as shown in Figure 8  
 
554
 
FIGURE 89. Table of binding-preferences for typoenzy  
me
 
555
 
So in our case, we have an enzyme which likes the letter C. If, in folding u  
p, an enzyme happens to cross itself, that's okay-just think of it as g  
oing under or over itself. Notice that all its amino acids play a role in t  
he determination of an enzyme's tertiary s
 
555
 
Punctuation, Genes, and R  
ibosome
 
555
 
Now one thing remains to be explained. Why is there a blank in box AA o  
f the Typogenetic Code? The answer is that the duplet AA acts as a punctuation mark inside a strand, and it signals the end of the code for an e  
nzyme. That is to say, one strand may code for two or more enzymes if it has one o  
r more duplets AA in it. For example, the s  
trand codes for two enzymes:  
CG GA TA CT AA AC CG A  
cop - ina - rpy - o  
ff a  
nd c ut.- -c  
op with the AA serving to divide the strand up into two "genes". The defini  
tion of gene is: that portion of a strand which codes for a single enzyme. Note that t  
he mere presence of AA inside a strand does not mean that the strand c  
odes for two enzymes. For instance, CMG codes for "mvr - de!". The AA b  
egins on an even-numbered unit and therefore is not read as a d
 
555
 
The mechanism which reads strands and produces the enzymes w  
hich are coded inside them is called a ribosome. (In Typogenetics, the player o  
f the game does the work of the ribosomes.) Ribosomes are not in any w  
ay responsible for the tertiary structure of enzymes, for that is entirely determined once the primary structure is created. Incidenta lly, the process o  
f translation always goes from strands to enzymes, and never in the r  
everse d
 
555
 
Puzzle: A Typogenetical Self-R  
e
 
555
 
Now that the rules of Typogenetics have been fully set out, you may find i  
t interesting to experiment with the game. In particular, it would be m  
ost interesting to devise a self-replicating strand. This would mean somethi  
ng along the following lines. A single strand is written down. A ribosome a  
cts on it, to produce any or all of the enzymes which are coded for in t  
he strand. Then those enzymes are brought into contact with the o  
riginal strand, and allowed to work on it. This yields a set of "daughter s  
trands". The daughter strands themselves pass through the ribosomes, to yield a  
second generation of enzymes, which act on the daughter strands; and t  
h
 
556
 
cycle goes on and on. This can go on for any number of stages; the hope i  
s that eventually, among the strands which are present at some point, t  
here will be found two copies of the original strand (one of the copies may be, i  
n fact, the original s  
tran
 
556
 
The Central Dogma of Typ  
ogenetic
 
556
 
Typogenetical processes can be represented in skeletal form in a diag  
ram (Fig. 9
 
556
 
FIGURE 90. The "Central Dogma o  
f Typogenetics": an example of a "  
Tangled Hierarchy"
 
556
 
This diagram illustrates the Central Dogma of Typogenetics. It shows how   
strands define enzymes (via the Typogenetic Code); and how in t  
urn, enzymes act back on the strands which gave rise to them, yielding n  
ew strands. Therefore, the line on the left portrays how old information fl  
ows upwards, in the sense that an enzyme is a translation of a strand, a  
nd contains therefore the same information as the strand, only in a diff  
erent form-in particular; in an active form. The line on the right, however, d  
oes not show information flowing downwards; instead, it shows how new information gets created: by the shunting of symbols in s
 
556
 
An enzyme in Typogenetics, like a rule of inference in a formal sys  
tem, blindly shunts symbols in strands without regard to any "meaning" w  
hich may lurk in those symbols. So there is a curious mixture of levels here. O  
n the one hand, strands are acted upon, and therefore play the role of da  
ta (as is indicated by the arrow on the right); on the other hand, they a  
lso dictate the actions which are to be performed on the data, and t  
herefore they play the role of programs (as is indicated by the arrow on the left). It i  
s the player of Typogenetics who acts as interpreter and processor, o  
f course. The two-way street which links "upper" and "lower" levels o  
f Typogenetics shows that, in fact, neither strands nor enzymes can b  
e thought of as being on a higher level than the other. By contrast, a p  
icture of the Central Dogma of the MIU-system looks this way
 
556
 
In the MIU-system, there is a clear distinction of levels: rules of infe  
rence simply belong to a higher level than strings. Similarly for TNT, and a  
ll formal systems.
 
557
 
Strange Loops, TNT, and Real G  
enetic
 
557
 
However, we have seen that in TNT, levels are mixed, in another sense. I  
n fact, the distinction between language and metalanguage breaks down  
: statements about the system get mirrored inside the system. It turns out t  
hat if we make a diagram showing the relationship between TNT and i  
ts metalanguage, we will produce something which resembles in a r  
emarkable way the diagram which represents the Central Dogma of Molecular Biology. In fact, it is our goal to make this comparison in detail; but to do so, w  
e need to indicate the places where Typogenetics and true genetics c  
oincide, and where they differ. Of course, real genetics is far more complex than  
Typogenetics-but the "conceptual skeleton" which the reader has acquired in understanding Typogenetics will be very useful as a guide in t  
he labyrinth of true genetics.
 
557
 
DNA and Nuc  
leotide
 
557
 
We begin by discussing the relationship between "strands", and DNA. T  
he initials "DNA" stand for "deoxyribonucleic acid". The DNA of most c  
ells resides in the cell's nucleus, which is a small area protected by a m  
embrane. Gunther Stent has characterized the nucleus as the "throne room" of t  
he cell, with DNA acting as the ruler. DNA consists of long chains o fr  
elatively simple molecules called nucleotides. Each nucleotide is made up of t  
hree parts: (1) a phosphate group stripped of one special oxygen atom, whenc  
e the prefix "deoxy"; (2) a sugar called "ribose", and (3) a base. It is the base   
alone which distinguishes one nucleotide from another; thus it suffices t  
o specify its base to identify a nucleotide. The four types of bases which occ  
ur in DNA nucleotides a  
re: A: adenine } G: g  
uanine p  
urines C: cytosine } T: thymin  
e pyr  
imidines (Also see Fig. 91.) It is easy to remember which ones are p  
yrimidines because the first vowel in "cytosine", "thymine", and "pyrimidine" is 'y'  
. Later, when we talk about RNA, " uracil"-also a pyrimidine-will come i  
n and wreck the pattern, unfortunately. (Note: Letters representing nucleotides in real genetics will not be in the Quadrata font, as they were i  
n Typogenetics.
 
557
 
A single strand of DNA thus consists of many nucleotides s  
trung together like a chain of beads. The chemical bond which links a n  
ucleotide to its two neighbors is very strong; such bonds are called covalent bonds, a  
nd the "chain of beads" is often called the covalent backbone of D
 
557
 
Now DNA usually comes in double strands-that is, two single s  
trands which are paired up, nucleotide by nucleotide (see Fig. 92). It is the b  
ase
 
558
 
FIGURE 91. The four constituent bases of DNA: Adenine, Guanine. Thymine, Cytosine  
. [From Hanawalt and Haynes, The Chemical Basis of Life (San Francisco: W. H. Freeman, 197  
3), p .142.]
 
558
 
FIGURE 92. DNA structure resembles a ladder in which the side pieces consist of alternating units of deoxyribose and phosphate. The rungs are formed by the bases paired in a spe  
cwl way, A with T and G with C, and held together respectively by two and three hydrogen b  
onds. [From Hanawalt and Haynes, The Chemical Basis of Life, p. 142. ]
 
559
 
which are responsible for the peculiar kind of pairing which takes p  
lace between strands. Each base in one strand faces a complementary base in the   
other strand, and binds to it. The complements are as in Typogenetics: A  
pairs up with T, and C with G. Always one purine pairs up with a  
p
 
559
 
Compared to the strong covalent bonds along the backbone, the interstrand bonds are quite weak. They are not covalent bonds, but hydrogen  
bonds. A hydrogen bond arises when two molecular complexes are a  
ligned in such a way that a hydrogen atom which originally belonged to one o  
f them becomes "confused" about which one it belongs to, and it h  
overs between the two complexes, vacillating as to which one to join. Because t  
he two halves of double-stranded DNA are held together only by h  
ydrogen bonds, they may come apart or be put together relatively easily; and t  
his fact is of great import for the workings of the c  
el
 
559
 
When DNA forms double strands, the two strands curl around e  
ach other like twisting vines (Fig. 93). There are exactly ten nucleotide p  
airs per revolution; in other words, at each nucleotide, the "twist" is 36 d  
egrees. Single-stranded DNA does not exhibit this kind of coiling, for it is a   
consequence of the base-pairin  
 
559
 
FIGURE 93. Molecular model of t  
he DNA double helix. [From Vernon M. I  
ngram, Biosynthesis (Menlo Park, Calif.: W. A. Benjamin, 1972), p. 1 3.]
 
560
 
Messenger RNA and R  
ibosome
 
560
 
As was mentioned above, in many cells, DNA, the ruler of the cell, dwells i  
n its private "throne room": the nucleus of the cell. But most of the "living" i  
n a cell goes on outside of the nucleus, namely in the cytoplasm-the "  
ground" to the nucleus' "figure". In particular, enzymes, which make practically e  
very life process go, are manufactured by ribosomes in the cytoplasm, and they d  
o most of their work in the cytoplasm. And just as in Typogenetics, the  
blueprints for all enzymes are stored inside the strands-that is, inside t  
he DNA, which remains protected in its little nuclear home. So how does t  
he information about enzyme structure get from the nucleus to the ribosomes?
 
560
 
Here is where messenger RNA-mRNA--comes in. Earlier, m  
RNA strands were humorously said to constitute a kind of DNA Rapid T  
ransit Service; by this is meant not that mRNA physically carries DNA a  
nywhere, but rather that it serves to carry the information, or message, stored in t  
he DNA in its nuclear chambers, out to the ribosomes in the cytoplasm. How i  
s this done? The idea is easy: a special kind of enzyme inside the n  
ucleus faithfully copies long stretches of the DNA's base sequence onto a n  
ew strand-a strand of messenger RN A. This mRN A then departs from t  
he nucleus and wanders out into the cytoplasm, where it runs into m  
any ribosomes which begin doing their enzyme-creating work on i  
 
560
 
The process by which DNA gets copied onto mRNA inside the n  
ucleus is called transcription; in it, the double-stranded DNA must be t  
emporarily separated into two single strands, one of which serves as a template for t  
he mRNA. Incidentally, "RNA" stands for "ribonucleic acid", and it is v  
ery much like DNA except that all of its nucleotides possess that special o  
xygen atom in the phosphate group which DNA's nucleotides lack. Therefore t  
he "deoxy" prefix is dropped. Also, instead of thymine, RN A uses the· b  
ase uracil, so the information in strands of RNA can be represented by arbitrary sequences of the four letters 'A', 'C', 'G', 'U'. Now when mRNA i  
s transcribed off of DNA, the transcription process operates via the u  
sual base-pairing (except with U instead of T), so that a DNA-template and i  
ts mRNA-mate might look something like t  
his: DNA: ........ CGT AAATCAAGTCA ....... .  
mRNA: ........ GCAUUUAGUUCAGU ....... .  
(template  
) (  
"copy") RNA does not generally form long double strands with itself, although it   
can. Therefore it is prevalently found not in the helical form which s  
o characterizes DNA, but rather in long, somewhat randomly curvi  
ng stran
 
560
 
Once a strand of mRNA has escaped the nhcleus, it encounters t  
hose strange subcellular creatures called "ribosomes"-but before we go on t  
o explain how a ribosome uses mRNA, I want to make some comments a  
bout enzymes and proteins. Enzymes belong to the general category o  
f biomolecules called proteins, and the job of ribosomes is to make all pro
 
561
 
teins, not just enzymes. Proteins which are not enzymes are much m  
ore passive kinds of beings; many of them, for instance, are s  
tructural molecules, which means that they are like girders and beams and so forth i  
n buildings: they hold the cell's parts together. There are other kinds o  
f proteins, but for our purposes, the principal proteins are enzymes, and I  
will henceforth not make a sharp d
 
561
 
Amino A  
cid
 
561
 
Proteins are composed of sequences of amino acids, which come in twenty   
primary varieties, each with a thr ee-letter abbreviation:  
ala a  
lanine arg a  
rginine asn asparagm  
e asp aspartic a  
cid cys c  
ysteine gin g  
lutamine glu glutamic a  
cid gly g  
lycine his h  
istidine ile i  
soleucine leu l  
eucine lys l  
ysine met m  
ethionine phe p  
henylalanine pro p  
raline ser s  
enne thr t  
hreonine trp tryptop  
han tyr t  
yrosine val v  
aline Notice the slight numerical discrepancy with Typogenetics, where we h  
ad only fifteen "amino acids" composing enzymes. An amino acid is a s  
mall molecule of roughly the same complexity as a nucleotide; hence the building blocks of proteins and of nucleic acids (DNA, RNA) are roughly of t  
he same size. However, proteins are composed of much shorter sequences o  
f components: typically, about three hundred amino acids make a c  
omplete protein, whereas a strand of DNA can consist of hundreds of thousands o  
r millions of nucleotides.
 
561
 
Ribosomes and Tape Recorders
 
561
 
Now when a strand of mRNA, after its escape into the cytoplasm, encounters a ribosome, a very intricate and beautiful process called translation t  
akes place. It could be said that this process of translation is at the very heart o  
 
562
 
all of life, and there are many mysteries connected with it. But in essence i  
t is easy to describe. Let us first give a picturesque image, and then render i  
t more precise. Imagine the mRNA to be like a long piece of magneti  
c recording tape, and the ribosome to be like a tape recorder. As the tape   
passes through the playing head of the recorder, it is "read" and c  
onverted into music, or other sounds. Thus magnetic markings are "translated" into   
notes. Similarly, when a "tape" of mRNA passes through the "  
playing head" of a ribosome, the "notes" which are produced are amin:1 acids, a  
nd the "pieces of music" which they make up are proteins. This is what translation is all about; it is shown in Figure 9
 
562
 
The Genetic C  
od
 
562
 
But how can a ribosome produce a chain of amino acids when it i  
s reading a chain of nucleotides? This mystery was solved in the e  
arly l 960's by the efforts of a large number of people, and at the core o  
fthe answer lies the Genetic Code-a mapping from triplets of nuc  
leotidesinto amino acids (see Fig. 94). This is in spirit extremely similar t  
othe Typogenetic Code, except that here, three consecutive bases (  
ornuc leotides) form a c  
odon, whereas there, CUA GAU only two wer  
e needed. Thus Cu Ag Au there must b  
e 4X4X4 (equals 6 4)diffe  
rententries in the A t ypical segment of mRNA table, ins  
teadof sixteen. A read first as two triplets ribosome dic  
ks down a strand ( a bove), and second as t hreeof RN A t  
hree nucleotides at duplets ( below) : an example a time-wh  
ich is to say, one of hemiolia in biochemistry. codon at a t  
ime -and each time it does so, i  
t appends a single new amino acid to the protein it is presently manufacturing. Thus, a protein comes out of the ribosome amino acid by amino a
 
562
 
Tertiary Structure
 
562
 
However, as a protein emerges from a ribosome, it is not only g  
etting longer and longer, but it is also continually folding itself up into an extraordinary three-dimensional shape, very much in the way that t  
hose funny little Fourth-of-July fireworks called "snakes" simultaneously g  
row longer and curl up, when they are lit. This fancy shape is called t  
he protein's tertiary structure (Fig. 95), while the amino acid sequence per se is   
called the primary structure of the protein. The tertiary structure is i  
mplicit in the primary structure, just as in Typogenetics. However, the recipe f  
or deriving the tertiary structure, if you know only the primary structure, is b  
y far more complex than that given in Typogenetics. In fact, it is one of t  
he outstanding problems of contemporary molecular biology to figure o  
ut some rules by which the tertiary structure of a protein can be predicted i  
f only its primary structure is k
 
563
 
FIGURE 94. The Genetic Code, fry which each triplet in a strand of messenger RNA c  
odes for one of twenty amino acid1 ( or a punctuation m
 
563
 
Reductionistic Explanation of Protein Fun  
ctio
 
563
 
Another discrepancy between Typogenetics and true genetics-and this is   
probably the most serious one o fall--is this: whereas in Typogenetics, e  
ach component amino acid of an enzyme is responsible for some specific "pi  
ece of the action", in real enzymes, individual amino acids cannot be a  
ssigned such clear roles. It is the tertiary structure as a whole which determines t  
he mode in which an enzyme will function; there is no way one can say, "  
Th
 
564
 
amino acid's presence means that such-and-such an operation will g  
et performed". In other words, in real genetics, an individual amino a  
cid's contribution to the enzyme's overall function is not "context-free". However, this fact should not be construed in any way as ammunition for a  
n antireductionist argument to the effect that "the whole [enzyme] cannot b  
e explained as the sum of its parts". That would be wholly unjustified. W  
hat is justified is rejection of the simpler claim that "each amino acid contributes to the sum in a manner which is independent of the other amino a  
cids present". In other words, the function of a protein cannot be considered t  
o be built up from context-free functions of its parts; rather, one m  
ust consider how the parts interact. It is still possible in principle to write a  
computer program which takes as input the primary structure of a prote
 
564
 
FIGURE 95. The structure of myoglobin, deduced from high-resolution X-ray data. T  
he large-scale "twisted pipe" appearance is the tertiary structure; the finer helix i  
nside-the "alpha helix"ïżœs the secondary structure. [From A. Lehninger, Biochemistry
 
565
 
and firstly determines its tertiary structure, and secondly determines t  
he function of the enzyme. This would he a completely reductio.nistic explanation of the workings of proteins, but the determination of the "sum" of t  
he parts would require a highly complex algorithm. The elucidation of t  
he function of an enzyme, given its primary, or even its tertiary, structure, i  
s another great problem of contemporary molecular b
 
565
 
Perhaps, in the last analysis, the function of the whole enzyme can b  
e considered to be built up from functions of parts in a context-free m  
anner, but where the parts are now considered to be individual particles, such a  
s electrons and protons, rather than "chunks", such as amino acids. T  
his exemplifies the "Reductionist's Dilemma": In order to explain e  
verything in terms of context-free sums, one has to go down to the level of physics; b  
ut then the number of particles is so huge as to make it only a t  
heoretical "in-principle" kind of thing. So, one has to settle for a context-  
dependent sum, which has two disadvantages. The first is that the parts are m  
uch larger units, whose behavior is describable only on a high level, and therefore indeterminately. The second is that the word "sum" carries the connotation that each part can be assigned a simple function and that t  
he function of the whole is just a context-free sum of those individual functions. This just cannot be done when one tries to explain a whole enzy  
me's function, given its amino acids as parts. But for better or for worse, this is a  
general phenomenon which arises in the explanations of complex s  
ystems. In order to acquire an intuitive and manageable understanding of h  
ow parts interact-in short, in order to proceed-one often has to sacrifice t  
he exactness yielded by a microscopic, context-free picture, simply because o  
f its unmanageability. But one does not sacrifice at that time the faith t  
hat such an explanation exists in p
 
565
 
Transfer RNA and Ribosom  
e
 
565
 
Returning, then, to ribosomes and R'.'JA and proteins, we have stated that a  
protein is manufactured by a ribosome according to the blueprint c  
arried from the DNA's "royal chambers" by its messenger, RNA. This seems t  
o imply that the ribosome can translate from the language of codons into t  
he language of amino acids, which amounts to saying that the r  
ibosome "knows" the Genetic Code. However, that amount of information is s  
imply not present in a ribosome. So how does it do it? Where is the Genetic C  
ode stored? The curious fact is that the Genetic Code is stored-where else?-in  
the DNA itself. This certainly calls for some e  
xplanatio
 
565
 
Let us back off from a total explanation for a moment, and give a  
partial explanation. There are, floating about in the cytoplasm at any g  
iven moment, large numbers of four-leaf-dover-shaped molecules; loosely fastened (i.e., hydrogen-bonded) to one leaf is an amino acid, and on t  
he opposite leaf there is a triplet of nucleotides called an anticodon. For o  
ur purposes, the other two leaves are irrelevant. Here is how these "  
clovers" are used by the ribosomes in their production of proteins. When a n  
e
 
566
 
FIGURE 96. A section of mRNA passing through a ribosome. Floating nearfry are t  
RNA molecules, carrying amino acids which are stripped off fry the ribosome and appended to t  
he growing protein. The Genetic Code is contained in the tRNA molecules, collectively. Note h  
ow the base-pairing (A-U, C-G) is represented fry interlocking letter-forms in the diagra  
m. [Drawing by Scott E. Kim.]
 
566
 
ribosome reaches out into the cytoplasm and latches onto a clover w  
hose anticodon is complementary to the mRNA codon. Then it pulls the c  
lover into such a position that it can rip off the clover's amino acid, and stick i  
t covalently onto the growing protein. (Incidentally, the bond between a  
n amino acid and its neighbor in a protein is a very strong covalent b  
ond, called a "peptide bond". For this reason, proteins are sometimes c  
alled "polypeptides".) Of course it is no accident that the "clovers" carry t  
he proper amino acids, for they have all been manufactured according t  
o precise instructions emanating from the "throne r
 
567
 
The real name for such a clover is transfer RNA. A molecule of tRNA i  
s quite small-about the size of a very small protein-and consists of a c  
hain of about eighty nucleotides. Like mRNA, tRNA molecules are made b  
y transcription off of the grand cellular template, DNA. However, tRNA's a  
re tiny by comparison with the huge mRNA molecules, which may c  
ontain thousands of nucleotides in long, long chains. Also, tRNA's resemble proteins (and are unlike strands of mRNA) in this respect: they have fi  
xed, well-defined tertiary structures-determined by their primary structure. A  
tRNA molecule's tertiary structure allows precisely one amino acid to b  
ind to its amino-acid site; to be sure, it is that one dictated according to t  
he Genetic Code by the anticodon on the opposite arm. A vivid image of t  
he function of tRNA molecules is as Aas hcards floating in a cloud around a  
simultaneous interpreter, who snaps one out of the air-invariably t  
he right one!-whenever he needs to translate a word. In this case, the interpreter is the ribosome, the words are codons, and their translations a  
re amino a
 
567
 
In order for the inner message of DNA to get decoded by the ribosomes, the tRNA flashcards must be floating about in the cytoplasm. I  
n some sense, the tRNA's contain the essence of the outer message of t  
he DNA, since they are the keys to the process of translation. But t  
hey themselves came from the DNA. Thus, the outer message is trying to b  
e part of the inner message, in a way reminiscent of the message-in-  
a-bottle which tells what language it is written in. Naturally, no such attempt can b  
e totally successful: there is no way for the DNA to hoist itself by its own  
bootstraps. Some amount of knowledge of the Genetic Code must a  
lready be present in the cell beforehand, to allow the manufacture of t  
hose enzymes which transcribe tRNA's themselves off of the master copy o  
f DNA. And this knowledge resides in previously manufactured t  
RNA molecules. This attempt to obviate the need for any outer message at all i  
s like the Escher dragon, who tries as hard as he can, within the context o  
f the two-dimensional world to which he is constrained, to be threedimensional. He seems to go a long way-but of course he never makes i  
t, despite the fine imitation he gives of t  
hree-dimensi
 
567
 
Punctuation and the Reading F  
ram
 
567
 
How does a ribosome know when a protein is done? Just as in Typogenetics, there is a signal inside the mRN A which indicates the termination o  
r initiation of a protein. In fact, three special codons-UAA, UAG, VGAact as punctuation marks instead of coding for amino acids. Whenever s  
uch a triplet clicks its way into the "reading head" of a ribosome, the r  
ibosome releases the protein under construction and begins a new o
 
567
 
Recently, the entire genome of the tiniest known virus, cpX 174, h  
as been laid bare. One most unexpected discovery was made en route: some   
of its nine genes overlap-that is, two distinct proteins are coded for by the same   
stretch of DNA! There is even one gene contained entirely inside a
 
568
 
This is accomplished by having the reading frames of the two genes shift  
ed relative to each other, by exactly one unit. The density of informationpacking in such a scheme is incredible. This is, of course, the i  
nspiration behind the strange "5/17 haiku" in Achilles' fortune cookie, in the Canon b  
y Intervallic Augmentatio
 
568
 
R  
eca
 
568
 
In brief, then, this picture emerges: from its central throne, DNA sends o  
ff long strands of messenger RNA to the ribosomes in the cytoplasm; and t  
he ribosomes, making use of the "flashcards" of tRNA hovering about t  
hem, efficiently construct proteins, amino acid by amino acid, according to t  
he blueprint contained in the mRNA. Only the primary structure of t  
he proteins is dictated by the DNA; but this is enough, for as they e  
merge from the ribosomes, the proteins "magically" fold up into complex conformations which then have the ability to act as powerful chemical machin  
e
 
568
 
Levels of Structure and Meaning in Proteins and M  
usi
 
568
 
We have been using this image of ribosome as tape recorder, mRNA a  
s tape, and protein as music. It may seem arbitrary, and yet there are s  
ome beautiful parallels. Music is not a mere linear sequence of notes. Our m  
inds perceive pieces of music on a level far higher than that. We chunk n  
otes into phrases, phrases into melodies, melodies into movements, and movements into full pieces. Similarly, proteins only make sense when they act a  
s chunked units. Although a primary structure carries all the information f  
or the tertiary structure to be created, it still "feels" like less, for its potential i  
s only realized when the tertiary structure is actually physically c  
reate
 
568
 
Incidentally, we have been referring only to primary and t  
ertiary structures, and you may well wonder whatever happened to the s  
econdary structure. Indeed, it exists, as does a quaternary structure, as well. T  
he folding-up of a protein occurs at more than one level. Specifically, at s  
ome points along the chain of amino acids, there may be a tendency to form a  
kind of helix, called the alpha helix (not to be confused with the DNA d  
ouble helix). This helical twisting of a protein is on a lower level than its t  
ertiary structure. This level of structure is visible in Figure 95. Quaternary structure can be directly compared with the building of a musical piece out o  
f independent movements, for it involves the assembly of several distinc  
t polypeptides, already in their full-blown tertiary beauty, into a l  
arger structure. The binding of these independent chains is usually acco  
mplished by hydrogen bonds, rather than covalent bonds; this is of course just as with  
pieces of music composed of several movements, which are far less t  
ightly bound to each other than they are internally, but which nevertheless form a  
tight "organic" w
 
568
 
The four levels of primary, secondary, tertiary, and quaternary structure can also be compared to the four levels of the MU-picture (Fig. 60) i  
 
569
 
FIGURE 97. A polyribosome. A s  
ingle strand of mRNA passes through one ribosome after another, like one tape p  
assing through several tape recorders in a row  
. The result is a set of growing proteins i  
n various stages of completion: the analo  
gue to a musical canon produced by t  
he staggered tape recorders. [From A  
.Lehning", Biochemistry.]
 
570
 
the Prelude, Ant Fugue. The global structure-consisting of the letters 'M'   
and 'U'-is its quaternary structure; then each of those two parts has a t  
ertiary structure, consisting of "HOLISM" or "REDUCTIONISM"; and then the  
opposite word exists on the secondary level, and at bottom, the prim  
ary structure is once again the word "MU", over and over agai  
 
570
 
Polyribosomes and Two-Tiered C  
anon
 
570
 
Now we come to another lovely parallel between tape recorders t  
ranslating tape into music and ribosomes translating mRNA into proteins. Imagine a  
collection of many tape recorders, arranged in a row, evenly spaced. We  
might call this array a "polyrecorder". Now imagine a single tape p  
assing serially through the playing heads of all the component recorders. If the  
tape contains a single long melody, then the output will be a many-voiced  
canon, of course, with the delay determined by the time it takes the tape t  
o get from one tape recorder to the next. In cells, such "molecular c  
anons" do indeed exist, where many ribosomes, spaced out in long lines-formi  
ng what is called a polyribosome-all "play" the same strand of mRNA, producing identical proteins, staggered in time (see Fig. 9
 
570
 
Not only this, but nature goes one better. Recall that mRNA is made b  
y transcription off of DNA; the enzymes which are responsible for t  
his process are called RNA polymerases ("-ase" is a general suffix for enzymes). I  
t happens often that a series of RNA polymerases will be at work in parallel  
on a single strand of DNA, with the result that many separate (but identical) strands of mRNA are being produced, each delayed with respect to the  
other by the time required for the DNA to slide from one RNA polymerase  
to the next. At the same time, there can be several different ribosomes  
working on each of the parallel emerging mRNA's. Thus one arrives at a  
double-decker, or two-tiered, "molecular canon" (Fig. 98). The corresponding image in music is a rather fanciful but amusing scenario: sever  
a
 
570
 
FIGURE 98. Here, an even more complex scheme. Not just one but several strands o  
f mRNA, all emerging by transcription from a single strand of DNA, are acted upon fry  
polyribosomes. The result is a two-tiered molecular canon. [From Hanawalt and Haynes, T  
he Chemical Basis of Life, p. 271
 
571
 
different copyists are all at work simultaneously, each one of them c  
opying the same original manuscript from a clef which flutists cannot read into a  
clef which they can read. As each copyist finishes a page of the o  
riginal manuscript, he passes it on to the next copyist, and starts transcribing a n  
ew page himself. Meanwhile, from each score emerging from the pens of t  
he copyists, a set of flutists are reading and tooting the melody, each f  
lutist delayed with respect to the others who are reading from the same s  
heet. This rather wild image gives, perhaps, an idea of some of the complexity o  
f the processes which are going on in each and every cell of your body d  
uring every second of every day .
 
571
 
Which Came First-The Ribosome or the Protein
 
571
 
We have been talking about these wonderful beasts called ribosomes; b  
ut what are they themselves composed of? How are they made? Ribosomes a  
re composed of two types of things: I 1) various kinds of proteins, and (  
2) another kind of RNA, called ribosomal RNA (rRNA). Thus, in order for a  
ribosome to be made, certain kinds of proteins must be present, and r  
RNA must be present. Of course, for proteins to be present, ribosomes must be   
there to make them. So how do you get around the vicious circle? W  
hich comes first-the ribosome or the protein? Which makes which? Of c  
ourse there is no answer because one always traces things back to p  
revious members of the same class-just as with the chicken-and-th  
e-egg question-until everything vanishes over the horizon of time. In any c  
ase, ribosomes are made of two pieces, a large and a small one, each of w  
hich contains some rRNA and some proteins. Ribosomes are about the size o  
f large proteins; they are much much smaller than the strands of m  
RNA which they take as input, and along which they m  
ov
 
571
 
Protein Funct  
io
 
571
 
We have spoken somewhat of the structure of proteins-specifically enzymes; but we have not really mentioned the kinds of tasks which they   
perform in the cell, nor how they do them. All enzymes are catalysts, w  
hich means that in a certain sense, they do no more than selectively a  
ccelerate various chemical processes in the cell, rather than make things h  
appen which without them never could happen. An enzyme realizes certain pathways out of the myriad myriad potentialities. Therefore, in choosing w  
hich enzymes shall be present, you choose what shall happen and what shall n  
ot happen-despite the fact that, theoretically speaking, there is a n  
onzero probability for any cellular process to happen spontaneously, without t  
he aid of catalys
 
571
 
Now how do enzymes act upon the molecules of the cell? As has b  
een mentioned, enzymes are folded-up polypeptide chains. In every enzyme  
, there is a cleft or pocket or some other clearly-defined surface f  
eature where the enzyme binds to some other kind of molecule. This location i  
 
572
 
called its active site, and any molecule which gets bound there is called a  
substrate. Enzymes may have more than one active site, and more than o  
ne substrate. Just as in Typogenetics, enzymes are indeed very choosy a  
bout what they will operate upon. The active site usually is quite specific, a  
nd allows just one kind of molecule to bind to it, although there are s  
ometimes "decoys"-other molecules which can fit in the active site and clog it up,   
fooling the enzyme and in fact rendering it i
 
572
 
Once an enzyme and its substrate are bound together, there is s  
ome disequilibrium of electric charge, and consequently charge-in the form o  
f electrons and protons-flows around the bound molecules and readjusts   
itself. By the time equilibrium has been reached, some rather p  
rofound chemical changes may have occurred to the substrate. Some examples are   
these: there may have been a "welding", in which some standard s  
mall molecule got tacked onto a nucleotide, amino acid, or other c  
ommon cellular molecule; a DNA strand may have been "nicked" at a p  
articular location; some piece of a molecule may have gotten lopped off; and s  
o forth. In fact, bio-enzymes do operations on molecules which are q  
uite similar to the typographical operations which Typo-enzymes p  
erform. However, most enzymes perform essentially only a single task, rather t  
han a sequence of tasks. There is one other striking difference between Typoenzymes and bio-enzymes, which is this: whereas Typo-enzymes o  
perate only on strands, bio-enzymes can act on DNA, RNA, other proteins, ribosomes, cell membranes-in short, on anything and everything in the cell. I  
n other words, enzymes are the universal mechanisms for getting things d  
one in the cell. There are enzymes which stick things together and take t  
hem apart and modify them and activate them and deactivate them and c  
opy them and repair them and destroy them .
 
572
 
Some of the most complex processes in the cell involve "cascades" i  
n which a single molecule of some type triggers the production of a c  
ertain kind of enzyme; the manufacturing process begins and the enzymes w  
hich come off the "assembly line" open up a new chemical pathway which a  
llows a second kind of enzyme to be produced. This kind of thing can go on f  
or three or four levels, each newly produced type of enzyme triggering t  
he production of another type. In the end a "shower" of copies of the fi  
nal type of enzyme is produced, and all of the copies go off and do t  
heir specialized thing, which may be to chop up some "foreign" DNA, or to h  
elp make some amino acid for which the cell is very "thirsty", or w  
hateve
 
572
 
Need for a Sufficiently Strong Support Syst  
e
 
572
 
Let us describe nature's solution to the puzzle posed for T  
ypogenetics: "What kind of strand of DNA can direct its own replication?" Certainly not   
every strand of DNA is inherently a self-rep. The key point is this: a  
ny strand which wishes to direct its own copying must contain directions f  
or assembling precisely those enzymes which can carry out the task. Now it is   
futile to hope that a strand of DNA in isolation could be a self-rep; for i  
 
573
 
order for those potential proteins to be pulled out of the DNA, there m  
ust not only be ribosomes, but also RNA polymerase, which makes the m  
RNA that gets transported to the ribosomes. And so we have to begin by assuming a kind of "minimal support system" just sufficiently strong that it a  
llows transcription and translation to be carried out. This minimal s  
upport system will thus consist in (1) some proteins, such as RNA p  
olymerase, which allow mRNA to be made from DNA, and (2) some r  
ibosome
 
573
 
How DNA Self-Replica  
te
 
573
 
It is not by any means coincidental that the phrases "sufficiently s  
trong support system" and "sufficiently powerful formal system" sound alik  
e. One is the precondition for a self-rep to arise, the other for a self-ref t  
o arise. In fact there is in essence only one phenomenon going on in two v  
ery different guises, and we shall explicitly map this out shortly. But before w  
e do so, let us finish the description of how a strand of DNA can be a s  
el
 
573
 
The DNA must contain the codes for a set of proteins which will c  
opy it. Now there is a very efficient and elegant way to copy a double-strande  
d piece of DNA, whose two strands are complementary. This involves t  
wo s  
teps: ( 1) unravel the two strands from each othe  
r; (2) "mate" a new strand to each of the two new single s  
trands. This process will create two new double strands of DNA, each identical t  
o the original one. Now if our solution is to be based on this idea, it m  
ust involve a set of proteins, coded for in the DNA itself, which will carry out   
these two s
 
573
 
It is believed that in cells, these two steps are performed together in a  
coordinated way, and that they require three principal enzymes: D  
NA endonuclease, DNA polymerase, and DNA ligase. The first is an "unzipping enzyme": it peels the two original strands apart for a short d  
istance, and then stops. Then the other two enzymes come into the picture. T  
he DNA polymerase is basically a copy-and-move enzyme: it chugs down t  
he short single strands of DNA, copying them complementarily in a fas  
hion reminiscent of the Copy mode in Typogenetics. In order to copy, it d  
raws on raw materials-specifically nucleotides-which are floating about in t  
he cytoplasm. Because the action proceeds in fits and starts, with some unzipping and some copying each time, some short gaps are created, and t  
he DNA ligase is what plugs them up. The process is repeated over and o  
ver again. This precision three-enzyme machine proceeds in careful fashion a  
ll the way down the length of the DNA molecule, until the whole thing h  
as been peeled apart and simultaneously replicated, so that there are now t  
wo copies of i
 
574
 
Comparison of DNA's Self-Rep Method with Q  
uinin
 
574
 
Note that in the enzymatic action on the DNA strands, the fact t  
hat information is stored in the DNA is just plain irrelevant; the enzymes are  
merely carrying out their symbol-shunting functions, just like rules o  
f inference in the MIU-system. It is of no interest to the three enzymes t  
hat at some point they are actually copying the very genes which coded f  
or them. The DNA, to them, is just a template without meaning or i  
nteres
 
574
 
It is quite interesting to compare this with the Quine sentence  
's method of describing how to construct a copy of itself. There, too, one h  
as a sort of "double strand"-two copies of the same information, where o  
ne copy acts as instructions, the other as template. In DNA, the process is   
vaguely parallel, since the three enzymes (DNA endonuclease, D  
NA polymerase, DNA ligase) are coded for in just one of the two strands, w  
hich therefore acts as program, while the other strand is merely a template. T  
he parallel is not perfect, for when the copying is carried out, both strands a  
re used as template, not just one. Nevertheless, the analogy is highly suggestive. There is a biochemical analogue to the use-mention dichotomy: w  
hen DNA is treated as a mere sequence of chemicals to be copied, it is l  
ike mention of typographical symbols; when DNA is dictating what o  
perations shall be carried out, it is like use of typographical s  
ymbol
 
574
 
Levels of Meaning of D  
N
 
574
 
There are several levels of meaning which can be read from a strand o  
f DNA, depending on how big the chunks are which you look at, and h  
ow powerful a decoder you use. On the lowest level, each DNA strand c  
odes for an equivalent RNA strand-the process of decoding being transcrip  
tion. If one chunks the DNA into triplets, then by using a "genetic decoder", o  
ne can read the DNA as a sequence of amino acids. This is translation (on top o  
f transcription). On the next natural level of the hierarchy, DNA is r  
eadable as a code for a set of proteins. The physical pulling-out of proteins f  
rom genes is called gene expression. Currently, this is the highest level at which we   
understand what DNA m  
ean
 
574
 
However, there are certain to be higher levels of DNA meaning w  
hich are harder to discern. For instance, there is every reason to believe that t  
he DNA of, say, a human being codes for such features as nose shape, m  
usic talent, quickness of reflexes, and so on. Could one, in principle, learn t  
o read off such pieces of information directly from a strand of DNA, w  
ithout going through the actual physical process of epigenesis-the phys  
ical pulling-out of phenotype from genotype? Presumably, yes, since-in  
theory-one could have an incredibly powerful computer p  
rogram simulating the entire process, including every cell, every protein, every t  
iny feature involved in the replication of DNA, of cells, to the bitter end. T  
he output of such apseudo-epigenesis program would be a high-level d  
escription of the phenotype
 
575
 
There is another (extremely faint) possibility: that we could learn t  
o read the phenotype off of the genotype without doing an isomorphic simulation of the physical process of epigenesis, but by finding some simpler  
sort of decoding mechanism. This could be called "shortcut pseudoepigenesis". Whether shortcut or not, pseudo-epigenesis is, of course, totally beyond reach at the present time-with one notable exception: in t  
he species Felis catus, deep probing has revealed that it is indeed possible t  
o read the phenotype directly off of the genotype. The reader will perh  
aps better appreciate this remarkable fact after directly examining the following typical section of the DNA of Felis c  
atus: ... CA TCATCATCA TCATCA TCA TCA TCA TCA T .
 
575
 
Below is shown a summary of the levels of DNA-readability, together  
with the names of the different levels of decoding. DNA can be read as a   
sequence of:   
(  
1) (  
2) (  
3) (4)  
(  
5) (  
6) ( N-1  
) (N  
) bases (nucleotides) . ....... ................ transcrip  
tion amino acids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . trans  
lation proteïżœns (priïżœary structure) } . ..... . ..... gene expression  
protems (tertiary structure  
) protein clusters .... ‱ ‱ . ‱ . . ‱ . higher levels of gene expressi  
on ?  
?? } . ................ unknown levels of DNA meanin  
g ?  
?? physical, mental, a  
nd psychological traits . ...... . ............ pseudo-epigen  
esi
 
575
 
The Central D  
ogma
 
575
 
th this background, now we are in a pos1t1on to draw an elabora  
te comparison between F. Crick's "Central Dogma of Molecular B  
iology" (.DOGMA I) upon which all cellular processes are based; and what I, wit  
h poetic license, call the "Central Dogma of Mathematical Logic" (  
.DOGMA II), upon which Godel's Theorem is based. The mapping from one o  
nto the other is laid out in Figure 99 and the following chart, which together  
constitute the Central D
 
575
 
FIGURE 99. The Central Dogmap. An analogy is established between two fun  
damental Tang/,ed Hierarchies: that of molecular biology and that of mathematical log
 
577
 
Note the base-pairing of A and T (Arithmetization and Translatio  
n), as well as of G and C (Godel and Crick). Mathematical logic gets the p  
urine side, and molecular biology gets the pyrimidine s  
id
 
577
 
To complete the esthetic side of this mapping, I chose to model m  
y Godel-numbering scheme on the Genetic Code absolutely faithfully. I  
n fact, under the following correspondence, the table of the Genetic C  
ode becomes the table of the Godel C  
ode: (odd) 1 Âą  
:ïżœ (even) 2 Âą  
:ïżœ (odd) 3 Âą  
:ïżœ (even) 6 Âą:ïżœÂ   
A  
C  
G  
u  
(  
purine) (pyr  
imidine) (  
purine) (pyr  
imidine) Each amino acid-of which there are twenty-corresponds to exactly o  
ne symbol of TNT-of which there are twenty. Thus, at last, my motive f  
or concocting "austere TNT" comes out-so that there would be exactly   
twenty symbols! The Godel Code is shown in Figure 100. Compare it w  
ith the Genetic Code (Fig. 9
 
577
 
There is something almost mystical in seeing the deep sharing of suc  
h an abstract structure by these two esoteric, yet fundamental, advances i  
n knowledge achieved in our century. This Central Dogmap is by no means a  
rigorous proof of identity of the two theories; but it clearly shows a profound kinship, which is worth deeper e
 
577
 
Strange Loops in the Central D  
ogma
 
577
 
One of the more interesting similarities between the two sides of the map i  
s the way in which "loops" of arbit rary complexity arise on the top level o  
f both: on the left, proteins which act on proteins which act on proteins a  
nd so on, ad infinitum; and on the right, statements about statements a  
bout statements of meta-TNT and so on, ad infinitum. These are like heterarchies, which we discussed in Chapter V, where a sufficiently c  
omplex substratum allows high-level Strange Loops to occur and to cycle a  
round, totally sealed off from lower levels. We will explore this idea in g  
reater detail in Chapter X
 
577
 
Incidentally, you may be wondering about this question: "What, according to the Central Dogmap, is Gc>del's Incompleteness Theorem i  
tself mapped onto?" This is a good question to think about before r  
eading a
 
577
 
The Central Dogmap and the Contracrostipun  
ctu
 
577
 
It turns out that the Central Dogmap is quite similar to the mapping t  
hat was laid out in Chapter IV between the Contracrostipunctus and God  
el's Theorem. One can therefore draw parallels between all three systems:
 
578
 
FIGURE JOO. The Godel Code. Under this Godel-numbering scheme, each TNT s  
ymbol gets one or more codons. The small ovals show how this table subsumes the earlier G  
ode/numbering table of Chapter I  
 
578
 
( 1) formal systems and s  
trings; ( 2)cells and strands of D  
NA;( 3)record players and r  
ecords.In the following chart, the mapping between systems 2 and 3 is e  
xplained careful  
l
 
579
 
The analogue of Godel's Theorem is seen to be a peculiar fact, proba-bly little useful to molecular biologists (to whom it is lik ely quite obviou  
s): It is always possible to design a strand of DNA which, if inj  
ected into a cell, would, upon being transcribed, cause such proteins t  
o be manufactured as would destroy the cell (or the DNA), and t  
hus result in the non-reproduction of that D  
NA. This conjures up a somewhat droll scenario, at least if taken in light of   
evolution: an invading species of virus enters a cell by some s  
urreptitiou
 
580
 
FIGURE I OI, The T4 bacterial virus is an assembly of protein components (a). The "  
head" is a protein membrane, shaped like a kind of prolate icosahedron with thirty face ts and fi  
lled with DNA. It is attached by a neck to a tail consisting of a hollow core surrounded lry a   
contractile sheath and based on a spiked end plate to which six fibers are attached. The spi  
kes and fibers affix the virus to a bacterial cell wall (b), The sheath contracts, driving the c  
ore through the wall, and viral DNA enters the cell. [From Hanawalt and Haynes, The C  
hemical Basis of Life, p, 2 30.]
 
580
 
means, and then carefully ensures the manufacture of proteins which w  
ill have the effect of destroying the virus itself! It is a sort of suicid  
e-or Epimenides sentence, if you will-on the molecular level. Obviously i  
t would not prove advantageous from the point of view of survival of t  
he species. However, it demonstrates the spirit, if not the letter, of the   
mechanisms of protection and subversion which cells and their inva  
ders have developed.
 
580
 
E. Coli vs. T 4
 
580
 
Let us consider the biologists' favorite cell, that of the bacterium E  
scherichia coli (no relation to M. C. Escher), and one of their favorite invaders of t  
hat cell: the sinister and eerie T4 phage, pictures of which you can see in F  
igure 101. (Incidentally, the words "phage" and "virus" are synonymous a  
nd mean "attacker of bacterial cells".) The weird tidbit looks like a little like a  
cross between a LEM (Lunar Excursion Module) and a mosquito-and it is   
much more sinister than the latter. It has a "head" wherein is stored all its   
"knowledge"-namely its DNA; and it has six "legs" wherewith to fa  
sten itself to the cell it has chosen to invade; and it has a "stinging tube" (  
more properly called its "tail") like a mosquito. The major difference is t  
hat unlike a mosquito, which uses its stinger for sucking blood, the T4 p  
hage uses its stinger for injecting its hereditary substance into the cell against the   
will of its victim. Thus the phage commits "rape" on a tiny s  
cal
 
581
 
FIGURE 102. Viral infection begins when ,1iral DNA enters a bacterium. Bacterial DNA i  
s disrupted and viral DNA replicated. Synthesis of viral structural proteins and their assembly  
into virus continues until the cell bursts, rl'leasmg particles. [From Hanawalt and Haynes, T  
he Chemical Basis of Life, p. 2 30.]
 
581
 
A Molecular Trojan Hors  
 
581
 
What actually happens when the viral DNA enters a cell? The v  
irus "hopes", to speak anthropomorphically, that its DNA will get exactly t  
he same treatment as the DNA of the host cell. This would mean g  
etting transcribed and translated, th us allowing it to direct the synthesis of its own  
special proteins, alien to the host cell, which will then begin to do their  
thing. This amounts to secretly transporting alien proteins "in code" (viz.,  
the Genetic Code) into the cell, and then "decoding" (viz., producing)  
them. In a way this resembles the story of the Trojan horse, according t  
o which hundreds of soldiers were sneaked into Troy inside a harmlessseeming giant wooden horse; but o nce inside the city, they broke loose a  
nd captured it. The alien proteins, once they have been "decoded" (synthesized) from their carrier DNA, now jump into action. The sequence o  
f actions directed by the T4 phage has been carefully studied, and is more o  
r less as follows (see also Figs. 102 and 103  
): 5  
38 Time e  
lapsed 0 m  
in. 1 mi  
n. 5 m  
in. 8 m  
in. Action taking p  
lace Injection of viral D NA.  
Breakdown of host DNA. Cessation of production of native proteins and initiation of production of alien (T 4) proteins. Among the e  
arliest produced proteins are those which direct t  
he replication of the alien (T4) DNA  
. Replication of viral DNA b  
egins. Initiation of production of structural p  
roteins which will form the "bodies" of new p  
hage
 
582
 
FIGURE 103. The morphogenetic pathway of the T4 virus has three princip  
al branches leading independently to the formation of heads, tails, and tail fibers, w  
hich then combine to form complete virus particles. [From Hanawalt and Haynes, T  
he Chemical Basis of Life, p. 237.]
 
583
 
First complete replica of T4 invader is produc  
ed. Lysozyme (a protein) attacks host cell w  
all, breaking open the bacterium, and the "bicentu plets" emerge.  
Thus, when a T4 phage invades an E. coli cell, after the brief span of a  
bout twenty-four or twenty-five minutes, the cell has been completely s  
ubverted, and breaks open. Out pop about two hundred exact copies of the o  
riginal virus-"bicentuplets"-ready to go attack more bacterial cells, the o  
riginal cell having been largely consumed in the process
 
583
 
Although from a bacterium's point of view this kind of thing is a  
deadly serious menace, from our large-scale vantage point it can be l  
ooked upon as an amusing game between two players: the invader, or "T" p  
layer (named after the T-even class of phages, including the T2, T4, and o  
thers), and the "C" player (standing for "Cell"). The objective of the T player is t  
o invade and take over the cell of the C player from within, for the purpo  
se of reproducing itself. The objective of the C player is to protect itself a  
nd destroy the invader. When described this way, the molecular TC-game c  
an be seen to be quite parallel to the macroscopic TC-game described in t  
he preceding Dialogue. (The reader can doubtless figure out which player-  
T or C--corresponds to the Tortoise, and which to the C  
ra
 
583
 
Recognition, Disguises, La  
belin
 
583
 
This "game" emphasizes the fact that recognition is one of the central t  
hemes of cellular and subcellular biology. How do molecules (or h  
igher-level structures) recognize each other? It is essential for the functioning o  
f enzymes that they should be able to latch onto special "binding sites" o  
n their substrates; it is essential that a bacterium should be able to d  
istinguish its own DNA from that of phages; it is essential that two cells should be a  
ble to recognize each other and interact in a controlled way. Such rec  
ognition problems may remind you of the original, key problem about fo  
rmal systems: How can you tell if a string has, or does not have, some p  
roperty such as theoremhood? Is there a decision procedure? This kind of q  
uestion is not restricted to mathematical logic: it permeates computer science a  
nd, as we are seeing, molecular b
 
583
 
The labeling technique described in the Dialogue is in fact one of E  
. coli's tricks for outwitting its phage invaders. The idea is that strands o  
f DNA can be chemically labeled by tacking on a small molecule-  
methyl-to various nucleotides. Now this labeling operation does not change the u  
sual biological properties of the DNA; in other words, methylated (la  
beled) DNA can be transcribed just as well as unmethylated (unlabeled) DNA, a  
nd so it can direct the synthesis of proteins. But if the host cell has some s  
pecia
 
584
 
mechanisms for examining whether DNA is labeled or not, then the label  
may make all the difference in the world. In particular, the host cell m  
ay have an enzyme system -which looks for unlabeled DNA, and destroys a  
ny that it finds by unmercifully chopping it to pieces. In that case, woe to a  
ll unlabeled inva  
der
 
584
 
The methyl labels on the nucleotides have been compared to serifs o  
n letters. Thus, using this metaphor, we could say that the E. coli cell i  
s looking for DNA written in its "home script", with its own p  
articular typeface-and will chop up any strand of DNA written in an "  
alien" typeface. One counterstrategy, of course, is for phages to learn to label  
themselves, and thereby become able to fool the cells which they a  
re invading into reproducing t
 
584
 
This TC-battle can continue to arbitrary levels of complexity, but w  
e shall not pursue it further. The essential fact is that it is a battle between a  
host which is trying to reject all invading DNA, and a phage which is tryin  
g to infiltrate its DNA into some host which will transcribe it into m  
RNA (after which its reproduction is guaranteed). Any phage DNA which succeeds in getting itself reproduced this way can be thought of as having t  
his high-level interpretation: "I Can Be Reproduced in'Cells of Type X". This  
is to be distinguished from the evolutionarily pointless kind of p  
hage mentioned earlier, which codes for proteins that destroy it, and w  
hose high-level interpretation is the self-defeating sentence: "I Cannot Be Reproduced in Cells of Type X
 
584
 
Henkin Sentences and Vi  
ruse
 
584
 
Now both of these contrasting types of self-reference in molecular b  
iology have their counterparts in mathematical logic. We have already d  
iscussed the analogue of the self-defeating phages-namely, strings of the G  
odel type, which assert their own unproducibility within specific formal s  
ystems. But one can also make a counterpart sentence to a real phage: the p  
hage asserts its own producibility in a specific cell, and the sentence asserts its   
own producibility in a specific formal system. Sentences of this type a  
re called Henkin sentences, after the mathematical logician Leon Henkin. T  
hey can be constructed exactly along the lines of Godel sentences, the o  
nly difference being the omission of a negation. One begins with an "uncle", o  
f c  
ourse: 3a:3a':<TNT-PROOF-PA1R{a,a'}AAR1THMOQU1NE{a'',  
a'}> and then proceeds by the standard trick. Say the Godel number of t  
he above "uncle" is h. Now by arithmoquining this very uncle, you get a  
Henkin sente  
nce: 3a:3a':<TNT-PROOF-PA1R{a,a'}AAR1THMOQU1NE{SSS ... SSS0/a",  
a'}> ---------h S's
 
585
 
(By the way, can you spot how this sentence differs from ~G?) The reason I   
show it explicitly is to point out that a Henkin sentence does not give a f  
ull recipe for its own derivation; it just asserts that there exists one. You m  
ight well wonder whether its claim is justified. Do Henkin sentences i  
ndeed possess derivations? Are they, as thev claim, theorems? It is useful to r  
ecall that one need not believe a politician who says, "I am honest"-he may b  
e honest, and yet he may not be. Are Henkin sentences any more t  
rustworthy than politicians? Or do Henkin sentences, like politicians, lie in c  
ast-iron s  
inks? It turns out that these Henkin sentences are invariably truth tell  
ers.Why this is so is not obvious; but we will accept this curious fact w  
ithout p
 
585
 
Implicit vs. Explicit Henkin Sen  
tence
 
585
 
I mentioned that a Henkin sentence tells nothing about its own deriv  
ation; it just asserts that one exists. Now it is possible to invent a variation on t  
he theme of Henkin sentences-namely sentences which explicitly describe t  
heir own derivations. Such a sentence's high-level interpretation would not b  
e "Some Sequence of Strings Exists Which is a Derivation of Me", but r  
ather, "The Herein-described Sequence of Strings ..... Is a Derivation of M  
e". Let us call the first type of sentence an implicit Henkin sentence. The n  
ew sentences will be called explicit Henkin sentences, since they explicitly describe their own derivations. Note that, unlike their implicit b  
rethren, explicit Henkin sentences need not be theorems. In fact, it is quite easy to wri  
te a string which asserts that its own derivation consists of the single s  
tring 0=0-a false statement, since 0=0 is not a derivation of anything. However, it is also possible to write an explicit Henkin sentence which is a  
theorem-that is, a sentence which in fact gives a recipe for its own derivation.
 
585
 
Henkin Sentences and Self-Assembly
 
585
 
The reason I bring up this distinction between explicit and implicit H  
enkin sentences is that it corresponds very nicely to a significant d  
istinction between types of virus. There are certain viruses, such as the so-ca  
lled "tobacco mosaic virus", which are called self-assembling viruses; and t  
hen there are others, such as our favorite T-evens, which are non-self-assemb  
ling.Now what is this distinction? It is a direct analogue to the d  
istinctionbetween implicit and explicit Henkin s
 
585
 
The DNA of a self-assembling ,irus codes only for the parts of a n  
ew virus, but not for any enzymes. Once the parts are produced, the s  
neaky virus relies upon them to link up to each other without help from a  
ny enzymes. Such a process depends on chemical affinities which the p  
arts have for each other, when swimming in the rich chemical brew of a c  
ell. Not only viruses, but also some organelles-such as ribosomes-a  
ssembl
 
586
 
themselves. Sometimes, enzymes may be needed-but in such cases, t  
hey are recruited from the host cell, and enslaved. This is what is meant b  
y self-assembly.
 
586
 
By contrast, the DNA of more complex viruses, such as the T  
-evens, codes not only for the parts, but in addition for various enzymes which p  
lay special roles in the assembly of the parts into wholes. Since the a  
ssembly process is not spontaneous but requires "machines", such viruses are n  
ot considered to be self-assembling. The essence of the distinction, t  
hen, between self-assembling units and non-self-assembling units is that t  
he former get away with self-reproduction without telling the cell a  
nything about their construction, while the latter need to give instructions as to h  
ow to assemble t  
hemselve
 
586
 
Now the parallel to Henkin sentences, implicit and explicit, ought to b  
e quite clear. Implicit Henkin sentences are self-proving but do not t  
ell anything at all about their proofs-they are analogous to self-  
assembling viruses; explicit Henkin sentences direct the construction of their o  
wn proofs-they are analogous to more complex viruses which direct t  
heir host cells in putting copies of themselves toget  
he
 
586
 
The concept of self-assembling biological structures as complex a  
s viruses raises the possibility of complex self-assembling machines as well.   
Imagine a set of parts which, when placed in the proper suppo  
rting environment, spontaneously group themselves in such a way as to form a  
complex machine. It seems unlikely, yet this is quite an accurate way t  
o describe the process of the tobacco mosaic virus' method of selfreproduction via self-assembly. The information for the total c  
onformation of the organism (or machine) is spread about in its parts; it is not concentrated in some single p  
lac
 
586
 
Now this concept can lead in some strange directions, as was shown i  
n the Edifying Thoughts of a Tobacco Smoker. There, we saw how the Crab u  
sed the idea that information for self-assembly can be distributed around,   
instead of being concentrated in a single place. His hope was that t  
his would prevent his new phonographs from succumbing to the Tortoise's   
phonograph-crashing method. Unfortunatel y,just as with the most sophisticated axiom schemata, once the system is all built and packaged into a  
box, its well-definedness renders it vulnerable to a sufficiently c  
lever "Godelizer"; and that was the sad tale related by the Crab. Despite i  
ts apparent absurdity, the fantastic scenario of that Dialogue is not so fa  
r from reality, in the strange, surreal world of the c  
el
 
586
 
Two Outstanding P  
roblems: Differentiation and M  
orphogenesi
 
586
 
Now self-assembly may be the trick whereby certain subunits of cells a  
re constructed, and certain viruses-but what of the most complex macroscopic structures, such as the body of an elephant or a spider, or the s  
hape of a Venus's-flytrap? How are homing instincts built into the brain of a
 
587
 
bird, or hunting instincts into the brain of a dog? In short, how is it t  
hat merely by dictating which proteins are to be produced in cells, DNA exercises such spectacularly precise control over the exact structure and function of macroscopic living objects? There are two major distinct p  
roblems here. One is that of cellular differentiation: how do different cells, s  
haring exactly the same DNA, perform different roles-such as a kidney cell, a  
bone marrow cell, and a brain cell? The other is that of morphogenesis (  
"birth of form"): how does intercellular communication on a local level give rise t  
o large-scale, global structures and organizations-such as the various o  
rgans of the body, the shape of the face, the suborgans of the brain, and so o  
n? Although both cellular differentiation and morphogenesis are poorly understood at present, the trick appears to reside in exquisitely fine-tune  
d feedback and "feedforward" mechanisms within cells and between cells,  
which tell a cell when to "turn on" and when to "turn off" production o  
f various proteins.
 
587
 
Feedback and Feedfo  
rwar
 
587
 
Feedback takes place when there is too much or too little of some desir  
ed substance in the cell; then the cell must somehow regulate the p  
roduction line which is assembling that substance. Feedforward also involves t  
he regulation of an assembly line, but not according to the amount of e  
nd product present; rather, according to the amount of some precursor of t  
he end product of that assembly line. There are two major devices for a  
chievïżœ ing negative feedforward or feedback. One way is to prevent the relevant  
enzymes from being able to perform--that is, to "clog up" their active sites.  
This is called inhi bition. The other way is to prevent the relevant e  
nzymes from ever being manufactured! This is called repression. Conceptually  
, inhibition is simple: you just block up the active site of the first enzyme i  
n the assembly line, and the whole process of synthesis gets stopped dead.
 
587
 
Repressors and Induc  
er
 
587
 
Repression is trickier. How does a cell stop a gene from being e  
xpressed? The answer is, it prevents it from ever getting transcribed. This means that   
it has to prevent RNA polymerase from doing its job. This can be accomplishec\ by placing a huge obstacle in its path, along the DNA, p  
recisely in front of that gene which the cell wants not to get transcribed. S  
uch obstacles do exist, and are called repressors. They are themselves proteins,   
and they bind to special obstacle-holding sites on the DNA, called (I am not   
sure why) operators. An operator therefore is a site of control for the g  
ene (or genes) which immediately follow it; those genes are called its oper  
on. Because a series of enzymes often act in concert in carrying out a l  
ong chemical transformation, they are often coded for in sequence; and this i  
s why operons often contain several genes, rather than just one. The e  
ffect of the successful repression of an operon is that a whole series of genes i  
 
588
 
prevented from being transcribed, which means that a whole set of r  
elated enzymes remains unsynthe  
size
 
588
 
What about positive feedback and feedforward? Here again, there a  
re two options: ( 1) unclog the clogged enzymes, or (2) stop the repression o  
f the relevant operon. (Notice how nature seems to love d  
ouble-negations! Probably there is some very deep reason for this.) The mechanism by whi  
ch repression is repressed involves a class of molecules called inducers. T  
he role of an inducer is simple: it combines with a repressor protein before t  
he latter has had a chance to bind to an operator on a DNA molecule; t  
he resulting "repressor-inducer complex" is incapable of binding to a  
n operator, and this leaves the door open for the associated operon to b  
e transcribed into mRNA and subsequently translated into protein. Oft  
en the end product or some precursor of the end product can act as a  
n i  
nduce
 
588
 
Feedback and Strange Loops Comp  
are
 
588
 
Incidentally, this is a good time to distinguish between simple kinds o  
f feedback, as in the processes of inhibition and repression, and t  
he looping-back between different informational levels, shown in the Centra  
l Dogmap. Both are "feedback" in some sense; but the latter is much d  
eeper than the former. When an amino acid, such as tryptophan or i  
soleucine, acts as feedback (in the form of an inducer) by binding to its repressor s  
o that more of it gets made, it is not telling how to construct itself; it is j  
ust telling enzymes to make more of it. This could be compared to a r  
adio's volume, which, when fed through a listener's ears, may cause itself to b  
e turned down or up. This is another thing entirely from the case in w  
hich the broadcast itself tells you explicitly to turn your radio on or off, or t  
o tune to another wavelength-or even how to build another radio! T  
he latter is much more like the looping-back between informational levels, f  
or here, information inside the radio signal gets "decoded" and translated i  
nto mental structures. The radio signal is composed of symbolic c  
onstituents whose symbolic meaning matters-a case of use, rather than mention. O  
n the other hand, when the sound is just too loud, the symbols are n  
ot conveying meaning; they are merely being perceived as loud sounds, a  
nd might as well be devoid of meaning-a case of mention, rather than u  
se. This case more resembles the feedback loops by which proteins r  
egulate their own rates of synt
 
588
 
It has been theorized that the difference between two neighl:ioring c  
ells which share the exact same genotype and yet have different functions i  
s that different segments of their genome have been repressed, and therefore they have different working sets of proteins. A hypopothesis like t  
his could account for the phenomenal differences between cells in differ  
ent organs of the body of a human b
 
589
 
Two Simple Examples of Differentiati  
o
 
589
 
The process by which one initial cell replicates over and over, giving rise t  
o a myriad of differentiated cells with specialized functions, can be likened t  
o the spread of a chain letter from person to person, in which each n  
ew participant is asked to propagate the message faithfully, but also to add   
some extra personal touch. Eventually, there will be letters which are  
tremendously different from each o  
the
 
589
 
Another illustration of the ideas of differentiation is provided by t  
his extremely simple computer analogue of a differentiating self-rep. C  
onsider a very short program which is controlled by an up-down switch, and whi  
ch has an internal parameter N-a natural number. This program can run i  
n two modes-the up-mode, and the down-mode. When it runs in the upmode, it self-replicates into an adjacent part of the computer's memoryexcept it makes the internal parameter N of its "daughter" one greater   
than in itself. When it runs in the down-mode, it does not self-rep, b  
ut instead calculates the n  
umber ( -l)N/( 2N + 1  
)and adds it to a running t
 
589
 
Well, suppose that at the beginning, there is one copy of the p  
rogram in memory, N = 0, and the mode is up. Then the program will copy i  
tself next door in memory, with N = 1. Repeating the process, the new program will self-rep next door to itself, with a copy having N = 2. And o  
ver and over again ... What happens is that a very large program is growing  
inside memory. When memory is full, the process quits. Now all of m  
emory can be looked upon as being filled with one big program, composed o  
f many similar, but differentiated, modules-or "cells". Now suppose w  
e switch the mode to down, and run this big program. What happens? T  
he first "cell" runs, and calculates 1 /1. The second "cell" runs, c  
alculating -1/ 3, and adding it to the previous result. The third "cell" runs, c  
alculating+ 1/5 and adding it on ... The end result is that the whole "organism"-  
thebig program-calculates the s  
um1 -1/3 +1/5 -1/7 +1/9 -· 1 /11 +1/13 -1/15 +  
to a large number of terms (as many terms as· "cells" can fit inside memory).  
And since this series converges (albeit slowly) to 'TT'/4, we have a "phenoty  
pe" whose function is to calculate the value of a famous mathematical constant
 
589
 
Level Mixing in the C  
el
 
589
 
I hope that the descriptions of processes such as labeling, self-a  
ssembly, differentiation, morphogenesis, as well as transcription and tra  
nslation, have helped to convey some notion of the immensely complex s  
ystem which is a cell-an information-processing system with some s  
trikingl
 
590
 
novel features. We have seen, in the Central Dogmap, that although we c  
an try to draw a clear line between program and data, the distinction i  
s somewhat arbitrary. Carrying this line of thought further, we find that not   
only are program and data intricately woven together, but also the interp  
reter of programs, the physical processor, and even the language are included i  
n this intimate fusion. Therefore, although it is possible (to some extent) t  
o draw boundaries and separate out the levels, it is just as import  
ant-and just as fascinating-to recognize the level-crossings and mixings. Illustrative of this is the amazing fact that in biological systems, all the v  
arious features necessary for self-rep (viz., language, program, data, i  
nterpreter, and processor) cooperate to such a degree that all of them are replicated   
simultaneously-which shows how much deeper is biological self-rep'ing   
than anything yet devised along those lines by humans. For instance, t  
he self-rep program exhibited at the beginning of this Chapter takes f  
or granted the pre-existence of three external aspects: a language, an interpr'eter, and a processor, and does not replicate t
 
590
 
Let us try to summarize various ways in which the subunits of a cell c  
an be classified in computer science terms. First, let us take DNA. Since D  
NA contains all the information for construction of protein&, which are the   
active agents of the cell, DNA can be viewed as a program written in a  
higher-level language, which is subsequently translated (or i  
nterpreted) into the "machine language" of the cell (proteins). On the other hand, DNA   
is itself a passive molecule which undergoes manipulation at the hands o  
f various kinds of enzymes; in this sense, a DNA molecule is exactly like a  
long piece of data, as well. Thirdly, DNA contains the templates off o  
f which the tRNA "flashcards" are rubbed, which means that DNA a  
lso contains the definition of its own higher-level l  
anguag
 
590
 
Let us move on to proteins. Proteins are active molecules, and c  
arry oui all the functions of the cell; therefore it is quite appropriate to think o  
f them as programs in the "machine language" of the cell (the cell itself b  
eing the processor). On the other hand, since proteins are hardware and m  
ost programs are software, perhaps it is better to think of the proteins a  
s processors. Thirdly, proteins are often acted upon by other proteins, w  
hich means that proteins are often data. Finally, one can view proteins as interp reters; this involves viewing DNA as a collection of high-level l  
anguage programs, in which case enzymes are merely carrying out the p  
rograms written in the DNA code, which is to say, the proteins are acting a  
s i
 
590
 
Then there are ribosomes and tRNA molecules. They mediate t  
he translation from DNA to proteins, which can be compared to the translation of a program from a high-level language to a machine language; i  
n other words, the ribosomes are functioning as interpreters and the tRNA   
molecules provide the definition of the higher-level language. But an alternative view of translation has it that the ribosomes are processors, while the   
tRNA's are interpreters.
 
590
 
We have barely scratched the surface in this analysis of i  
nterrelations between all these biomolecules. What we have seen is that nature feels q  
uit
 
591
 
comfortable in mixing levels which we tend to see as quite distinct. Actually  
, in computer science there is already a visible tendency to mix all t  
hese seemingly distinct aspects of an information-processing system. This i  
s particularly so in Artificial Intelligence research, which is usually at t  
he forefront of computer language d
 
591
 
The Origin of L  
if
 
591
 
A natural and fundamental question to ask, on learning of these i  
ncredibly intricately interlocking pieces of software and hardware is: "How did t  
hey ever get started in the first place?'' It is truly a baffling thing. One has t  
o imagine some sort of a bootstrap process occurring, somewhat like t  
hat which is used in the development of new computer languages-but a  
bootstrap from simple molecules to entire cells is almost beyond o  
ne's power to imagine. There are various theories on the origin of life. They a  
ll run aground on this most central of all central questions: "How did t  
he Genetic Code, along with the mechanisms for its translation (ribosomes a  
nd tRNA molecules), originate?" For the moment, we will have to c  
ontent ourselves with a sense of wonder and awe, rather than with an answer. And  
perhaps experiencing that sense of wonder and awe is more satisfying t  
han having an answer-at least for a w  
hil
 
602
 
C HAPTER X VII  
Church, Turing, Tarski,   
and Others
 
602
 
Formal and Informal S  
ystem
 
602
 
WE HAVE COME to the point where we can develop one of the main theses   
of this book: that every aspect of thinking can be viewed as a h  
igh-level description of a system which, on a low level, is governed by simple, e  
ven formal, rules. The "system", of course, is a brain-unless one is speaking o  
f thought processes flowing in another medium, such as a computer's circuits. The image is that of a formal system underlying an "  
informal system"-a system which can, for instance, make puns, discover n  
umber patterns, forget names, make awful blunders in chess, and so forth. This i  
s what one sees from the outside: its informal, overt, software level. B  
y contrast, it has a formal, hidden, hardware level (or "substrate") which is a  
formidably complex mechanism that makes transitions from state to s  
tate according to definite rules physically embodied in it, and according to t  
he input of signals which impinge on i
 
602
 
A vision of the brain such as this has many philosophical and other   
consequences, needless to say. I shall try to spell some of them out in t  
his Chapter. Among other things, this vision seems to imply that, at b  
ottom, the brain is some sort of a "mathematical" object. Actually, that is at best a  
very awkward way to look at the brain. The reason is that, even if a brain i  
s, in a technical and abstract sense, some sort of formal system, it remains t  
rue that mathematicians only work with simple and elegant systems, systems in   
which everything is extremely clearly defined-and the brain is a far c  
ry from that, with its ten billion or more semi-independent neurons, quasirandomly connected up to each other. So mathematicians would n  
ever study a real brain's networks. And if you define "mathematics" as w  
hat mathematicians enjoy doing, then the properties of brains are not mathem
 
602
 
The only way to understand such a complex system as a brain is b  
y chunking it on higher and higher levels, and thereby losing some p  
recision at each step. What emerges at the top level is the "informal system" w  
hich obeys so many rules of such complexity that we do not yet have t  
he vocabulary to think about it. And that is what Artificial Intelligence research is hoping to find. It has quite a different flavor from mathema  
tics research. Nevertheless, there is a loose connection to mathematics: A  
I people often come from a strong mathematics background, a  
n
 
603
 
mathematicians sometimes are intrigued by the workings of their own  
brains. The following passage, quoted from Stanislaw Ulam's autobiographical Adventures of a Mathematician, illustrates this p  
oint: It seems to me that more could be done to elicit ... the nature of a  
ssociations, with computers providing the means for experimentation. Such a s  
tudy would have to involve a gradation of notions, of symbols, of classes of symbo  
ls, of classes of classes, and so on, in the same way that the complexity of   
mathematical or physical structures is i  
nvestigated. There must be a trick to the train of thought, a recursive formula. A g  
roup of neurons starts working automatica lly, sometimes without external i  
mpulse. It is a kind of iterative process with a growing pattern. It wanders about in t  
he brain, and the way it happens must depend on the memory of similar patterns .'
 
603
 
Intuition and the Magnificent C  
ra
 
603
 
Artificial Intelligence is often referred to as "Al". Often, when I try to   
explain what is meant by the term, I say that the letters "Al" could just a  
s well stand for "Artificial Intuition'', or even "Artificial Imagery". The a  
im of Al is to get at what is happening when one's mind silently and i  
nvisibly chooses, from a myriad alternatives, which one makes most sense in a v  
ery complex situation. In many real-life situations, deductive reasoning is inappropriate, not because it would give wrong answers, but because there a  
re too many correct but irrelevant statements which can be made; there are j  
ust too many things to take into account simultaneously for reasoning alone t  
o be sufficient. Consider this mini-d  
ialogue: "The other day I read in the paper that the  
-" "Oh-you were reading? It follows that you have eyes. Or at l  
east one eye. Or rather, that you had at least one eye t hen."  
A sense of judgment-"What is important here, and what is not?"-is c  
alled for. Tied up with this is a sense of simplicity, a sense of beauty. Where d  
o these intuitions come from? How can they emerge from an u  
nderlying formal s
 
603
 
In the Magnificrab, some unusual powers of the Crab's mind are revealed. His own version of his powers is merely that he listens to music a  
nd distinguishes the beautiful from the non-beautiful. (Apparently for him the  
re is a sharp dividing line.) Now Achilles finds another way to describe t  
he Crab's abilities: the Crab divides statements of number theory into t  
he categories true and false. But the Crab maintains that, if he chances to do s  
o, it is only by the purest accident, for he is, by his own admission, incompetent in mathematics. What makes the Crab's performance all the m  
ore mystifying to Achilles, however, is that it seems to be in direct violation of a  
celebrated result of metamathematics with which Achilles is famili  
ar: CHURCH'S THEOREM: There is no infallible method for telling theorems o  
f TNT from n
 
604
 
It was proven in 1936 by the American logician Alonzo Church. C  
losely related is what I call t  
he TARSKI-CHURCH-TURING THEOREM: There is no infallible method fo  
r telling true from false statements of number t
 
604
 
The Church-Turing T  
hesi
 
604
 
To understand Church's Theorem and the Tarski-Church-Turing Theorem better, we should first describe one of the ideas on which they a  
re based; and that is the Church-Turing Thesis (often called "Church's Thesis"  
). For the Church-Turing Thesis is certainly one of the most i  
mportant concepts in the philosophy of mathematics, brains, and t  
hinkin
 
604
 
Actually, like tea, the Church-Turing Thesis can be given in a v  
ariety of different strengths. So I will present it in various versions, and we w  
ill consider what they i  
mply. The first version sounds very innocent-in fact almost pointless  
: CHURCH-TURING THESIS, TAUTOLOGICAL VERSION: Mathematics p  
roblems can be solved only by doing m  
athematics. Of course, its meaning resides in the meaning of its constituent terms. B  
y "mathematics problem" I mean the problem of deciding whether s  
ome number possesses or does not possess a given arithmetical property. I  
t turns out that by means of Godel-numbering and related coding t  
ricks, almost any problem in any branch of mathematics can be put into t  
his form, so that "mathematics problem" retains its ordinary meaning. W  
hat about "doing mathematics"? When one tries to ascertain whether a n  
umber has a property, there seem to be only a small number of operations w  
hich one uses in combination over and over again-addition, m  
ultiplication, checking for equality or inequality. That is, loops composed of such operations seem to be the only tool we have that allows us to probe the world o  
f numbers. Note the word "seem". This is the critical word which t  
he Church-Turing Thesis is about. We can give a revision  
: CHURCH-TURING THESIS, STANDARD VERSION: Suppose there is a m  
ethod which a sentient being follows in order to sort numbers into t  
wo classes. Suppose further that this method always yields an a  
nswer within a finite amount of time, and that it always gives the saHle a  
nswer for a given number. Then: Some terminating FlooP program (i.e., s  
ome general recursive function) exists which gives exactly the same a  
nswers as the sentient being's method d  
oes. The central hypothesis, to make it very clear, is that any mental p  
rocess which divides numbers into two sorts can be described in the form of a  
FlooP program. The intuitive belief is that there are no other tools t  
han those in FlooP, and that there are no ways to use those tools other than b  
 
605
 
unlimited iterations (which FlooP allows). The Church-Turing Thesis is n  
ot a provable fact in the sense of a Theorem of mathematics-it is a h  
ypothesis about the processes which human brains u
 
605
 
The Public-Processes V  
ersio
 
605
 
Now some people might feel that this version asserts too much. These   
people might put their objections as follows: "Someone such as the C  
rab might exist-someone with an almost mystical insight into mathematics, b  
ut who is just as much in the dark about his own peculiar abilities as a  
nyone else-and perhaps that person's mental mechanisms carry out o  
perations which have no counterpart in FlooP." The idea is that perhaps we have a  
subconscious potential for doing things which transcend the c  
onscious processes-things which are somehow inexpressible in terms of t  
he elementary FlooP operations. For these objectors, we shall give a w  
eaker version of the Thesis," one which di&tinguishes between public and p  
rivate mental p
 
605
 
CHURCH-TURING THESIS, PUBLIC-PROCESSES VERSION: Suppose there is a   
method which a sentient being follows in order to sort numbers i  
nto two classes. Suppose further that this method always yields an a  
nswer within a finite amount of time, and that it always gives the same a  
nswer for a given number. Proviso: Suppose also that this method can b  
e communicated reliably from one sentient being to another by means o  
f language. Then: Some terminating FlooP program (i.e., general recursive function) exists which gives exactly the same answers as the sentient beings' method d  
oes. This says that public methods are subject to "FlooPification", but asser  
ts nothing about private methods. It does not say that they are un-FlooP-able,   
but it at least leaves the door o  
pe
 
605
 
Srinivasa R  
amanuja
 
605
 
As evidence against any stronger version of the Church-Turing Thesis, l  
et us consider the case of the famous Indian mathematician of the fi  
rst quarter of the twentieth century, Srinivasa Ramanujan (188 7-192  
0). Ramanujan (Fig. 105) came from Tamilnadu, the southernmost part o  
f India, and studied mathematics a little in high school. One day, s  
omeone who recognized Ramanujan's talent for math presented him with a copy o  
f a slightly out-of-date textbook on analysis, which Ramanujan d  
evoured (figuratively speaking). He then began making his own forays into t  
he world of analysis, and by the time he was twenty-three, he had made a  
number of discoveries which he considered worthwhile. He did not k  
now to whom to turn, but somehow was told about a professor of mathema  
tics in faraway England, named G. H. Hardy. Ramanujan compiled his b  
es
 
606
 
FIGURE 105. · Srinivasa Ramanuj  
an and one of his strange Indian m  
elodie
 
606
 
results together in a packet of papers, and sent them all to the unforewarned Hardy with a covering letter which friends helped him e  
xpress in English. Below are some excerpts taken from Hardy's description of h  
is reaction upon receiving the b  
undle: ... It soon became obvious that Ramanujan must possess much more g  
eneral theorems and was keeping a great deal up his sleeve . ... [ Some f  
ormulae] defeated me completely; I had never seen anything in the least like t  
hem before. A single look at them is enough to show that they could only b  
e written down by a mathematician of the highest class. They must be t  
rue because, if they were not true, no one would have had the imagination t  
o invent them. Finally ... the writer must be completely honest, because g  
reat mathematicians are commoner than thieves or humbugs of such incredi  
ble skill. 2  
What resulted from this correspondence was that Ramanujan came t  
o England in 1913, sponsored by Hardy; and then followed an i  
ntense collaboration which terminated in Ramanujan's early demise at age thirtythree from tuberc
 
606
 
Ramanujan had several extraordinary characteristics which set h  
im apart from the majority of mathematicians. Om. was his lack of rigor. V  
ery often he would simply state a result which he would insist, had just come t  
 
607
 
him from a vague intuitive source, far out of the realm of consc  
ious probing. In fact, he often said that the goddess Namagiri inspired him i  
n his dreams. This happened time and again, and what made it all the m  
ore mystifying-perhaps even imbuing it with a certain mystical quality-  
was the fact that many of his "intuition-theorems" were wrong. Now there is a  
curious paradoxical effect where sometimes an event which you t  
hink could not help but make credulous people become a little more s  
keptical, actually has the reverse effect, hitting the credulous ones in some vulnerable spot of their minds, tantalizing them with the hint of some b  
affling irrational side of human nature. Such was the case with Ramanuj  
an's blunders: many educated people with a yearning to believe in something o  
f the sort considered Ramanujan's intuitive powers to be evidence of a  
mystical insight into Truth, and the fact of his fallibility seemed, if anything, to strengthen, rather than weaken, such beliefs.
 
607
 
Of course it didn't hurt that he was from one of the most b  
ackward parts of India, where fakirism and other eerie Indian rites had been   
practiced for millennia, and were still practiced with a frequency probably   
exceeding that of the teaching of higher mathematics. And his o  
ccasional wrong flashes of insight, instead of suggesting to people that he was m  
erely human, paradoxically inspired the idea that Ramanujan's wrongness always had some sort of "deeper rightness" to it-an "Oriental" rightness,   
perhaps touching upon truths inaccessible to Western minds. What a delicious, almost irresistible thought! Even Hardy-who would have been t  
he first to deny that Ramanujan had any mystical powers-once wrote a  
bout one of Ramanujan's failures, "And yet I am not sure that, in some ways, h  
is failure was not more wonderful than any of his triu  
mph
 
607
 
The other outstanding feature of Ramanujan's mathematical personality was his "friendship with the integers", as his colleague Littlewood put i  
t. This is a characteristic that a fair number of mathematicians share to s  
ome degree or other, but whic h.Ramanujan possessed to an extreme. There a  
re a couple of anecdotes which illustrate this special power. The first one i  
s related by H  
ardy: I remember once going to see him when he was lying ill at Putney. I h  
ad ridden in taxi-cab No. 1729, and remarked that the number seemed to m  
e rather a dull one, and that I hoped it was not an unfavourable omen. "No," h  
e replied, "it is a very interesting number; it is the smallest number e  
xpressible as a sum of two cubes in two different ways." I asked him, naturally, w  
hether he knew the answer to the correspondmg problem for fourth powers; and h  
e replied, after a moment's thought, that he could see no obvious example, and   
thought that the first such number must be very large.3  
It turns out that the answer for fourth powers i  
s: 6 3531 8657 = 134  
4 + 133  
4 = 1584 + 594  
The reader may find it interesting to tackle the analogous problem f  
or squares, which is much e
 
607
 
It is actually quite interesting to ponder why it is that Hardy im
 
608
 
mediately jumped to fourth powers. After all, there are several o  
ther reasonably natural generalizations of the equat  
ion u  
3 + v  
3 = x3 + y  
3  
along different dimensions. For instance, there is the question about representing a number in three distinct ways as a sum of two c ubes:  
r  
3 + s  
3 = u3 + v  
3 = x3 + y  
3  
Or, one can use three different cube s:  
u  
3 + v  
3 + w3 = x3 + y  
3 + z  
3  
Or one can even make a Grand Generalization in all dimensions at o  
nce: There is a sense, however, in which Hardy's generalization is "the m  
ost mathematician-like". Could this sense of mathematical esthetics ever b  
e progr
 
608
 
The other anecdote is taken from a biography of Ramanujan by h  
is countryman S. R. Ranganathan, where it is called "Ramanujan's Flash". I  
t is related by a Indian friend of Ramanujan's from his Cambridge days, D  
r. P .C. Mahalan obis.  
On another occasion, I went to his room to have lunch with him. The First  
World War had started some time earlier. I had in my hand a copy of t  
hemonthly "Strand Magazine" which at that time used to publish a number o  
f puzzles to be solved by readers. Ramanujan was stirring something in a p  
anover the fire for our lunch. I was sitting near the table, turning over the p  
agesof the Magazine. I got interested in a problem involving a relation b  
etweentwo numbers. I have forgotten the details; but I remember the type of t  
heproblem. Two British officers had been billeted in Paris in two diffe  
renthouses in a long street; the door numbers of these houses were related in a  
special way; the problem was to find out the two numbers. It was not at a  
lldifficult. I got the solution in a few minutes by trial and e  
rror.MAHALAN0BIS (in a joking way): Now here is a problem for you  
. RAMANUJAN: What problem, tell me. (He went on stirring the pan  
.) I read out the question from the "Strand M  
agazine". RAMANUJAN: Please take down the solution. (He dictated a c  
ontinued fracti  
on.) The first term was the solution which I had obtained. Each successive t  
erm represented successive solutions for the same type of relation between t  
wo numbers, as the number of houses in the street would increase indefinitely. I  
was a  
mazed. MAHALANOBIS: Did you get the solution in a fl  
ash? RAMANUJAN: Immediately I heard the problem, it was clear that t  
he solution was obviously a continued fraction; I then thought, "Which continued fraction?" and the answer came to my mind. It was just as simple a  
s this.4  
Hardy, as Ramanujan's closest co-worker, was often asked a  
fte
 
609
 
Ramanujan's death if there had been any occult or otherwise e  
xotically flavored elements to Ramanujan's thinking style. Here is one c  
omment which he g  
ave: I have often been asked wh-:ther Ramanujan had any special secret; w  
hether his methods differed in kind from those of other mathematicians; w  
hether there was anything really abnormal in his mode of thought. I cannot a  
nswer these questions with any confidence or conviction; but I do not believe it. M  
y belief is that all mathematicians think, at bottom, in the same kind of w ay, a  
nd that Ramanujan was no exception.  
5  
Here Hardy states in essence his own version of the Church-Turing T  
hesis. I p  
araphrase: CHURCH-TURING THESIS, HARDY'S VERSION: At bottom, all mathemat  
icians are i  
somorphic. This does not equate the mathematical potential of mathematicians w  
ith that of general recursive functions; for that, however, all you need is t  
o show that some mathematician's mental capacity is no more general t  
han recursive functions. Then, if you believe Hardy's Version, you know it fo  
r all mathematician
 
609
 
Then Hardy compares Ramanujan with calculating p rodigies:  
His memory, and his powers of calculation, were very unusual, but they c  
ould not reasonably be called "abnormal". If he had to multiply two large numbers,  
he multiplied them in the ordinary way; he could do it with unusual r  
apidity and accuracy, but not more rapidly and accurately than any mathemati  
cian who is naturally quick and has the habit of computation.6  
Hardy describes what he perceived as Ramanujan's outstanding inte  
llectual attribut es:  
With his memory, his patience, and his power of calculation, he combined a  
power of generalisation, a feeling for form, and a capacity for rapid modification of h  
is hypotheses, that were often really startling, and made him, in his own fi  
eld, without a rival in his d ay.  
7 The part of this passage which I have italicized seems to me to be a  
n excellent characterization of some of the subtlest features of intelligence i  
n general. Finally, Hardy concludes somewhat nostalgically:  
[ His work] has not the simplicity and inevitableness of the very greatest work;  
it would be greater if it were less strange. One gift it has which no one c  
an deny-profound and invincible originality. He would probably have been a  
greater mathematician if he had been caught and tamed a little in his y  
outh; he would have discovered more that was new, and that, no doubt, of g  
reater importance. On the other hand he would have been less of a Ramanujan, a  
nd more of a European professor and the loss might have been greater than t  
he gain.8  
The esteem in which Hardy held Ramanujan is revealed by the r  
omantic way in which he speaks of h
 
610
 
"Idiots Savants"
 
610
 
There is another class of people whose mathematical abilities seem to defy  
rational explanation-the so-called "idiots savants", who can perform complex calculations at lightning speeds in their heads (or wherever they do it).  
Johann Martin Zacharias Dase, who lived from 1824 to 186i and was   
employed by various European governments to perform computations, i  
s an outstanding example. He not only could multiply two numbers each o  
f 100 digits in his head; he also had an uncanny sense of quantity. That is, h  
e could just "tell", without counting, how many sheep were in a field, o  
r words in a sentence, and so forth, up to about 30-this in contrast to m  
ost of us, who have such a sense up to about 6, with reliability. Inc  
identally, Dase was not an i
 
610
 
I shall not describe the many fascinating documented cases of "lightning calculators", for that is not my purpose here. But I do feel it i  
s important to dispel the idea that they do it by some mysterious, unanalyzable method. Although it is often the case that such wizards' c  
alculational abilities far exceed their abilities to explain their results, every once in a  
while, a person with other intellectual gifts comes along who also has t  
his spectacular ability with numbers. From such people's introspection, as w  
ell as from extensive research by psychologists, it has been ascertained t  
hat nothing occult takes place during the performances of lightning calculators, but simply that their minds race through intermediate steps w  
ith the kind of self-confidence that a natural athlete has in executing a complicated motion quickly and gracefully. They do not reach their answers b  
y some sort of instantaneous flash of enlightenment (though subjectively i  
t may feel that way to some of them), but-like the rest of us-by sequen  
tial calculation, which is to say, by FlooP-ing (or BlooP-ing) a  
lon
 
610
 
Incidentally, one of the most obvious clues that no "hot line to God" is   
involved is the mere fact that when the numbers involved get bigger, t  
he answers are slower in coming. Presumably, if God or an "oracle" w  
ere supplying the answers, he wouldn't have to slow up when the numbers g  
ot bigger. One could probably make a nice plot showing how the time t  
aken by a lightning calculator varies with the sizes of the numbers involved, a  
nd the operations involved, and from it deduce some features of the algorithms e
 
610
 
The Isomorphism Version of the Church-Turing T  
hesi
 
610
 
This finally brings us to a strengthened standard version of the ChurchTuring T  
hesis: CHURCH-TURING THESIS, ISOMORPHISM VERSION: Suppose there is a m  
ethod which a sentient being follows in order to sort numbers into t  
wo classes. Suppose further that this method always yields an a  
nswer within a finite amount of time, and that it always gives the same a  
nswer for a given number. Then: Some terminating FlooP program (
 
611
 
general recursive function) exists which gives exactly the same a  
nswers as the sentient being's method does. Moreover: The mental process a  
nd the FlooP program are isomorphic in the sense that on some level   
there is a correspondence between the steps being carried out in b  
oth computer and b  
rain. Notice that not only has the conclusion been strengthened, but also t  
he proviso of communicability of the faint-hearted Public-Processes V  
ersion has been dropped. This bold version is the one which we now shall discuss
 
611
 
In brief, this version asserts that when one computes something, one's  
mental activity can be mirrored isomorphically in some FlooP p  
rogram. And let it be very clear that this does not mean that the brain is a  
ctually running a FlooP program, written in the FlooP language complete w  
ith BEGIN's, END's, ABORT's, and the rest-not at all. It is just that the steps a  
re taken in the same order as they could be in a FlooP program, and t  
he logical structure of the calculation can be mirrored in a FlooP program.
 
611
 
Now in order to make sense of this idea, we shall have to make s  
ome level distinctions in both computer and brain, for otherwise it could b  
e misinterpreted as utter nonsense. Presumably the steps of the c  
alculation going on inside a person's head are on the highest level, and are s  
upported by lower levels, and eventually by hardware. So if we speak of an isomorphism, it means we've tacitly made the assumption that the highest level c  
an be isolated, allowing us to discuss what goes on there independently o  
f other levels, and then to map that top level into FlooP. To be more p  
recise, the assumption is that there exist software entities which play the roles o  
f various mathematical constructs, and which are activated in ways which c  
an be mirrored exactly inside FlooP (see Fig. 106). What enables t  
hese software entities to come into existence is the entire infrastructure discussed in Chapters XI and XII, as well as in the Prelude, Ant Fugue. There i  
s no assertion of isomorphic activity on the lower levels of brain and computer ( e.g., neurons and bits
 
611
 
The spirit of the Isomorphism Version, if not the letter, is g  
otten across by saying that what an idiot savant does in calculating, say, t  
he logarithm of 'TT, is isomorphic to what a pocket calculator does in c  
alculating it-where the isomorphism holds on the arithmetic-step level, not on t  
he lower levels of, in the one case, neurons, and in the other, int  
egrated circuits. (Of course different routes can be followed in c  
alculating anything-but presumably the pocket calculator, if not the human, c  
ould be instructed to calculate the answer in any specific m
 
611
 
FIGURE 106. The behavior of natural numbers can be mirrored in a human brain or in t  
he programs of a computer. These two different representations can then be mapped onto e  
ach other on an appropriately abstract l
 
612
 
Representation of Knowledge about the Real W  
orl
 
612
 
Now this seems quite plausible when the domain referred to is n  
umber theory, for there the total universe in which things happen is very small a  
nd clean. Its boundaries and residents and rules are well-defined, as in a  
hard-edged maze. Such a world is far less complicated than the open-ended   
and ill-defined world which we inhabit. A number theory problem, o  
nce stated, is complete in and of itself. A real-world problem, on the o  
ther hand, never is sealed off from any part of the world with absolute c  
ertainty. For instance, the task of replacing a burnt-out light bulb may turn out t  
o require moving a garbage bag; this may unexpectedly cause the spilling of a  
box of pills, which then forces the floor to be swept so that the pet dog w  
on't eat any of the spilled pills, etc., etc. The pills and the garbage and the d  
og and the light bulb are all quite distantly related parts of the world-yet a  
n intimate connection is created by some everyday happenings. And there i  
s no telling what else could be brought in by some other small variations o  
n the expected. By contrast, if you are given a number theory problem, y  
ou never wind up having to consider extraneous things such as pills or dogs or   
bags of garbage or brooms in order to solve your problem. (Of course, y  
our intuitive knowledge of such objects may serve you in good stead as you g  
o about unconsciously trying to manufacture mental images to help you i  
n visualizing the problem in geometrical terms-but that is another m  
atte
 
612
 
Because of the complexity of the world, it is hard to imagine a l  
ittle pocket calculator that can answer questions put to it when you press a few   
buttons bearing labels such as "dog", "garbage", "light bulb", and so f  
orth. In fact, so far it has proven to be extremely complicated to have a f  
ull-size high-speed computer answer questions about what appear to us to b  
e rather simple subdomains of the real world. It seems that a large amount o  
f knowledge has to be taken into account in a highly integrated way f  
or "understanding" to take place. We can liken real-world thought p  
rocesses to a tree whose visible part stands sturdily above ground but depends v  
itally on its invisible roots which extend way below ground, giving it stability a  
nd nourishment. In this case the roots symbolize complex processes which t  
ake place below the conscious level of the mind-processes whose effects permeate the way we think but of which we are unaware. These are the   
"triggering patterns of symbols" which were discussed in Chapters XI a  
nd XII
 
612
 
Real-world thinking is quite different from what happens when we d  
o a multiplication of two numbers, where everything is "above ground", so t  
o speak, open to inspection. In arithmetic, the top level can be "skimmed o  
ff" and implemented equally well in many different sorts of h  
ardware: mechanical adding machines, pocket calculators, large computers, p  
eople's brains, and so forth. This is what the Church-Turing Thesis is all a  
bout. But when it comes to real-world understanding, it seems that there is n  
o simple way to skim off the top level, and program it.alone. The t  
riggering patterns of symbols are just too complex. There must be several l  
evels through which thoughts may "percolate" and "  
bubb
 
613
 
In particular-and this comes back to a major theme of Chapters X  
I and XII-the representation of the rïżœal world in the brain, a  
lthough rooted in isomorphism to some extent, involves some elements which have  
no counterparts at all in the outer world. That is, there is much more to it   
than simple mental structures representing "dog", "broom", etc. All of   
these symbols exist, to be sure-but their internal structures are e  
xtremely complex and to a large degree are unavailable for conscious inspe  
ction. Moreover, one would hunt in vain to map each aspect of a symbol's i  
nternal structure onto some specific feature of the real w  
orl
 
613
 
Processes That Are Not So Skimma  
bl
 
613
 
For this reason, the brain begins to look like a very peculiar formal s  
ystem, for on its bottom level-the neural level-where the "rules" operate a  
nd change the state, there may be no interpretation of the primitive e  
lements (neural firings, or perhaps even lower-level events). Yet on the top l  
evel, there emerges a meaningful interpretation-a mapping from the l  
arge "clouds" of neural activity which we have been calling "symbols", onto the   
real world. There is some resemblance to the Godel construction, in that a  
high-level isomorphism allows a high level of meaning to be read i  
nto strings; but in the Godel construction, the higher-level meaning "rides" o  
n the lower level-that is, it is derived from the lower level, once the notion o  
f Godel-numbering has been introduced. But in the brain, the events on t  
he neural level are not subject to real-world interpretation; they are simply not   
imitating anything. They are there purely as the substrate to support the   
higher level, much as transistors in a pocket calculator are there purely t  
o support its number-mirroring activity. And the implication is that there i  
s no way to skim off just the highest level and make an isomorphic copy in a   
program; if one is to mirror the brain processes which allow r  
eal-world understanding, then one must mirror some of the lower-level things w  
hich are taking place: the "languages of the brain". This doesn't n  
ecessarily mean that one must go all the way down to the level of the hardware,   
though that may turn out to be the case.
 
613
 
In the course of developing a program with the aim of achieving a  
n "intelligent" (viz., human-like) internal representation of what is "  
out there", at some point one will probably be forced into using structures a  
nd processes which do not admit of any straightforward interpreta  
tions-that is, which cannot be directly mapped onto elements of reality. These l  
ower l,!yers of the program will be able to be understood only by virtue of t  
heir catalytic relation to layers above them, rather than because of some d  
irect connection they have to the outer world. (A concrete image of this idea w  
as suggested by the Anteater in the Ant Fugue: the "indescribably b  
oring nightmare" of trying to understand a book on the letter l
 
613
 
Personally, I would guess that such multilevel architecture o  
f concept-handling systems becomes necessary just when processes i  
nvolving images and analogies become significant elements of the p
 
614
 
contrast to processes which are supposed to carry out strictly d  
eductive reasoning. Processes which carry out deductive reasoning can be programmed in essentially one single level, and are therefore skimmable, b  
y definition. According to my hypothesis, then, imagery and analo  
gical thought processes intrinsically require several layers of substrate and a  
re therefore intrinsically non-skimmable. I believe furthermore that it is precisely at this same point that creativity starts to emerge-which would i  
mply that creativity intrinsically depends upon certain kinds of "unin  
terpretable" lower-level events. The layers of underpinning of analogical thinking are,   
of course, of extreme interest, and some speculations on their nature w  
ill be offered in the next two C
 
614
 
Articles of Reductionistic Fa  
it
 
614
 
One way to think about the relation between higher and lower levels in t  
he brain is this. One could assemble a neural net which, on a local (neuronto-neuron) level, performed in a manner indistinguishable from a n  
eural net in a brain, but which had no higher-level meaning at all. The fact t  
hat the lower level is composed of interacting neurons does not n  
ecessarily force any higher level of meaning to appear-no more than the fact t  
hat alphabet soup contains letters forces meaningful sentences to be found,   
swimming about in the bowl. High-level meaning is an optional feature of a  
neural network-one which may emerge as a consequence of e  
volutionary environmental p  
ressure
 
614
 
Figure 107 is a diagram illustrating the fact that emergence of a h  
igher level of meaning is optional. The upwards-pointing arrow indicates that a  
substrate can occur without a higher level of meaning, but not vice vers  
a: the higher level must be derived from properties of a lower o  
n
 
614
 
FIGURE 107. Floating on neural activity, the symbol level of the brain mirrors the w  
orld. But neural activity per se, which can be simulated on a computer, does not create thought; t  
hat calls for higher levels of organizat
 
615
 
The diagram includes an indication of a computer simulation of a n  
eural network. This is in principle feasible, no matter how complicated t  
he network, provided that the behavior of individual neurons can be described in terms of computations which a computer can carry out. This is a  
subtle postulate which few people even think of questioning. Nevertheless   
it is a piece of "reductionistic faith"; it could be considered a "mic  
roscopic version" of the Church-Turing Thesis. Below we state it explicitl  
y: CHURCH-TURING THESIS, MICROSCOPIC VERSION: The behavior of the components of a living being can be simulated on a computer. That is, t  
he behavior of any component (typically assumed to be a cell) can b  
e calculated by a FlooP program (i.e., general recursive function) to a  
ny desired degree of accuracy, given a sufficiently precise description o  
f the component's internal state and local e  
nvironment. This version of the Church-Turing Thesis says that brain processes do not   
possess any more mystique-even though they possess more levels o  
f organization-than, say, stomach processes. It would be unthinkable in t  
his day and age to suggest that people digest their food, not by o  
rdinary chemical processes, but by a sort of mysterious and magic "  
assimilation". This version of the CT-Thesis simply extends this kind of c  
ommonsense reasoning to brain processes. In short, it amounts to faith that the b  
rain operates in a way which is, in principle, understandable. It is a piece o  
f reductionist fait
 
615
 
A corollary to the Microscopic CT -Thesis is this rather terse n  
ew macroscopic v  
ersion: CHURCH-TURING THESIS, REDUCTIONIST'S VERSION: All brain processes a  
re derived from a computable s  
ubstrate. This statement is about the strongest theoretical underpinning one c  
ould give in support of the eventual possibility of realizing Artificial Intelligenc  
 
615
 
Of course, Artificial Intelligence research is not aimed at s  
imulating neural networks, for it is based on another kind of faith: that p  
robably there are significant features of intelligence which can be floated on top o  
f entirely different sorts of substrates than those of organic brains. F  
igure 108 shows the presumed relations among Artificial Intelligence, n  
atural intelligence, and the real w  
orl
 
615
 
Parallel Progress in Al and Brain Simulati
 
615
 
The idea that, if AI is to be achieved, the actual hardware of the b  
rain might one day have to be simulated or duplicated, is, for the present a  
t least, quite an abhorrent thought to many AI workers. Still one w  
onders, "How finely will we need to copy the brain to achieve Al?" The real answer   
is probably that it all depends on how many of the features of h  
uman consciousness you want to s  
imulat
 
616
 
FIGURE 108. Crucial to the endeavor of Artificial Intelligence research is the notion t  
hat the symbolic levels of the mind can be "skimmed off" of their neural substrate and imp  
lemented in other media, such as the electronic substrate of computers. To what depth the copying o  
f brain must go is at present completely u  
nclea
 
616
 
Is an ability to play checkers well a sufficient indicator of i  
ntelligence? If so, then AI already exists, since checker-playing programs are of world   
class. Or is intelligence an ability to integrate functions symbolically, as in a  
freshman calculus class? If so, then AI already exists, since symbolic integration routines outdo the best people in most cases. Or is intelligence t  
he ability to play chess well? If so, then AI is well on its way, since c  
hess-playing programs can defeat most good amateurs; and the level of artificial c  
hess will probably continue to improve s  
lowl
 
616
 
Historically, people have been nai've about what qualities, i  
f mechanized, would undeniably constitute intelligence. Sometimes it s  
eems as though each new step towards AI, rather than producing s  
omething which everyone agrees is real intelligence, merely reveals what real intelligence is not. If intelligence involves learning, creativity, emotional responses, a sense of beauty, a sense of self, then there is a long road a  
head, and it may be that these will only be realized when we have totally duplicated a living b
 
616
 
Beauty, the Crab, and the S  
ou
 
616
 
Now what, if anything, does all this have to say about the Crab's v  
irtuoso performance in front of Achilles? There are two issues clouded t  
ogether here. They a
 
617
 
(1) Could any brain process, under any circumstances, distinguish completely reliably between true and false s  
tatementsof TNT without being in violation of the Churc  
h-TuringThesis-or is such an act in principle impo  
ssible?( 2)Is perception of beauty a brain proc  
ess?First of all, in response to ( 1 ), if violations of the Church-Turing Thesis a  
re allowed, then there seems to be no fundamental obstacle to the s  
trange events in the Dialogue. So what we are interested in is whether a believer i  
n the Church-Turing Thesis would have to disbelieve in the Crab's ability  
. Well, it all depends on which version of the CT-Thesis you believe. F  
or example, if you only subscribe to the Public-Processes Version, then y  
ou could reconcile the Crab's behavior with it very easily by positing that the  
Crab's ability is not communicable. Contrariwise, if you believe the Reductionist's Version, you will have a very hard time believing in the Crab  
's ostensible ability (because of Church's Theorem-soon to be demonstrated). Believing in intermediate versions allows you a certain amount o  
f wishy-washiness on the issue. Of course, switching your stand according t  
o convenience allows you to waffle even m
 
617
 
It seems appropriate to present a new version of the CT-Thesis, o  
ne which is tacitly held by vast numbers of people, and which has been p  
ublicly put forth by several authors, in various manners. Some of the more f  
amous ones are: philosophers Hubert Dreyfus, S. Jaki, Mortimer Taube, a ndj. R  
. Lucas; the biologist and philosopher Michael Polanyi (a holist par excellence); the distinguished Australian neurophysiologist John Eccles. I a  
m sure there are many other authors who have expressed similar ideas, a  
nd countless readers who are sympathetic. I ,have attempted below to summarize their joint position. I have probably not done full justice to it, but I  
have tried to convey the flavor as accurately as I c  
an: CHURCH-TURING THESIS, SouusTS' VERSION: Some kinds of things which a  
brain can do can be vaguely approximated on a computer but n  
ot most, and certainly not the interesting ones. But anyway, even if the  
y all could, that would still leave the soul to explain, and there is no w  
ay that computers have any bearing on that.  
This 'version relates to the tale of the Magnijicrab in two ways. In the firs  
t place, its adherents would probably consider the tale to be silly and implausible, but-not forbidden in principle. In the second place, they w  
ould probably claim that appreciation of qualities such as beauty is one of tho  
se properties associated with the elusive soul, and is therefore i  
nherently possible only for humans, not for mere machines.
 
617
 
We will come back to this second point in a moment; but first, while we   
are on the subject of "soulists", we ought to exhibit this latest version in a  
n even more extreme form, since that is the form to which large numbers o  
f well-educated people subscribe these d  
ays: CHURCH-TURING THESIS, THEODORE ROSZAK VERSION:  
ridiculous. So is science in g
 
618
 
This view is prevalent among certain people who see in anything s  
macking of numbers or exactitude a threat to human values. It is too bad that t  
hey do not appreciate the depth and complexity and beauty involved in exploring abstract structures such as the human mind, where, indeed, one comes   
in intimate contact with the ultimate questions of what to be human i  
 
618
 
Getting back to beauty, we were about to consider whether the appreciation of beauty is a brain process, and if so, whether it is imitable by a   
computer. Those who believe that it is not accounted for by the brain a  
re very unlikely to believe that a computer could possess it. Those who b  
elieve it is a brain process again divide up according to which version of t  
he CT-Thesis they believe. A total reductionist would believe that any b  
rain process can in principle be transformed into a computer program; others,   
however, might feel that beauty is too ill-defined a notion for a c  
omputer program ever to assimilate. Perhaps they feel that the appreciation o  
f beauty requires an element of irrationality, and therefore is incompat  
ible with the very fiber of c
 
618
 
Irrational and Rational Can Coexist on Different L  
evel
 
618
 
However, this notion that "irrationality is incompatible with c  
omputers" rests on a severe confusion of levels. The mistaken notion stems from t  
he idea that since computers are faultlessly functioning machines, they a  
re therefore bound to be "logical" on all levels. Yet it is perfectly obvious that a   
computer can be instructed to print out a sequence of illogic  
al statements--o:r, for variety's sake, a batch of statements having r  
andom truth values. Yet in following such instructions, a computer would not b  
e making any mistakes! On the contrary, it would only be a mistake if t  
he computer printed out something other than the statements it had b  
een instructed to print. This illustrates how faultless functioning on one l  
evel may underlie symbol manipulation on a higher level-and the goals of t  
he higher level may be completely unrelated to the propagation of Truth.
 
618
 
Another way to gain perspective on this is to remember that a b  
rain, too, is a collection of faultlessly functioning elements-neurons. W  
henever a neuron's threshold is surpassed by the sum of the incoming s  
ignals, BANG!-it fires. It never happens that a neuron forgets its a  
rithmetical knowledge-carelessly adding its inputs and getting a wrong answer. E  
ven when a neuron dies, it continues to function correctly, in the sense that i  
ts components continue to obey the laws of mathematics and physics. Yet a  
s we all know, neurons are perfectly capable of supporting high-level behavior that is wrong, on its own level, in the most amazing ways. Figure 109  
is meant to illustrate such a clash of levels: an incorrect belief held in t  
he software of a mind, supported by the hardware of a faultlessly function  
ing b
 
618
 
The point-a point which has been made several times earlier i  
n various contexts-is simply that meaning can exist on two or more differ  
ent levels of a symbol-handling system, and along with meaning, rightness a  
nd wrongness can exist on all those levels. The presence of meaning on a g  
ive
 
619
 
FIGURE 109. The &rain is rational; the mind may not be. [Drawing by the author.]
 
620
 
level is determined by whether or not reality is mirrored in an i  
somorphic (or looser) fashion on that level. So the fact that neurons always p  
erform correct additions (in fact, much more complex calculations) has no b  
earing whatsoever on the correctness of the top-level conclusions supported b  
y their machinery. Whether one's top level is engaged in proving koans of   
Boolean Buddhism or in meditating on theorems of Zen Algebra, o  
ne's neurons are functioning rationally. By the same token, the high-level   
symbolic processes which in a brain create the experience of appre  
ciating beauty are perfectly rational on the bottom level, where the faultless functioning is taking place; any irrationality, if there is such, is on the h  
igher level, and is an epiphenomenon-a consequence-of the events on the   
lower l  
eve
 
620
 
To make the same point in a different way, let us say you are having a  
hard time making up your mind whether to order a cheeseburger or a   
pineappleburger. Does this imply that your neurons are also b  
alking, having difficulty deciding whether or not to fire? Of course not. Y  
our hamburger-confusion is a high-level state which fully depends on t  
he efficient firing of thousands of neurons in very organized ways. This is a   
little ironic, yet it is perfectly obvious when you think about it. N  
evertheless, it is probably fair to say that nearly all confusions about minds and computers have their origin in just such elementary level-confusions.
 
620
 
There is no reason to believe that a. computer's faultlessly fu  
nctioning hardware could not support high-level symbolic behavior which w  
ould represent such complex states as confusion, forgetting, or appreciation o  
f beauty. It would require that there exist massive subsystems i  
nteracting with each other according to a complex "logic". The overt behavior c  
ould appear either rational or irrational; but underneath it would be the performance of reliable, logical h  
ardwar
 
620
 
More Against Lucas
 
620
 
Incidentally, this kind o flevel distinction provides us with some new fuel i  
n arguing against Lucas. The Lucas argument is based on the idea t  
hat Godel's Theorem is applicable, by definition, to machines. In fact, L  
ucas makes a most emphatic p  
ronunciation: Godel's theorem must apply to cybernetical machines, because it is of t  
he essence of being a machine, that it should be a concrete instantiation of a  
formal system.9  
This is, as we have seen, true on the hardware level-but since there may b  
e higher levels, it is not the last word on the subject. Now Lucas gives the   
impression that in the mind-imitating machines he discusses, there is only  
one level on which manipulation of symbols takes place. For instance, t  
he Rule of Detachment (called "Modus Ponens" in his article) would bïżœ w  
ired into the hardware and would be an unchangeable feature of such a  
machine. He goes further and intimates that if Modus Ponens were not a  
 
621
 
immutable pillar of the machine's system, but could be overridden on   
occasion, then:  
The system will have ceased to be a formal logical system, and the m  
achine will barely qualify for the title of a model for the mind. 1  
0Now many programs which are being developed in AI research have v  
ery little in common with programs for generating truths of number theoryprograms with inflexible rules of inference and fixed sets of axioms. Y  
et they are certainly intended as "models for the mind". On their top levelthe "informal" level-there may be manipulation of images, formulation o  
f analogies, forgetting of ideas, confusing of concepts, blurring of distinctions, and so forth. But this does not contradict the fact that they rely on the   
correct functioning of their underlying hardware as much as brains rely on   
the correct functioning of their neurons. So AI programs are still "  
concrete instantiations of formal systems"-but they are not machines to w  
hich Lucas' transmogrification of Godel's proof can be applied. Lucas' a  
rgument applies merely to their bottom level, on which their intelligenc  
e-however great or small it may be-does not l
 
621
 
There is one other way in which Lucas betrays his oversimplified v  
ision of how mental processes would have to be represented inside c  
omputer programs. In discussing the matter of consistency, he w  
rites If we really were inconsistent machines, we should remain content with o  
ur inconsistencies, and would happily affirm both halves of a c  
ontradiction. Moreover, we would be prepared to say absolutely anything-which we a  
re not. It is easily shown that in an inconsistent formal system everything i  
s provable. 1  
1This last sentence shows that Lucas assumes that the Propositional C  
alculus must of necessity be built into any formal system which carries out reasoning·. In particular, he is thinking of the theorem <<PA-P>:JQ> of t  
he Propositional Calculus; evidently he has the erroneous belief that it is a  
n inevitable feature of mechanized reasoning. However, it is perfectly plausible that logical thought processes, such as propositional reasoning, w  
ill emerge as consequences of the general intelligence of an AI program, rather   
than being preprogrammed. This is wllat happens in humans! And there is no   
particular reason to assume that the strict Propositional Calculus, with i  
ts rigid rules and the rather silly definition of consistency that they entail,   
would emerge from such a p
 
621
 
An Underpinning of A  
 
621
 
We can summarize this excursion into level distinctions and come a  
way with one final, strongest version of the Church-Turing Thesis:  
CHURCH-TURING THESIS; AI VERSION: Mental processes of any sort can be   
simulated by a computer program whose underlying language is o  
 
622
 
power equal to that of FlooP-that is, in which all partial r  
ecursive functions can be p  
rogrammed. It should also be pointed out that in practice, many AI researchers r  
ely on another article of faith which is closely related to the CT-Thesis, a  
nd which I call the AI Thesis. It runs something like t  
his: AI THESIS: As the intelligence of machines evolves, its u  
nderlying mechanisms will gradually converge to the mechanisms u  
nderlying human i  
ntelligence. In other words, all intelligences are just variations on a single theme; t  
o create true intelligence, AI workers will just have to keep pushing to e  
ver lower levels, closer and closer to brain mechanisms, if they wish t  
heir machines to attain the capabilities which we h  
av
 
622
 
Church's T  
heore
 
622
 
Now let us come back to the Crab and to the question of whether h  
is decision procedure for theoremhood (which is presented in the guise of a  
filter for musical beauty) is compatible with reality. Actually, from t  
he events which occur in the Dialogue, we have no way of deducing w  
hether the Crab's gift is an ability to tell theorems from nontheorems, or a  
lternatively, an ability to tell true statements from false ones. Of course in many cases t  
his amounts to the same thing but Godel's Theorem shows that it doesn'  
t always. But no matter: both of these alternatives are impossible, if y  
ou believe the AI Version of the Church-Turing Thesis. The proposition t  
hat it is impossible to have a decision procedure for theoremhood in any f  
ormal system with the power of TNT is known as Church's Theorem. The proposition that it is impossible to have a decision procedure for numbertheoretical truth-if such truth exists, which one can well doubt after meeting up with all the bifurcations of TNT-follows quickly from Tars  
ki's Theorem (published in 1933, although the ideas were known to T  
arski considerably earlier)
 
622
 
The proofs of these two highly important results of metamathem  
atics are very similar. Both of them follow quite quickly from self-refe  
rential constructions. Let us first consider the question of a decision procedure f  
or TNT-theoremhood. If there were a uniform way by which people c  
ould decide which of the classes "theorem" and "nontheorem" any given formula X fell into, then, by the CT-Thesis (Standard Version), there would   
exist a terminating FlooP program (a general recursive function) whi  
ch could make the same decision, when given as input the Godel number o  
f formula X. The crucial step is to recall that any property that can be t  
ested for by a terminating FlooP program is represented in TNT. This means t  
hat the property ofTNT-theoremhood would be represented (as distinguis  
hed from merely expressed) inside TNT. But as we shall see in a moment, t  
hi
 
623
 
would put us in hot water, for if theoremhood is a representable a  
ttribute, then Godel's formula G becomes as vicious as the Epimenides p  
arado
 
623
 
It all hinges on what G says: "G is not a theorem of TNT". Assume t  
hat G were a theorem. Then, since theoremhood is supposedly r  
epresented, the TNT-formula which asserts "G is a theorem" would be a theorem o  
f TNT. But this formula is -G, the negation of G, so that TNT is inconsistent. On the other hand, assume G were not a theorem. Then once again by   
the supposed representability of theoremhood, the formula which a  
sserts "G is not a theorem" would be a theorem of TNT. But this formula is G  
, and once again we get into paradox. L'nlike the situation before, there is no   
resolution of the paradox. The problem is created by the assumption that   
theoremhood is represented by some formula of TNT, and therefore w  
e must backtrack and erase that assumption. This forces us also to c  
onclude that no FlooP program can tell the Godel numbers of theorems from t  
hose of nontheorems. Finally, if we accept the AI Version of the C  
T-Thesis, then we must backtrack further, and conclude that no method w  
hatsoever could exist by which humans could reliably tell theorems from nontheorems-and this includes determinations based on beauty. Those who subscribe only to the Public-Processes Version might still think the C  
rab's performance is possible; but of all the versions, that one is perhaps t  
he hardest one to find any justification f
 
623
 
Tarski's The  
ore
 
623
 
Now let us proceed to Tarski's result. Tarski asked whether there could b  
e a way of expressing in TNT the concept of number-theoretical truth. T  
hat theoremhood is expressible (though not representable) we have seen;   
Tarski was interested in the analogous question regarding the notion o  
f truth. More specifically, he wished to determine whether there is a  
ny TNT-formula with a single free variable a which can be translated t  
hus: "The formula whose Godel number is a expresses a truth."  
Let us suppose, with Tarski, that there is one-which we'll abbreviate a  
s TRUE{a}. Now what we'll do is use the diagonalization method to produce a  
sentence which asserts about itself that it is untrue. We copy the G  
odel method exactly, beginning with an "  
uncle": 3a: <-TRUE{a}AARITHMOQUINE{ a'',  
a}> Let us say the Godel number of the uncle is t. We arithmoquine this v  
ery uncle, and produce the Tarski formula T  
: 5  
80 3a: <-TRUE{a}AARITHMOQUINE{  
SSS ... SSS0/a",  
a}> ïżœ t S's
 
624
 
When interpreted, it s  
ays: "The arithmoquinification of t is t  
he Godel number of a false statement.  
" But since the arithmoquinification of t is T's own Godel number, T  
arski's formula T reproduces the Epimenides paradox to a tee inside TNT, s  
aying of itself, "I am a falsity". Of course, this leads to the conclusion that it m  
ust be simultaneously true and false (or simultaneously neither). There ari  
ses now an interesting matter: What is so bad about reproducing t  
he Epimenides paradox? Is it of any consequence? After all, we already have i  
t in English, and the English language has not gone up in s
 
624
 
The Impossibility of the M  
agnificra
 
624
 
The answer lies in remembering that there are two levels of m  
eaning involved here. One level is the level we have just been using; the other is a  
s a statement of number theory. If the Tarski formula T actually e  
xisted, then it would be a statement about natural numbe rs that is both true and fal  
se at once! There is the rub . While we can always just sweep t  
he English-language Epimenides paradox under the rug, saying that its subject matter (its own truth) is abstract, this is· not so when it becomes a  
concrete statement about numbers! If we believe this is a ridiculous state o  
f affairs, then we have to undo our assumption that the formula T  
RUE{a} exists. Thus, there is no way of expressing the notion of truth inside T  
NT. Notice that this makes truth a far more elusive property than theoremhood, for the latter is expressible. The same backtracking reasons as b  
efore (involving the Church-Turing Thesis, AI Version) lead us to the c  
onclusion t  
hat The Crab's mind cannot be a truth-recognizer any more than it i  
s a T  
NT-theorem-recognizer. The former would violate the Tarski-Church-Turing Theorem ("There i  
s no decision procedure for arithmetical truth"), while the latter w  
ould violate Church's T
 
624
 
Two Types of F  
or
 
624
 
It is extremely interesting, then, to think about the meaning of the w  
ord "form" as it applies to constructions of arbitrarily complex shapes. F  
or instance, what is it that we respond to when we look at a painting and fe  
el its beauty? Is it the "form" of the lines and dots on our retina? Evidently i  
t must be, for that is how it gets passed along to the analyzing mechanisms i  
n our heads-but the complexity of the processing makes us feel that we a  
re not merely looking at a two-dimensional surface; we are responding t  
 
625
 
some sort of inner meaning inside the picture, a multidimensional asp  
ect trapped somehow inside those two dimensions. It is the word "  
meaning" which is important here. Our minds contain interpreters which a  
ccept two-dimensional patterns and then "pull" from them high-di  
mensional notions which are so complex that we cannot consciously describe t  
hem. The same can be said about how we respond to music, incident
 
625
 
It feels subjectively that the pulling-out mechanism of inner meaning is   
not at all akin to a decision procedure which checks for the presence o  
r absence of some particular quality such as well-formedness in a s  
tring. Probably this is because inner meaning is something which reveals more o  
f itself over a period of time. One can never be sure, as one can about   
well-formedness, that one has finished with the i
 
625
 
This suggests a distinction that could be drawn between two senses o  
f "form" in patterns which we analyze. First, there are qualities such a  
s well-formedness, which can be detected by predictably terminating tests, as i  
n BlooP programs. These I propose to call syntactic qualities of form. O  
ne intuitively feels about the syntactic aspects of form that they lie close to t  
he surface, and therefore they do not provoke the creation of multidimensional cognitive structures.
 
625
 
By contrast, the semantic aspects of form are those which cannot b  
e tested for in predictable lengths of time: they require open-ended tests. S  
uch an aspect is theoremhood of TNT-strings, as we have seen. You cannot j  
ust apply some standard test to a string and find out if it is a t  
heorem. Somehow, the fact that its meaning is involved is crucially related to t  
he difficulty of telling whether or not a string is a TNT-theorem. The act o  
f pulling out a string's meaning involves, in essence, establishing all t  
he implications of its connections to all other strings, and this leads, to be s  
ure, down an open-ended trail. So "semantic" properties are connected t  
o open-ended searches because, in an important sense, an object's meaning is  
not localized within the object itself. This is not to say that no u  
nderstanding of any object's meaning is possible until the end of time, for as time p  
asses, more and more of the meaning unfolds. However, there are always asp  
ects of its meaning which will remain hidden arbitrarily l
 
625
 
Meaning Derives from Connections to Cognitive S  
tructure
 
625
 
Let us switch from strings to pieces of music, just for variety. You may s  
till substitute the term "string" for every reference to a piece of music, if y  
ou pre{er. The discussion is meant to be general, but its flavor is better g  
otten across, I feel, by referring to music. There is a strange duality about t  
he meaning of a piece of music: on the one hand, it seems to be s  
pread around, by virtue of its relation to many other things in the world-and y  
et, on the other hand, the meaning of a piece of music is obviously d  
erived from the music itself, so it must be localized somewhere inside the m  
usi
 
625
 
The resolution of this dilemma comes from thinking about the interpreter-the mechanism which does the pulling-out of meaning. (By "i
 
626
 
preter" in this context, I mean not the performer of the piece, but t  
he mental mechanism in the listener which derives meaning when the piece is   
played.) The interpreter may discover many important aspects of a p  
iece's meaning while hearing it for the first time; this seems to confirm the n  
otion that the meaning is housed in the piece itself, and is simply being read o  
ff. But that is only part of the story. The music interpreter works by setting u  
p a multidimensional cognitive structure-a mental representation of t  
he piece-which it tries to integrate with pre-existent information by fi  
nding links to other multidimensional mental structures which encode p  
revious experiences. As this process takes place, the full meaning gradually unfolds. In fact, years may pass before someone comes to feel that he h  
as penetrated to the core meaning of a piece. This seems to support t  
he opposite view: that musical meaning is spread around, the i  
nterpreter's role being to assemble it g
 
626
 
The truth undoubtedly lies somewhere in between: meanings-  
both musical and linguistic-are to some extent localizable, to some e  
xtent spread around. In the terminology of Chapter VI, we can say that m  
usical pieces and pieces of text are partly triggers, and partly carriers of e  
xplicit meaning. A vivid illustration of this dualism of meaning is provided by t  
he example of a tablet with an ancient inscription: the meaning is p  
artially stored in the libraries and the brains of scholars around the world, and y  
et it is also obviously implicit in the tablet i  
tsel
 
626
 
Thus, another way of characterizing the difference between "  
syntactic" and "semantic" properties (in the just-proposed sense) is that the syntacti  
c ones reside unambiguously inside the object under consideration, whe  
reas semantic properties depend on its relations with a potentially infinite class   
of other objects, and therefore are not completely localizable. There i  
s nothing cryptic or hidden, in principle, in syntactic properties, w  
hereas hiddenness is of the essence in semantic properties. That is the reason f  
or my suggested distinction between "syntactic" and "semantic" aspects o  
f visual f
 
626
 
Beauty, Truth, and F  
or
 
626
 
What about beauty? It is certainly not a syntactic property, according to t  
he ideas above. Is it even a semantic property? Is beauty a property which, f  
or instance, a particular painting has? Let us immediately restrict our consideration to a single viewer. Everyone has had the experience of fi  
nding something beautiful at one time, dull another time-and probably intermediate at other times. So is beauty an attribute which varies in time? O  
ne could turn things around and say that it is the beholder who has varied i  
n time. Given a particular beholder of a particular painting at a p  
articular time, is it reasonable to assert that beauty is a quality that is d  
efinitely present or absent? Or is there still something ill-defined and i  
ntangible about i
 
626
 
Different levels of interpreter probably could be invoked in e  
ver
 
627
 
person, depending on the circumstances. These various interpreters p  
ull out different meanings, establish different connections, and generally   
evaluate all deep aspects differently. So it seems that this notion of beauty i  
s extremely hard to pin down. It is for this reason that I chose to link b  
eauty, in the Magnificrab, with truth, which we have seen is also one of the m  
ost intangible notions in all of metamath  
ematic
 
627
 
The Neural Substrate of the Epimenides P  
arado
 
627
 
I would like to conclude this Chapter with some ideas about that c  
entral problem of truth, the Epimenides paradox. I think the Tarski reproduction of the Epimenides paradox inside TNT points the way to a d  
eeper understanding of the nature of the Epimenides paradox in English. What   
Tarski found was that his version of the paradox has two distinct levels to it.   
On one level, it is a sentence about itself which would be true if it were f  
alse, and false if it were true. On the other level-which I like to call t  
he arithmetical substrate-it is a sentence about integers which is true if and only if   
false
 
627
 
Now for some reason this latter bothers people a lot more than t  
he former. Some people simply shrug off the former as "meaningless", because of its self-referentiality. But you can't shrug off paradoxical statements about integers. Statements about integers simply cannot be both t  
rue and fals  
 
627
 
Now my feeling is that the Tarski transformation of the Epimeni  
des paradox teaches us to look for a substrate in the English-language version. I  
n the arithmetical version, the upper level of meaning is supported by t  
he lower arithmetical level. Perhaps analogously, the self-referential s  
entence which we perceive ("This sentence is false") is only the top level of a  
dual-level entity. What would be the lower level, then? Well, what is t  
he mechanism that language rides on? The brain. Therefore one ought t  
o look for a neural substrate to the Epimenides paradox-a lower level o  
f physical events which clash with each other. That is, two ïżœvents which b  
y their nature cannot occur simultaneously. If this physical substrate e  
xists, then the reason we cannot make heads or tails of the Epimenides s  
entence is that our brains are trying to do an impossible task
 
627
 
Now what would be the nature of the conflicting physical e  
vents? Presumably when you hear the Epimenides sentence, your brain sets up   
some "coding" of the sentence-an internal configuration of int  
eracting symbols. Then it tries to classify the sentence as "true" or "false". T  
his classifying act must involve an attempt to force several symbols to int  
eract in a particular way. (Presumably this happens when any sentence is processed.) Now if it happens that the act of classification would physically d  
isrupt the coding of the sentence-something which would ordinarily n  
ever happen-then one is in trouble, for it is tantamount to trying to force a  
record player to play its self-breaking record. We have described t  
he conflict in physical terms, but not in neural terms. If this analysis is right s  
 
628
 
far, then presumably the rest of the discussion could be carried on when w  
e know something about the constitution of the "symbols" in the brain out o  
f neurons and their firings, as well as about the way that sentences b  
ecome converted into "  
codin
 
628
 
This sketch of the neural substrate of the Epimenides paradox   
suggests (to me, at least) that the resolution of the English version of t  
he Epimenides paradox might be similar to that for the Tarski version. T  
he resolution involves abandoning the notion that a brain could ever provide a  
fully accurate representation for the notion of truth. The novelty of this   
resolution lies in its suggestion that a total modeling of truth is i  
mpossible for quite physical reasons: namely, such a modeling would require physically incompatible events to occur in a b
 
637
 
C HA PTER X VII  
IArtificial Intelligence:  
Retrospect  
 
637
 
Turi  
n
 
637
 
IN 1 950, ALAN TURING wrote a  
most prophetic and p  
rovocative article on Artificial Intelligence. I  
t was entitled "Computing Machinery and Intelligence" and appeared in the journal Mind. 1 I w  
ill say some things about that arti  
cle, but I would like to precede t  
hem with some remarks about T  
uring the m
 
637
 
Alan Mathison Turing was   
born in London in 1912. He was a  
chilc\ full of curiosity and h  
umor. Gifted in mathematics, he went t  
o Cambridge where his interests i  
n machinery and mathemat  
ical logic cross-fertilized and r  
esulted in his famous paper on "computable numbers", in which he invented the theory of T  
uring machines and demonstrated the   
unsolvability of the halting problem; it was published in 1937. I  
n the l 940's, his interests t  
urned from the theory of c  
omputing machines to the actual building o  
f real computers. He was a maj  
or figure in the development o  
f computers in Britain, and a  
staunch defender of Artificial In
 
637
 
FIGURE 113. Alan Turing, after a successful race (May, 1950). [From Sara Turing, Alan M. Turing (Cambridge, U. K  
.: W .Hejfer & Sons, 1959).]
 
638
 
telligence when it first came under attack. One of his best friends was David   
Champernowne (who later worked on computer composition of m  
usic). Champernowne and Turing were both avid chess players and i  
nvented "round-the-house" chess: after your move, run around the house-if y  
ou get back before your opponent has moved, you're entitled to another m  
ove. More seriously, Turing and Champernowne invented the first chessplaying program, called "Turochamp". Turing died young, at 41-apparently of an accident with chemicals. Or some say suicide. His mother, S  
ara Turing, wrote his biography. From the people she quotes, one gets t  
he sense that Turing was highly unconventional, even gauche in some w  
ays, but so honest and decent that he was vulnerable to the world. He l  
oved games, chess, children, and bike riding; he was a strong l  
ong-distance runner. As a student at Cambridge, he bought himself a second-hand v  
iolin and taught himself to play. Though not very musical, he derived a g  
reat deal of enjoyment from it. He was somewhat eccentric, given to great b  
ursts of energy in the oddest directions. One area he explored was the p  
roblem of morphogenesis in biology. According to his mother, Turing "had a  
particular fondness for the Pickwick Papers", but "poetry, with the e  
xception of Shakespeare's, meant nothing to him." Alan Turing was one of the t  
rue pioneers in the field of computer s
 
638
 
The Turing T  
es
 
638
 
Turing's article begins with the sentence: "I propose to consider the question 'Can machines think?'" Since, as he points out, these are loaded t  
erms, it is obvious that we should search for an operational way to approach t  
he question. This, he suggests, is contained in what he calls the "im  
itation game"; it is nowadays known as the Turing test. Turing introduces it a  
s f  
ollows: It is played with three people: a man (A), a woman (B), and an int  
errogator (C) who may be of either sex. The interrogator stays in a room apart from t  
heother two. The object of the game for the interrogator is to determine w  
hich of the other two is the man and which is the woman. He knows them by l  
abels X and Y, and at the end of the game he says either "X is A and Y is B" or " Xi  
s B and Y is A". The interrogator is allowed to put questions to A and B t  
hus: C: Will X please tell me the length of his or her hair?   
Now suppose Xis actually A, then A must answer. It is A's object in the g  
ame to try to cause C to make the wrong identification. His answer might t  
herefore b  
e "My hair is shingled, and the longest strands are about nine inches l  
ong." In order that tones of voice may not help the interrogator the answers should   
be written, or better still, typewritten. The ideal arrangement is to have a  
teleprinter communicating between the two rooms. Alternatively the questions and answers can be repeated by an intermediary. The object of the g  
ame for the third player (B) is to help the interrogator. The best strategy for her i  
s probably to give truthful answers. She can add such things as "I am t  
he woman, don't listen to him!" to her answers, but it will avail nothing as t  
he man can make similar r
 
639
 
We now ask the question, "What will happen when a machine takes the part o  
f A in this game?'' Will the interrogator decide wrongly as often when the g  
ame is played like this as he does when the game is played between a man and a  
woman? These questions replace our original, "Can machines think?"2  
After having spelled out the nature of his test, Turing goes on to m  
ake some commentaries on it, which, given the year he was writing in, are quite  
sophisticated. To begin with, he gives a short hypothetical dialogue between interrogator and interrogate e:  
3  
Q: Please write me a sonnet on the subject of the Forth Bridge [ a b  
ridge over the Firth of Forth, in Scotland  
]. A: Count me out on this one. I never could write p  
oetry. Q: Add 34957 to 7  
0764. A: (Pause about 30 seconds and then give as answer) 105 621  
. Q: Do you play c  
hess? A: Y  
es. Q: I have K at my K 1, and no other pieces. You have only K at K6 and Rat  
Rl. It is your move. What do you p  
lay? A: (After a pause of 15 seconds) R-R8 m  
ate. Few readers notice that in the arithmetic problem, not only is there a  
n inordinately long delay, but moreover, the answer given is wrong! T  
his would be easy to account for if the respondent were a human: a m  
ere calculational error. But if the respondent were a machine, a variety of   
explanations are possible. Here are s  
ome: (1) a run-time error on the hardware level (i.e., an irreproducible fluke);  
(2) an unintentional hardware (or programming) error w  
hich (reproducibly) causes arithmetical mistakes;  
(3) a ploy deliberately inserted by the machine's programmer (  
or builder) to introduce occasional arithmetical mistakes, so as   
to trick interrogators;  
(4) an unanticipated epiphenomenon: the program has a h  
ard time thinking abstractly, and simply made "an honest mistake", which it might not make the next time around;  
(5) a joke on the part of the machine itself, deliberately t  
easing its in  
terrogator. Reflection on what Turing might have meant by this subtle touch opens u  
p just about all the major philosophical issues connected with Artificial Intell
 
639
 
Turing goes on to point out t  
hat The new problem has the advantage of drawing a fairly sharp line b  
etween the physical and the intellectual capacities of a man . ... We do not wish t  
o penalize the machine for its inabilit} to shine in beauty competitions, nor t  
o penalize a man for losing in a race against an airplane.4  
One of the pleasures of the article is to see how far Turing traced out e  
ac
 
640
 
line of thought, usually turning up a seeming contradiction at some stage   
and, by refining his c oncepts, resolving it at a deeper level of a  
nalysis. Because of this depth of penetration into the issues, the article still s  
hines after nearly thirty years of tremendous progress in computer d  
evelopment and intensive work in AI. In the following short excerpt you can see s  
ome of this rich back-and-forth working of i  
deas: The game may perhaps be criticized on the ground that the odds are weighted too heavily against the machine. If the man were to try to pretend to be t  
he machine he would clearly make a very poor showing. He would be given a  
way at once by slowness and inaccuracy in arithmetic. May not machines carry o  
ut something which ought to be described as thinking but which is very differe  
nt from what a man does? This objection is a very strong one, but at least we c  
an say that if, nevertheless, a machine can be constructed to play the i  
mitation game satisfactorily, we need not be troubled by this obj  
ection. It might be urged that when playing the "imitation game" the best strategy   
for the machine may possibly be something other than imitation of t  
he behaviour of a man. This may be, but I think it is unlikely that there is any g  
reat effect of this kind. In any case there is no intention to investigate here t  
he theory of the game, and it will be assumed that the best strategy is to try t  
o provide answers that would naturally be given by a man.5  
Once the test has been proposed and discussed, Turing r  
emarks: The original question "Can machines think?" I believe to be too m  
eaningless to deserve discussion. Nevertheless, I believe that at the end of the century the   
use of words and general educated opinion will have altered so much that o  
ne will be able to speak of machines thinking without expecting to be contradicted.  
6
 
640
 
Turing Anticipates O  
bjection
 
640
 
Aware of the storm of opposition that would undoubtedly greet this opinion, he then proceeds to pick apart, concisely and with wry humor, a s  
eries of objections to the notion that machines could think. Below I list the n  
ine types of objections he counters, using his own descriptions of them.  
7 Unfortunately there is not space to reproduce the humorous and i  
ngenious responses he formulated. You may enjoy pondering the objections yourself, and figuring out your own r  
esponses. ( 1)The Theological Objection. Thinking is a function of man's immortal s  
oul.God has given an immortal soul to every man and woman, but not to a  
nyother animal or to machines. Hence no animal or machine l:an t  
hink.(2) The "Heads in the Sand" Objection. The consequences of machines t  
hinkingwould be too dreadful. Let us hope and believe that they cannot do so  
.(3) The Mathematical Objection. [This is essentially the Lucas argument.  
]( 4)The Argument from Consciousness. "Not until a machine can write a s  
onnetor compose a concerto because of thoughts and emotions felt, and not b  
ythe chance fall of symbols, could we agree that machine equals brainthat is, not only write it but know that it had written it. No m  
echanis
 
641
 
could feel (and not merely artificially signal, an easy contrivance) pleasure at its successes, grief when its valves fuse, be warmed by flattery, b  
e made miserable by its mistakes, be charmed by sex, be angry or depressed when it cannot get what it wants." [ A quote from a c  
ertain Professor Jeffer  
son.] Turing is quite concerned that he should answer this serious objection i  
n full detail. Accordingly, he devotes quite a bit of space to his answer, and i  
n it he offers another short hypothetical d ialogue:8  
Interrogator: In the first line of your sonnet which reads "Shall I c  
ompare thee to a summer's day", would not "a spring day" do as well or bette  
r? Witness: It wouldn't s  
can. Interrogator: How about "a winter's day"? That would scan all r  
ight. Witness: Yes, but nobody wants to be compared to a winter's d  
ay. Interrogator: Would you say Mr. Pickwick reminded you of Chris  
tmas? Witness: In a w  
ay. Interrogator: Yet Christmas is a winter's day, and I do not think Mr. P  
ickwick would mind the c  
omparison. Witness: I don't think you're serious. By a winter's day one means a typ  
ical winter's day, rather than a special one like Chris  
tmas. After this dialogue, Turing asks, "What would Professor Jefferson say i  
f the sonnet-writing machine was able to answer like this in the viva voc  
e?" Further objectio  
ns: ( 5)Arguments from Various Disabilities. These arguments take the form, "  
Igrant you that you can make machines do all the things that you h  
avementioned but you will never be able to make one to do X." N  
umerousfeatures X are suggested in this connection. I offer a select  
ion:Be kind, resourceful, beautiful, friendly, have initiative, have a sense o  
fhumor, tell right from wrong, make mistakes, fall in love, enjoy strawberries and cream, make someone fall in love with it, learn from experience, use words properly, be the subject of its own thought, have as  
much diversity of behaviour as a man, do something really new.  
(6) Lady Lovelace's Objection. Our most detailed information of B  
abbage'sAnalytical Engine comes from a memoir by Lady Lovelace. In it s  
hestates, "The Analytical Engine has no pretensions to originate anything. I  
tcan do whatever we know how to order it to perform" (her italics)  
.( 7)Argument from Continuity in the Nervous System. The nervous system i  
scertainly not a discrete state machine. A small error in the i  
nformationabout the size of a nervous impulse impinging on a neuron may make a  
large difference to the size of the outgoing impulse. It may be a  
rguedthat, this being so, one cannot expect to be able to mimic the behavi  
ourof the nervous system with a discrete state s  
ystem.(8) The Argument from Informality of Behaviour. It seems to run something l  
ikethis. "If each man had a definite set of rules of conduct by which h  
eregulated his life he would be no better than a machine. But there are n  
osuch rules, so men cannot be machines.  
"( 9)The Argument from Extra-Sensory Perception. Let us play the i  
mitationgame, using as witnesses a man who is good as a telepathic receiver, and a  
digital computer. The interrogator can ask such questions as "What s  
ui
 
642
 
does the card in my right hand belong to?" The man by telepathy o  
r clairvoyance gives the right answer 130 times out of 400 cards. T  
he machine can only guess at random, and perhaps get 104 right, so t  
he interrogator makes the right identification.
 
642
 
As you can see, many of the objections overlap and are related to e  
ach other. I have attempted in this book to answer most of them in some way o  
r other, usually far less concisely than Turing d
 
642
 
Objection (9) I find remarkable. I have seen the Turing paper reprinted in a book-but with objection (9) omitted-which I find e  
qually remarkable. My own point of view-contrary to Turing's-is that ESP d  
oes not exist. Turing was reluctant to accept the idea that ESP is real, but did so   
nonetheless, being compelled by his outstanding scientific integrity to accept the consequences of what he viewed as powerful statistical evidence in   
favor of ESP. I disagree, though I consider it an exceedingly complex a  
nd fascinating q
 
642
 
"Parry Encounters the D  
octo
 
642
 
In the Dialogue preceding this Chapter, you have seen an authentic exchange between a computer program and a human. Now there have b  
een many programs with less ability to "converse" than SHRDLU. Two r  
ather famous ones are "Doctor", created by Joseph Weizenbaum, and "  
Parry", created by Kenneth Colby. The former is supposed to simulate a psychiatrist using "nondirective therapy", the latter to simulate the belief str  
ucture of a paranoid. An amusing head-on collision between the two p  
rograms was arranged over a computer network, and the following is a short e  
xcerpt from the full transcript :9  
Parry: I don't understand your m  
otives. Doctor: Why don't you understand my motives?  
Parry: Shouldn't I be?   
Doctor: Why do you say that just n  
ow? Parry: That's just the way it i  
s. Doctor: Suppose that's just the way it were n  
ot. Parry: S  
o? Doctor: I am not sure I understand you ful  
ly. Parry: You harp on the same questions too m  
uch. Doctor: What does that similarity suggest to y  
ou? Parry: I don't get y  
ou. Doctor: Do you wish that you did get I  
? Parry: It's about time you showed some interest in my feel  
ings. Doctor: Oh? I showed some interest in your f  
eeling
 
642
 
These programs, of course, were made to talk to humans, so they a  
re shown up rather poorly here, pitted against each other. Mostly they rely o  
n shrewd guesses as to the nature of the input (which they analyze q  
uite shallowly) and spit back canned answers that have been carefully s  
elected from a large repertoire. The answer may be only partially canned: for   
example, a template with blanks that can be filled in. It is assumed that t  
hei
 
643
 
human partners will read much more into what they say than is a  
ctually underlying it. And in fact, according to Weizenbaum, in his book C  
omputer Power and Human Reason, just that happens. He w  
rites: ELIZA [ the program from which Doctor was made] created the most remarkable illusion of having understood in the minds of the many people w  
ho conversed with it. . .. They would often demand to be permitted to c  
onverse with the system in private, and would, after conversing with it for a t  
ime, insist, in spite of my explanations, that the machine really understood t hem.1  
0Given the above excerpt, you may find this incredible. Incredible, but t  
rue. Weizenbaum has an e  
xplanation: Most men don't understand computers to even the slightest degree. So, u  
nless they are capable of very great skepticism (the kind we bring to bear w  
hile watching a stage magician), they can explain the computer's intellectual f  
eats only by bringing to bear the single analogy available to them, that is, t  
heir model of their own capacity to think. No wonder, then, that they o  
vershoot the mark; it is truly impossible to imagine a human who could imitate E  
LIZA, for example, but for whom ELIZA's language abilities were his limit. 1  
1Which amounts to an admission that this kind of program is based on a  
shrewd mixture of bravado and bluffing, taking advantage of people'  
s g ullibility.
 
643
 
In light of this weird "ELIZA-eff  
ect", some people have suggested t  
hat the Turing test needs revision, since people can apparently be fooled b  
y simplistic gimmick ry. It has been suggested that the interrogator should be   
a Nobel Prize-winning scientist. It might be more advisable to turn t  
he Turing test on its head, and insist that the interrogator should be a  
nother computer. Or perhaps there should be two interrogators-a human and a  
computer-and one witness, and the two interrogators should try to fi  
gure out whether the witness is a human or a compute
 
643
 
In a more serious vein, I personally feel that the Turing test, as   
originally proposed, is quite reasonable. As for the people who Weizenbaum claims were sucked in by ELIZA, t1'ey were not urged to be s  
keptical, or to use all their wits in trying to determine if the "person" typing to t  
hem were human or not. I think that Turing's insight into this issue was s  
ound, and that the Turing test, essentially unmodified, will sur  
viv
 
643
 
A Brief History of A  
 
643
 
I would like in the next few pages to present the story, perhaps from a  
n unorthodox point of view, of some of the efforts at unraveling the algorithms behind intelligence; there have been failures and setbacks a  
nd there will continue to be. Nonetheless, we are learning a great deal, and it i  
s an exciting p
 
643
 
Ever since Pascal and Leibniz, people have dreamt of machines t  
hat could perform intellectual tasks. In the nineteenth century, Boole and D  
e Morgan devised "laws of thought"-essentiall y the P  
ropositiona
 
644
 
Calculus-and thus took the first step towards AI software; also Char  
les Babbage designed the first "calculating engine"-the precursor to t  
he hardware of computers and hence of AI. One could define AI as c  
oming into existence at the moment when mechanical devices took over any t  
asks previously performable only by human minds. It is hard to look back a  
nd imagine the feelings of those who first saw toothed wheels p  
erforming additions and multiplications of large numbers. Perhaps they experien  
ced a sense of awe .at seeing "thoughts" flow in their very physical hardware. I  
n any case, we do know that nearly a century later, when the first e  
lectronic computers were constructed, their inventors did experience an awe  
some and mystical sense of being in the presence of another kind of "  
thinking being". To what extent real thought was taking place was a source of m  
uch puzzlement; and even now, several decades later, the question remains a  
great source of stimulation and vitriolics.
 
644
 
It is interesting that nowadays, practically no one feels that sense o  
f awe any longer-even when computers perform operations that are incredibly more sophisticated than those which sent thrills down spines in t  
he early days. The once-exciting phrase "Giant Electronic Brain" remains o  
nly as a sort of "camp" cliche, a ridiculous vestige of the era of Flash G  
ordon and Buck Rogers. It is a bit sad that we become blase so q  
uickl
 
644
 
There is a related "Theorem" about progress in Al: once some m  
ental function is programmed, people soon cease to consider it as an e  
ssential ingredient of "real thinking". The ineluctable core of intelligence is a  
lways in that next thing which hasn't yet been programmed. This "Theorem" w  
as first proposed to me by Larry Tesler, so I call it Tester's Theorem: "AI is   
whatever hasn't been done yet
 
644
 
A selective overview of AI is furnished below. It shows several d  
omains in which workers have concentrated their efforts, each one seeming in i  
ts own way to require the quintessence of intelligence. With some of t  
he domains I have included a breakdown according to methods employed, or   
more specific areas of concentr  
ation. mechanical tra  
nslation direct (dictionary look-up with some word rearrangeme  
nt) indirect (via some intermediary internal l  
anguage) game p  
laying c  
hess with brute force look-  
ahead with heuristically pruned l  
ook-ahead with no look-ahe  
ad c  
heckers g  
o k  
alah bridge (bidding; p  
laying) p  
oker variations on tic-tac-t  
oe e
 
645
 
proving theorems in various parts of m  
athematics symbolic l  
ogic "resolution" theorem-pr  
oving elementary g  
eometry symbolic manipulation of mathematical expr  
essions symbolic i  
ntegration algebraic s  
implification summation of infinite s  
eries v  
ision printed matte  
r: recognition of individual hand-printed characters d  
rawn from a small class (e.g., n  
umerals) reading text in variable f  
onts reading passages in h  
andwriting reading Chinese or Japanese printed characters   
reading Chinese or Japanese handwritten c  
haracters p  
ictorial: h  
earing locating prespecified objects in p  
hotographs decomposition of a scene into separate o  
bjects identification of separate objects in a s  
cene recognition of objects portrayed in sketches by people   
recognition of human fa  
ces understanding spoken words drawn from a limited vocabulary (e.g., names of the ten d  
igits) understanding continuous speech in fixed d  
omains finding boundaries between p  
honemes identifying p  
honemes finding boundaries between morphem  
es identifying morpheme  
s putting together whole words and s  
entences understanding natural l  
anguages answering questions in specific d  
omains parsing complex s  
entences making paraphrases of longer pieces of t  
ext using knowledge of the real world in order to u  
nderstand p  
assages resolving ambiguous referenc  
es producing natural l  
anguage abstract poetry (e.g., h  
aiku) random sentences, paragraphs, or longer pieces of t  
ext producing output from internal representation of k  
nowledg
 
646
 
creating original thoughts or works of a  
rt poetry writing (  
haiku) story w  
riting computer a  
rt musical c  
omposition a  
tonal t  
onal analogical t  
hinking geometrical shapes ("intelligence t  
ests") constructing proofs in one domain of mathematics based o  
n those in a related d  
omain l  
earning adjustment of param  
eters concept f  
ormatio
 
646
 
Mechanical Translatio  
 
646
 
Many of the preceding topics will not be touched upon in my s  
elective discussion below, but the list would not be accurate without them. The fi  
rst few topics are listed in historical order. In each of them, early efforts fe  
ll short of expectations. For example, the pitfalls in mechanical trans  
lation came as a great surprise to many who had thought it was a near  
ly straightforward task, whose perfection, to be sure, would be arduous, but   
whose basic implementation should be easy. As it turns out, translation i  
s far more complex than mere dictionary look-up and word r  
earranging. Nor is the difficulty caused by a lack of knowledge of idiomatic p  
hrases. The fact is that translation involves having a mental model of the w  
orld being discussed, and manipulating symbols in that model. A p  
rogram which makes no use of a model of the world as it reads the passage will s  
oon get hopelessly bogged down in ambiguities and multiple meanings. E  
ven people-who have a huge advantage over computers, for they come f  
ully equipped with an understanding of the world-when given a piece of text  
and a dictionary of a language they do not know, find it next to impossib  
le to translate the text into their own language. Thus-and it is not s  
urprising in retrospect-the first problem of AI led immediately to the issues at t  
he heart of A
 
646
 
Computer Che  
s
 
646
 
Computer chess, too, proved to be much more difficult than the e  
arly intuitive estimates had suggested. Here again it turns out that the w  
ay humans represent a chess situation in their minds is far more complex than   
just knowing which piece is on which square, coupled with knowledge o  
f the rules of chess. It involves perceiving configurations of several r  
elated pieces, as well as knowledge of heuristics, or rules of thumb, which pertain t  
 
647
 
such higher-level chunks. Even though heuristic rules are not rigorous i  
n the way that the official rules are, they provide shortcut insights into what i  
s going on on the board, which knowledge of the official rules does not. T  
his much was recognized from the start; it was simply underestimated h  
ow large a role the intuitive, chunked understanding of the chess world p  
lays in human chess skill. It was predicted that a program having some b  
asic heuristics, coupled with the blinding speed and accuracy of a computer t  
o look ahead in the game and analyze each possible move, would easily b  
eat top-flight human players-a prediction which, even after twenty-five y  
ears of intense work by various people, still is far from being realize  
 
647
 
People are nowadays tackling the chess problem from various angles.  
One of the most novel involves the hypothesis that looking ahead is a s  
illy thing to do. One should instead merely look at what is on the board a  
t present, and, using some heuristics, generate a plan, and then find a m  
ove which advances that particular plan. Of course, rules for the formulation o  
f chess plans will necessarily involve heuristics which are, in some sense,  
"flattened" versions of looking ahead. That is, the equivalent of m  
any games' experience of looking ahead is "squeezed" into another form w  
hich ostensibly doesn't involve looking ahead. In some sense this is a game o  
f words. But if the "flattened" knowledge gives answers more efficiently t  
han the actual look-ahead-even if it occasionally m isleads-then s  
omething has been gained. Now this kind of distillation of knowledge into m  
ore highly usable forms is just what intelligence excels at-so look-ahead-le  
ss chess is probably a fruitful line of research to push. Particularly int  
riguing would be to devise a program which itself could convert knowledge gaine  
d from looking ahead into "flattened" rules-but that is an immense tas  
 
647
 
Samuel's Checker P  
rogra
 
647
 
As a matter of fact, such a method was developed by Arthur Samuel in h  
is admirable checker-playing program. Samuel's trick was to use both dyn  
amic (look-ahead) and static (no-look-ahead) ways of evaluating any given b  
oard position. The static method involved a simple mathematical function o  
f several quantities characterizing any board position, and thus could b  
e calculated practically instantaneously, whereas the dynamic e  
valuation method involved creating a "tree" of possible future moves, responses t  
o them, responses to the responses, and so forth (as was shown in Fig. 38). I  
n the static evaluation function there were some parameters which c  
ould vary; the effect of varying them was to provide a set of different p  
ossible versions of the static evaluation function. Samuel's strategy was to select, i  
n an evolutionary way, better and better values of those parameters.
 
647
 
Here's how this was done: each time time the program evaluated a  
board position, it did so both statically and dynamically. The answer g  
otten by looking ahead-let us call it D-waïżœ used in determining the move to b  
e made. The purpose of S, the static evaluation, was trickier: on each m  
ove, the variable parameters were readjusted slightly so that S approximated D
 
648
 
as accurately as possible. The effect was to partially encode in the values o  
f the static evalution's parameters the knowledge gained by dynamic  
ally searching the tree. In short, the idea was to "flatten" the complex dyna  
mic evaluation method into the much simpler and more efficient static evaluation functi
 
648
 
There is a rather nice recursive effect here. The point is that t  
he dynamic evaluation of any single board position involves looking ahead a  
finite number of moves-say seven. Now each of the scads of board positions which might turn up seven turns down the road has to be i  
tself evaluated somehow as well. But when the program evaluates these positions, it certainly cannot look another seven moves ahead, lest it have t  
o look fourteen positions ahead, then twenty-one, etc., etc.-an infinite regress. Instead, it relies on static evaluations of positions seven moves a  
head. Therefore, in Samuel's scheme, an intricate sort of feedback takes p  
lace, wherein the program is constantly trying to "flatten" look-ahead e  
valuation into a simpler static recipe; and this recipe in turn plays a key role in the  
dynamic look-ahead evaluation. Thus the two are intimately linked together, and each benefits from improvements in the other in a r  
ecursive w  
a
 
648
 
The level of play of the Samuel checkers program is extremely high: o  
f the order of the top human players in the world. If this is so, why not a  
pply the same techniques to chess? An international committee, convened i  
n 1961 to study the feasibility of computer chess, including the Dutch International Grandmaster and mathematician Max Euwe, came to the b  
leak conclusion that the Samuel technique would be approximately one m  
illion times as difficult to implement in chess as in checkers, and that seems to   
close the book on that.
 
648
 
The extraordinarily great skill of the checkers program cannot b  
e taken as saying "intelligence has been achieved"; yet it should not b  
e minimized, either. It is a combination of insights into what checkers is, h  
ow to think about checkers, and how to program. Some people might feel tha  
t all it shows is Samuel's own checkers ability. But this is not true, for at l  
east two reasons. One is that skillful game players choose their moves a  
ccording to mental processes which they do not fully understand-they use the  
ir intuitions. Now there is no known way that anyone can bring to light all o  
f his own intuitions; the best one can do via introspection is to use "fee  
ling" or "meta-intuition"-an intuition about one's intuitions-as a guide, and   
try to describe what one thinks one's intuitions are all about. But this w  
ill only give a rough approximation to the true complexity of int  
uitive methods. Hence it is virtually certain that Samuel has not mirrored his o  
wn personal methods of play in his program. The other reason that Samu  
el's program's play should not be confused with Samuel's own play is t  
hat Samuel does not play checkers as well as his program-it beats him. This i  
s not a paradox at all-no more than is the fact that a computer which h  
as been programmed to calculate 1r can outrace its programmer in s  
pewing forth digits of 1
 
649
 
When Is a Program Origin
 
649
 
This issue of a program outdoing its programmer is connected with t  
he question of "originality" in Al. What if an AI program comes up with a  
n idea, or a line of play in a game, which its programmer has n  
ever entertained-who should get the credit? There are various i  
nteresting instances of this having happened, some on a fairly trivial level, some on a  
rather deep level. One of the more famous involved a program to fi  
nd proofs of theorems in elementary Euclidean geometry, written by E  
. Gelernter. One day the program came up with a sparklingly i  
ngenious proof of one of the basic theorems of geometry-the so-called "po  
ns asinorum", or "bridge of a
 
649
 
This theorem states that the base angles of an isosceles triangle a  
re equal. Its standard proof requires constructing an altitude which div  
ides the triangle into symmetrical halves. The elegant method found by t  
he program (see Fig. 114) used no construction lines. Instead, it c  
onsidere
 
649
 
FIGURE I 14. Pons Asinorum Proof  
(found by Pappus [ -300 A.D. ] a  
nd Gelernter's program [ -1960 A.D. ]). Problem: To show that the base angles of a  
n isosceles triangle are equal. Solution: As th  
e triangle is isosceles, AP and AP' are o  
f equal length. Therefore triangfRs PAP' a  
nd P'AP are congruent (side-side-side). T  
his implies that corresponding angfRs are e  
qual. In particular, the two base angles are e  
qua
 
649
 
the triangle and its mirror image as two different triangles. Then, h  
aving proved them congruent, it pointed out that the two base angles m  
atched each other in this congruence-QED.
 
649
 
This gem of a proof delighted the program's creator and others; some   
saw evidence of genius in its performance. Not to take anything away f  
rom this feat, it happens that in A.D. 300 the geometer Pappus had a  
ctually found this proof, too. In any case, the question remains: "Who gets t  
he credit?" Is this intelligent behavior? Or was the proof lying deeply h  
idden within the human (Gelernter), and did the computer merely bring it to t  
he surface? This last question comes close to hitting the mark. We can turn it   
around: Was the proo flying deeply hidden in the program? Or was it c  
lose to the surface? That is, how easy is it to see why the program did what i  
t did? Can the discovery be attributed to some simple mechanism, or simp  
le combination of mechanisms, in the program? Or was there a c  
omplex interaction which, if one heard it explained, would not diminish one's a  
we at its having h
 
649
 
It seems reasonable to say that if one can ascribe the performance to   
certain operations which are easily traced in the program, then in s  
ome sense the program was just revealing ideas which were in essence hiddenthough not too deeply-inside the programmer's own mind. Conversely, i  
 
650
 
following the program does not serve to enlighten one as to why t  
his particular discovery popped out, then perhaps one should begin to separate the program's "mind" from that of its programmer. The human g  
ets credit for having invented the program, but not for having had inside h  
is own head the ideas produced by the program. In such cases, the h  
uman can be referred to as the "meta-author"-the author of the author of t  
he result, and the program as the (just plain) a
 
650
 
In the particular case of Gelernter and his geometry machine, w  
hile Gelernter probably would not have rediscovered Pappus' proof, still t  
he mechanisms which generated that proof were sufficiently close to t  
he surface of the program that one hesitates to call the program a geometer i  
n its own right. If it had kept on astonishing people by coming up w  
ith ingenious new proofs over and over again, each of which seemed to b  
e based on a fresh spark of genius rather than on some standard m  
ethod, then one would have no qualms about calling the program a g  
eometer-but this did not h  
appe
 
650
 
Who Composes Computer M  
usi
 
650
 
The distinction between author and meta-author is sharply pointed up i  
n the case of computer composition of music. There are various levels o  
f autonomy which a program may seem to have in the act of c  
omposition. One level is illustrated by a piece whose "meta-author" was Max Mathews o  
f Bell Laboratories. He fed in the scores of the two marches "When J  
ohnny Comes Marching Home" and "The British Grenadiers", and instructed t  
he computer to make a new score--one which starts out as "Johnny", b  
ut slowly merges into "Grenadiers". Halfway through the piece, "Johnny" i  
s totally gone, and one hears "Grenadiers" by itself ... Then the process i  
s reversed, and the piece finishes with "Johnny", as it began. In Mathew  
s' own words, this is   
... a nauseating musical experience but one not without interest, particu  
larly in the rhythmic conversions. "The Grenadiers" is written in 2/4 time in t  
he key of F major. "Johnny" is written in 6/8 time in the key of E minor. T  
he change from 2/4 to 6/8 time can be clearly appreciated, yet would be q  
uite difficult for a human musician to play. The modulation from the key of F   
major to E minor, which involves a change of two notes in the scale, is j  
arring, and a smaller transition would undoubtedly have been a better c hoice.1  
2 The resulting piece has a somewhat droll quality to it, though in spots it i  
s turgid and c  
onfused. Is the computer composing? The question is best unasked, but it cannot b  
e completely ignored. An answer is difficult to provide. The algorithms a  
re deterministic, simple, and understandable. No complicated or hard-tounderstand computations are involved; no "learning" programs are used; n  
o random processes occur; the machine functions in a perfectly mechanical a  
nd straightforward manner. However, the result is sequences of sound that a  
re unplanned in fine detail by the composer, even though the over-all s  
tructur
 
651
 
of the section is completely and precisely specified. Thus the composer is   
often surprised, and pleasantly surprised, at the details of the realization o  
f his ideas. To this extent only is the computer composing. We call the p  
rocess algorithmic composition, but we immediately re-emphasize that the algorithms are transparently s imple.1
 
651
 
This is Mathews' answer to a question which he would rather "una  
sk". Despite his disclaimer, however, many people find it easier to say s  
imply that the piece was "composed by a computer". I believe this phrase misrepresents the situation totally. The program contained no structures   
analogous to the brain's "symbols", and could not be said in any sense to b  
e "thinking" about what it was doing. To attribute the composition of such a  
piece of music to the computer would be like attributing the authorship o  
f this book to the computerized automatically (often incorrectly) hyphenating phototypesetting machine with which it was s
 
651
 
This brings up a quest10n which is a slight digression from AI, b  
ut actually not a huge one. It is this: When you see the word "I" or "me" in a  
text, what do you take it to be referring to? For instance, think of t  
he phrase "WASH ME" which appears occasionally on the back of dirty t  
rucks. Who is this "me"? Is this an outcry of some forlorn child who, in desperation to have a bath, scribbled the words on the nearest surface? Or is t  
he truck requesting a wash? Or, perhaps, does the sentence itself wish to b  
e given a shower? Or, is it that the filthy English language is asking to b  
e cleansed? One could go on and on in this game. In this case, the phrase is a  
joke, and one is supposed to pretend, on some level, that the truck i  
tself wrote the phrase and is requesting a wash. On another level, one clear  
ly recognizes the writing as that of a child, and enjoys the humor of t  
he misdirection. Here, in fact, is a game based on reading the "me" at t  
he wrong l
 
651
 
Precisely this kind of ambiguity has arisen in this book, first in t  
he C ontracrostipunctus, and later in the discussions of Godel's string G (and i  
ts relatives). The interpretation given for unplayable records was "I Canno  
t Be Played on Record Player X", and that for unprovable statements was, "  
I Cannot Be Proven in Formal System X". Let us take the latter sentence. O  
n what other occasions, if any, have you encountered a sentence c  
ontaining the pronoun "I" where you automatically understood that the r  
eference was not to the speaker of the sentence, but rather to the sentence itsel  
f? Very few, I would guess. The word "I", when it appears in a S  
hakespeare sonnet, is referring not to a fourteen-line form of poetry printed on a p  
age, but to a flesh-and-blood creature behind the scenes, somewhere off s  
tag
 
651
 
How far back do we ordinarily trace the "I" in a sentence? The a  
nswer, it seems to me, is that we look for a sentient being to attach the a  
uthorship to. But what is a sentient being? Something onto which we can m  
ap ourselves comfortably. In Weizenbaum's "Doctor" program, is there a  
personality? If so, whose is it? A small debate over this very q  
uestion recently raged in the pages of Science m
 
651
 
This brings us back to the issue of the "who" who composes c  
omputer music. In most circumstances, the driving force behind such pieces is a
 
652
 
human intellect, and the computer has been employed, with more or l  
ess ingenuity, as a tool for realizing an idea devised by the human. The program which carries this out is not anything which we can identify with. It i  
s a simple and single-minded piece of software with no flexibility, n  
o perspective on what it is doing, and no sense of self. If and when, h  
owever, people develop programs which have those attributes, and pieces of m  
usic start issuing forth from them, then I suggest that will be the appro  
priate time to start splitting up one's admiration: some to the programmer f  
or creating such an amazing program, and some to the program itself for i  
ts sense of music. And it seems to me that that will only take place when t  
he internal structure of such a program is based on something similar to t  
he "symbols" in our brains and their triggering patterns, which are resp  
onsible for the complex notion of meaning. The fact of having this kind of i  
nternal structure would endow the program with properties which would make u  
s feel comfortable in identifying with it, to some extent. But until then, I w  
ill not feel comfortable in saying "this piece was composed by a c  
ompute
 
652
 
Theorem Proving and Problem Reduc  
tio
 
652
 
Let us now return to the history of AI. One of the early things which p  
eople attempted to program was the intellectual activity of theorem p  
roving. Conceptually, this is no different from programming a computer to l  
ook for a derivation of MU in the MIU-system, except that the formal s  
ystems involved were often more complicated than the MIU-system. They were   
versions of the Predicate Calculus, which is an extension of the Propositional Calculus involving quantifiers. Most of the rules of the Pre  
dicate Calculus are included in TNT, as a matter of fact. The trick in writing s  
uch a program is to instill a sense of direction, so that the program does n  
ot wander all over the map, but works only on "relevant" pathways-th  
ose which, by some reasonable criterion, seem to be leading towards the desired s
 
652
 
In this book we have not dealt much with such issues. How indeed c  
an you know when you are proceeding towards a theorem, and how can y  
ou tell if what you are doing is just empty fiddling? This was one thing which I  
hoped to illustrate with the MU-puzzle. Of course, there can be no definitive answer: that is the content of the limitative Theorems, since if y  
ou could always know which way to go, you could construct an algorithm f  
or proving any desired theorem, and that would violate Church's T  
heorem. There is no such algorithm. (I will leave it to the reader to see exactly w  
hy this follows from Church's Theorem.) However, this doesn't mean that it i  
s impossible to develop any intuition at all concerning what is and what is n  
ot a promising route; in fact, the best programs have very s  
ophisticated heuristics, which enable them to make deductions in the Predicate C  
alculus at speeds which are comparable to those of capable h  
uman
 
652
 
The trick in theorem proving is to to use the fact that you have a  
n overall goal-namely the string you want to produce-in guiding y  
ou locally. One technique which was developed for converting global g  
oal
 
653
 
into local strategies for derivations is called problem reduction. It is based on   
the idea that whenever one has a long-range goal, there are usually s  
ubgoals whose attainment will aid in the attainment of the main goal. Therefore i  
f one breaks up a given problem into a series of new subproblems, t  
hen breaks those in turn into subsubproblems, and so on, in a recursive fash  
ion, one eventually comes down to very modest goals which can presumably b  
e attained in a couple of steps. Or at least so it would seem
 
653
 
PrQblem reduction got Zeno into hot water. Zeno's method, you r  
ecall, for getting from A to B (think of B as the goal), is to "reduce" the p  
roblem into two subproblems: first go halfway, then go the rest of the way. So n  
ow you have "pushed"-in the sense of Chapter V-two subgoals onto y  
our "goal stack". Each of these, in turn, will be replaced by two subsubgoalsand so on ad infinitum. You wind up with an infinite goal-stack, instead of a  
single goal (Fig. 1 15). Popping an infinite number of goals off your s  
tack will prove to be tricky-which is just Zeno's point, of c  
ours
 
653
 
Another example of an infinite recursion in problem reduction occurred in the Dialogue Little Harmonic Labyrinth, when Achilles wanted to h  
ave a Typeless Wish granted. Its granting had to be deferred until p  
ermission was gotten from the Meta-Genie; but in order to get permission to g  
ive permission, she had to summon the Meta-Meta-Genie-and so on. Despit  
 
653
 
FIGURE 115. Zeno's endless goal t ree,for getting from A to B .
 
654
 
the infiniteness of the goal stack, Achilles got his wish. Problem r  
eduction wins the day
 
654
 
Despite my mockery, problem reduction is a powerful technique f  
or converting global problems into local problems. It shines in certain situations, such as in the endgame of chess, where the look-ahead t  
echnique often performs miserably, even when it is carried to ridiculous l  
engths, such as fifteen or more plies. This is because the look-ahead technique i  
s not based on planning; it simply has no goals and explores a huge n  
umber of pointless alternatives. Having a goal enables you to develop a strategy f  
or the achievement of that goal, and this is a completely different p  
hilosophy from looking ahead mechanically. Of course, in the look-ahead t  
echnique, desirability or its absence is measured by the evaluation function for positions, and that incorporates indirectly a number of goals, principally that o  
f not getting checkmated. But that is too indirect. Good chess players w  
ho play against look-ahead chess programs usually come away with the impression that their opponents are very weak in formulating plans o  
r s
 
654
 
Shandy and the B  
on
 
654
 
There is no guarantee that the method of problem reduction will wor  
k. There are many situations where it flops. Consider this simple problem, f  
or instance. You are a dog, and a human friend has just thrown your f  
avorite bone over a wire fence into another yard. You can see your bone t  
hrough the fence,just lying there in the grass-how luscious! There is an open g  
ate in the fence about fifty feet away from the bone. What do you do? S  
ome dogs will just run up to the fence, stand next to it, and b ark; others will d  
ash up to the open gate and double back to the lovely bone. Both dogs can b  
e said to be exercising the problem reduction technique; however, t  
hey represent the problem in their minds in different ways, and this makes a  
ll the difference. The barking dog sees the subproblems as (1) running to t  
he fence, (2) getting through it, and (3) running to the bone-but that s  
econd subproblem is a "toughie", whence the barking. The other dog sees t  
he subproblems as (1) getting to the gate; (2) going through the gate; (  
3) running to the bone. Notice how everything depends on the way y  
ou represent the "problem space"-that is, on what you perceive as red  
ucing the problem (forward motion towards the overall goal) and what y  
ou perceive as magnifying the problem (backward motion away from the g  
oa
 
654
 
Changing the Problem Spac  
 
654
 
Some dogs first try running directly towards the bone, and when t  
hey encounter the fence, something clicks inside their brain; soon they c  
hange course, and run over to the gate. These dogs realize that what on firs  
 
655
 
glance seemed as if it would increase the distance between the initial situation and the desired situation-na mely, running away from the bone but   
towards the open gate-actually would decrease it. At first, they c  
onfuse physical distance with problem distance. Any motion away from the b  
one seems, by definition, a Bad Thing. But then-somehow-they realize that   
they can shift their perception of what will bring them "closer" to the b  
one. In a properly chosen abstract space, moving towards the gate is a traj  
ectory bringing the dog closer to the bone! At every moment, the dog is g  
etting "closer"-in the new sense-to the bone. Thus, the usefulness of pro  
blem reduction depends on how you represent your problem mentally. What i  
n one space looks like a retreat can in another space look like a revolutionar  
y step forward.
 
655
 
In ordinary life, we constantly face and solve variations on the dogand-bone problem. For instance, if one afternoon I decide to drive o  
ne hundred miles south, but am at my office and have ridden my bike to wor  
k, I have to make an extremely large number of moves in what are o  
stensibly "wrong" directions before I am actually on my way in car headed south. I   
have to leave my office, which means, say, heading east a few feet; t  
hen follow the hall in the building which heads north, then west. Then I r  
ide my bike home, which involves excursions in all the directions of the compass; and I reach my home. A succession of short moves there eventu  
ally gets me into my car, and I am off. Not that I immediately drive due s  
outh, of course-I choose a route which may involve some excursions north,  
west, or east, with the aim of getting to the freeway as quickly as p  
ossibl
 
655
 
All of this doesn't feel paradoxical in the slightest; it is done w  
ithout even any sense of amusement. The space in which physical backtracking i  
s perceived as direct motion towards the goal is built so deeply into my m  
ind that I don't even see any irony when I head north. The roads and hall  
ways and so forth act as channels which I accept without much fight, so that p  
art of the act of choosing how to perceive the situation involves just a  
ccepting what is imposed. But dogs in front of fences sometimes have a hard t  
ime doing that, especially when that bone is sitting there so close, staring t  
hem in the face, and looking so good. And when the problem space is just a  
shade more abstract than physical space, people are often just as lacking i  
n insight about what to do as the barking d
 
655
 
In some sense all problems are abstract versions of the dog-and-  
bone problem. Many problems are not in physical space but in some sort o  
f conceptual space. When you realize that direct motion towards the goal i  
n that space runs you into some sort of abstract "fence", you can do one o  
f two things: (1) try moving away from the goal in some sort o fr.andom w  
ay, hoping that you may come upon a hidden "gate" through which you c  
an pass and then reach your bone; or (2) try to find a new "space" in which y  
ou can represent the problem, and in which there is no abstract fence separating you from your goal-then you can proceed straight towards the goal i  
n this new space. The first method may seem like the lazy way to go, and t  
he second method may seem like a difficult and complicated way to go. And   
yet, solutions which involve restructuring the problem space more oft  
e
 
656
 
than not come as sudden flashes of insight rather than as products of a  
series of slow, deliberate thought processes. Probably these intuitive fl  
ashes come from the extreme core of intelligence-and needless to say, t  
heir source is a closely protected secret of our jealous b
 
656
 
In any case, the trouble is not that problem reduction per se leads t  
o failures; it is quite a sound technique. The problem is a deeper one: how d  
o you choose a good internal representation for a problem? What kind o  
f "space" do you see it in? What kinds of action reduce the "distanc  
e" between you and your goal in the space you have chosen? This can b  
e expressed in mathematical language as the problem of hunting for an   
approprate metric (distance function) between states. You want to find a  
metric in which the distance between you and your goal is very sma
 
656
 
Now since this matter of choosing an internal representation is itself a  
type of problem-and a most tricky one, too-you might think of turni  
ng the technique of problem reduction back on it! To do so, you would have t  
o have a way of representing a huge variety of abstract spaces, which is a  
n exceedingly complex project. I am not aware of anyone's having tri  
ed anything along these lines. It may be just a theoretically appealing, a  
musing suggestion which is in fact wholly unrealistic. In any case, what AI s  
orely lacks is programs which can "step back" and take a look at what is going o  
n, and with this perspective, reorient themselves to the task at hand. It is o  
ne thing to write a program which excels at a single task which, when done b  
y a human being, seems to require intelligence-and it is another t  
hing altogether to write an intelligent program! It is the difference between t  
he Sphex wasp (see Chapter XI), whose wired-in routine gives the d  
eceptive appearance of great intelligence, and a human being observing a S  
phex w
 
656
 
The I-Mode and the M-Mode Again
 
656
 
An intelligent program would presumably be one which is versatile e  
nough to solve problems of many different sorts. It would learn to do each  
different one and would accumulate experience in doing so. It would b  
e able to work within a set of rules and yet also, at appropriate moments, t  
o step back and make a judgment about whether working within that set o  
f rules is likely to be profitable in terms of some overall set of goals which i  
t has. It would be able to choose to stop working within a given framework, i  
f need be, and to create a new framework of rules within which to work for a  
w  
hil
 
656
 
Much of this discussion may remind you of aspects of the MU-p  
uzzle. For instance, moving away from the goal of a problem is reminiscent o  
f moving away from MU by making longer and longer strings which y  
ou hope may in some indirect way enable you to make MU. If you are a naive  
"dog", you may feel you are moving away from your "MU-bone" w  
henever your string increases beyond two characters; if you are a more s  
ophisticated dog, the use of such lengthening rules has an indirect justification, something like heading for the gate to get your MU-b
 
657
 
Another connection between the previous discussion and the MUpuzzle is the two modes of operation which led to insight about the n  
ature of the MU-puzzle: the Mechanical mode, and the Intelligent mode. In t  
he former, you are embedded within some fixed framewor k; in the latter, y  
ou can always step back and gain an overview of things. Having an overview i  
s tantamount to choosing a representation within which to work; and working within the rules of the system is tantamount to trying the technique o  
f problem reduction within that selected framework. Hardy's comment o  
n Ramanujan's style-particularly his willingness to modify his o  
wn hypotheses-illustrates this interplay between the M-mode and the I-mo  
de in creative thought.
 
657
 
The Sphex wasp operates excellently in the M-mode, but it has absolutely no ability to choose its framework or even to alter its M-mode in t  
he slightest. It has no ability to notice when the same thing occurs over a  
nd over and over again in its system, for to notice such a thing would be t  
o jump out of the system, even if only ever so slightly. It simply does n  
ot notice the sameness of the repetitions. This idea (of not noticing t  
he identity of certain repetitive eventïżœ) is interesting when we apply it t  
o ourselves. Are there highly repetitious situations which occur in our l  
ives time and time again, and which we handle in the identical stupid way e  
ach time, because we don't have enough of an overview to perceive t  
heir sameness? This leads back to that recurrent issue, "What is sameness?" I  
t will soon come up as an AI theme, when we discuss pattern recogniti  
o
 
657
 
Applying Al to M  
athematic
 
657
 
Mathematics is in some ways an extremely interesting domain to s  
tudy from the AI point of view. Every mathematician has the sense that there is a  
kind of metric between ideas in mathematics-that all of mathematics is a  
network of results between which there are enormously many connecti  
ons. In that network, some ideas are very closely linked; others require m  
ore elaborate pathways to be joined. Sometimes two theorems in m  
athematics are close because one can be proven easily, given the other. Other times t  
wo ideas are close because they are analogous, or even isomorphic. These a  
re two different senses of the word "dose" in the domain of m  
athematics. There are probably a number of others. Whether there is an objectivity or a  
universality to our sense of mathematical closeness, or whether it is l  
argely an accident of historical development is hard to say. Some theorems o  
f different branches of mathematics appear to us hard to link, and we. might   
say that they are unrelated-but something might turn up later w  
hich forces us to change our minds. If we could instill our highly d  
eveloped sense of mathematical closeness-a "mathematician's mental metric", so t  
o speak-into a program, we could perhaps produce a primitive "  
artificial mathematician". But that depends on being able to convey a sense o  
f sir,1plicity or "naturalness" as well, which is another major stumbling bloc
 
657
 
These issues have been confronted in a number of Al projects. Ther.
 
658
 
is a collection of programs developed at MIT which go under the n  
ame " MACSYMA", whose purpose it is to aid mathematicians in symbolic manipulation of complex mathematical expressions. This program has in i  
t some sense of "where to go"-a sort of "complexity gradient" which g  
uides it from what we would generally consider complex expressions to sim  
pler ones. Part of M ACSYMA's repertoire is a program called "SIN", w  
hich does symbolic integration of functions; it is generally acknowledged to b  
e superior to humans in some categories. It relies upon a number of different skills, as intelligence in general must: a vast body of knowledge, t  
he technique of problem reduction, a large number of heuristics, and a  
lso some special t
 
658
 
Another program, written by Douglas Lenat at Stanford, had as its a  
im to invent concepts and discover facts in very elementary mathem  
atics. Beginning with the notion of sets, and a collection of notions of what i  
s "interesting" which had been spoon-fed into it, it "invented" the idea o  
f counting, then the idea of addition, then multiplication, t  
hen-among other things-the notion of prime numbers, and it went so far as t  
o rediscover Goldbach's conjecture! Of course these "discoveries" were a  
ll hundreds-even thousands--of years old. Perhaps this may be explain  
ed in part by saying that the sense of "inteFesting" was conveyed by Lenat in a  
large number of rules which may have been influenced by his twentiethcentury training; nonetheless it is impressive. The program seemed to r  
un out of steam after this very respectable performance. An interesting t  
hing about it was that it was unable to develop or improve upon its own,sense o  
f what is interesting. That seemed another level of difficulty up--or p  
erhaps several levels u
 
658
 
The Crux of Al: Representation of K  
nowledg
 
658
 
Many of the examples above have been cited in order to stress that the w  
ay a domain is represented has a huge bearing on how that domain is "understood". A program which merely printed out theorems of TNT in a  
preordained order would have no understanding of number theory; a  
program such as Lenat's with its extra layers of knowledge could be said t  
o have a rudimentary sense of number theory; and one which embeds mathematical knowledge in a wide context of real-world experience w  
ould probably be the most able to "understand" in the sense that we think we do.   
It is t his-representation of knowledge that is at the crux of A
 
658
 
In the early days it was assumed that knowledge came in sentence-  
like "packets", and that the best way to implant knowledge into a program w  
as to develop a simple way of translating facts into small passive packets o  
f data. Then every fact would simply be a piece of data, accessible to t  
he programs using it. This is exemplified by chess programs, where b  
oard positions are coded into matrices or lists of some sort and stored efficient  
ly in memory where they can be retrieved and acted upon by s  
ubroutine
 
658
 
The fact that human beings store facts in a more complicated way w  
a
 
659
 
known to psychologists for quite a while and has only recently been rediscovered by Al workers, who are now confronting the problems o  
f "chunked" knowledge, and the difference between procedural and declarative types of knowledge, which is related, as we saw in Chapter XI, t  
o the difference between knowledge which is accessible to introspection a  
nd knowledge which is inaccessible to i ntr  
ospectio
 
659
 
The nai:ve assumption that all knowledge should be coded into p  
assive pieces of data is actually c ontradicted by the most fundamental fact a  
bout computer design: that is, how to add, subtract, multiply, and so on is n  
ot coded into pieces of data and stored in memo ry; it is, in fact, represent  
ed nowhere in memory, but rather in tht· wiring patterns of the hardware. A  
pocket calculator does not store in its memory knowledge of how to add;  
that knowledge is encoded into its "guts". There is no memory location t  
o point to if somebody demands, "Show me where the knowledge of how t  
o add resides in this machine!
 
659
 
A large amount of work in AI has nevertheless gone into systems i  
n which the bulk of the knowledge is stored in specific places-that is, declaratively. It goes without saying that some knowledge has to be e  
mbodied in programs; otherwise one would not have a program at all, but merely a  
n encyclopedia. The question is how to split up knowledge between p  
rogram and data. Not that it is always easy to distinguish between program a  
nd data, by any means. I hope that was made clear enough in Chapter X  
VI. But in the development of a system, if the programmer intuitively conceives of some particular item as data (or as program), that may h  
ave significant repercussions on the system's structure, because as one programs one does tend to distinguish between data-like objects and programlike obj
 
659
 
It is important to point out that in principle, any manner of c  
oding information into data structures or procedures is as good as any other, i  
n the sense that if you are not too concerned about efficiency, what you c  
an do in one scheme, you can do in the other. However, reasons can be g  
iven which seem to indicate that one method is definitely superior to the other  
. For instance, consider the following argument in favor of using p  
rocedural representations only: "As soon as you try to encode features of suffic  
ient complexity into data, you are forced into developing what amounts to a  
new language, or formalism. So in effect your data s tructures b  
ecome program-like, with some piece of your program serving as their interpreter; you might as well represent the same information directly in p  
rocedural form to begin with, and obviate the extra level of interpretation
 
659
 
DNA and Proteins Help Give Some Perspec  
tiv
 
659
 
This argument sounds quite convincing, and yet, if interpreted a l  
ittle loosely, it can be read as an argument for the abolishment of DNA a  
nd RNA. Why encode genetic information in DNA, when by representing it   
directly in proteins, you could eliminate not just one, but two levels o  
f interpretation? The answer is: it turrn. out that it is extremely useful to h  
av
 
660
 
the same information in several different forms for different pur  
poses. One advantage of storing genetic information in the modular and d  
ata-like form of DNA is that two individuals' genes can be easily recombined t  
o form a new genotype. This would be very difficult if the information w  
ere only in proteins. A second reason for storing information in DNA is that i  
t is easy to transcribe and translate it into proteins. When it is not needed, i  
t does not take up much room; when it is needed, it serves as a t  
emplate. There is no mechanism for copying one protein off of another; t  
heir folded tertiary structures would make copying highly unwieldy. Complementarily, it is almost imperative to be able to get genetic inf  
ormation into three-dimensional structures such as enzymes, because the r  
ecognition and manipulation of molecules is by its nature a three-dimensional operation. Thus the argument for purely procedural representations is seen t  
o be quite fallacious in the context of cells. It suggests that there are advantages to being able to switch back and forth between procedural a  
nd declarative representations. This is probably true also in A
 
660
 
This issue was raised by Francis Crick in a conference on communica-tion with e xtraterrestrial intelligence:  
We see on Earth that there are two molecules, one of which is good f  
or replication [DNA] and one of which is good for action [protei ns]. ls it   
possible to devise a system in which one molecule does both jobs, or are th  
ere perhaps strong arguments, from systems analysis, which might suggest (  
if they exist) that to divide the job into two gives a great advantage? This is a   
question to which I do not know the answer.1  
 
660
 
Modularity of Know  
ledg
 
660
 
Another question which comes up in the representation of knowledge i  
s modularity. How easy is it to insert new knowledge? How easy is it to r  
evise old knowledge? How modular are books? It all depends. If from a t  
ightly structured book with many cross-references a single chapter is removed,  
the rest of the book may become virtually incomprehensible. It is like try  
ing to pull a single strand out of a spider web-you ruin the whole in doing s  
o. On the other hand, some books are quite modular, having indepen  
dent chapt
 
660
 
Consider a straightforward theorem-generating program which u  
ses TNT's axioms and rules of inference. The "knowledge" of such a p  
rogram has two aspects. It resides implicitly in the axioms and rules, and e  
xplicitly in the body of theorems which have so far been produced. Depending o  
n which way you look at the knowledge, you will see it either as modular or a  
s spread all around and completely nonmodular. For instance, suppose y  
ou had written such a program but had forgotten to include TNT's Axiom 1 i  
n the list of axioms. After the program had done many thousands of derivations, you realized your oversight, and inserted the new axiom. The f  
act that you can do so in a trice shows that the system's implicit knowledge i  
s modular; but the new axiom's c ontribution to the explicit knowledge of t  
he system will only be reflected after a long time-after its effects have "d
 
661
 
fused" outwards, as the odor of perfume slowly diffuses in a room w  
hen the bottle is broken. In that sense the new knowledge takes a long time to b  
e incorporated. Furthermore, if you wanted to go back and replace Axiom 1  
by its negation, you could not just do that by itself; you would have to d  
elete all theorems which had involved Axiom l in their derivations. Clearly this   
system's explicit knowledge is not nearly so modular as its implicit knowle
 
661
 
It would be useful if we learned how to transplant knowledge modularly. Then to teach everyone French, we would just open up their h  
eads and operate in a fixed way on t heir neural structures-then they w  
ould know how to speak French. Of course, this is only a hilarious pipe d
 
661
 
Another aspect of knowledge representation has to do with the way i  
n which one wishes to use the knowledge. Are inferences supposed to b  
e drawn as pieces of information arrive? Should analogies and c  
omparisons constantly be being made between new information and old i  
nformation? In a chess program, for instance, if you want to generate look-ahead tre  
es, then a representation which encodes board positions with a minimum o  
f redundancy will be preferable to one which repeats the information i  
n several different ways. But if you want your program to "understand" a  
board position by looking for patterns and comparing them to k  
nown patterns, then representing the same information several times over i  
n different forms will be more u
 
661
 
Representing Knowledge in a Logical F  
ormalis
 
661
 
There are various schools of thought concerning the best way to represent  
and manipulate knowledge. One which has had great influence advoc  
ates representations using formal notations similar to those for TNT  
-using propositional connectives and quantifiers. The basic operations in s  
uch representations are, not surprisingly, formalizations of deductive reasoning. Logical deductions can be made using rules of inference analogous t  
o some of those in TNT. Querying the system about some particular idea s  
ets up a goal in the form of a string to be derived. For example: "Is MU MON a  
theorem?" Then the automatic reasoning mechanisms take over in a goaloriented way, using various methods of problem r
 
661
 
For example, suppose that the proposition "All formal arithmetics a  
re incomplete" were known, and the program were queried, "Is Princi  
pia Mathematic a incomplete?" In scanning the list of known facts-often c  
alled the data base-the system might notice that if it could establish that Princi  
pia Mathematica is a formal arithmetic, then it could answer the q  
uestion. Therefore the proposition "Principia Mathematica is a formal a  
rithmetic" would be set up as a subgoal, and then problem reduction would take o  
ver. If it could find further things which would help in establishing (or r  
efuting) the goal or-the subgoal, it would work on them-and so on, recursively  
. This process is given the name of backwards chaining, since it begins with t  
he goal and works its way backwards, presumably towards things which m  
ay already be known. If one makes a graphic representation of the main g
 
662
 
subsidiary goals, subsubgoals, etc., a tree-like structure will arise, since t  
he main goal may involve several different subgoals, each of which in t  
urn involves several subsubgoals, e  
t
 
662
 
Notice that this method is not guaranteed to resolve the question, f  
or there may be no way of establishing within the system that Princip  
i,aMathematica is a formal arithmetic. This does not imply, however, t  
hat either the goal or the subgoal is a false statement-merely that they c  
annot be derived with the knowledge currently available to the system. T  
he system may print out, in such a circumstance, "I do not know" or words t  
o that effect. The fact that some questions are left open is of course similar t  
o the incompleteness from which certain well-known formal systems s
 
662
 
Deductive vs. Analogical Awarene  
s
 
662
 
This method affords a deductive awareness of the domain that is represente  
d, in that correct logical conclusions can be drawn from known facts. However, it misses something of the human ability to spot similarities and t  
o compare situations-it misses what might be called analogi.cal a  
wareness-a crucial side of human intelligence. This is not to say that analogical t  
hought processes cannot be forced into such a mold, but they do not lend themselves naturally to being captured in that kind of formalism. These d  
ays, logic-oriented systems are not so much in vogue as other kinds, which a  
llow complex forms of comparisons to be carried out rather n
 
662
 
When you realize that knowledge representation is an altogether different ball game than mere storage of numbers, then the idea that "  
a computer has the memory of an elephant" is an easy myth to e  
xplode. What is stored in memory is not necessarily synonymous with what a p  
rogram knows; for even if a given piece of knowledge is encoded somewhere i  
nside a complex system, there may be no procedure, or rule, or other type o  
f handler of data, which can get at it-it may be inaccessible. In such a c  
ase, you can say that the piece of knowledge has been "forgotten" b  
ecause access to it has been temporarily or permanently lost. Thus a c  
omputer program may "forget" something on a high level which it "remembers" o  
n a low level. This is another one of those ever-recurring level distinctions,   
from which we can probably learn much about our own selves. When a  
human forgets, it most likely means that a high-level pointer has b  
een lost-not that any information has been deleted or destroyed. This highlights the extreme importance of keeping track of the ways in which you   
store incoming experiences, for you never know in advance under what   
circumstances, or from what angle, you will want to pull something out o  
f s  
torag
 
662
 
From Computer Haiku to an RTN-G  
ramma
 
662
 
The complexity of the knowledge representation in human heads first h  
it home with me when I was working on a program to generate E  
nglish sentences "out of the blue". I had come to this project in a rather in
 
663
 
ing way. I had heard on the radio a few examples of so-called "  
Computer Haiku". Something about them struck me deeply. There was a large element of humor and simultaneously mystery to making a computer generate something which ordinarily would be considered an artistic creation. I  
was highly amused by the humorous aspect, and I was very motivated b  
y the mystery--even contradiction-of programming creative acts. So I s  
et out to write a program even more mysteriously contradictory and humorous than the haiku p
 
663
 
At first I was concerned with making the grammar flexible and recursive, so that one would not have the sense that the program was m  
erely filling in the blanks in some template. At about that time I ran across a  
Scientific American article by Victor Yngve in which he described a s  
imple but flexible grammar which could produce a wide variety of sentences o  
f the type found in some children's books. I modified some of the ideas I'd  
gleaned from that article and came up with a set of procedures w  
hich formed a Recursive Transition Network grammar, as described in Chapter   
V .In this grammar, the selection of words in a sentence was determined b  
ya process which began by selecting--at random-the overall structure o  
f the sentence; gradually the decision-making process trickled down t  
hrough lower levels of structure until the word level and the letter level w  
erereached. A lot had to be done below the word level, such as inflecting v  
erbs and making plurals of nouns; also irregular verb and noun forms were fi  
rst formed regularly, and then if they matched entries in a table, sub  
stitutions of the proper (irregular) forms were made. As each word reached its fi  
nal form, it was printed out. The program was like the proverbial monkey at a  
typewriter, but operating on several levels of linguistic struc  
ture simultaneously-not just the letter l
 
663
 
In the early stages of developing the program, I used a totally s  
illy vocabulary-deliberately, since I waïżœ aiming at humor. It produced a lot o  
f nonsense sentences, some of which had very complicated structures, o  
thers of which were rather short. Some excerpts are shown b  
elow: A male pencil who must laugh clumsily would quack. M  
ust program not always crunch girl at memory? The decimal b  
ug which spits clumsily might tumble. Cake who does sure take a  
n unexpected man within relationship might always dump c  
ard. Program ought run chee  
rfully. The worthy machine ought not always paste the a  
stronomer. Oh, program who ought really run off of the girl writes musician for theater. The businesslike relationship q  
uacks. The lucky girl which can always quack will never sure q  
uack. The game quacks. Professor will write pickle. A bug tumbles. Man   
takes the box who slips.   
The effect is strongly surrealistic and at times a little reminiscent o  
 
664
 
haiku-for example, the final sample of four consecutive short s  
entences. At first it seemed very funny and had a certain charm, but soon it b  
ecame rather stale. After reading a few pages of output one could sense the l  
imits of the space in which the program was operating; and after that, s  
eeing random points inside that space--even though each one was "new"  
-was nothing new. This is, it seems to me, a general principle: you get bored w  
ith something not when you have exhausted its repertoire of behavior, b  
ut when you have mapped out the limits of the space that contains its behavior. The behavior space of a person is just about complex enough that i  
t can continually surprise other people; but that wasn't true of my program  
. I realized that my goal of producing truly humorous output would r  
equire that far more subtlety be programmed in. But what, in this case, was m  
eant by "subtlety"? It was clear that absurd juxtapositions of words were just t  
oo unsubtle; I needed a way to ensure that words would be used in a  
ccordance with the realities of the world. This was where thoughts about representation of knowledge began to enter the p
 
664
 
From RTN's to ATN'
 
664
 
The idea I adopted was to classify each word-noun, verb, p  
repos1t10n, etc.-in several different "semantic dimensions". Thus, each word was a  
member of classes of various sorts; then there were also superclassesclasses of classes (reminiscent of the remark by Ulam). In principle, s  
uch aggregation could continue to any number of levels, but I stopped at t  
wo. At any given moment, the choice of words was now semantically r  
estricted, because it was required that there should be agreement between the v  
arious parts of the phrase being constructed. The idea was, for instance, t  
hat certain kinds of acts could be performed only by animate objects; that o  
nly certain kinds of abstractions could influence events, and so on. The decisions about what categories were reasonable, and whether each c  
ategory was better thought of as a class or a superclass, were quite complicated. A  
ll words were branded in several different dimensions. Common prepositions-"of", "in", etc.-had several distinct entries, corresponding to t  
heir distinct usages. Now, the output began to be much more  
comprehensible-and for that reason it was funny in a new w
 
664
 
A Little Turing T  
es
 
664
 
Below, I have reproduced nine selections, carefully culled from m  
any pages of output from later versions of my program. Along with them a  
re three (seriously intended) human-written sentences. W  
hich? ( 1 )Blurting may be considered as the reciprocal substitution o  
fsemiotic material (dubbing) for a semiotic dialogical p  
roduct in a dynamic r
 
665
 
( 2)Rather think of a pathway of a 'sequence' of gedankenexperiment simpletons where heir-lines are a prima facie case of a  
paradiachronic transitivity.  
( 3)Think of that as a chain strength possibility of what, eventually, comes out as a product (epistemic conditions?) and t  
he product is not a Frankfurt-ish p  
acking-it-all-in. ( 4)Despite the efforts, the reply, if you will, had been suppo  
rtedby the Orient; hence a fallacy will thereafter be suspended b  
y the attitude which will be being held by the a  
mbassador. ( 5)Of course, until the upheavals, the ambassador was s  
lightlygradually mollycoddling the r  
abble.( 6)Supposedly, refined liberty caused the attitudes insofar a  
speace is distilled by the consequences which will not eventually be caused by the command irrevocably insofar as p  
eaceof it is sometimes causing the intransigency infin  
itesimally s  
urprisingly. ( 7)According to the sophists, the campaigns in the city-states, i  
nother words, have been accepted by the Orient cunningly. O  
f course, the Orient has been separated by the states particularly v  
iolently.The Orient supports the efforts which had been s  
upportedby m  
ankind.( 8)Admittedly, the hierarchical origin of the fallacy, nevertheless, will be prophesied by the enemies of it. By the s  
ametoken, the individualists will have testified that i  
ntransigency will not have suspended the campa  
igns. ( 9)Needless to say, during the upheaval which will have warranted the secrecy, the replies do not separate the Orient.   
Of course, the countries, ipso facto, are always probing libe  
rty. (10) Although a Nobel Prize was being achieved by the h  
umanists,yet in addition, it was being achieved by the s  
erf.( 11) An attitude will often be held by the serfs of a s  
trife-torn n  
ation. (12) Moreover, the Nobel Prizes will be achieved. By the s  
ametoken, despite the consequence, the Nobel Prizes which w  
ill be achieved will sometimes be achieved by a w  
oman. The human-written sentences are numbers 1 to 3; they were drawn f  
rom the contemporary journal A  
rt-Lan guage15 and are-as far as I can tellcompletely serious efforts among literate and sane people to c  
ommunicate something to each other. That they appear here out of context is not t  
oo misleading, since their proper context sounds just the same as they d  
 
666
 
My program produced the rest. Numbers 10 to 12 were chosen t  
o show that there were occasional bursts of total lucidity; numbers 7 to 9 a  
re more typical of the output, floating in that curious and provocative netherworld between meaning and no-meaning; and then numbers 4 to 6 p  
retty much transcend meaning. In a generous mood, one could say that they   
stand on their own as pure "language objects", something like pieces o  
f abstract sculpture carved out of words instead of stone; alternatively, o  
ne could say that they are pure pseudointellectual d
 
666
 
My choice of vocabulary was still aimed at producing humorous effects. The flavor of the output is hard to characterize. Although much of i  
t "makes sense", at least on a single-sentence level, one definitely gets t  
he feeling that the output is coming from a source with no understanding o  
f what it is saying and no reason to say it. In particular, one senses an utt  
er lack of visual imagery behind the words. When I saw such sentences c  
ome pouring out of the line printer, I experienced complex emotions. I was v  
ery amused by the silliness of the output. I was also very proud of my achievement and tried to describe it to friends as similar to giving rules for   
building up meaningful stories in Arabic out of single strokes of t  
he pen-an exaggeration, but it pleased me to think of it that way. And lastly I  
was deeply thrilled by the knowledge that this enormously comp  
licated machine was shunting around long trains of symbols inside it according t  
o rules, and that these long trains of symbols were something like thoughts in   
my own head ... something like t
 
666
 
Images of What Thought Is
 
666
 
Of course I didn't fool myself into thinking that there was a conscious b  
eing behind those sentences-far from it. Of all people, I was the most aware o  
f the reasons that this program was terribly remote from real t  
hought. Tesler's Theorem is quite apt here: as soon as this level of languagehandling ability had been mechanized, it was clear that it did not consti  
tute intelligence. But this strong experience left me with an image: a glimmering sense that real thought was composed of much longer, much m  
ore complicated trains of symbols in the brain-many trains moving simultaneously down many parallel and crisscrossing tracks, their cars being p  
ushed and pulled, attached and detached, switched from track to track by a   
myriad neural shunting-engines .
 
666
 
It was an intangible image which I cannot convey in words, and it w  
as only an image. But images and intuitions and motivations lie mingled c  
lose in the mind, and my utter fascination with this image was a constant spur t  
o think more deeply about what thought really could be. I have tried in o  
ther parts of this book to communicate some of the daughter images of t  
his original image-particularly in the Prelude, Ant Fugue
 
666
 
What stands out in my mind now, as I look back at this program f  
rom the perspective of a dozen years, is how there is no sense of imagery b  
ehind what is being said. The program had no idea what a serf is, what a person i  
s, or what anything at all is. The words were empty formal symbols, as empty
 
668
 
as-perhaps emptier than-the p and q of the pq-system. My program t  
ook advantage of the fact that when people read text, they quite naturally t  
end to imbue each word with its full flavor-as if that were necessarily attac  
hed to the group of letters which form the word. My program could be l  
ooked at as a formal system, whose "theorems"-the output s  
entences-had ready-made interpretations (at least to speakers of English). But unlike t  
he pq-system, these "theorems" were not all true statements when interpre  
ted that way. Many were false, many were nonsen
 
668
 
In its humble way, the pq-system mirrored a tiny corner of the w  
orld. But when my program ran, there was no mirror inside it of how the w  
orld works, except for the small semantic constraints which it had to follow. T  
o create such a mirror of understanding, I would have had to wrap e  
ach concept in layers and layers of knowledge about the world. To do t  
his would have been another kind of effort from what I had intended to d  
o. Not that I didn't often think of trying to do it-but I never got around t  
o trying it o
 
668
 
Higher-Level Grammars .
 
668
 
In fact, I often pondered whether I could write an A TN-grammar (or s  
ome other kind of sentence-producing program) which would only produce t  
rue sentences about the world. Such a grammar would imbue the words w  
ith genuine meanings, in the way it happened in the pq-system and in T  
NT. This idea of a language in which false statements are ungrammatical is a  
n old one, going back to Johann Amos Comenius, in 1633. It is very appealing because you have a crystal ball embodied in your grammar: just wri  
te down the statement you want to know about, and check to see if it i  
s grammatical . ... Actually, Comenius went even further, for in his l  
anguage, false statements were not only ungrammatical-they were i
 
668
 
Carrying this thought in another direction, you might imagine a highlevel grammar which would produce random koans. Why not? Such a  
grammar would be equivalent to a formal system whose theorems a  
re koans. And if you had such a program, could you not arrange it to produc  
e only genuine koans? My friend Marsha Meredith was enthusastic about t  
his idea o f" Artificial Ism", so she tackled the project of writing a koan-wr  
iting program. One of her early efforts produced this curious q  
uasi-koan: A SMALL YOUNG MASTER WANTED A SMALL WHITE GNA  
RLED BOWL. "HOW CAN WE LEARN AND UNDERSTAND W  
ITHOUT STUDY?" THE YOUNG MASTER ASKED A LARGE CONFUSED MASTER. THE CONFUSED MASTER WALKED FROM A BROWN H  
ARD MOUNTAIN TO A WHITE SOFT MOUNTAIN WITH A SMALL R  
ED STONY BOWL. THE CONFUSED MASTER SAW A RED SOFT HUT  
. THE CONFUSED MASTER WANTED THE HUT. "WHY D ID  
BODHIDHARMA COME INTO CHINA?" THE CONFUSED M  
ASTE
 
668
 
FIGURE 116. A meaningful story in Arabic. [From A. Khatibi and M. Sijelmassi, T  
heSplendour of Islamic Calligraphy (New York: Riz.z.oli, 1976 ).]
 
669
 
ASKED A LARGE ENLIGHTENED STUDENT. "THE PEACHES A  
RE LARGE", THE STUDENT ANSWERED THE CONFUSED M  
ASTER. "HOW CAN WE LEARN AND UNDERSTAND WITHOUT S  
TUDY?" THE CONFUSED MASTER ASKED A LARGE OLD MASTER. THE O  
LD MASTER WALKED FROM A WHITE STONY G0025. THE OLD MASTER GOT L
 
669
 
Your personal decision procedure for koan genuineness p  
robably reached a verdict without need of the Geometric Code or the Art of Z  
en Strings. If the lack of pronouns or the unsophisticated syntax didn't aro  
use your suspicions, that strange "G0025" towards the end must have. What i  
s it? It is a strange fluke-a manifestation of a bug which caused the p  
rogram to print out, in place of the English word for an object, the program'  
s internal name for the "node" (a LISP atom, in fact) where all inf  
ormation concerning that particular object was stored. So here we have a "windo  
w" onto a lower level of the underlying Zen mind-a level that should h  
ave remained invisible. Unfortunately, we don't have such clear windows o  
nto the lower levels of human Zen m
 
669
 
The sequence of actions, though a little arbitrary, comes from a recursive LISP procedure called "CASCADE", which creates chains of a  
ctions linked in a vaguely causal way to each other. Although the degree o  
f comprehension of the world possessed by this koan generator is clearly n  
ot stupendous, work is in progress to make its output a little more genuines  
eemin
 
669
 
Grammars for M
 
669
 
Then there is music. This is a domain which you might suppose, on fi  
rst thought, would lend itself admirably to being codified in.  
an A TNgrammar, or some such program. Whereas (to continue this nai:ve line o  
f thought) language relies on connect.ions with the outside world for meaning, music is purely formal. There is no reference to things "out there" i  
n the sounds of music; there is just pure syntax-note following note, c  
hord following chord, measure following measure, phrase following phrase .
 
669
 
But wait. Something is wrong in this analysis. Why is some music s  
o much deeper and more beautiful than other music? It is because form, in   
music, is expressive--expressive to some strange subconscious regions o  
f our minds. The sounds of music do not refer to serfs or city-states, but t  
hey do trigger clouds of emotion in our innermost selves; in that sense m  
usical meaning is dependent on intangible links from the symbols to things in t  
he world-those "things", in this case, being secret software structures in o  
ur minds. No, great music will not come out of such an easy formalism as a  
n ATN-grammar. Pseudomusic, like pseudo-fairy tales, may well c  
ome out-and that will be a valuable exploration for people to make-but t  
he secrets of meaning in music lie far, far deeper than pure synt  
a
 
669
 
I should clarify one point here: in principle, A TN-grammars have a  
ll the power of any programming formalism, so if musical meaning is captur
 
670
 
able in any way at all (which I believe it is), it is capturable in an A TNgrammar. True. But in that case, I maintain, the grammar will be d  
efining not just musical structures, but the entire structures of the mind of a  
beholder. The "grammar" will be a full grammar of thought-not just a  
grammar of m
 
670
 
Winograd's Program SHRDLU
 
670
 
What kind of program would it take to make human beings admit that i  
t had some "understanding", even if begrudgingly? What would it take  
before you wouldn't feel intuitively that there is "nothing t  
her
 
670
 
In the years 1968-70, Terry Winograd (alias Dr. Tony Earrwig) was a  
doctoral student at MIT, working on the joint problems of language a  
nd understanding. At that time at MIT, much AI research involved the socalled blocks world-a relatively simple domain in which problems concerning both vision and language-handling by computer could fit easily. T  
he blocks world consists of a table with various kinds of toy-like blocks o  
n it-square ones, oblong ones, triangular ones, etc., in various colors. (For a  
"blocks world" of another kind, see Figure 1 17: the painting Mental Arithmetic by Magritte. I find its title singularly appropriate in this context.) T  
he vision problems in the MIT blocks world are very tricky: how can a computer figure out, from a TV-scan of a scene with many blocks in it,just wha  
t kinds of blocks are present, and what their relationships are? Some b  
locks may be perched on top of others, some may be in front of others, there may   
be shadows, and so o
 
670
 
FIGURE 117. Mental Arithmetic, by Rene Magritte (193
 
671
 
Winograd's work was separate from the issues of vision, h  
owever. Beginning with the assumption that the blocks world was well represent  
ed inside the computer's memory, he confronted the many-faceted pro  
blem of how to get the computer t  
o: ( 1 )understand questions in English about the s  
ituation;( 2)give answers in English to questions about the s  
ituation;( 3)understand requests in English to manipulate the blocks  
;( 4)break down each request into a sequence of operations i  
tcould do;  
( 5)understand what it had done, and for what reasons;  
( 6)describe its actions and their reasons, in English
 
671
 
It might seem reasonable to break up the overall program into modular subprograms, with one module for each different part of the p  
roblem; then, after the modules have been developed separately, to integrate t  
hem smoothly. Winograd found that this strategy of developing i  
ndependent modules posed fundamental difficulties. He developed a radical approach  
, which challenged the theory that intelligence can be compartme  
ntalized into independent or semi-independent pieces. His program SHRDLUnamed after the old code "ET AOIN SHRDLU", used by linotype o  
perators to mark typos in a newspaper column-did not separate the problem i  
nto clean conceptual parts. The operations of parsing sentences, p  
roducing internal representations, reasoning about the world represented i  
nside itself, answering questions, and so on, were all deeply and i  
ntricately meshed together in a procedural representation of knowledge. Some c  
ritics have charged that his program is so tangled that it does not represent a  
ny "theory" at all about language, nor does it contribute in any way to o  
ur insights about thought processes. Nothing could be more wrong than s  
uch claims, in my opinion. A tour de force such as SHRDLU may not b  
e isomorphic to what we do-in fact, in no way should you think that in   
SHRDLU, the "symbol level" has been attained-but the act of creating i  
t and thinking about it offers tremendous insight into the way intelligenc  
e w
 
671
 
The Structure of SHR  
DL
 
671
 
In fact, SHRDLU does consist of separate procedures, each of whi  
ch contains some knowledge about the world; but the procedures have such a  
strong interdependency that they cannot be cleanly teased apart. T  
he program is like a very tangled knot which resists untangling; but the f  
act that you cannot untangle it does not mean that you cannot understand i  
t. There may be an elegant geometrical description of the entire knot even i  
f it is physically messy. We could go back to a metaphor from the M  
u Offering, and compare it to looking at an orchard from a "natural" a  
ngl
 
671
 
Winograd has written lucidly about SHRDLU. I quote here from h  
is article in Schank and Colby's book:
 
672
 
One of the basic viewpoints underlying the model is that all language use c  
an be thought of as a way of activating procedures within the hearer. We c  
an think of any utterance as a program-one that indirectly causes a set o  
f operations to be carried out within the hearer's cognitive system. This "program writing" is _indirect in the sense that we are dealing with an i  
ntelligent interpreter, who may take a set of actions which are quite different from t  
hose the speaker intended. The exact form is determined by his knowledge of t  
heworld, his expectations about the person talking to him, etc. In this p  
rogramwe have a simple version of this process of interpretation as it takes place i  
n the robot. Each sentence interpreted by the robot is converted to a set of   
instructions in PLANNER. The program that is created is then executed t  
o achieve the desired effec t.1  
 
672
 
PLANNER Facilitates Problem R  
eductio
 
672
 
The language PLANNER, referred to here, is an AI language who  
se principal feature is that some of the operations necessary for problem   
reduction are built in-namely, the recursive process of creating a tree o  
f subgoals, subsubgoals, etc. What this means is that such processes, i  
nstead of having to be spelled out time and time again by the programmer, a  
re automatically implied by so-called GOAL-statements. Someone who reads a  
PLANNER program will see no explicit reference to such operations; i  
n jargon, they are user-transparent. If one path· in the tree fails to achieve t  
he desired goal, then the PLANNER program will "backtrack" and t  
ry another route. "Backtracking" is the magic word as far as PLANNER i  
s concerned.
 
672
 
Winograd's program made excellent use of these features o  
f PLANNER-more exactly, of MICROPLANNER, a partial implementation of the plans for PLANNER. In the past few years, however, p  
eople with the goal of developing AI have concluded that automatic backtracking, as in PLANNER, has definite disadvantages, and that it will p  
robably not lead to their goal; therefore they have backed off from it, preferring t  
o try other routes to A
 
672
 
Let us listen to further comments from Winograd on SHRDLU  
: The definition of every word is a program which is called at an a  
ppropriate point in the analysis, and which can do arbitrary computations involving t  
he sentence and the present physical s ituation.1  
7Among the examples which Winograd cites is the follow  
ing: The different possibilities for the meaning of "the" are procedures w  
hich check various facts about the context, then prescribe actions such as "Look for  
a unique object in the data base which fits this description'', or "Assert that the   
object being described is unique as far as the speaker is concerned." T  
he program incorporates a variety of heuristics for deciding what part of t  
he context is relev ant.1  
8 It is amazing how deep this problem with the word "the" is. It is p  
robably safe to say that writing a program which can fully handle the top five w  
ord
 
673
 
of English-"the", "of", "and", "a", and "to"-would be equivalent t  
o solving the entire problem of AI, and hence tantamount to knowing w  
hat intelligence and consciousness are. A small digression: the five most common nouns in English are-according to the Word Frequency Book compil  
ed by John B. Carroll et al-"time", "people", "way", "water", and "words" (  
in that order). The amazing thing about this is that most people have no i  
dea that we think in such abstract terms. Ask your friends, and 10 to 1 they'l  
l guess such words as "man", "house", "car", "dog", and "money". And  
-while we're on the subject of frequencies-the top twelve letters in E  
nglish, in order, according to Mergenthaler, are: "ETAOIN S  
HRDL
 
673
 
One amusing feature of SHRDLU which runs totally against t  
he stereotype of computers as "number crunchers" is this fact, pointed out b  
y Winograd: "Our system does not accept numbers in numeric form, and h  
as only been taught to count to ten."19 With all its mathematical underpinning, SHRDLU is a mathematical ignoramus! Just like Aunt Hillar  
y, SHRDLU doesn't know anything about the lower levels which make it u  
p. Its knowledge is largely procedural (see particularly the remark by " Dr.  
Tony Earrwig" in section 11 of the previous Dialogue
 
673
 
It is interesting to contrast the procedural embedding of knowledge i  
n SHRDLU with the knowledge in my sentence-generation program. All of   
the syntactical knowledge in my program was procedurally embedded i  
n Augmented Transition Networks, written in the language Algol; but t  
he semantic knowledge-the information about semantic cla  
ss membership-was static: it was contained in a short list of numbers a  
fter each word. There were a few words, such as the auxiliary verbs "to be", "  
to have", and others, which were represented totally in procedures in Alg  
ol, but they were the exceptions. By contrast, in SHRDLU, all words w  
ere represented as programs. Here is a case which demonstrates that, despite   
the theoretical equivalence of data and programs, in practice the choice o  
f one over the other has major consequences.
 
673
 
Syntax and Sem  
antic
 
673
 
And now, a few more words from Winograd:  
Our program does not operate by first parsing a sentence, then doing semantic analysis, and finally by using deduction to produce a response. These thr  
ee activities go on concurrently throughout the understanding of a sentence. A  
s soon as a piece of syntactic structure begins to take shape, a semantic p  
rogram is called to see whether it might make sense, and the resultant answer c  
an direct the parsing. In deciding whether it makes sense, the semantic r  
outine may call deductive processes and ask questions about the real world. As an   
example, in sentence 34 of the Dialogue ("Put the blue pyramid on the block   
in the box"), the parser first comes up with "the blue pyramid on the block" a  
s a candidate for a noun group. At this point, semantic analysis is done, a  
nd since "the" is definite, a check is made in the data base for the object b  
eing referred to. When no such object is found, the parsing is redirected to fi  
nd the noun group "the blue pyramid". It will then go on to find "on the block i  
 
674
 
the box" as a single phrase indicating a location ... Thus there is a contin  
uing interplay between the different sorts of analysis, with the results of o  
ne affecting the others. 2  
 
674
 
It is extremely interesting that in natural language, syntax and semantics are so deeply intertwined. Last Chapter, in discussing the e  
lusive concept of "form", we had broken the notion into two categories: synta  
ctic form, which is detectable by a predictably terminating decision procedure,  
and semantic form, which is not. But here, Winograd is telling us t  
hat-at least when the usual senses of "syntax" and "semantics" are t  
aken-they merge right into each other, in natural language. The external form of a  
sentence-that is, its composition in terms of elementary signs--does n  
ot divide up so neatly into syntactic and semantic aspects. This is a very   
significant point for linguistic
 
674
 
Here are some final comments on SHRDLU by Winogra  
d. Let us look at what the system would do with a simple description like "a r  
ed cube which supports a pyramid". The description will use concepts like  
BLOCK, RED, PYRAMID, and EQUIDIMENSIONAL-all parts of the sy
 
674
 
FIGURE 118. Procedural representation of "a red cube which supports a py ramid."  
[Adapted from Roger Schank and Kenneth Colby, Computer Models of Thought and L  
anguage (San Francisco: W. H. Freeman, 1973), p. 172. ]
 
675
 
in a flow chart like that in Figure 118. Note that this is a program for fi  
ndingan object fitting the description. It would then be incorporated into a command for doing something with the object, a question asking something a  
bout it, or, if it appeared in a statement, it would become part of the p  
rogram which was generated to represent the meaning for later use. Note that this bit   
of program could also be used as a test to see whether an object fit t  
he description, if the first FIND instruction were told in advance to look only a  
t that particular obj  
ect. At first glance, it seems that there is too much structure in this program, a  
s we don't like to think of the meaning of a simple phrase as explicitly containing loops, conditional tests, and other programming details. The solution is t  
o provide an internal language that contains the appropriate looping a  
nd checking as its primitives, and in which the representation of the process is a  
s simple as the description. The program described in Figure 118 would b  
e written in PLANNER looking something like what is belo  
w: (GOAL (IS ?X l B  
LOCK)) (GOAL (COLOR-OF ? Xl R  
ED)) (GOAL (EQUIDIMENSIONAL ? Xl))  
(GOAL (IS ?X2 P  
YRAMID)) (GOAL (SUPPORT ?X l ?  
X2)) The loops of the flowchart are implicit in PLANNER'S backtrack c  
ontrol structure. The description is evaluated by proceeding down the list until some   
goal fails, at which time the system backs up automatically to the last p  
oint where a decision was made, trying a different possibility. A decision can b  
e made whenever a new object name or VARIABLE (indicated by the p  
refix "?") such as "?Xl" or "?X2" appears. The variables are used by the p  
attern matcher. If they have already been assigned to a particular item, it checks t  
o see whether the GOAL is true for that item. If not, it checks for all p  
ossible items which satisfy the GOAL, by choosing one, and then taking s  
uccessive ones whenever backtracking occurs to that point. Thus, even the d  
istinction between testing and choosing is implicit.2
 
675
 
One significant strategy decision in devising this program was to n  
ot translate all the way from English into LISP, but only p  
artway-into PLANNER. Thus (since the PLANNER interpreter is itself written i  
n LISP), a new intermediate level-PLANNER-was inserted between t  
he top-level language (English) and the bottom-level language (machine language). Once a PLANNER program had been made from an E  
nglish sentence fragment, then it could be sent off to the PLANNER i  
nterpreter, and the higher levels of SHRDLU would be freed up, to work on new task  
 
675
 
This kind of decision constantly crops up: How many levels should a  
system have? How much and what kind of "intelligence" should be p  
laced on which level? These are some of the hardest problems facing AI t  
oday. Since we know so little about natural intelligence, it is hard for us to fi  
gure out which level of an artificially intelligent system should carry out what   
part of a t
 
675
 
This gives you a glimpse behind the scenes of the Dialogue preceding   
this Chapter. Next Chapter, we shall meet new and speculative ideas fo  
r A
 
684
 
C HAPTER X  
IXArtificial Intel  
ligence: Prospec  
t
 
684
 
"Almost" Situations and Subjunc  
tive
 
684
 
AFTER READING Contrafactus, a friend said to me, "My uncle was a  
lmost President of the U.S.!" "Really?" I said. "Sure," he replied, "he was s  
kipper of the PT 108." Qohn F. Kennedy was skipper of the PT 109
 
684
 
That is what Contref actus is all about. In everyday thought, we are   
constantly manufacturing mental variants on situations we face, ideas w  
e have, or events that happen, and we let some features stay exactly the same   
while others "slip". What features do we let slip? What ones do we not e  
ven consider letting slip? What events are perceived on some deep int  
uitive level as being close relatives of ones which really happened? What do w  
e think "almost" happened or "could have" happened, even though it unambiguously did not? What alternative versions of events pop without a  
ny conscious thought into our minds when we hear a story? Why do s  
ome counterfactuals strike us as "less counterfactual" than other counterfactuals? After all, it is obvious that anything that didn't happen didn't h  
appen. There aren't degrees of "didn't-happen-ness". And the same goes fo  
r "almost" situations. There are times when one plaintively says, "It almost   
happened", and other times when one says the same thing, full of r  
elief. But the "almost" lies in the mind, not in the external fact
 
684
 
Driving down a country road, you run into a swarm of bees. You d  
on't just duly take note of it; the whole situation is immediately placed i  
n perspective by a swarm of "replays" that crowd into your mind. Typically,   
you think, "Sure am lucky my window wasn't open!"--or worse, the reverse: "Too bad my window wasn't closed!" "Lucky I wasn't on my b ike!  
" "Too bad I didn't come along five seconds earlier." Strange but p  
ossible replays: "If that had been a deer, I could have been killed!" "I bet t  
hose bees would have rather had a collision with a rosebush." Even s  
tranger replays: "Too bad those bees weren't dollar bills!" "Lucky fhose b  
ees weren't made of cement!" "Too bad it wasn't just one bee instead of a  
swarm." "Lucky I wasn't the swarm instead of being me." What s  
lips naturally and what doesn't-and w
 
684
 
In a recent issue of The New Yorker magazine, the following e  
xcerpt from the "Philadelphia Welcomat" was reprinte d:1  
If Leonardo da Vinci had been born a female the ceiling of t  
he Sistine Chapel might never have been p
 
685
 
The New Yorker commented:  
And if Michelangelo had been Siamese twins, the work w  
ould have been completed in half the t  
ime. The point of The New Yorker's comment is not that such counterfactuals a  
re false; it is more that anyone who would entertain such an idea-anyone w  
ho would "slip" the sex or number of a given human being-would have to b  
e a little loony. Ironically, though, in the same issue, the following s  
entence, concluding a book review, was printed without b  
lushing: I think he [ Professor Philipp Frank] would have enjoyed both o  
f these books enormously.2  
Now poor Professor Frank is dead; and clearly it is nonsense to suggest t  
hat someone could read books written after his death. So why wasn't t  
his serious sentence also scoffed at? Somehow, in some d  
ifficult-to-pin-down sense, the parameters slipped in this sentence do not violate our sense o  
f "possibility" as much as in the earlier examples. Something allows us t  
o imagine "all other things being equal" better in this one than in the o  
thers. But why? What is it about the way we classify events and people that makes   
us know deep down what is "sensible" to slip, and what is "
 
685
 
Consider how natural it feels to slip from the valueless declarative "  
I don't know Russian" to the more charged conditional "I would like to k  
now Russian" to the emotional subjunctive "I wish I knew Russian" and finally t  
o the rich counterfactual "If I knew Russian, I would read Chekhov a  
nd Lermontov in the original". How flat and dead would be a mind that s  
aw nothing in a negation but an opaque barrier! A live mind can see a w  
indow onto a world of possibilities.
 
685
 
I believe that "almost" situations and unconsciously manufa  
ctured subjunctives represent some of the richest potential sources of insight i  
nto how human beings organize and categorize their perceptions of the w  
orld. An eloquent co-proponent of this view is the linguist and translator G  
eorge Steiner, who, in his book After Babel, has writte  
n: Hypotheticals, 'imaginaries', conditionals, the syntax of counter-factuality a  
nd contingency may well be the generative centres of human speech .... [  
They] do more than occasion philosophical and grammatical perplexity. No l  
ess than future tenses to which they are, one feels, related, and with which t  
hey ought probably to be classed in the larger set of 'suppositionals' or 'alternates',  
these ' if' propositions are fundamental to the dynamics of human feeling . ..  
. Ours is the ability, the need, to gainsay or 'un-say' the world, to image a  
nd speak it otherwise . ... We need a word which will designate the power, t  
he compulsion of language to posit 'otherness' .... Perhaps 'alternity' will do: t  
o define the 'other than the case', the counter-factual propositions, i  
mages, shapes of will and evasion with which we charge our mental being and b  
y means of which we build the changing, largely fictive milieu of our s  
omatic and our social existence . ..  
. Finally, Steiner sings a counterfactual hymn to counterfactuality:
 
686
 
It is unlikely that man, as we know him, would have survived without t  
he fictive, counter-factual, anti-determinist means of language, without t  
he semantic capacity, generated and stored in the 'superfluous' zones of t  
he cortex, to conceive of, to articulate possibilities beyond the treadmill of organic decay and death.3
 
686
 
The manufacture of "subjunctive worlds" happens so casually, s  
o naturally, that we hardly notice what we are doing. We select from o  
ur fantasy a world which is close, in some internal mental sense, to the r  
eal world. We compare what is real with what we perceive as almost real. In so   
doing, what we gain is some intangible kind of perspective on reality. T  
he Sloth is a droll example of a variation on reality-a thinking being w  
ithout the ability to slip into subjunctives (or at least, who claims to be without t  
he ability-but you may have noticed that what he says is full of counterfactuals!). Think how immeasurably poorer our mental lives would be if w  
e didn't have this creative capacity for slipping out of the midst of reality i  
nto soft "what i f" 's! And from the point of view of studying human t  
hought processes, this slippage is very interesting, for most of the time it h  
appens completely without conscious direction, which means that observation o  
f what kinds of things slip, versus what kinds don't, affords a good w  
indow on the unconscious m
 
686
 
One way to gain some perspective on the nature of this mental metric  
is to "fight fire with fire". This is done in the Dialogue, where our "subjunctive ability" is asked to imagine a world in which the very notion o  
f subjunctive ability is slipped, compared to what we expect. In the D  
ialogue, the first subjunctive instant replay-that where Palindromi stays i  
n bounds-is quite a normal thing to imagine. In fact, it was inspired by a  
completely ordinary, casual remark made to me by a person sitting next t  
o me at a football game. For some reason it struck me and I wondered w  
hat made it seem so natural to slip that particular thing, but not, say, t  
he number of the down, or the present score. From those thoughts, I went o  
n to consider other, probably less slippable features, such as the w  
eather (that's in the Dialogue), the kind of game (also in the Dialogue), and t  
hen even loonier variations (also in the Dialogue). I noticed, though, that w  
hat was completely ludicrous to slip in one situation could be quite slippable i  
n another. For instance, sometimes you might spontaneously wonder h  
ow things would be if the ball had a different shape (e.g., if you are p  
laying basketball with a half-inflated ball); other times that would never enter y  
our mind (e.g., when watching a football game on T
 
686
 
Layers of Stability
 
686
 
It seemed to me then, and still does now, that the slippability of a feature o  
f some event (or circumstance) depends on a set of nested contexts in whi  
ch the event (or circumstance) is perceived to occur. The terms cons  
tant, parameter, and variable, borrowed from mathematics, seem useful h  
ere. Often mathematicians, physicists, and others will carry out a c  
alculation, saying "c is a constant, p is a parameter, and v is a variable". What t  
he
 
687
 
mean is that any of them can vary (including the "constant"); h  
owever, there is a kind of hierarchy of variability. In the situation which is b  
eing represented by the symbols, c establishes a global condition; p e  
stablishes some less global condition which can vary while c is held fixed; and final  
ly, v can run around while c and p are held fixed. It makes little sense to t  
hink of holding v fixed while c and p vary, for c and p establish the context i  
n which v has meaning. For instance, think of a dentist who has a list o  
f patients, and for each patient, a list of teeth. It makes perfect sense (  
and plenty of money) to hold the patient fixed and vary his teeth-but it m  
akes no sense at all to hold one tooth fixed and vary the patient. (  
Although sometimes it makes good sense to vary the dentist ... )
 
687
 
We build up our mental representation of a situation layer by l  
ayer. The lowest layer establishes the deepest aspect of the context-  
sometimes being so low that it cannot vary at all. For instance, the t  
hree-dimensionality of our world is so ingrained that most of us never would imagine letting i  
t slip mentally. It is a constant constant. Then there are layers which e  
stablish temporarily, though not permanently, fixed aspects of situations, w  
hich could be called background assumptions--things which, in the back of y  
our mind, you know can vary, but which most of the time you unquesti  
oningly accept as unchanging aspects. These could still be called "constants". F  
or instance, when you go to a football game, the rules of the game a  
re constants of that sort. Then there are "parameters": you think of them a  
s more variable, but you temporarily hold them constant. At a football g  
ame, parameters might include the weather, the opposing team, and so f  
orth. There could be-and probably are-several layers of parameters. F  
inally, we reach the "shakiest" aspects of your mental representation of t  
he situation-the variables. These are things such as Palindromi's stepping o  
ut of bounds, which are mentally "loose" and which you don't mind letting slip   
away from their real values, for a short m  
omen
 
687
 
Frames and Nested Cont  
ext
 
687
 
The word frame is in vogue in Al currently, and it could be defined a  
s a computational instantiation of a context. The term is due to Marvin Minsky, a  
s are many ideas about frames, though the general concept has been fl  
oating around for a good number of years. In frame language, one could say t  
hat mental representations of situations involve frames nested within e  
ach other. Each of the various ingredients of a situation has its own frame. It i  
s interesting to verbalize explicitly one of my mental images c  
oncerning nested frames. Imagine a large collection of chests of drawers. When y  
ou choose a chest, you have a frame, and the drawer holes are places w  
here "subframes" can be attached. But subfr  
ames are themselves chests of drawers. How can you stick a whole chest of drawers into the slot for a s  
ingle drawer in another chest of drawers? Easy: you shrink and distort t  
he second chest, since, after all, this is all mental, not physical. Now in t  
he outer frame, there may be several different drawer slots that need to b  
 
688
 
filled; then you may need to fill slots in some of the inner chests of d  
rawers (or subframes). This can go on, r
 
688
 
The vivid surrealistic image of squishing and bending a chest of drawers so that it can fit into a slot of arbitrary shape is probably quite important, because it hints that your concepts are squished and bent by t  
he contexts you force them into. Thus, what does your concept of "perso  
n" become when the people you are thinking about are football players? I  
t certainly is a distorted concept, one which is forced on you by the o  
verall context. You have stuck the "person" frame into a slot in the "  
football game" frame. The theory of representing knowledge in frames relies o  
n the idea that the world consists of quasi-dosed subsystems, each of whi  
ch can serve as a context for others without being too disrupted, or c  
reating too much disruption, in the p
 
688
 
One of the main ideas about frames is that each frame comes with its   
own set of expectations. The corresponding image is that each chest o  
f drawers comes with a built-in, but loosely bound, drawer in each of i  
ts drawer slots, called a default. If I tell you, "Picture a river bank", you wil  
l invoke a visual image which has various features, most of which you c  
ould override if I added extra phrases such as "ip a drought" or "in Brazil" o  
r "without a merry-go-round". The existence of default values for slots   
allows the recursive process of filling slots to come to an end. In effect, y  
ou say, "I will fill in the slots myself as far as three layers down; beyond that I  
will take the default options." Together with its default expectations, a  
frame contains knowledge of its limits of applicability, and heuristics f  
or switching to other frames in case it has been stretched beyond its limits of   
t
 
688
 
The nested structure of a frame gives you a way of "zooming in" a  
nd looking at small details from as dose up as you wish: you just zoom in o  
n the proper subframe, and then on one of its subframes, etc., until you h  
ave the desired amount of detail. It is like having a road atlas of the USA w  
hich has a map of the whole country in the front, with individual state m  
aps inside, and even maps of cities and some of the larger towns if you want s  
till more detail. One can imagine an atlas with arbitrary amounts of d  
etail, going down to single blocks, houses, rooms, etc. It is like looking through a  
telescope with lenses of different power; each lens has its own uses. It i  
s important that one can make use of all the different scales; often detail i  
s irrelevant and even d  
istractin
 
688
 
Because arbitrarily different frames can be stuck inside other frames'  
slots, there is great potential for conflict or "collision". The nice n  
eat scheme of a single, global set of layers of "constants", "parameters", a  
nd "variables" is an oversimplification. In fact, each frame will have its o  
wn hierarchy of variability, and this is what makes analyzing how we p  
erceive such a complex event as a football game, with its many subframes, subsubframes, etc., an incredibly messy operation. How do all these many f  
rames interact with each other? If there is a conflict where one frame says, "  
This item is a constant" but another frame says, "No, it is a variable!", how does i  
t get resolved? These are deep and difficult problems of frame theory to,
 
689
 
which I can give no answers. There has as yet been no complete a  
greement on what a frame really is, or on how to implement frames in AI programs. I   
make my own stab at discussing some of these questions in the f  
ollowing section, where I talk about some puzzles in visual pattern r  
ecognition, which I call "Bongard p
 
689
 
Bongard Prob  
lem
 
689
 
Bongard prob lems (BP's) are problems of the general type given by t  
he Russian scientist M. Bongard in his book Pattern Recognition. A t  
ypical BP-number 51 in his collection of one hundred-is shown in Figure 119
 
689
 
FIGURE 119. Bongard problem 5 1. [From M. Bongard, Pattern Recognition (Rochelle P  
ark, N.j.: Hayden Book Co., Spartan Books, 1970).
 
689
 
These fascinating problems are intended for pattern-recognizers, w  
hether human or machine. (One might also throw in ETI's--extraterrestrial intelligences.) Each problem consists of twelve boxed figures (henceforth cal  
led boxes): six on the left, forming Class I, and six on the right, forming Class I  
I. The boxes may be indexed this w  
ay: I-A 1-B  
1-C 1  
-D1-E 1  
-FII-A II-  
B11-C II-  
D11-E II-  
FThe problem is "How do Class I boxes differ from Class II b  
oxe
 
689
 
A Bongard problem-solving program would have several stages, i  
n which raw data gradually get converted into descriptions. The early s  
tages are relatively inflexible, and higher stages become gradually more fl  
exible. The final stages have a property which I call tentativity, which means s  
imply that the way a picture is represented is always tentative. Upon the drop of a  
hat, a high-level description can he restructured, using all the devices of t  
h
 
690
 
later stages. The ideas presented below also have a tentative quality to   
them. I will try to convey overall ideas first, glossing over s  
ignificant difficulties. Then I will go back and try to explain subtleties and tricks a  
nd so forth. So your notion of how it all works may also undergo s  
ome revisions as you read. But that is in the spirit of the d
 
690
 
Preprocessing Selects a Mini-v  
ocabular
 
690
 
Suppose, then, that we have some Bongard problem which we want t  
o solve. The problem is presented to a TV camera and the raw data are r  
ead in. Then the raw data are preprocessed. This means that some salient features are detected. The names of these features constitute a "mini-vocabulary" for the problem; they are drawn from a general "salient-  
feature vocabulary". Some typical terms of the salient-feature vocabulary a  
re: line segment, curve, horizontal, vertical, black, white, big, s  
mall, pointy, round .  
.. In a second stage of preprocessing, some knowledge about e  
lementary shapes is used; and if any are found, their names are also made a  
vailable. Thus, terms such as   
triangle, circle, square, indentation, protrusion, right a  
ngle, vertex, cusp, arrow .  
.. may be selected. This is roughly the point at which the conscious and t  
he unconscious meet, in humans. This discussion is primarily concerned w  
ith describing what happens from here on o  
u
 
690
 
High-Level Descrip  
tion
 
690
 
Now that the picture is "understood", to some extent, in terms of f  
amiliar concepts, some looking around is done. Tentative descriptions are m  
ade for one or a few of the twelve boxes. They will typically use simple descriptors such a  
s above, below, to the right of, to the left of, inside, outside of, c  
lose to, far from, parallel to, perpendicular to, in a row, s  
cattered, evenly spaced, irregularly spaced, e  
tc. Also, definite and indefinite numerical descriptors can be u  
sed: 1, 2, 3, 4, 5, ... many, few, e  
tc. More complicated descriptors may be built up, such a  
s further to the right of, less close to, almost parallel to, e
 
691
 
FIGURE 120. Bongard problem 47. [From M. Bongard, Pattern Recogn
 
691
 
Thus, a typical box-say 1-F of BP 4 7 (Fig. 120)-could be variously described as h  
aving: three s  
hapes O  
Tthree white s  
hapes o  
r a circle on the right   
o  
r two triangles and a c  
ircle o  
r two upwards-pointing triangles   
o  
r one large shape and two small s  
hapes o  
r one curved shape and two straight-edged s  
hapes o  
r a circle with the same kind of shape on the inside and o  
utside. Each of these descriptions sees the box through a "filter". Out of c  
ontext, any of them might be a useful description. As it turns out, though, all o  
f them are "wrong", in the context of the particular Bongard problem they   
are part of. In other words, if you knew the distinction between Classes I  
and II in BP 4 7, and were given one of the preceding lines as a descripti  
on of an unseen drawing, that information would not allow you to tell to w  
hich Class the drawing belonged. The essential feature of this box, in context, i  
s that it i  
ncludes a circle containing a trian  
gle. Note that someone who heard such a description would not be able t  
o reconstruct the original drawing, but would be able to r ecognize d  
rawing
 
692
 
FIGURE 121. Bongard problem 91. [From M. Bongard, Pattern Recogn
 
692
 
which have this property. It is a little like musical style: you may be a  
n infallible recognizer of Mozart, but at the same time unable to write anything which would fool anybody into thinking it was by M  
ozar
 
692
 
Now consider box 1-D of BP 91 (Fig. 1 21). An overloaded but "  
right" description in the context of BP 91 i  
s a circle with three rectangular i  
ntrusions. Notice the sophistication of such a description, in which the word "  
with" functions as a disclaimer, implying that the "circle" is not really a circle: it i  
s almost a circle, except that ... Furthermore, the intrusions are not f  
ull rectangles. There is a lot of "play" in the way we use language to d  
escribe things. Clearly, a lot of information has been thrown away, and even m  
ore could be thrown away. A priori, it is very hard to know what it would b  
e smart to throw away and what to keep. So some sort of method for a  
n intelligent compromise has to be encoded, via heuristics. Of course, there i  
s always recourse to lower levels of description (i.e., less chunked descriptions) if discarded information has to be retrieved, just as people c  
an constantly look at the puzzle for help in restructuring their ideas about i  
t. The trick, then, is to devise explicit rules that say how to   
make tentative descriptions for each box;  
compare them with tentative descriptions for other boxes of e  
ither C  
lass; restructure the descriptions, b  
y (i) adding information  
,(ii) discarding i  
nformation,or (iii) viewing the same information from another a  
ngle;iterate this process until finding out what makes the two C  
lasses d
 
693
 
Templates and Sameness-D  
etector
 
693
 
One good strategy would be to try to make descriptions structurally similar t  
oeach other, to the extent this is possible. Any structure they have in c  
ommon will make comparing them that much easier. Two important elements o  
f this theory deal with this strategy. One is the idea of"description-sc  
hemas", or templates; the other is the idea of Sam-a "sameness d
 
693
 
First Sam. Sam is a special agent present on all levels of the program  
. (Actually there may be different kinds of Sams on different levels.) S  
am constantly runs around within individual descriptions and within diffe  
rent descriptions, looking for descriptors or other things which are r  
epeated. When some sameness is found, various restructuring operations can b  
e triggered, either on the single-description level or on the level of s  
everal descriptions at o
 
693
 
Now templates. The first thing that happens after preprocessing is a  
n attempt to manufacture a template, or description-schema-a uniform format for the descriptions of all the boxes in a problem. The idea is that a  
description can often be broken up in a natural way into s  
ubdescriptions, and those in turn into subsubdescriptions, if need be. The bottom is hit   
when you come to primitive concepts which belong to the level of t  
he preprocessor. Now it is important to choose the way of breaking descriptions into parts so as to reflect commonality among all the boxes; o  
therwise you are introducing a superfluous and meaningless kind of "pse  
udo-order" into the w
 
693
 
On the basis of what information is a template built? It is best to look a  
t an example. Take BP 49 (Fig. 122). Preprocessing yields the info  
rmation that each box consists of several little o's, and one large closed curve. This i  
s a valuable observation, and deserves to be incorporated in the t  
emplate. Thus a first stab at a template would b  
e: 6  
50 large closed curve: -  
small o's: -
 
693
 
FIGURE 122. Bongard problem 49. [From M. Bongard, Pattern Recognition.
 
694
 
It is very simple: the description-template has two explicit slots w  
here subdescriptions are to be a  
ttache
 
694
 
A Heterarchical P  
rogra
 
694
 
Now an interesting thing happens, triggered by the term "closed c  
urve". One of the most important modules in the program is a kind of semanti  
c net-the concept network-in which all the known nouns, adjectives, etc., a  
re linked in ways which indicate their interrelations. For instance, "  
closed curve" is strongly linked with the terms "interior" and "exterior". T  
he concept net is just brimming with information about relations between   
terms, such as what is the opposite of what, what is similar to what, w  
hat often occurs with what, and so on. A little portion of a concept network, t  
o be explained shortly, is shown in Figure 123. But let us follow w  
hat happens now, in the solution of problem 49. The concepts "interior" a  
nd "exterior" are activated by their proximity in the net to "closed curve". This   
suggests to the template-builder that it might be a good idea to m  
ake distinct slots for the interior and exterior of the curve. Thus, in the spirit o  
f tentativity, the template is tentatively restructured to be t  
his: large closed curve: -  
little o's in interior: -  
little o's in exterior: -  
-Now when subdescriptions are sought, the terms "interior" and "exteri  
or" will cause procedures to inspect those specific regions of the box. What i  
s found in BP 49, box I-A is t  
his: large closed curve: circle  
little o's in interior: t  
hreelittle o's in exterior: t  
hreeAnd a description of box II-A of the same BP might b  
e large closed curve: ciga  
rlittle o's in interior: t  
hreelittle o's in exterior: t  
hreeNow Sam, constantly active in parallel with other operations, spots t  
he recurrence of the concept "three" in all the slots dealing with o's, and this is   
strong reason to undertake a second template-restructuring o  
peration. Notice that the first was suggested by the concept net, the second by S  
am. Now our template for problem 49 b  
ecomes: large closed curve: -  
-three little o's in interior: -  
three little o's in exterior: -
 
696
 
Now that "three" has risen one level of generality-namely, into t  
he template-it becomes worthwhile to explore its neighbors in the concept   
network. One of them is "triangle", which suggests that triangles of o's may   
be important. As it happens, this leads down a blind alley-but how c  
ould you know in advance? It is a typical blind alley that a human would e  
xplore, so it is good if our program finds it too! For box 11-E, a description such a  
s the following might get g  
enerated: large closed curve: circl  
e three little o's in interior: equilateral tri  
angle three little o's in exterior: equilateral trian  
gle Of course an enormous amount of information has been thrown a  
way concerning the sizes, positions, and orientations of these triangles, a  
nd many other things as well. But that is the whole point of making descriptions instead of just using the raw data! It is the same idea as f  
unneling, which we discussed in Chapter X
 
696
 
The Concept Netw  
or
 
696
 
We need not run through the entire solution of problem 49; this suffices to   
show the constant back-and-forth interaction of individual d  
escriptions, templates, the sameness-detector Sam, and the concept network. W  
e should now look a little more in detail at the concept network and i  
ts function. A simplified portion shown in the figure codes the f  
ollowing i  
deas: "High" and "low" are o  
pposites. "Up" and "down" are o  
pposites. "High" and "up" are s  
imilar. "Low" and "down" are similar.   
"Right" and "left" are o  
pposites. The "right-left" distinction is similar to the "high-low" distinction  
. "Opposite" and "similar" are o  
pposites. Note how everything in the net-both nodes and links-can be t  
alked about. In that sense nothing in the net is on a higher level than a  
nything else. Another portion of the net is shown; it codes for the ideas t  
hat A square is a p  
olygon. A triangle is a polygo  
n. A polygon is a closed c
 
696
 
FIGURE 123. A small portion of a concept network for a program to solve Bon  
gard problems. "Nodes" are joined by "links", which in tum can be linked. By considering a link a  
s a verb and the nodes it joins as subject and object, you can pull out some English sentences fro  
m this diagra
 
697
 
The difference between a triangle and a square is that one has 3   
sides and the other has 4  
. 4 is similar to 3  
. A circle is a closed c  
urve. A closed curve has an interior and an e  
xterior. "Interior" and "exterior" are o  
pposites. I  
The network of concepts is necessarily very vast. It seems to store knowl-edge only statically, or declaratively, but that is only half the story. A  
ctually, its knowledge borders on being procedural as well, by the fact that t  
he proximities in the net act as guides, or "programs", telling the main program how to develop its understanding of the drawings in the b  
oxe
 
697
 
For instance, some early hunch may turn out to be wrong and yet h  
ave the germ of the right answer in it. In BP 33 (Fig. 124), one might at fi  
rs
 
697
 
FIGURE 124. Bongard problem 33. [From M. Bongard, Pattern Recognit
 
697
 
jump to the idea that Class I boxes contain "pointy" shapes, Class II boxe  
s contain "smooth" ones. But on closer inspection, this is wrong. Nevertheless, there is a worthwhile insight here, and one can try to push it f  
urther, by sliding around in the network of concepts beginning at "pointy". It i  
s close to the concept "acute", which is precisely the distinguishing feature o  
f Class I. Thus one of the main functions of the concept network is to a  
llow early wrong ideas to be modified slightly to slip into variations which m  
ay be c
 
697
 
Slippage and T enta  
tivit
 
697
 
Related to this notion of slipping between closely related terms is the n  
otion of seeing a given object as a variation on another object. An e  
xcellent example has been mentioned already-that of the "circle with three indentations", where in fact there is no circle at all. One has to be able to b  
end concepts, when it is appropriate. Nothing should be absolutely rigid. O  
 
698
 
the other hand, things shouldn't be so wishy-washy that nothing has a  
ny meaning at all, either. The trick is to know when and how to slip o  
ne concept into a  
nothe
 
698
 
An extremely interesting set of examples where slipping from o  
ne description to another is the crux of the matter is given in B  
ongard problems 85-87 (Fig. 125). BP 85 is rather trivial. Let us assume that o  
ur program identifies "line segment" in its preprocessing stage. It is r  
elatively simple for it then to count line segments and arrive at the diffe  
renc
 
698
 
FIGURE 125. Bongard problems 85-87. [From M. Bongard, Pattern Recogniti
 
699
 
between Class I and Class II in BP 85. Now it goes on to BP 86. A g  
eneral heuristic which it uses is to try out recent ideas which have worked. Successfu  
l repetition of recent methods is very common in the real world, and Bongard does not try to outwit this kind of heuristic in his collection-in fact,  
he reinforces it, fortunately. So we plunge right into problem 86 with t  
wo ideas ("count" and "line segment") fused into one: "count line segments"  
. But as it happens, the trick of BP 86 is to count line trains rather than l  
ine segments, where "line train" means an end-to-end concatenation of (one o  
r more) line segments. One way the program might figure this out is if t  
he concepts "line train" and "line segment" are both known, and are close i  
n the concept network. Another way is if it can invent the concept of "  
line train"-a tricky proposition, to say the least
 
699
 
Then comes BP 87, in which the notion of "line segment" is f  
urther played with. When is a line segment three line segments? (See box II-  
A.) The program must be sufficiently flexible that it can go back and f  
orth between such different representations for a given part of a drawing. It i  
s wise to store old representations, rather than forgetting them and p  
erhaps having to reconstruct them, for there is no guarantee that a newer representation is better than an old one. Thus, along with each old representation should be stored some of the reasons for liking it and disliking it. (  
This begins to sound rather complex, doesn't i
 
699
 
Meta-Descr  
iption
 
699
 
Now we come to another vital part of the recognition process, and t;hat h  
as to do with levels of abstraction and meta-descriptions. For this let u  
s consider BP 91 (Fig. 121) again. What kind of template could be constructed here? There is such an amount of variety that it is hard to k  
now where to begin. But this is in itself a clue! The clue says, namely, that t  
he class distinction very likely exists on a higher level of abstraction than t  
hat of geometrical description. This observation clues the program that i  
t should construct descriptions of descriptions-that is, meta-descriptions. P  
erhaps on this second level some common feature will emerge; and if we are lucky  
, we will discover enough commonality to guide us towards the fo  
rmulation of a template for the meta-descriptions! So we plunge ahead without a  
template, and manufacture descriptions for various boxes; then, once t  
hese descriptions have been made, we de5cribe them. What kinds of slot will our   
template for meta-descriptions have? Perhaps these, among others:  
concepts used: -  
recurring concepts: -names of slots: -filters used: -  
-There are many other kinds of slots which might be needed in metadescriptions, but this is a sample. Now suppose we have described box 1  
-E of BP 91. Its (template-less) description might look like this:
 
700
 
horizontal line segm  
ent vertical line segment mounted on the horizontal line s  
egment vertical line segment mounted on the horizontal line segment   
vertical line segment mounted on the horizontal line s  
egment Of course much information has been thrown out: the fact that the t  
hree vertical lines are of the same length, are spaced equidistantly, etc. But it i  
s plausible that the above description would be made. So the metadescription might look like t  
his: concepts used: vertical-horizontal, line segment, mounted o  
n repetitions in description: 3 copies of "vertical line segment mounted o  
n the horizontal line s  
egment" names of slots: --  
filters used: --  
Not all slots of the meta-description need be filled in; information can b  
e thrown away on this level as well as on the â€ąïżœust-plain-description" l
 
700
 
Now if we were to make a description for any of the other boxes o  
f Class I, and then a meta-description of it, we would wind up filling the s  
lot "repetitions in description" each time with the phrase "3 copies of ... " T  
he sameness-detector would notice this, and pick up three-ness as a s  
alient feature, on quite a high level of abstraction, of the boxes of Class I  
. Simila rly, four-ness would be recognized, via the method of metadescriptions, as the mark of Class I
 
700
 
Flexibility Is Importa  
n
 
700
 
Now you might object that in this case, resorting to the method of metadescriptions is like shooting a fly with an elephant gun, for the t  
hree-ness versus four-ness might as easily have shown up on the lower level if we h  
ad constructed our descriptions slightly differently. Yes, true-but it is important to have the possibility of solving these problems by different routes.  
There should be a large amount of flexibility in the program; it should n  
ot be doomed if, malaphorically speaking, it "barks up the wrong alley" for a   
while. (The amusing term "malaphor" was coined by the newspaper columnist Lawrence Harrison; it means a cross between a malapropism and a  
metaphor. It is a good example of "recombinant ideas".) In any case, I  
wanted to illustrate the general principle that says: When it is hard to b  
uild a template because the preprocessor finds too much diversity, that s  
hould serve as a clue that concepts on a higher level of abstraction are i  
nvolved than the preprocessor knows a
 
700
 
Focusing and Filtering
 
700
 
Now let us deal with another question: ways to throw information out. This   
involves two related notions, which I call "focusing" and "filtering". Focus
 
701
 
FIGURE 126. Bongard problem 55. [From M. Bongard, Pattern Recognition.]
 
701
 
FIGURE 127. Bongard problem 22. [From M. Bongard, Pattern Recogniti
 
701
 
ing involves making a description whose focus is some part of the d  
rawing in the box, to the exclusion of everything else. Filtering involves making a  
description which concentrates on some particular way of viewing t  
he contents of the box, and deliberately ignores all other aspects. Thus t  
hey are complementary: focusing has to do with objects (roughly, nouns), a  
nd filtering has to do with concepts (roughly, adjectives). For an example o  
f focusing, let's look at BP 55 (Fig. 126). Here, we focus on the i  
ndentation and the little circle next to it, to the exclusion of the everything else in t  
he box. BP 22 (Fig. 127) presents an example of filtering. Here, we must fi  
lter out every concept but that of size. A combination of focusing and filtering is   
required to solve problem BP 58 (Fig. 128
 
701
 
One of the most important ways to get ideas for focusing and fi  
ltering is by another sort of "focusing": namely, by inspection of a single particularly simple box-say one with as few objects in it as possible. It can b  
 
702
 
FIGURE 128. Bongard problem 58. [From M. Bongard, Pattern Recogn  
itio
 
702
 
FIGURE 129. Bongard problem 61. [From M. Bongard, Pattern Recogn
 
702
 
extremely helpful to compare the starkest boxes from the two Classes. B  
ut how can you tell which boxes are stark until you have descriptions f  
or them? Well, one way of detecting starkness is to look for a box with a  
minimum of the features provided by the preprocessor. This can be d  
one very early, for it does not require a pre-existing template; in fact, this c  
an be one useful way of discovering features to build into a template. BP 6  
1 (Fig. 129) is an example where that technique might quickly lead to a s  
olutio
 
702
 
Science and the World of Bongard P  
roblem
 
702
 
One can think of the Bongard-problem world as a tiny place where "science" is done-that is, where the purpose is to discern patterns in t  
he world. As patterns are sought, templates are made, unmade, and r  
emad
 
703
 
FIGURE 130. Bongard problems 70-71. [From M. Bongard, Pattern Recogniti
 
703
 
slots are shifted from one level of generality to another; filtering a  
nd focusing are done; and so on. There are discoveries on all levels of complexity. The Kuhnian theory that certain rare events called "  
paradigm shifts" mark the distinction between "normal" science and "  
conceptual revolutions" does not seem to work, for we can see paradigm shifts happening all throughout the system, all the time. The fluidity of d  
escriptions ensures that paradigm shifts will take place on all scales.
 
703
 
Of course, some discoveries are more "revolutionary" than othe  
rs, because they have wider effects. For instance, one can make the discove  
ry that problems 70 and 71 (Fig. 130) are "the same problem", when looked a  
t on a sufficiently abstract level. The key observation is that both i  
nvolve depth-2 versus depth-I nesting. This is a new level of discovery that can b  
e made about Bongard problems. There is an even higher level, c  
oncerning the collection as a whole. If someone has never seen the collection, it can b  
e a good puzzle just to figure out what it is. To figure it out is a r  
evolutionary insight, but it must be pointed out that the mechanisms of thought whi  
ch allow such a discovery to be made are no different from those whi  
ch operate in the solution of a single Bongard p
 
704
 
By the same token, real science does not divide up into "  
normal" periods versus "conceptual revolutions"; rather, paradigm shi  
fts pervade-there are just bigger and smaller ones, paradigm shifts on different levels. The recursive plots of INT and Gplot (Figs. 32 and 34) provide a  
geometric model for this idea: they have the same structure full of discontinuous jumps on every level, not just the top level-only the lower t  
he level, the smaller the j
 
704
 
Connections to Other Types of T  
hough
 
704
 
To set this entire program somewhat in context, let me suggest two ways i  
n which it is related to other aspects of cognition. Not only does it depend on   
other aspects of cognition, but also they in turn depend on it. First let m  
e comment on how it depends on other aspects of cognition. The intu  
ition which is required for knowing when it makes sense to blur distinctions, t  
o try redescriptions, to backtrack, to shift levels, and so forth, is s  
omething which probably comes only with much experience in thought in g  
eneral. Thus it would be very hard to define heuristics for these crucial aspects o  
f the program. Sometimes one's experience with real objects in the world h  
as a subtle effect on how one describes or redescribes boxes. For instance, who   
can say how much one's familiarity with living trees helps one to solve B  
P 70? It is very doubtful that in humans, the subnetwork of concepts relevan  
t to these puzzles can be easily separated out from the whole n  
etwork. Rather, it is much more likely that one's intuitions gained from seeing a  
nd handling real objects-combs, trains, strings, blocks, letters, rubber b  
ands, etc., etc.-play an invisible but significant guiding role in the solution o  
f these p
 
704
 
Conversely, it is certain that understanding real-world situations heavily depends on visual imagery and spatial intuition, so that having a powerful and flexible way of representing patterns such as these Bongard patterns can only contribute to the general efficiency of thought process
 
704
 
It seems to me that Bongard's problems were worked out with g  
reat care, and that they have a quality of universality to them, in the sense t  
hat each one has a unique correct answer. Of course one could argue with t  
his and say that what we consider "correct" depends in some deep way on o  
ur being human, and some creatures from some other star system m  
ight disagree entirely. Not having any concrete evidence either way, I still have a  
certain faith that Bongard problems depend on a sense of simplicity w  
hich is not just limited to earthbound human beings. My earlier comments a  
bout the probable importance of being acquainted with such surely e  
arth-limited objects as combs, trains, rubber bands, and so on, are not in conflict w  
ith the idea that our notion of simplicity is universal, for what matters is n  
ot any of these individual objects, but the fact that taken together they span a  
wide space. And it seems likely that any other civilization would have as v  
ast a repertoire of artifacts and natural objects and varieties of experience o  
n which to draw as we do. So I believe that the skill of solving B  
ongar
 
705
 
problems lies very close to the core of "pure" intelligence, if there is such a   
thing. Therefore it is a good place to begin if one wants to investigate t  
he ability to discover "intrinsic meaning" in patterns or messages. Unfortunately we have reproduced only a small selection of his stimulating collection. I hope that many readers will acquaint themselves with the e  
ntire collection, to be found in his book (see Bibliography).
 
705
 
Some of the problems of visual pattern recognition which we h  
uman beings seem to have completely "flattened" into our unconscious are q  
uite amazing. They i  
nclude: recognition of faces (invariance of faces under age change, expression change, lighting change, distance change, a  
ngle change, e  
tc.) recognition of hiking trails in forests and m  
ountains-somehow this has always impressed me as one of our most subtle acts o  
f pattern recognition-and yet animals can do it, t  
oo reading text without hesitation in hundreds if not thousands o  
f different t  
ypeface
 
705
 
Message-Passing Languages, Frames, and S  
ymbol
 
705
 
One way that has been suggested for handling the complexities of p  
attern recognition and other challenges to Al programs is the so-called "  
actor" formalism of Carl Hewitt (similar to the language "Smalltalk", d  
eveloped by Alan Kay and others), in which a program is written as a collection o  
f interacting actors, which can pass elaborate messages back and forth among   
themselves. In a way, this resembles a heterarchical collection of procedures which can call each other. The major difference is that where procedures usually only pass a rather small number of arguments back a  
nd forth, the messages exchanged by actors can be arbitrarily long and comp
 
705
 
Actors with the ability to exchange messages become somewhat autonomous agents-in fact, even like autonomous computers, with message  
s being somewhat like programs. Each actor can have its own idiosy  
ncratic way of interpreting any given message; thus a message's meaning w  
ill depend on the actor it is intercepted by. This comes about by the a  
ctor having within it a piece of program which interprets messages; so t  
here may be as many interpreters as there are actors. Of course, there may be   
many actors with identical interpreters; in fact, this could be a great advantage, just as it is extremely important in the cell to have a multitude ·  
of identical ribosomes floating throughout the cytoplasm, all of which w  
ill interpret a message-in this case, messenger RNA-in one and the s  
ame w
 
705
 
It is interesting to think how one might merge the frame-notion w  
ith the actor-notion. Let us call a frame with the capability of generating a  
nd interpreting complex messages a symbol:  
frame + actor = s  
ymbo
 
706
 
We now have reached the point where we are talking about ways of   
implementing those elusive active symbols of Chapters XI and XII; henceforth in this Chapter, "symbol" will have that meaning. By the way, d  
on't feel dumb if you don't immediately see just how this synthesis is to be m  
ade. It is not clear, though it is certainly one of the most fascinating directions t  
o go in AI. Furthermore, it is quite certain that even the best synthesis of   
these notions will turn out to have much less power than the actual s  
ymbols of human minds. In that sense, calling these frame-actor syntheses "symbols" is premature, but it is an optimistic way of looking at t  
hing
 
706
 
Let us return to some issues connected with message passing. S  
hould each message be directed specifically at a target symbol, or should it b  
e thrown out into the grand void, much as mRN A is thrown out into t  
he cytoplasm, to seek its ribosome? If messages have destinations, then e  
ach symbol must have an address, and messages for it should always be sent t  
o that address. On the other hand, there could be one central receiving d  
ock for messages, where a message would simply sit until it got picked up b  
y some symbol that wanted it. This is a counterpart to General D  
elivery. Probably the best solution is to allow both types of message to exist; also t  
o have provisions for different classes of urgency-special delivery, first c  
lass, second class, and so on. The whole postal system provides a rich source o  
f ideas for message-passing languages, including such curios as selfaddressed stamped envelopes (messages whose senders want a  
nswers quickly), parcel post (extremely long messages which can be sent some v  
ery slow way), and more. The telephone system will give you more i  
nspiration when you run out of postal-system i  
dea
 
706
 
Enzymes and A  
 
706
 
Another rich source of ideas for message passing-indeed, for i  
nformation processing in general-is, of course, the cell. Some objects in the cell a  
re quite comparable to actors-in particular, enzymes. Each enzyme's a  
ctive site acts as a filter which only recognizes certain kinds of substrates (messages). Thus an enzyme has an "address", in effect. The enzyme is "programmed" (by virtue of its tertiary structure) to carry out certain operations upon that "message", and then to release it to the world again. Now i  
n this way, when a message is passed from enzyme to enzyme along a  
chemical pathway, a lot can be accomplished. We have already desc  
ribed the elaborate kinds of feedback mechanisms which can take place in c  
ells (either by inhibition or repression). These kinds of me::chanisms show t  
hat complicated control of processes can arise through the kind of m  
essage passing that exists in the cel  
 
706
 
One of the most striking things about enzymes is how they sit a  
round idly, waiting to be triggered by an incoming substrate. Then, when t  
he substrate arrives, suddenly the enzyme springs into action, like a Venus'sflytrap. This kind of "hair-trigger" program has been used in AI, and g  
oes by the name of demon. The important thing here is the idea of having m  
any different "species" of triggerable subroutines just lying around waiting t  
 
707
 
be triggered. In cells, all the complex molecules and organelles are built u  
p, simple step by simple step. Some of these new structures are often e  
nzymes themselves, and they participate in the building of new enzymes, which i  
n turn participate in the building of yet other types of enzyme, etc. S  
uch recursive cascades of enzymes can have drastic effects on what a cell i  
s doing. One would like to see the same kind of simple step-by-step a  
ssembly process imported into AI, in the construction of useful subprograms. F  
or instance, repetition has a way of burning new circuits into our m  
ental hardware, so that oft-repeated pieces of behavior become encoded b  
elow the conscious level. It would be extremely useful if there were an a  
nalogous way of synthesizing efficient pieces of code which can carry out the same   
sequence of operations as something which has been learned on a h  
igher level o f"consciousness". Enzyme cascades may suggest a model for how t  
his could be done. (The program called "HACKER", written by G  
erald Sussman, synthesizes and debugs small subroutines in a way not too m  
uch unlike that of enzyme cascades.
 
707
 
The sameness-detectors in the Bongard problem-solver (Sams) c  
ould be implemented as enzyme-like subprograms. Like an enzyme, a S  
am would meander about somewhat at random, bumping into small d  
ata structures here and there. Upon filling its two "active sites" with iden  
tical data structures, the Sam would emit a message to other parts (actors) of t  
he program. As long as programs are serial, it would not make much sense t  
ïżœ have several copies of a Sam, but in a truly parallel computer, r  
egulating the number of copies of a subprogram would be a way of regulating t  
he expected waiting-time before an operation gets done, just as regulating the   
number of copies of an enzyme in a cell regulates how fast that f  
unction gets performed. And if new Sams could be synthesized, that would b  
e comparable to the seepage of pattern detection into lower levels of o  
ur m
 
707
 
Fission and F  
usio
 
707
 
Two interesting and complementary ideas concerning the interaction o  
f symbols are "fission" and "fusion". Fission is the gradual divergence of a  
new symbol from its parent symbol (that is , from the symbol which s  
erved as a template off of which it was copied). Fusion is what happens when two   
(or more) originally unrelated symbols participate in a ''.joint a  
ctivation", passing messages so tightly back and forth that they get bound t  
ogether and the combination can thereafter be addressed as if it were a s  
ingle symbol. Fission is a more or less inevitable process, since once a new s  
ymbol has been "rubbed off'' of an old one, it becomes autonomous, and i  
ts interactions with the outside world get reflected in its private i  
nternal structure; so what started out as a perfect copy will soon become imper  
fect, and then slowly will become less and less like the symbol off of which it was   
"rubbed". Fusion is a subtler thing. When do two concepts really b  
ecome one? Is there some precise instant when a fusion takes p
 
708
 
This notion of joint activations opens up a Pandora's box of q  
uestions. For instance, how much do we hear "dough" and "nut" when we s  
ay "doughnut"? Does a German who thinks of gloves ("Handschuhe") h  
ear "hand-shoes" or not? How about Chinese people, whose word "dong  
-xi" ("East-West") means "thing"? It is a matter of some political concern, t  
oo, since some people claim that words like "chairman" are heavily char  
ged with undertones of the male gender. The degree to which the parts resonate inside the whole probably varies from person to person and a  
ccording to circumstances.
 
708
 
The real problem with this notion of "fusion" of symbols is that it i  
s very hard to imagine general algorithms which will create meaningful n  
ew symbols from colliding symbols. It is like two strands of DNA which come   
together. How do you take parts from each and recombine them into a  
meaningful and viable new strand of DNA which codes for an individual o  
f the same species? Or a new kind of species? The chance is infinitesimal t  
hat a random combination of pieces of DNA will code for anything that w  
ill survive-something like the chance that a random combination of w  
ords from two books will make another book. The chance that recombinant   
DNA will make sense on any level but the lowest is tiny, precisely b  
ecause there are so many levels of meaning in DNA. And the same goes f  
or "recombinant s
 
708
 
Epigenesis of the Crab C  
ano
 
708
 
I think of my Dialogue Crab Canon as a prototype example where two i  
deas collided in my mind, connected in a new way, and suddenly a new kind o  
f verbal structure came alive in my mind. Of course I can still think a  
bout musical crab canons and verbal dialogues separately-they can still b  
e activated independently of each other; but the fused symbol for crabcanonical dialogues has its own characteristic modes of activation, too. T  
o illustrate this notion of fusion or "symbolic recombination" in some d  
etail, then, I would like to use the development of my Crab Canon as a case s  
tudy, because, of course, it is very familiar to me, and also because it is interesting, yet typical of how far a single idea can be pushed. I will recount it i  
n stages named after those of meiosis, which is the name for cell division i  
n which "crossing-over", or genetic recombination, takes place-the source o  
f diversity in evolu
 
708
 
PROPHASE: I began with a rather simple idea-that a piece of m  
usic, say a canon, could be imitated verbally. This came from the o  
bservation that, through a shared abstract form, a piece of text and a piece of m  
usic may be connected. The next step involved trying to realize some of t  
he potential of this vague hunch; here, I hit upon the idea that "voices" i  
n canons can be mapped onto "characters" in dialogues-still a rather obvious i
 
708
 
Then I focused down onto specific kinds of canons, and r  
emembered that there was a crab canon in the Musical Offering. At that time, I had j  
us
 
709
 
begun writing Dialogues, and there were only two characters: Achilles a  
nd the Tortoise. Since the Bach crab canon has two voices, this m  
apped perfectly: Achilles should be one voice, the Tortoise the other, with the o  
ne doing forwards what the other does backwards. But here I was faced with a  
problem: on what level should the reversal take place? The letter level? T  
he word level? The sentence level? After some thought, I concluded that t  
he "dramatic line" level would be most a  
ppropriat
 
709
 
Now that the "skeleton" of the Bach crab canon had been t  
ransplanted, at least in plan, into a verbal form, there was just one problem. When t  
he two voices crossed in the middle, there would be a short period of e  
xtreme repetition: an ugly blemish. What to do about it? Here, a strange t  
hing happened, a kind of level-crossing typical of creative acts: the word "  
crab" in "crab canon" flashed into my mind, undoubtedly because of s  
ome abstract shared quality with the notion of "tortoise"-and immediately I  
realized that at the dead center, I could block the repetitive effect, b  
y inserting one special line, said by a new character: a Crab! This is how, i  
n the "prophase" of the Crab Canon, the Crab was conceived: at the crossingover of Achilles and the Tortoise. (See Fig. 131.)
 
709
 
-  
FIGURE 13 I. A schematic diagram of the Dialogue Crab C  
ano
 
709
 
MET APHASE: This was the skeleton of my Crab Canon. I then e  
ntered the second stage-the "metaphase"-in which I had to fill in the fl  
esh, which was of course an arduous task. I made a lot of stabs at it, getting u  
sed to the way in which pairs of successive lines had to make sense when r  
ead from either direction, and experimenting around to see what kinds of d  
ual meanings would help me in writing such a form (e.g., "Not at all"). T  
here were two early versions both of which were interesting, but weak. I abandoned work on the book for over a year, and when I returned to the C  
rabCanon, I had a few new ideas. One of them was to mention a Bach c  
anon inside it. At first my plan was to mention the "Canon per a  
ugmentationem, contrario motu", from the Musical Offering (Sloth Canon, as I call it). B  
ut that started to seem a little silly, so reluctantly I decided that inside my C  
rabCanon, I could talk about Bach's own Crab Canon instead. Actually, this w  
as a crucial turning point, but I didn't know it t  
he
 
709
 
Now if one character was going to mention a Bach piece, wouldn't it b  
e awkward for the other to say exactly the same thing in the c  
orresponding place? Well, Escher was playing a similar role to Bach in my thoughts a  
nd my book, so wasn't there some way of just slightly modifying the line so t  
hat it would refer to Escher? After all, in the strict art of canons, note-pe  
rfect imitation is occasionally foregone for the sake of elegance or beauty. A  
n
 
710
 
no sooner did that idea occur to me than the picture Day and Night (Fig. 4  
9) popped into my mind. "Of course!" I thought, "It is a sort of pictorial c  
rab canon, with essentially two complementary voices carrying the same t  
heme both leftwards and rightwards, and harmonizing with each othe r!" H  
ere again was the notion of a single "conceptual skeleton" being instantiated i  
n two different media-in this case, music and art. So I let the Tortoise t  
alk about Bach, and Achilles talk about Escher, in parallel language; c  
ertainly this slight departure from strict imitation retained the spirit of crab c  
anot1
 
710
 
At this point, I began realizing that something marvelous was happening: namely, the Dialogue was becoming self-referential, without my e  
ven having intended it! What's more, it was an indirect self-reference, in t  
hat the characters did not talk directly about the Dialogue they were in, b  
ut rather about structures which were isomorphic to it (on a certain plane o  
f abstraction). To put it in the terms I have been using, my Dialogue n  
ow shared a "conceptual skeleton" with Godel's G, and could therefore b  
e mapped onto Gin somewhat the way that the Central Dogma was, to c  
reate in this case a "Central Crabmap". This was most exciting to me, since out o  
f nowhere had come an esthetically pleasing unity of Godel, Escher, a  
nd B
 
710
 
ANAP HASE: The next step was quite startling. I had had Caroli  
ne MacGillavry's monograph on Escher's tesselations for years, but one day, a  
s I flipped through it, my eye was riveted to Plate 23 (Fig. 42), for I saw it in a  
way I had never seen it before: here was a genuine crab canon--crab-like i  
n both form and content! Escher himself had given the picture no title, and   
since he had drawn similar tesselations using many other animal forms, it i  
s probable that this coincidence of form and content was just s  
omething which I had noticed. But fortuitous or not, this untitled plate was a miniature version of one main idea of my book: to unite form and content. So   
with delight I christened it Crab Canon, substituted it for Day and Night, a  
nd modified Achilles' and the Tortoise's remarks accordi  
ngl
 
710
 
Yet this was not all. Having become infatuated with molecular biolo  
gy, one day I was perusing Watson's book in the bookstore, and in the i  
ndex saw the word "palindrome". When I looked it up, I found a magical t  
hing: crab-canonical structures in DNA. Soon the Crab's comments had b  
een suitably modified to include a short remark to the effect that he owed h  
is predilection for confusing retrograde and forward motion to his g
 
710
 
TELOPH ASE: The last step came months later, when, as I was t  
alking about the picture of the crab-canonical section of DNA (Fig. 43), I saw t  
hat the 'A', 'T', 'C' of Adenine, Thymine, Cytosine coincided- mir  
abile dictu-with the 'A', 'T', 'C' of Achilles, Tortoise, Crab; moreover, just a  
s Adenine and Thymine are paired in DNA, so are Achilles and the Tort  
oise paired in the Dialogue. I thought for a moment and, in another of tho  
se level-crossings, saw that 'G', the letter paired with 'C' in DNA, could s  
tand for "Gene". Once again, I jumped back to the Dialogue, did a little s  
urgery on the Crab's speech to reflect this new discovery, and now I had a m  
apping between the DNA's structure, and the Dialogue's structure. In that s  
ense, the DNA could be said to be a genotype coding for a phenotype: the
 
711
 
structure of the Dialogue. This final touch dramatically heightened t  
he self-reference, and gave the Dialogue a density of meaning which I had   
never anti
 
711
 
Conceptual Skeletons and Conceptual M  
appin
 
711
 
That more or less summarizes the epigenïżœsis of the Crab Canon. The whole   
process can be seen as a succession of mappings of ideas onto each other, a  
t varying levels of abstraction. This is what I call conceptual mapping, and t  
he abstract structures which connect up two different ideas are con  
ceptual skeletons. Thus, one conceptual skeleton is that of the abstract notion of a  
crab c  
anon: a structure having two parts which do the same thing,   
only moving in opposite d  
irections. This is a concrete geometrical image which can be manipulated by the mind   
almost as a Bongard pattern. In fact, when I think of the Crab Canon toda  
y, I visualize it as two strands which cross in the middle, where they are joined   
by a "knot" (the Crab's speech). This is such a vividly pictorial image that i  
t instantaneously maps, in my mind, onto a picture of two h  
omologous chromosomes joined by a centromere in their middle, which is an i  
mage drawn directly from meiosis, as shown in Figure 132
 
711
 
FIGURE 1
 
711
 
In fact, this very image is what inspired me to cast the description of t  
he Crab Canon's evolution in terms of meiosis-which is itself, of course, y  
et another example of conceptual m  
appin
 
711
 
Recombinant Ide  
a
 
711
 
There are a variety of techniques of fusion of two symbols. One i  
nvolves lining the two ideas up next to each other (as if ideas were linear!), t  
hen judiciously choosing pieces from each one, and recombining them in a n  
ew symbol. This strongly recalls genetic recombination. Well, what do chromosomes exchange, and how do they do it? They exchange genes. What in a  
symbol is comparable to a gene? If symbols have frame-like slots, then s  
lots, perhaps. But which slots to exchange, and why? Here is where the crabcanonical fusion may offer some ideas. Mapping the notion of "mus  
ical crab canon" onto that of "dialogue" involved several auxiliary mappings; i  
 
712
 
fact it induced them. That is, once it had been decided that these two n  
otions were to be fused, it became a matter of looking at them on a level w  
here analogous parts emerged into view, then going ahead and mapping the parts  
onto each other, and so on, recursively, to any level that was found desirable. Here, for instance, "voice" and "character" emerged as c  
orresponding slots when "crab canon" and "dialogue" were viewed abstractly. Where d  
id these abstract views come from, though? This is at the crux of t  
he mapping-problem-where do abstract views come from? How do you make   
abstract views of specific n
 
712
 
Abstractions, Skeletons, A  
nalogie
 
712
 
A view which has been abstracted from a concept along some dimension is   
what I call a conceptual skeleton. In effect, we have dealt with c  
onceptual skeletons all along, without often using that name. For instance, many of   
the ideas concerning Bongard problems could be rephrased using t  
his terminology. It is always of interest, and possibly of importance, when two   
or more ideas are discovered to share a conceptual skeleton. An example i  
s the bizarre set of concepts mentioned at the beginning of the Contraf actus: a  
Bicyclops, a tandem unicycle, a teeter-teeter, the game of ping-ping, a  
one-way tie, a two-sided Mobius strip, the "Bach twins", a piano c  
oncerto for two left hands, a one-voice fugue, the act of clapping with one hand, a  
two-channel monaural phonograph, a pair of eighth-backs. All of t  
hese ideas are "isomorphic" because they share this conceptual s  
keleton: a plural thing made singular and re-pluralized wrongly.   
Two other ideas in this book which share that conceptual skeleton are ( 1  
) the Tortoise's solution to Achilles' puzzle, asking for a word beginning a  
nd ending in "HE" (the Tortoise's solution being the pronoun "HE", w  
hich collapses two occurrences into one), and (2) the Pappus-Gelernter proof o  
f the Pons Asinorum Theorem, in which one triangle is reperceived as t  
wo. Incidentally, these droll concoctions might be dubbed "
 
712
 
A conceptual skeleton is like a set of constant features (as d  
istinguished from parameters or variables)-features which should not be slipped in a  
subjunctive instant replay or mapping-operation. Having no parameters o  
r variables of its own to vary, it can be the invariant core of several diffe  
rent ideas. Each instance of it, such as "tandem unicycle", does have layers o  
f variability and so can be "slipped" in various w  
ay
 
712
 
Although the name "conceptual skeleton" sounds absolute and r  
igid, actually there is a lot of play in it. There can be conceptual skeletons o  
n several different levels of abstraction. For instance, the "  
isomorphism" between Bongard problems 70 and 71, already pointed out, involves a  
higher-level conceptual skeleton than that needed to solve either p  
roblem in i
 
713
 
Multiple Representatio  
n
 
713
 
Not only must conceptual skeletons exist on different levels of a  
bstraction; also, they must exist along different conceptual dimensions. Let us take t  
he following sentence as an e  
xample: "The Vice President is the spare t  
ire on the automobile of gov  
ernment." How do we understand what it means (leaving aside its humor, which is o  
f course a vital aspect) ? If you were told, "See our government as an automobile" without any prior motivation, you might come up with a  
ny number of correspondences: steering wheel = president, etc .. What a  
re checks and balances? What are seat. belts? Because the two things b  
eing mapped are so different, it is almost inevitable that the mapping w  
ill involve functional aspects. Therefore, you retrieve from your store of conceptual skeletons representing parts of automobiles, only those having t  
o do with function, rather than, say, shape. Furthermore, it makes sense to   
work at a pretty high level of abstraction, where "function" isn't taken i  
n too narrow a context. Thus, of the two following definitions of the f  
unction of a spare tire: (1) "replacement for a flat tire", and (2) "replacement for a  
certain disabled part of a car", certainly the latter would be preferable, i  
n this case. This comes simply from the fact that an auto and a g  
overnment are so different that they have to be mapped at a high level of a  
bstractio
 
713
 
Now when the particular sentence is examined, the mapping g  
ets forced in one respect-but it is not an awkward way, by any means. In fa  
ct, you already have a conceptual skeleton for the Vice President, among   
many others, which says, "replacement for a certain disabled part of government". Therefore the forced mapping works comfortably. But s  
uppose, for the sake of contrast, that you had retrieved another conceptual s  
keleton for "spare tire"-say, one describing its physical aspects. Among other   
things, it might say that a spare tire is "round and inflated". Clearly, this i  
s not the right way to go. (Or is it? As a friend of mine pointed out, some V  
ice Presidents are rather portly, and most are quite inflate
 
713
 
Ports of A  
cces
 
713
 
One of the major characteristics of each idiosyncratic style of thought is   
how new experiences get classified and stuffed into memory, for t  
hat defines the "handles" by which they will later be retrievable. And f  
or events, objects, ideas, and so on-for everything that can be t  
hought about-there is a wide variety of "handles". I am struck by this each time I  
reach down to turn on my car radio, and find, to my dismay, that it i  
s already on! What has happened is that two independent r  
epresentations are being used for the radio. One is "music producer", the other is "boredom reliever". I am aware that the music is on, but I am bored anyway, a  
nd before the two realizations have a chance to interact, my reflex to r  
eac
 
714
 
down has been triggered. The same reaching-down reflex one day occurred just after I'd left the radio at a repair shop and was driving a  
way, wanting to hear some music. Odd. Many other representations for t  
he same object exist, such a  
s shiny silver-knob h  
aver overheating-problems h  
aver lying-on-my-back-over-hump-to-fix t  
hing buzz-ma  
ker slipping-dials obj  
ect multidimensional representation e  
xample All of them can act as ports of access. Though they all are attached to m  
y symbol for my car radio, accessing that symbol through one does not o  
pen up all the others. Thus it is unlikely that I will be inspired to r  
emember lying on my back to fix the radio when I reach down and turn it on. A  
nd conversely, when I'm lying on my back, unscrewing screws, I probab  
ly won't think about the time I heard the Art of the Fugue on it. There a  
re "partitions" between these aspects of one symbol, partitions that p  
revent my thoughts from spilling over sloppily, in the manner of free a  
ssociations. My mental partitions are important because they contain and channel t  
he flow of my t
 
714
 
One place where these partitions are quite rigid is in sealing off w  
ords for the same thing in different languages. If the partitions were not s  
trong, a bilingual person would constantly slip back and forth between l  
anguages, which would be very uncomfortable. Of course, adults learning two new   
languages at once often confuse words in them. The partitions b  
etween these languages are flimsier, and can break down. Interpreters are particularly interesting, since they can speak any of their languages as if t  
heir partitions were inviolable and yet, on command, they can negate tho  
se partitions to allow access to one language from the other, so they c  
an translate. Steiner, who grew up trilingual, devotes several pages in Aft  
er Babel to the intermingling of French, English, and German in the layers o  
f his mind, and how his different languages afford different ports of a  
ccess onto conce
 
714
 
Forced Matc  
hin
 
714
 
When two ideas are seen to share conceptual skeletons on some level o  
f abstraction, different things can happen. Usually the first stage is that y  
ou zoom in on both ideas, and, using the higher-level match as a guide, you t  
ry to identify corresponding subideas. Sometimes the match can be e  
xtended recursively downwards several levels, revealing a profound i  
somorphism. Sometimes it stops earlier, revealing an analogy or similarity. And t  
hen there are times when the high-level similarity is so compelling that, even i  
f there is no apparent lower-level continuation of the map, you just go a  
head and make one: this is the forced match
 
715
 
Forced matches occur every day in the political cartoons of newspapers: a political figure is portrayed as an airplane, a boat, a fish, the M  
ona Lisa; a government is a human, a bird, an oil rig; a treaty is a briefcase, a  
sword, a can of worms; on and on and on. What is fascinating is how e  
asily we can perform the suggested mapping, and to the exact depth intended.  
We don't carry the mapping out too deeply or too shallowly
 
715
 
Another example of forcing one thing into the mold of another occurred when I chose to describe the development of my Crab Canon in terms o  
f meiosis. This happened in stages. First, I noticed the common concep  
tual skeleton shared by the Crab Canon and the image of chromosomes joined b  
y a centromere; this provided the inspiration for the forced match. Then I  
saw a high-level resemblance involving "growth", "stages", and "recombination". Then I simply pushed the analogy as hard as I could. T  
entativity-as in the Bongard problem-solver-played a large role: I went forwards a  
nd backwards before finding a match which I found a
 
715
 
A third example of conceptual mapping is provided by the C  
entral Dogmap. I initially noticed a high-level similarity between the discoveries o  
f mathematical logicians and those of molecular biologists, then pursued i  
t on lower levels until I found a strong analogy. To strengthen it further, I  
chose a Godel-numbering which imitated the Genetic Code. This was the   
lone element of forced matching in the Central D
 
715
 
Forced matches, analogies, and metaphors cannot easily be s  
eparated out. Sportscasters often use vivid imagery which is hard to pigeonhole. F  
or instance, in a metaphor such as "The Rams [ football team] are s  
pinning their wheels", it is hard to say just what image you are supposed to conjure  
up. Do you attach wheels to the team as a whole? Or to each p  
layer? Probably neither one. More likely, the image of wheels spinning in mud o  
r snow simply flashes before you for a brief instant, and then in s  
ome mysterious way, just the relevant partïżœ get lifted out and transferred to t  
he team's performance. How deeply are the football team and the car m  
apped onto each other in the split second that you do t
 
715
 
R  
eca
 
715
 
Let me try to tie things together a little. I have presented a number o  
f related ideas connected with the creation, manipulation, and c  
omparison of symbols. Most of them have to do with slippage in some fashion, the i  
dea being that concepts are composed of some tight and some loose e  
lements, coming from different levels of nested contexts (frames). The loose o  
nes can be dislodged and replaced rather easily, which, depending on t  
he circumstances, can create a "subjunctive instant replay", a forced match, o  
r an analogy. A fusion of two symbols may result from a process in whi  
ch parts of each symbol are dislodged and other parts r
 
716
 
Creativity and Randomne  
s
 
716
 
It is obvious that we are talking about mechanization of creativity. But i  
s this not a contradiction in terms? Almost, but not really. Creativity is t  
he essence of that which is not mechanical. Yet every creative act i  
s mechanical-it has its explanation no less than a case of the hiccups d  
oes. The mechanical substrate of creativity may be hidden from view, but i  
t exists. Conversely, there is something unmechanical in flexible p  
rograms, even today. It may not constitute creativity, but when programs cease to b  
e transparent to their creators, then the approach to creativity has b  
egu
 
716
 
It is a common notion that randomness is an indispensable i  
ngredient of creative acts. This may be true, but it does not have any bearing on t  
he mechanizability-or rather, programmability!-of creativity. The world is a   
giant heap of randomness; when you mirror some of it inside your h  
ead, your head's interior absorbs a little of that randomness. The t  
riggering patterns of symbols, therefore, can lead you down the most randomseeming paths, simply because they came from your interactions with a  
crazy, random world. So it can be with a computer program, too. Randomness is an intrinsic feature of thought, not something which has to b  
e "artificially inseminated", whether through dice, decaying nuclei, r  
andom number tables, or what-have-you. It is an insult to human creativity t  
o imply that it relies on such arbitrary s
 
716
 
What we see as randomness is often simply an effect of looking a  
t something symmetric through a "skew" filter. An elegant example w  
as provided by Salviati's two ways of looking at the number rr/4. Although the   
decimal expansion of rr/4 is not literally random, it is as random as o  
ne would need for most purposes: it is "pseudorandom". Mathematics is f  
ull of pseudorandomness-plenty enough to supply all would-be creators f  
or all tim
 
716
 
Just as science is permeated with "conceptual revolutions" on all l  
evels at all times, so the thinking of individuals is shot through and through w  
ith creative acts. They are not just on the highest plane; they are e  
verywhere. Most of them are small and have been made a million times b  
efore-but they are close cousins to the most highly creative and new acts. C  
omputer programs today do not yet seem to produce many small creations. Most o  
f what they do is quite "mechanical" still. That just testifies to the fact t  
hat they are not close to simulating the way we think-but they are g  
etting c
 
716
 
Perhaps what differentiates highly creative ideas from ordinary ones i  
s some combined sense of beauty, simplicity, and harmony. In fact, I have a   
favorite "meta-analogy", in which I liken analogies to chords. The idea i  
s simple: superficially similar ideas are often not deeply related; and d  
eeply related ideas are often superficially disparate. The analogy to chords i  
s natural: physically close notes are harmonically distant (e.g., E-F-G); a  
nd harmonically close notes are physically distant (e.g., G-E-B). Ideas t  
hat share a conceptual skeleton resonate in a sort of conceptual analogue to   
harmony; these harmonious "idea-chords" are often widely separated, a  
 
717
 
measured on an imaginary "keyboard of concepts". Of course, it d  
oesn't suffice to reach wide and plunk down any old way-you may hit a s  
eventh or a ninth! Perhaps the present analogy is like a ninth-chord-wide b  
ut d
 
717
 
Picking up Patterns on All L  
evel
 
717
 
Bongard problems were chosen as a focus in this Chapter because w  
hen you study them, you realize that the elusive sense for patterns which w  
e humans inherit from our genes involves all the mechanisms of representation of knowledge, including nested contexts, conceptual skeletons a  
nd conceptual mapping, slippability, descriptions and meta-descriptions a  
nd their interactions, fission and fusion of symbols, multiple r  
epresentations (along different dimensions and diff  
erent levels of abstraction), d  
efault expectations, and m  
or
 
717
 
These days, it is a safe bet that if some program can pick u  
p patterns in one area, it will miss patterns in another area which, to u  
s, are equally obvious. You may remember that I mentioned this back i  
n Chapter I, saying that machines can be oblivious to repetition, whe  
reas people cannot. For instance, consider SHRDLU. If Eta Oin typed t  
he sentence "Pick up a big red block and put it down" over and over a  
gain, SHRDLU would cheerfully react in the same way over and over a  
gain, exactly as an adding machine will print out "4" over and over a  
gain, if a human being has the patience to type "2+2" over and over agai  
n. Humans aren't like that; if some pattern occurs over and over a  
gain, they will pick it up. SHRDLU wasn't built with the potential for f  
orming new concepts or recognizing patterns: it had no sense of over and o  
vervie
 
717
 
The Flexibility of L  
anguag
 
717
 
SHRDLU's language-handling capability is immensely fl  
exible-within limits. SHRDLU can figure out sentences of great syntactical complexity, or  
sentences with semantic ambiguities as long as they can be resolved b  
y inspecting the data base-but it cannot handle "hazy" language. For instance, consider the sentence "How many blocks go on top of each other t  
o make a steeple?" We understand it immediately, yet it does not make sense   
if interpreted literally. Nor is it that some idiomatic phrase has been u  
sed. "To go on top of each other" is an imprecise phrase which nonetheless g  
ets the desired image across quite well to a human. Few people would be   
misled into visualizing a paradoxical 5-etup with two blocks each of which i  
s on top of the other--or blocks which are "going" somewhere or o  
the
 
717
 
The amazing thing about language is how imprecisely we use it a  
nd still manage to get away with it. SHRDLU uses words in a "metallic" w  
ay, while people use them in a "spongy'· or "rubbery" or even "Nutty-Puttyish" way. If words were nuts and bolts, people could make any bolt fit i  
nto any nut: they'd just squish the one into the other, as in some surrealis  
ti
 
718
 
painting where everything goes soft. Language, in human hands, b  
ecomes almost like a fluid, despite the coarse grain of its c  
omponent
 
718
 
Recently, AI research in natural language understanding has tur  
ned away somewhat from the understanding of single sentences in i  
solation, and more towards areas such as understanding simple children's s  
tories. Here is a well-known children's joke which illustrates the open-ende  
dness of real-life situations:  
A man took a ride in an airp  
lane. Unfortunately, he fell o  
ut. Fortunately, he had a parachute o  
n. Unfortunately, it didn't w  
ork. Fortunately, there was a haystack below h  
im. Unfortunately, there was a pitchfork sticking out of i  
t. Fortunately, he missed the p  
itchfork. Unfortunately, he missed the hays  
tack. It can be extended indefinitely. To represent this silly story in a framebased system would be extremely complex, involving jointly a  
ctivating frames for the concepts of man, airplane, exit, parachute, falling, etc., e  
t
 
718
 
Intelligence and E  
motion
 
718
 
Or consider this tiny yet poignant stor  
y: Margie was holding tightly to the string of her beautiful new balloo  
n. Suddenly, a gust of wind caught it. The wind carried it into a tree. The   
balloon hit a branch and burst. Margie cried and cried.4  
To understand this story, one needs to read many things between the l  
ines. For instance: Margie is a little girl. This is a toy balloon with a string for a  
child to hold. It may not be beautiful to an adult, but in a child's eye, it i  
s. She is outside. The "it" that the wind caught was the balloon. The wind d  
id not pull Margie along with the balloon; Margie let go. Balloons can b  
reak on contact with any sharp point. Once they are broken, they are g  
one forever. Little children love balloons and can be bitterly disappointed w  
hen they break. Margie saw that her balloon was broken. Children cry w  
hen they are sad. "To cry and cry" is to cry very long and hard. Margie c  
ried and cried because of her sadness at her balloon's b  
reakin
 
718
 
This is probably only a small fraction of what is lacking at the surfac  
e level. A program must have all this knowledge in order to get at what i  
s going on. And you might object that, even if it "understands" in s  
ome intellectual sense what has been said, it will never really understand, until i  
t, too, has cried and cried. And when will a computer do that? This is the kind   
of humanistic point which Joseph Weizenbaum is concerned with m  
aking in his book Computer Power and Human Reason, and I think it is an important  
issue; in fact, a very, very deep issue. Unfortunately, many AI workers a  
t this time are unwilling, for various reasons, to take this sort of p  
oin
 
719
 
seriously. But in some ways, those AI workers are right: it is a l  
ittle premature to think about computers crying; we must first think about r  
ules for computers to deal with language and other things; in time, we'll fi  
nd ourselves face to face with the deeper issues.
 
719
 
Al Has Far to G  
 
719
 
Sometimes it seems that there is such a complete absence of r  
ule-governed behavior that human beings just aren't rule-governed. But this is an   
illusion-a little like thinking that crystals and metals emerge from r  
igid underlying laws, but that fluids or flowers don't. We'll come back to t  
his question in the next C  
hapter. The process of logic itself working internally in the brain may be m  
ore analogous to a succession of operations with symbolic pictures, a sort o  
f abstract analogue of the Chinese alphabet or some Mayan description o  
f events-except that the elements are not merely words but more like sentences or whole stories with linkages between them forming a sort of m eta-o  
r super-logic with its own rules.5  
It is hard for most specialists to express vividly-perhaps even t  
o remember-what originally sparked them to enter their field. C  
onversely, someone on the outside may understand a field's special romance and m  
ay be able to articulate it precisely. I think that is why this quote from Ulam   
has appeal for me, because it poetically conveys the strangeness of t  
he enterprise of AI, and yet shows faith in it. And one must run on faith at t  
his point, for there is so far to g
 
719
 
Ten Questions and Specula  
tion
 
719
 
To conclude this Chapter, I would like to present ten "Questions and   
Speculations" about AI. I would not make so bold as to call t  
hem "Answers"-these are my personal opinions. They may well change i  
n some ways, as I learn more and as Al develops more. (In what follows, t  
he term "AI program" means a program which is far ahead of today's programs; it means an "Actually Intelligent" program. Also, the words "program" and "computer" probably carr} overly mechanistic connotations, b  
ut let us stick with them a  
nyway.) Question: Will a computer program ever write beautiful m  
usic? Speculation: Yes, but not soon. Music is a language of emotions, a  
nd until programs have emotions as complex as ours, there is no way a  
program will write anything beautiful. There can be "  
forgeries"shallow imitations of the syntax of earlier music-but despite what o  
ne might think at first, there is much more to musical expression than can   
be captured in syntactical rules. There will be no new kinds of b  
eauty turned up for a long time by computer music-composing p  
rograms. Let me carry this thought a little further. To think-and I have h  
ear
 
720
 
this suggested-that we might soon be able to command a preprogrammed mass-produced mail-order twenty-dollar desk-model "  
music box" to bring forth from its sterile circuitry pieces which Chopin o  
r Bach might have written had they lived longer is a grotesque a  
nd shameful misestimation of the depth of the human spirit. A "  
program" which could produce music as they did would have to wander a  
round the world on its own, fighting its way through the maze of life a  
nd feeling every moment of it. It would have to understand the joy a  
nd loneliness of a chilly night wind, the longing for a cherished hand, t  
he inaccessibility of a distant town, the heartbreak and regeneration a  
fter a human death. It would have to have known resignation and worldweariness, grief and despair, determination and victory, piety and a  
we. In it would have had to commingle such opposites as hope and fear  
, anguish and jubilation, serenity and suspense. Part and parcel of i  
t would have to be a sense of grace, humor, rhythm, a sense of t  
he unexpected-and of course an exquisite awareness of the magic o  
f fresh creation. Therein, and therein only, lie the sources of meaning i  
n m  
usic. Question: Will emotions be explicitly programmed into a m  
achin
 
720
 
Speculation: No. That is ridiculous. Any direct simulation o  
f emotions-PAR RY, for example-cannot approach the complexity of   
human emotions, which arise indirectly from the organization of o  
ur minds. Programs or machines will acquire emotions in the same w  
ay: as by-products of their structure, of the way in which they are  
organized-not by direct programming. Thus, for example, n  
obody will write a "falling-in-love" subroutine, any more than they w  
ould write a "mistake-making" subroutine. "Falling in love" is a d  
escription which we attach to a complex process of a complex system; there n  
eed be no single module inside the system which is solely responsible for i  
t, h
 
720
 
Question: Will a thinking computer be able to add fa  
st? Speculation: Perhaps not. We ourselves are composed of h  
ardware which does fancy calculations but that doesn't mean that our symb  
ol level, where "we" are, knows how to carry out the same fancy calculations. Let me put it this way: there's no way that you can load n  
umbers into your own neurons to add up your grocery bill. Luckily for y  
ou, your symbol level (i.e., you) can't gain access to the neurons which a  
re doing your thinking-otherwise you'd get addle-brained. To paraphrase Descartes a  
gain: "I think; therefore I have no access  
to the level where I sum."  
Why should it not be the same for an intelligent program? It mustn'  
t be allowed to gain access to the circuits which are doing its thinkingotherwise it'll get addle-CPU'd. Quite seriously, a machine that c  
an pass the Turing test may well add as slowly as you or I do, and f  
o
 
721
 
similar reasons. It will represent the number 2 not just by the two b  
its "10", but as a full-fledged concept the way we do, replete with associations such as its homonyms "too" and "to", the words "couple" a  
nd "deuce", a host of mental images such as dots on dominos, the shape o  
f the numeral '2', the notions of alternation, evenness, oddness, and o  
n and on ... With all this "extra baggage" to carry around, an int  
elligent program will become quite slothful in its adding. Of course, we could  
give it a "pocket calculator", so to speak (or build one in). Then it could  
answer very fast, but its performance would be just like that of a  
person with a pocket calculator. There would be two separate parts to   
the machine: a reliable but mindless part and an intelligent but f  
allible part. You couldn't r ely on the composite system to be reliable, a  
ny more than a composite of person and machine is necessarily r  
eliable. So if it's right answers you're after, better stick to the pocket c  
alculator alone-don't throw in the intelligenc
 
721
 
Question: Will there be chess programs that can beat anyo  
ne? Speculation: No. There may be programs which can beat anyone a  
t chess, but they will not be exclusively chess players. They will b  
e programs of general intelligence, and they will be just as temperamental as people. "Do you want to play chess?" "No, I'm bored with c  
hess. Let's talk about poetry." That may be the kind of dialogue you c  
ould have with a program that could beat everyone. That is because r  
eal intelligence inevitably depends on a total overview capacity-that is, a  
programmed ability to 'Jump out of the system", so to speak-at l  
east roughly to the extent that we have that ability. Once that is p  
resent, you can't contain the program; it's gone beyond that certain c  
ritical point, and you just have to face the facts of what you've w  
rough
 
721
 
Question: Will there be special locations in memory which store p  
arameters governing the behavior of the program, such that if you reached i  
n and changed them, you would be able to make the program smarter o  
r stupider or more creative or more interested in baseball? In s  
hort, would you be able to "tune" the program by fiddling with it on a  
relatively low l
 
721
 
Speculation: No. It would be quite oblivious to changes of any p  
articular elements in memory, just as we stay almost exactly the same t  
hough thousands of our neurons die every day(!). If you fuss around too   
heavily, though, you'll damage it, just as if you irresponsibly d  
id neurosurgery on a human being. There will be no "magic" location i  
n memory where, for instance, the "IQ" of the program sits. Again, t  
hat will be a feature which emerges as a consequence of lower-level behavior, and nowhere will it sit explicitly. The same goes for such t  
hings as "the number of items it can hold in short-term memory", "  
the amount it likes physics", etc., e
 
721
 
Question: Could you "tune" an Al program to act like me, or like y  
ou-or halfway between us?
 
722
 
Speculation: No. An intelligent program will not be chameleon-like, a  
ny more than people are. It will r ely on the constancy of its memories, a  
nd will not be able to flit between personalities. The idea of c  
hanging internal parameters to "tune to a new personality" reveals a r  
idiculous underestimation of the complexity of p  
ersonalit
 
722
 
Question: Will there be a "heart" to an AI program, or will it simply c  
onsist of "senseless loops and sequences of trivial operations" (in the words o  
f Marvin Minsky6  
)  
? Speculation: If we could see all the way to the bottom, as we can a s  
hallow pond, we would surely see only "senseless loops and sequences o  
f trivial operations"-and we would s urely not see any "heart". N  
ow there are two kinds of extremist views on AI: one says that the h  
uman mind is, for fundamental and mysterious reasons, unprogr  
ammable. The other says that you merely need to assemble the a  
ppropriate "heuristic devices-multiple optimizers, pattern-recognition t  
ricks, planning algebras, recursive administration procedures, and the like",  
7 and you will have intelligence. I find myself somewhere in b  
etween, believing that the "pond" of an AI program will turn out to be so d  
eep and murky that we won't be able to peer all the way to the bottom. I  
f we look from the top, the loops will be invisible, just as nowadays t  
he current-carrying electrons are 'invisible to most programmers. W  
hen we create a program that passes the Turing test, we will see a "  
heart" even though we know it's not t
 
722
 
Question: Will AI programs ever become "  
superintelligent"? Speculation: I don't know. It is not clear that we would be able t  
o understand or relate to a "superintelligence", or that the concept e  
ven makes sense. For instance, our own intelligence is tied in with o  
ur speed of thought. If our reflexes had been ten times faster or s  
lower, we might have developed an entirely different set of concepts w  
ith which to describe the world. A creature with a radically different v  
iew of the world may simply not have many points of contact with us. I  
have often wondered if there could be, for instance, pieces of m  
usic which are to Bach as Bach is to folk tunes: "Bach squared", so to s  
peak. And would I be able to understand them? Maybe there is such m  
usic around me already, and I just don't recognize it, just as dogs d  
on't understand language. The idea of superintelligence is very strange. I  
n any case, I don't think of it as the aim of AI research, although if w  
e ever do reach the level of human intelligence, superintelligence w  
ill undoubtedly be the next goal-not only for us, but for our AI-program colleagues, too, who will be equally curious about AI a  
nd superintelligence. It seems quite likely that AI programs will be extremely curious about AI in general-understandab  
l
 
722
 
Question: You seem to be saying that AI programs will be virtually i  
dentical to people, then. Won't there be any differences?
 
723
 
Speculation: Probably the differences between AI programs and p  
eople will be larger than the differences between most people. It is a  
lmost impossible to imagine that the "body" in which an AI program i  
s housed would not affect it deeply. So unless it had an a  
mazingly faithful replica of a human body-and why should it?-it would probably have enormously different perspectives on what is important,  
what is interesting, etc. Wittgenstein once made the amusing comment  
, "If a lion could speak, we would not understand him." It makes m  
e think of Rousseau's painting of the gentle lion and the sleeping g  
ypsy on the moonlit desert. But how does Wittgenstein know? My guess is   
that any AI program would, if comprehensible to us, seem pretty a  
lien. For that reason, we will have a very hard time deciding when and if w  
e really are dealing with an Al program, or just a "weird" p
 
723
 
Question: Will we understand what intelligence and consciousness and f  
ree will and "I" are when we have made an intelligent p  
rogram? Speculation: Sort of-it all depends on what you mean by "  
understand". 6  
80 On a gut level, each of us probably has about as good an understanding as is possible of those things, to start with. It is like listening t  
o music. Do you really understand Bach because you have taken h  
im apart? Or did you understand it that time you felt the exhilaration i  
n every nerve in your body? Do we understand how the speed of light i  
s constant in every inertial reference frame? We can do the math, but n  
o one in the world has a truly relativistic intuition. And probably no o  
ne will ever understand the mysteries of intelligence and consciousness i  
n an intuitive way. Each of us can understand people, and that is p  
robably about as close as you can c
 
727
 
C HAPTER X  
XStrange Loops  
, Or Tangled Hierarchies
 
727
 
Can Machines Possess Ori
 
727
 
IN THE C HAPTER before last, I described Arthur Samuel's very s  
uccessful checkers program-the one which can beat its designer. In light of that, it is   
interesting to hear how Samuel himself feels about the issue of c  
omputers and originality. The following extracts are taken from a rebuttal by S  
amuel, written in 1960, to an article by Norbert Wiener.   
It is my conviction that machines cannot possess originality in the s  
ense implied by Wiener in his thesis that "machines can and do transcend some o  
f the limitations of their designers, and that in doing so they may be both   
effective and dangerous." .  
.. A machine is not a genie, it does not work by magic, it does not possess a w  
ill, and, Wiener to the contrary, nothing comes out which has not been put i  
n, barring, of course, an infrequent case of malfunctioning . ..  
.The "intentions" which the machine seems to manifest are the intentions o  
f the human programmer, as specified in advance, or they are subsid  
iary intentions derived from these, following rules specified by the p  
rogrammer. We can even anticipate higher levels of abstraction, just as Wiener does, i  
n which the program will not only modify the subsidiary intentions but will a  
lso modify the rules which are used in their derivation, or in which it will modify  
the ways in which it modifies the rules, and so on, or even in which o  
ne machine will design and construct a second machine with enhanc  
ed capabilities. However, and this is important, the machine will not and c  
annot [italics are his) do any of these things until it has been instructed as to how t  
o proceed. There is and logically then· must always remain a complete hiatu  
s between (i) any ultimate extension and elaboration in this process of c  
arrying out man's wishes and (ii) the development within the machine of a will of i  
ts own. To believe otherwise is either to believe in magic or to believe that t  
he existence of man's will is an illusion and that man's actions are as m  
echanical as the machine's. Perhaps Wiener's article and my rebuttal have both b  
een mechanically determined, but this I refuse to believe.'
 
727
 
This reminds me of the Lewis Carroll Dialogue (the Two-Part Invention); I'll try to explain why. Samuel bases his argument against m  
achine consciousness (or will) on the notion that any mechanical instantiation of w  
ill would require an in.finite regress. Similarly, Carroll's Tortoise argues that n  
o step of reasoning, no matter how simple, can be done without i  
nvoking some rule on a higher level to justify the step in question. But that b  
ein
 
728
 
also a step of reasoning, one must resort to a yet higher-level rule, and s  
o on. Conclusion: Reasoning involves an in.finite regres  
 
728
 
Of course something is wrong with the Tortoise's argument, and I  
believe something analogous is wrong with Samuel's argument. To s  
how how the fallacies are analogous, I now shall "help the Devil", by a  
rguing momentarily as Devil's advocate. (Since, as is well known, God helps tho  
se who help themselves, presumably the Devil helps all those, and only t  
hose, who don't help themselves. Does the Devil help himself?) Here are m  
y devilish conclusions drawn from the Carroll D  
ialogue: The conclusion "reasoning is impossible" does not apply to   
people, because as is plain to anyone, we do manage to carry o  
ut many steps of reasoning, all the higher levels notwiths  
tanding. That shows that we humans operate without need of rules: we a  
re "informal systems". On the other hand, as an argument agains  
t the possibility of any mechanical instantiation of reasoning, it i  
s valid, for any mechanical reasoning-system would have to d  
epend on rules explicitly, and so it couldn't get off the ground unless i  
t had metarules telling it when to apply its rules, metametarule  
s telling it when to apply its metarules, and so on. We may c  
onclude that the ability to reason can never be mechanized. It is a u  
niquely human capability
 
728
 
What is wrong with this Devil's advocate point of view? It is o  
bviously the assumption that a machine cannot do anything without having a rule telling i  
t to do so. In fact, machines get around the Tortoise's silly objections as e  
asily as people do, and moreover for exactly the same reason: both m  
achines and people are made of hardware which runs all by itself, according to t  
he laws of physics. There is no need to rely on "rules that permit you to a  
pply the rules", because the lowest-level rules-those without any "meta" 's i  
n front-are embedded in the hardware, and they run without p  
ermission. Moral: The Carroll Dialogue doesn't say anything about the differenc  
es between people and machines, after all. (And indeed, reasoning i  
s m
 
728
 
So much for the Carroll Dialogue. On to Samuel's argument. S  
amuel's point, if I may caricature it, is this:  
No computer ever "wants" to do anything, because it was programmed by someone else. Only if it could program itself f  
rom zero on up-an absurdity-would it have its own sense of d  
esire. In his argument, Samuel reconstructs the Tortoise's position, replacing "  
to reason" by "to want". He implies that behind any mechanization of d  
esire, there has to be either an infinite regress or worse, a closed loop. If this i  
s why computers have no will of their own, what about people? The same  
criterion would imply tha  
 
729
 
Unless a person designed himself and chose his own wants (as well   
as choosing to choose his own wants, etc.), he cannot be said t  
o have a will of his o  
w
 
729
 
It makes you pause to think where your sense of having a will c  
omes from. Unless you are a soulist, you'll probably say that it comes from y  
our brain-a piece of hardware which you did not design or choose. And y  
et that doesn't diminish your sense that you want certain things, and n  
ot others. You aren't a "self-programmed object" (whatever that would be),  
but you still do have a sense of desires, and it springs from the phys  
ical substrate of your mentality. Likewise, machines may someday have w  
ills despite the fact that no magic program spontaneously appears in m  
emory from out of nowhere (a "self-programmed program"). They will have wills   
for much the same reason as you do-by reason of organization a  
nd structure on many levels of hardware and software. Moral: The Samu  
el argument doesn't say anything about the differences between people a  
nd machines, after all. (And indeed, will will be mechanized.)
 
729
 
Below Every Tangled Hierarchy Lies An Inviolate L  
eve
 
729
 
Right after the Two-Part Invention, I wrote that a central issue of this b  
ook would be: "Do words and thoughts follow formal rules?" One major thr  
ust of the book has been to point out the many-leveledness of the m  
ind/brain, and I have tried to show why the ultimate answer to the question i  
s, "Yes-provided that you go down to the lowest level-the hardwa  
re-to find the rules.
 
729
 
Now Samuel's statement brought up a concept which I want to p  
ursue. It is this: When we humans think, we certainly do change our own m  
ental rules, and we change the rules that change the rules, and on and on-b  
ut these are, so to speak, "software rules". However, the rules at bottom do n  
ot change. Neurons run in the same simple way the whole time. You c  
an't "think" your neurons into running some nonneural way, although you c  
an make your mind change style or subject of thought. Like Achilles in the   
Prelude, Ant Fugue, you have access to your thoughts but not to y  
our neurons. Software rules on various levels can change; hardware rul  
es cannot-in fact, to their rigidity is due the software's flexibility! Not a  
paradox at all, but a fundamental, simple fact about the mechanisms o  
f i  
ntelligenc
 
729
 
This distinction between self-modifiable software and invi  
olate hardware is what I wish to pursue in this final Chapter, developing it into a  
set of variations on a theme. Some of the variations may seem to be q  
uite far-fetched, but I hope that by the time I close the loop by returning t  
o brains, minds, and the sensation of consciousness, you will have found a  
n invariant core in all the vari
 
729
 
My main aim in this Chapter is to communicate some of the ima  
ges which help me to visualize how consciousness rises out of the jungle o  
f neurons; to communicate a set of intangible intuitions, in the hope t  
ha
 
730
 
these intuitions are valuable and may perhaps help others a little to come t  
o clearer formulations of their own images of what makes minds run. I could   
not hope for more than that my own mind's blurry images of minds and   
images should catalyze the formation of sharper images of minds a  
nd images in other m
 
730
 
A Self-Modifying G  
am
 
730
 
A first variation, then, concerns games in which on your turn, you m  
ay modify the rules. Think of chess. Clearly the rules stay the same, just the   
board position changes on each move. But let's invent a variation in whi  
ch, on your turn, you can either make a move or change the rules. But how? A  
t liberty? Can you turn it into checkers? Clearly such anarchy would b  
e pointless. There must be some constraints. For instance, one version m  
ight allow you to redefine the knight's move. Instead of being l-and-then-2, i  
t could be m-and-then-n where m and n are arbitrary natural numbers; a  
nd on your turn you could change either m or n by plus or minus l. So i  
t could go from 1-2 to 1-3 to 0-3 to 0-4 to 0-5 to 1-5 to 2-5 ... Then t  
here could be rules about redefining the bishop's moves, and the other piece  
s' moves as well. There could be rules about adding new squares, or d  
eleting old squares . ..
 
730
 
Now we have two layers of rules: those which tell how to move p  
ieces, and those which tell how to change the rules. So we have rules a  
nd metarules. The next step is obvious: introduce metametarules by which w  
e can change the metarules. It is not so obvious how to do this. The reason i  
t is easy to formulate rules for moving pieces is that pieces move in a  
formalized space: the checkerboard. If you can devise a simple f  
ormal notation for expressing rules and metarules, then to manipulate them wil  
l be like manipulating strings formally, or even like manipulating c  
hess pieces. To carry things to their logical extreme, you could even e  
xpress rules and metarules as positions on auxiliary chess boards. Then an arbitrary chess position could be read as a game, or as a set of rules, or as a s  
et of metarules, etc., depending on which interpretation you place on it. O  
f course, both players would have to agree on conventions for interp  
reting the notati  
o
 
730
 
Now we can have any number of adjacent chess boards: one for t  
he game, one for rules, one for metarules, one for metametarules, and so o  
n, as far as you care to carry it. On your turn, you may make a move on a  
ny one of the chess boards except the top-level one, using the rules whi  
ch apply (they come from the next chess board up in the hierarchy  
). Undoubtedly both players would get quite disoriented by the fact tha  
t almost anything-though not everything!-can change. By definition, t  
he top-level chess board can't be changed, because you don't have rules t  
elling how to change it. It is inviolate. There is more that is inviolate: the conventions by which the different boards are interpreted, the agreement to t  
ake turns, the agreement that each person may change one chess board e  
ach turn-and you will find more if you examine the idea c
 
731
 
Now it is possible to go considerably further in removing the pillars by   
which orientation is achieved. One step at a time . .. We begin by c  
ollapsing the whole array of boards into a single board. What is meant by this? T  
here will be two ways of interpreting the board: (1) as pieces to be moved; (2) a  
s rules for moving the pieces. On your turn, you move pieces-and perfor  
ce, you change rules! Thus, the rules constantly change themselves. Shades o  
f Typogenetics---or for that matter, of real genetics. The distinction betw  
een game, rules, metarules, metametarules, has been lost. What was once a n  
ice clean hierarchical setup has become a Strange Loop, Or Tangled Hierarchy. The moves change the rules, the rules determine the moves, r  
ound and round the mulberry bush ... There are still different levels, but t  
he distinction between "lower" and "higher" has been wiped o
 
731
 
Now, part of what was inviolate has been made changeable. But t  
here is s.till plenty that is inviolate. Just as before, there are conventions b  
etween you and your opponent by which you interpret the board as a collection o  
f rules. There is the agreement to take turns-and probably other impl  
icit conventions, as well. Notice, therefore, that the notion of different l  
evels has survived, in an unexpected way. There is an Inviolate level-let's call i  
t the I-level-on which the interpretation conventions reside; there is also a  
Tangled level-the T-level---on which the Tangled Hierarchy resides. S  
o these two levels are still hierarchical: the I-level governs what happens o  
n the T-level, but the T-level does not and cannot affect the I-level. N  
o matter that the T-level itself is a Tangled Hierarchy-it is still governed b  
y a set of conventions outside of itself. And that is the important p
 
731
 
As you have no doubt imagined, there is nothing to stop us from d  
oing the "impossible"-namely, tangling the I-level and the T-level by m  
aking the interpretation conventions themselves subject to revision, according t  
o the position on the ches& board. But in order to carry out such a "supertangling", you'd have to agree with your opponent on some further conventions connecting the two levels--and the act of doing so would create a  
new level, a new sort of inviolate level on top of the "supertangled" level (  
or underneath it, if you prefer). And this could continue going on and on. I  
n fact, the "jumps" which are being made are very similar to those charted i  
n the Birthday Cantatatata, and in the repeated Godelization applied to vario  
us improvements on TNT. Each time you think you have reached the e  
nd, there is some new variation on the theme of jumping out of the syst  
em which requires a kind of creativity to s
 
731
 
The Authorship Triangle A  
gai
 
731
 
But I am not interested in pursuing the strange topic of the ever m  
ore abstruse tanglings which can arise in self-modifying chess. The point of t  
his has been to show, in a somewhat graphic way, how in any system there i  
s always some "protected" level which is unassailable by the rules on oth  
er levels, no matter how tangled their interaction may be among t  
hemselves. An amusing riddle from Chapter IV illustrates this same idea in a s  
lightly different context. Perhaps it will catch you off guard:
 
732
 
FIGURE 134. An "authorship triangle"
 
732
 
There are three authors-Z, T, and E. Now it happens that Z  
exists only in a novel by T. Likewise, T exists only in a novel by E  
. And strangely, E, too, exists only in a novel-by Z, of course. Now,  
is such an "authorship triangle" really possible? (See Fig. 1
 
732
 
Of course it's possible. But there's a trick ... All three authors Z, T, E, a  
re themselves characters in another novel-by H ! You can think of the Z  
-T-E triangle as a Strange Loop, Or Tangled Hierarchy; but author H is o  
utside of the space in which that tangle takes place-author H is in an invi  
olate space. Although Z, T, and E all have access-direct or indirect-to e  
ach other, and can do dastardly things to each other in their various n  
ovels, none of them can touch H's life! They can't even imagine him-no m  
ore than you can imagine the author of the book you're a character in. If I w  
ere to draw author H, I would represent him somewhere off the page. O  
f course that would present a problem, since drawing a thing necessarily p  
uts it onto the page ... Anyway, His really outside of the world of Z, T, and E  
, and should be represented as being s  
 
732
 
Escher's Drawing H  
and
 
732
 
Another classic variation on our theme is the Escher picture of D  
rawing Hands (Fig. 135). Here, a left hand (LH) draws a right hand (RH), while a  
t the same time, RH draws LH. Once again, levels which ordinarily are seen   
as hierarchical-that which draws, and that which is drawn-tum back o  
n each other, creating a Tangled Hierarchy. But the theme of the Chapter i  
s borne out, of course, since behind it all lurks the undrawn but d  
rawing hand of M. C. Escher, creator of both LH and RH. Escher is outside of t  
he two-hand space, and in my schematic version of his picture (Fig. 136), y  
ou can see that explicitly. In this schematized representation of the E  
scher picture, you see the Strange Loop, Or Tangled Hierarchy at the top; a  
lso, you see the Inviolate Level below it, enabling it to come into being. O  
ne could further Escherize the Escher picture, by taking a photograph of a  
hand drawing it. And so o
 
733
 
FIGURE 135. Drawing Hands, by M. C. Escher (lithograph, 194
 
733
 
FIGURE 136. Abstract diagram o  
f M .C. Escher's Drawing Hands. On t  
op, a seeming paradox. Below, its res
 
734
 
Brain and M  
ind: A Neural Tangle Supporting a Symbol T  
angl
 
734
 
Now we can relate this to the brain, as well as to AI programs. In o  
ur thoughts, symbols activate other symbols, and all interact heterarchically.  
Furthermore, the symbols may cause each other to change internally, in t  
he fashion of programs acting on other programs. The illusion is created,  
because of the Tangled Hierarchy of symbols, that there is no inviolate l  
evel. One thinks there is no such level because that level is shielded from o  
ur v  
ie
 
734
 
If it were possible to schematize this whole image, there would be a  
gigantic forest of symbols linked to each other by tangly lines like vines in a   
tropical jungle-this would be the top level, the Tangled Hierarchy whe  
re thoughts really flow back and forth. This is the elusive level of mind: t  
he analogue to LH and RH. Far below in the schematic picture, analogous t  
o the invisible "prime mover" Escher, there would be a representation of t  
he myriad neurons-the "inviolate substrate" which lets the tangle above i  
t come into being. Interestingly, this other level is itself a tangle in a l  
iteral sense-billions of cells and hundreds of billions of axons, joining them all   
t  
ogethe
 
734
 
This is an interesting case where a software tangle, that of the symb  
ols, is supported by a hardware tangle, that of the neurons. But only the symb  
ol tangle is a Tangled Hierarchy. The neural tangle is just a "simple" t  
angle. This distinction is pretty much the same as that between Strange Loops a  
nd feedback, which I mentioned in Chapter XVI. A Tangled Hierarchy occurs   
when what you presume are clean hierarchical levels take you by surp  
rise and fold back in a hierarchy-violating way. The surprise element is important; it is the reason I call Strange Loops "strange". A simple tangle, l  
ike feedback, doesn't involve violations of presumed level distinctions. A  
n example is when you're in the shower and you wash your left arm with y  
our right, and then vice versa. There is no strangeness to the image. E  
scher didn't choose to draw hands drawing hands for nothing!
 
734
 
Events such as two arms washing each other happen all the time in t  
he world, and we don't notice them particularly. I say something to you, t  
hen you say something back to me. Paradox? No; our perceptions of each o  
ther didn't involve a hierarchy to begin with, so there is no sense of strang  
enes
 
734
 
On the other hand, where language does create strange loops is w  
hen it talks about itself, whether directly or indirectly. Here, something in t  
he system jumps out and acts on the system, as if it were outside the s  
ystem. What bothers us is perhaps an ill-defined sense of topological wrongnes  
s: the inside-outside distinction is being blurred, as in the famous shape c  
alled a "Klein bottle". Even though the system is an abstraction, our minds u  
se spatial imagery with a sort of mental t  
opolog
 
734
 
Getting back to the symbol tangle, if we look only at it, and forget t  
he neural tangle, then we seem to see a self-programmed object-in just t  
he same way as we seem to see a self-drawn picture if we look at Drawing H  
ands and somehow fall for the illusion, by forgetting the existence of Escher. F  
o
 
735
 
the picture, this is unlikely-but for humans and the way they look at t  
heir minds, this is usually what happens. Wefeel self-programmed. Indeed, w  
e couldn't feel any other way, for we are shielded from the lower levels, t  
he neural tangle. Our thoughts seem to run about in their own space, c  
reating new thoughts and modifying old ones, and we never notice any n  
eurons helping us out! But that is to be expected. We c  
a
 
735
 
An analogous double-entendre can happen with LISP programs t  
hat are designed to reach in and change their own structure. If you look a  
t them on the LISP level, you will say that they change themselves; but if y  
ou shift levels, and think of LISP programs as data to the LISP interpreter (  
see Chapter X), then in fact the sole program that is running is the interprete  
r, and the changes being made are merely changes in pieces of data. T  
he LISP interpreter itself is shielded from changes.
 
735
 
How you describe a tangled situation of this sort depends how far b  
ack you step before describing. If you step far enough back, you can often s  
ee the clue that allows you to untangle thin
 
735
 
Strange Loops in Gover  
nmen
 
735
 
A fascinating area where hierarchies tangle is government-particularly i  
n the courts. Ordinarily, you think of two disputants arguing their cases i  
n court, and the court adjudicating the matter. The court is on a different  
level from the disputants. But strange things can start to happen when t  
he courts themselves get entangled in legal cases. Usually there is a h  
igher court which is outside the dispute. Even if two lower courts get involved in   
some sort of strange fight, with each one claiming jurisdiction over t  
he other, some higher court is outside, and in some sense it is analogous to t  
he inviolate interpretation conventions which we discussed in the w  
arped version of ches
 
735
 
But what happens when there is no higher court, and the Supr  
eme Court itself gets all tangled up in legal troubles? This sort of snarl near  
ly happened in the Watergate era. The President threatened to obey only a  
"definitive ruling" of the Supreme Court-then claimed he had the right t  
o decide what is "definitive". Now that threat never was made good; but if i  
t had been, it would have touched off  
a monumental confrontation b  
etween two levels of government, each of which, in some ways, can validly claim t  
o be "above" the other-and to whom is there recourse to decide which one is   
right? To say "Congress" is not to settle the matter, for Congress m  
ight command the President to obey the Supreme Court, yet the P  
resident might still refuse, claiming that he has the legal right to disobey t  
he Supreme Court (and Congress!) under certain circumstances. This w  
ould create a new court case, and would throw the whole system into d  
isarray, because it would be so unexpected, so Tangled-so S  
trang
 
735
 
The irony is that once you hit your head against the ceiling like t  
his, where you are prevented from jumping out of the system to a yet h  
igher authority, the only recourse is to forces which seem less well defined b  
 
736
 
rules, but which are the only source of higher-level rules anyway: t  
he lower-level rules, which in this case means the general reaction of society. I  
t is well to remember that in a society like ours, the legal system is, in a s  
ense, a polite gesture granted collectively by millions of people-and it can b  
e overridden just as easily as a river can overflow its banks. Then a s  
eeming anarchy takes over; but anarchy has its own kinds of rules, no less than d  
oes civilized society: it is just that they operate from the bottom up, not f  
rom the top down. A student of anarchy could try to discover rules according t  
o which anarchic situations develop in time, and very likely there are some   
such rules.
 
736
 
An analogy from physics is useful here. As was mentioned earlier i  
n the book, gases in equilibrium obey simple laws connecting their temperature, pressure, and volume. However, a gas can violate those laws (as a  
President can violate laws)-provided it is not in a state of equilibrium. I  
n nonequilibrium situations, to describe what happens, a physicist has recourse only to statistical mechanics-that is, to a level of description whi  
ch is not macroscopic, for the ultimate explanation of a gas's behavior a  
lways lies on the molecular level, just as the ultimate explanation of a s  
ociety's political behavior always lies at the "grass roots level". The field o  
f nonequilibrium thermodynamics attempts to find macroscopic laws to describe the behavior of gases (and other systems) which are out of equilibrium. It is the analogue to the branch of political science which would   
search for laws governing anarchical societies.
 
736
 
Other curious tangles which arise in government include the F  
BI investigating its own wrongdoings, a sheriff going to jail while in office, t  
he self-application of the parliamentary rules of procedure, and so on. One o  
f the most curious legal cases I ever heard of involved a person who c  
laimed to have psychic powers. In fact, he claimed to be able to use his p  
sychic powers to detect personality traits, and thereby to aid lawyers in p  
icking juries. Now what if this "psychic" has to stand trial himself one day? What   
effect might this have on a jury member who believes staunchly in E  
SP? How much will he feel affected by the psychic (whether or not the psychic i  
s genuine)? The territory is ripe for exploitation-a great area for selffulfilling p
 
736
 
Tangles Involving Science and the O  
ccul
 
736
 
Speaking of psychics and ESP, another sphere of life where strange l  
oops abound is fringe science. What fringe science does is to call into q  
uestion many of the standard procedures or beliefs of orthodox science, a  
nd thereby challenge the objectivity of science. New ways of i  
nterpreting evidence that rival the established ones are presented. But how do y  
ou evaluate a way of interpreting evidence? Isn't this precisely the problem o  
f objectivity all over a gain,just on a higher plane? Of course. Lewis Carrol  
l's infinite-regress paradox appears in a new guise. The Tortoise would a  
rgue that if you want to show that A is a fact, you need evidence: B. But wha  
t makes you sure that B is evidence of A? To show that, you need me
 
737
 
evidence: C. And for the validity of that meta-evidence, you need metameta-evidence-and so on, ad nauseam. Despite this argument, p  
eople have an intuitive sense of evidence. This is because-to repeat an o  
ld refrain-people have built-in hardware in their brains that includes s  
ome rudimentary ways of interpreting evidence. We can build on this, a  
nd accumulate new ways of interpreting evidence; we even learn how a  
nd when to override our most basic mechanisms of evidence interpretation, a  
s one must, for example, in trying to figure out magic t
 
737
 
Concrete examples of evidence dilemmas crop up in regard to m  
any phenomena of fringe science. For instance, ESP often seems to m  
anifest itself outside of the laboratory, but when brought into the laboratory, i  
t vanishes mysteriously. The standard scientific explanation for this is t  
hat ESP is a nonreal phenomenon which cannot stand up to rigorous s  
crutiny. Some (by no means all) believers in ESP have a peculiar way of fi  
ghting back, however. They say, "No, ESP is real; it simply goes away when o  
ne tries to observe it scientifically-it is contrary to the nature of a s  
cientific worldview." This is an amazingly brazen technique, which we might c  
all "kicking the problem upstairs". What that means is, instead of q  
uestioning the matter at hand, you call into doubt theories belonging to a higher level   
of credibility. The believers in ESP insinuate that what is wrong is not t  
heir ideas, but the belief system of science. This is a pretty grandiose claim, a  
nd unless there is overwhelming evidence for it, one should be skeptical of it.  
But then here we are again, talking about "overwhelming evidence" as i  
f everyone agreed on what that mea
 
737
 
The Nature of Evid  
enc
 
737
 
The Sagredo-Simplicio-Salviati tangle, mentioned in Chapters XIII a  
nd XV, gives another example of the complexities of evaluation of evi  
dence. Sagredo tries to find some objective compromise, if possible, between t  
he opposing views of Simplicio and Salviati. But compromise may not a  
lways be possible. How can one compromise "fairly" between right and w  
rong? Between fair and unfair? Between compromise and no compromise? T  
hese questions come up over and over again in disguised form in argume  
nts about ordinary t
 
737
 
Is it possible to define what evidence is? Is it possible to lay down l  
aws as to how to make sense out of situations? Probably not, for any rigid r  
ules would undoubtedly have exceptions, and nonrigid rules are not rules.   
Having an intelligent AI program would not solve the problem either, f  
or as an evidence processor, it would not be any less fallible than humans a  
re. So, if evidence is such an intangible thing after all, why am I w  
arning against new ways of interpreting evidence? Am I being inconsistent? In this   
case, I don't think so. My feeling is that there are guidelines which one c  
an give, and out of them an organic synthesis can be made. But i  
nevitably some amount of judgment and intuition must enter the picture-t  
hings which are different in different people. They will also be different i  
 
738
 
different AI programs. Ultimately, there are complicated criteria for deciding if a method of evaluation of evidence is good. One involves the "usefulness" of ideas which are arrived at by that kind of reasoning. Modes o  
f thought which lead to useful new things in life are deemed "valid" in s  
ome sense. But this word "useful" is extremely subjecti
 
738
 
My feeling is that the process by which we decide what is valid or w  
hat is true is an art; and that it relies as deeply on a sense of beauty a  
nd simplicity as it does on rock-solid principles of logic or reasoning or anything else which can be objectively formalized. I am not saying either (1)  
truth is a chimera, or (2) human intelligence is in principle not programmable. I am saying (1) truth is too elusive for any human or any collection o  
f humans ever to attain fully; and (2) Artificial Intelligence, when it r  
eaches the level of human intelligence-or even if it surpasses it-will still b  
e plagued by the problems of art, beauty, and simplicity, and will run up   
against these things constantly in its own search for knowledge and unders
 
738
 
"What is evidence?" is not just a philosophical question, for it i  
ntrudes into life in all sorts of places. You are faced with an extraordinary n  
umber of choices as to how to interpret evidence at every moment. You can h  
ardly go into a bookstore (or these days, even a grocery store!) without s  
eeing books on clairvoyance, ESP, UFO's, the Bermuda triangle, astrology, dowsing, evolution versus creation, black holes, psi fields, biofeedback, transcendental meditation, new theories of psychology . .. In science, there a  
re fierce debates about catastrophe theory, elementary particle theory, b  
lack holes, truth and existence in mathematics, free will, Artificial Int  
elligence, reductionism versus holism ... On the more pragmatic side of life, t  
here are debates over the efficacy of vitamin C or of laetrile, over the real size o  
f oil reserves (either underground or stored), over what causes inflation a  
nd unemployment-and on and on. There is Buckminster Fullerism, Z  
en Buddhism, Zeno's paradoxes, psychoanalysis, etc., etc. From issues as triv  
ial as where books ought to be shelved in a store, to issues as vital as what i  
deas are to be taught to children in schools, ways of interpreting evidence p  
lay an inestimable r  
ol
 
738
 
Seeing O  
nesel
 
738
 
One of the most severe of all problems of evidence interpretation is that o  
f trying to interpret all the confusing signals from the outside as to who o  
ne is. In this case, the potential for intralevel and interlevel conflict is tremendous. The psychic mechanisms have to deal simultaneously with t  
he individual's internal need for self-esteem and the constant flow of evi  
dence from the outside affecting the self-image. The result is that i  
nformation flows in a complex swirl between different levels of the personality; as i  
t goes round and round, parts of it get magnified, reduced, negated, o  
r otherwise distorted, and then those parts in turn get further subjected t  
o the same sort of swirl, over and over again-all of this in an attempt t  
o reconcile what is, with what we wish were (see Fig. 81).
 
739
 
The upshot is that the total picture o f"who I am" is integrated in s  
ome enormously complex way inside the entire mental structure, and c  
ontains in each one of us a large number of unresolved, possibly u  
nresolvable, inconsistencies. These undoubtedly provide much of the dynamic t  
ension which is so much a part of being human. Out of this tension between the   
inside and outside notions of who we are come the drives towards var  
ious goals that make each o fus unique. Thus, ironically, something which we a  
ll have in common-the fact of being self-reflecting conscious beings-lead  
s to the rich diversity in the ways we have of internalizing evidence about a  
ll sorts of things, and in the end winds up being one of the major forces i  
n creating distinct i
 
739
 
Godel's Theorem and Other Disciplin  
e
 
739
 
It is natural to try to draw parallels between people and sufficiently complicated formal systems which, like people, have "self-images" of a sort.  
Godel's Theorem shows that there are fundament.\l limitations to consistent formal systems with self-images. But is it more general? Is there a  
"Godel's Theorem of psychology", for i
 
739
 
If one uses Godel's Theorem as a metaphor, as a source of i  
nspiration, rather than trying to translate it literally into the language of psychology o  
r of any other discipline, then perhaps it can suggest new truths in psychology or other areas. But it is quite unjustifiable to translate it directly into a  
statement of another discipline and take that as equally valid. It would be a  
large mistake to think that what has been worked out with the utmo  
st delicacy in mathematical logic should hold without modification in a completely different a
 
739
 
Introspection and Insanity: A Godelian P  
roble
 
739
 
I think it can have suggestive value to translate Godel's Theorem into o  
ther domains, provided one specifies in advance that the translations a  
re metaphorical and are not intended to be taken literally. That having b  
een said, I see two major ways of using analogies to connect Godel's T  
heorem and human thoughts. One involves the problem of wondering about o  
ne's sanity. How can you figure out if you are sane? This is a Strange L  
oop indeed. Once you begin to question your own sanity, you can get trapped i  
n an ever-tighter vortex of self-fulfilling prophecies, though the process is b  
y no means inevitable. Everyone knows that the insane interpret the w  
orld via their own peculiarly consistent logic; how can you tell if your own l  
ogic is "peculiar" or not, given that you have only your own logic to judge i  
tself? I don't see any answer. I am just reminded of Godel's second T  
heorem, which implies that the only versions of formal number theory which a  
ssert their own consistency are inconsistent .
 
740
 
Can We Understand Our Own Minds or Bra  
in
 
740
 
The other metaphorical analogue to Godel's Theorem which I find provocative suggests that ultimately, we cannot understand our own m  
inds/ brains. This is such a lo:ided, many-leveled idea that one must be e  
xtremely cautious in proposing it. What does "understanding our own minds/b  
rains" mean? It could mean having a general sense of how they work, a  
s mechanics have a sense of how cars work. It could mean having a comple  
te explanation for why people do any and all things they do. It could m  
ean having a complete understanding of the physical structure of one's o  
wn brain on all levels. It could mean having a complete wiring diagram of a  
brain in a book (or library or computer). It could mean knowing, at e  
very instant, precisely what is happening in one's own brain on the n  
eural level-each firing, each synaptic alteration, and so on. It could mean h  
aving written a program which passes the Turing test. It could mean kno  
wing oneself so perfectly that such notions as the subconscious and the i  
ntuition make no sense, because everything is out in the open. It could mean a  
ny number of other t
 
740
 
Which of these types of self-mirroring, if any, does the self-  
mirroring in Godel's Theorem most resemble? I would hesitate to say. Some of t  
hem are quite silly. For instance, the idea of being able to monitor your own  
brain state in all its detail is a pipe dream, an absurd and u  
ninteresting proposition to start with; and if Godel's Theorem suggests that it is impossible, that is hardly a revelation. On the other hand, the age-old goal o  
f knowing yourself in some profound way-let us call it "understanding y  
our own psychic structure"-has a ring of plausibility to it. But might there n  
ot be some vaguely Godelian loop which limits the depth to which any individual can penetrate into his own psyche? Just as we cannot see our fa  
ces with our own eyes, is it not reasonable to expect that we cannot mirror o  
ur complete mental structures in the symbols which carry them o  
u
 
740
 
All the limitative Theorems of metamathematics and the theory o  
f computation suggest that once the ability to represent your own s  
tructure has reached a certain critical point, that is the kiss of death: it guarant  
ees that you can never represent yourself totally. Godel's Incompleteness Theorem, Church's Undecidability Theorem, Turing's Halting T  
heorem, Tarski's Truth Theorem-all have the flavor of some ancient fairy tale   
which warns you that "To seek self-knowledge is to embark on a j  
ourney which ... will always be incomplete, cannot be charted on any map, wil  
l never halt, cannot be d
 
740
 
But do the limitative Theorems have any bearing on people? Here i  
s one way of arguing the case. Either I am consistent or I am inconsist  
ent. (The latter is much more likely, but for completeness' sake, I consider b  
oth possibilities.) If I am consistent, then there are two cases. (1) The "lowfidelity" case: my self-understanding is below a certain critical point. In t  
his case, I am incomplete by hypothesis. (2) The "high-fidelity" case: M  
y self-understanding has reached the critical point where a metaphoric  
al analogue of the limitative Theorems does apply, so my self-  
understandi
 
741
 
undermines itself in a Godelian way, and I am incomplete for that reason.  
Cases (1) and (2) are predicated on my being 100 per cent consisten  
t-a very unlikely state of affairs. More likely is that I am i  
nconsistent-but that's worse, for then inside me there are contradictions, and how can I  
ever understand t
 
741
 
Consistent or inconsistent, no one is exempt from the mystery of t  
he self. Probably we are all inconsistent. The world is just too complicated for a  
person to be able to afford the luxury of reconciling all of his beliefs w  
ith each other. Tension and confusion are important in a world where m  
any decisions must be made quickly. Miguel de Unamuno once said, "If a  
person never contradicts himself, it must be that he says nothing." I w  
ould say that we all are in the same boat as the Zen master who, after contradicting himself several times in a row, said to the confused Doko, "I canno  
t understand m
 
741
 
Godel's Theorem and Personal Nonex  
istenc
 
741
 
Perhaps the greatest contradiction in our lives, the hardest to handle, is t  
he knowledge "There was a time when I was not alive, and there will come a  
time when I am not alive." On one level, when you "step out of y ourself"  
and see yourself as ')ust another human being", it makes complete s  
ense. But on another level, perhaps a deeper level, personal nonexistence m  
akes no sense at all. All that we know is embedded inside our minds, and for a  
ll that to be absent from the universe is not comprehensible. This is a b  
asic undeniable problem of life; perhaps it is the best metaphorical analogue o  
f Godel's Theorem. When you try to imagine your own nonexistence, y  
ou have to try to jump out of yourself, by mapping yourself onto someone else.   
You fool yourself into believing that you can import an outsider's view of   
yourself into you, much as TNT "believes" it mirrors its own m  
etatheory inside itself. But TNT only contains its own metatheory up to a c  
ertain extent-not fully. And as for you, though you may imagine that you h  
ave jumped out of yourself, you never can actually do so-no more t  
han Escher's dragon can jump out of its native two-dimensional plane into t  
hree dimensions. In any case., this contradiction is so great that most of our l  
ives we just sweep the whole mess under the rug, because trying to deal with i  
t just leads n
 
741
 
Zen minds, on the other hand, revel in this irreconcilability. Over a  
nd over again, they face the conflict between the Eastern belief: "The w  
orld and I are one, so the notion of my ceasing to exist is a contradiction i  
n terms" (my verbalization is undoubtedly too Westernized-apologies t  
o Zenists), and the Western belief: "I am just part of the world, and I will d  
ie, but the world will go on without m
 
741
 
Science and D  
ualis
 
741
 
Science is often criticized as being too "Western" or "dualistic"-that i  
s, being permeated by the dichotomy between subject and object, or obser  
ve
 
742
 
and observed. While it is true that up until this century, science w  
as exclusively concerned with things which can be readily distinguished f  
rom their human observers-such as oxygen and carbon, light and heat, s  
tars and planets, accelerations and orbits, and so on-this phase of science was a  
necessary prelude to the more modern phase, in which life itself has c  
ome under investigation. Step by step, inexorably, "Western" science has m  
oved towards investigation of the human mind-which is to say, of the obser  
ver. Artificial Intelligence research is the furthest step so far along that r  
oute. Before AI came along, there were two major previews of the strang  
e consequences of the mixing of subject and object in science. One was t  
he revolution of quantum mechanics, with its epistemological problems involving the interference of the observer with the observed. The other was t  
he mixing of subject and object in metamathematics, beginning with Godel  
's Theorem and moving through all the other limitative Theorems we h  
ave discussed. Perhaps the next step after AI will be the self-application o  
f science: science studying itself as an object. This is a different manner o  
f mixing subject and object-perhaps an even more tangled one than that o  
f humans studying their own m  
ind
 
742
 
By the way, in passing, it is interesting to note that all results e  
ssentially dependent on the fusion of subject and object have been limitative result  
s. In addition to the limitative Theorems, there is Heisenberg's u  
ncertainty principle, which says that measuring one quantity renders impossible t  
he simultaneous measurement ofa related quantity. I don't know why all these   
results are limitative. Make of it what you w
 
742
 
Symbol vs. Object in Modern Music and A  
r
 
742
 
Closely linked with the subject-object dichotomy is the symbol-o  
bject dichotomy, which was explored in depth by Ludwig Wittgenstein in t  
he early part of this century. Later the words "use" and "mention" w  
ere adopted to make the same distinction. Quine and others have written a  
t length about the connection between signs and what they stand for. But n  
ot only philosophers have devoted much thought to this deep and a  
bstract matter. In our century both music and art have gone through crises whi  
ch reflect a profound concern with this problem. Whereas music and p  
ainting, for instance, have traditionally expressed ideas or emotions through a  
vocabulary of "symbols" (i.e. visual images, chords, rhythms, or w  
hatever), now there is a tendency to explore the capacity of music and art to n  
otexpress anything-just to be. This means to exist as pure globs of paint, o  
rpure sounds, but in either case drained of all symbolic v  
alu
 
742
 
In music, in particular, John Cage has been very influential in b  
ringing a Zen-like approach to sound. Many of his pieces convey a disdain for "  
use" of sounds-that is, using sounds to convey emotional states-and an exultation in "mentioning" sounds-that is, concocting arbitrary juxtapositions o  
f sounds without regard to any previously formulated code by which a  
listener could decode them into a message. A typical example is "Ima  
ginary Landscape no. 4", the polyradio piece described in Chapter VI. I may n  
o
 
743
 
be doing Cage justice, but to me it seems that much of his work has b  
een directed at bringing meaninglessness into music, and in some sense, a  
t making that meaninglessness have meaning. Aleatoric music is a typi  
cal exploration in that direction. (Incidentally, chance music is a close cousin t  
o the much later notion of "happenings" or "be-in" 's.) There are many o  
ther contemporary composers who are following Cage's lead, but few with a  
s much originality. A piece by Anna Lockwood, called "Piano Burni  
ng", involves just that-with the strings stretched to maximum tightness, t  
o make them snap as loudly as possible; in a piece by LaMonte Young, t  
he noises are provided by shoving the piano all around the stage and t  
hrough obstacles, like a battering r
 
743
 
Art in this century has gone through many convulsions of this g  
eneral type. At first there was the abandonment of representation, which was   
genuinely revolutionary: the beginnings of abstract art. A gradual swoo  
p from pure representation to the most highly abstract patterns is revealed i  
n the work of Piet Mondrian. After the world was used to nonrepresentational art, then surrealism came along. It was a bizarre about-face, something like neoclassicism in music, in which extremely representational a  
rt was "subverted" and used for altogether new reasons: to shock, c  
onfuse, and amaze. This school was founded by Andre Breton, and was l  
ocated primarily in France; some of its more influential members were D  
ali, Magritte, de Chirico, Tang
 
743
 
Magritte's Semantic Ill  
usion
 
743
 
Of all these artists, Magritte was the most conscious of the symbol-obj  
ect mystery (which I see as a deep extension of the use-mention distinction).  
He uses it to evoke powerful responses in viewers, even if the viewers d  
o not verbalize the distinction this way. For example, consider his v  
ery strange variation on the theme of still life, called Common Sense (Fig. 1
 
743
 
FIGURE 137. Common Sense, by Rene Magritte (194
 
744
 
FIGURE 138. The Two Mysteries, by Rene Magritte (19
 
744
 
Here, a dish filled with fruit, ordinarily the kind of thing represen  
ted inside a still life, is shown sitting on top of a blank canvas. The c  
onflict between the symbol and the real is great. But that is not the full irony, fo  
r of course the whole thing is itself just a painting-in fact, a still life w  
ith nonstandard subject m
 
744
 
Magritte's series of pipe paintings is fascinating and perplexing. Consider The Two MysteriÂŁs (Fig. 138). Focusing on the inner painting, you g  
et the message that symbols and pipes are different. Then your glance m  
oves upward to the "real" pipe floating in the air-you perceive that it is r  
eal, while the other one is just a symbol. But that is of course totally wrong: b  
oth of them are on the same flat surface before your eyes. The idea that o  
ne pipe is in a twice-nested painting, and therefore somehow "less real" t  
han the other pipe, is a complete fallacy. Once you are willing to "enter t  
he room", you have already been tricked: you've fallen for image as reality. T  
o be consistent in your gullibility, you should happily go one level f  
urther down, and confuse image-within-image with reality. The only way not to b  
e sucked in is to see both pipes merely as colored smudges on a surface a few  
inches in front of your nose. Then, and only then, do you appreciate t  
he full meaning of the written message "Ceci n'est pas une pipe"-but ironically, at the very instant everything turns to smudges, the writing too t  
urns to smudges, thereby losing its meaning! In other words, at that instant, t  
he verbal message of the painting self-destructs in a most Godelian w
 
745
 
FIGURE 139. Smoke Signal. [ Drawing by the author.]
 
745
 
The Air and the Song (Fig. 82), taken from a series by M  
agritte, accomplishes all that The Two Myster  
ies does, but in one level instead of t  
wo. My drawings Smoke Signal and Pipe Dream (Figs. 139 and 140) c  
onstitute "Variations on a Theme of Magritte". Try staring at Smoke Signal for a  
while. Before long, you should be able to make out a hidden m  
essage saying, "Ceci n'est pas un message". Thus, if you find the message, it d  
enies itself-yet if you don't, you miss the point entirely. Because of their i  
ndirect self-snuffing, my two pipe pictures can be loosely mapped onto G  
odel's G-thus giving rise to a "Central Pipemap", in the same spirit as the o  
ther"Central Xmaps": Dog, Crab, Sloth.
 
745
 
A classic example of use-mention confusion in paintings is the occurrence of a palette in a painting. Whereas the palette is an illusion created b  
y the representational skill of the painter, the paints on the painted p  
alette are literal daubs of paint from the artist's palette. The paint plays itself-it  
does not symbolize anything else. In Don Giovanni, Mozart exploited a  
related trick: he wrote into the score explicitly the sound of an orchest  
ra tuning up. Similarly, if I want the letter 'I' to play itself (and not symbolize   
me), I put 'I' directly into my text; then I enclose 'I' between quotes. W  
hat results is "I" (not 'I', nor '"I '"). Got t
 
746
 
FIGURE 140. Pipe Dream. [Drawing b-y the author.]
 
746
 
The "Code" of Modern A  
r
 
746
 
A large number of influences, which no one could hope to pin d  
own completely, led to further explorations of the symbol-object dualism in a  
rt. There is no doubt that John Cage, with his interest in Zen, had a p  
rofound influence on art as well as on music. His friends Jasper Johns and R  
obert Rauschenberg both explored the distinction between objects and s  
ymbols by using objects as symbols for themselves-or, to flip the coin, by u  
sing symbols as objects in themselves. All of this was perhaps intended to b  
reak down the notion that art is one step removed from reality-that art s  
peaks in "code", for which the viewer must act as interpreter. The idea was t  
o eliminate the step of interpretation and let the naked object simply b  
e,period. ("Period"-a curious case of use-mention blur.) However, if t  
his was the intention, it was a monumental flop, and perhaps had to be.
 
746
 
Any time an object is exhibited in a gallery or dubbed a "work", i  
t acquires an aura of deep inner significance-no matter how much the   
viewer has been warned not to look for meaning. In fact, there is a backfiring effect whereby the more that viewers are told to look at these obj  
ects without mystification, the more mystified the viewers get. After all, if a
 
747
 
wooden crate on a museum floor is just a wooden crate on a museum fl  
oor, then why doesn't the j;mitor haul it out back and throw it in the g  
arbage? Why is the name of an artist attached to it? Why did the artist want t  
o demystify art? Why isn't that dirt clod out front labeled with an a  
rtist's name? Is this a hoax? Am I c razy, or are artists crazy? More and m  
ore questions flood into the viewer's mind; he can't help it. This is the "  
frame effect" which art-Art-automatically creates. There is no way to s  
uppress the wanderings in the minds of the curious.
 
747
 
Of course, if the purpose is to instill a Zen-like sense of the world a  
s devoid of categories and meanings, then perhaps such art is merely intended to serve-as does intellectualizing about Zen-as a catalyst to i  
nspire the viewer to go out and become acquainted with the philosophy whi  
ch rejects "inner meanings" and embraces the world as a whole. In this c  
ase, the art is self-defeating in the short run, since the viewers do ponder a  
bout its meaning, but it achieves its aim with a few people in the long run, b  
y introducing them to its sources. But in either case, it is not true that there i  
s no code by which ideas are conveyed to the viewer. Actually, the code is a   
much more complex thing, involving statements about the absence of c  
odes and so forth-that is, it is part code, part metacode, and so on. There is a  
Tangled Hierarchy of messages being transmitted by the most Zen-like a  
rt objects, which is perhaps why so many find modern art so i
 
747
 
Ism Once A  
gai
 
747
 
Cage has led a movement to break the boundaries between art and n  
ature. In music, the theme is that all sounds are equal-a sort of a  
coustical democracy. Thus silence is just as important as sound, and random s  
ound is just as important as organized ïżœound. Leonard B. Meyer, in his b  
ook Music, the Arts, and Ideas, has called this movement in music "transcendentalism", and states:  
If the distinction between art and nature is mistaken, aesthetic valuation i  
s irrelevant. One should no more judge the value of a piano sonata than o  
ne should judge the value of a stone, a thunderstorm, or a starfish. "Cate  
gorical statements, such as right and wrong, beautiful or ugly, ty,Pical of t  
he rationalistic thinking of tonal aesthetics," writes Luciano Berio La contemporary composer], "are no longer useful in understanding why and how a   
composer today works on audible forms and musical action  
." Later, Meyer continues in describing the philosophical position of transc  
endentalism: ... all things in all of time and space are inextricably connected with o  
ne another. Any divisions, classifications, or organizations discovered in t  
he universe are arbitra ry. The world is a complex, continuous, singl  
e event. 2 [ Shades of Zeno!  
] I find "transcendentalism" too bulky a name for this movement. In i  
ts place, I use "ism''. Being a suffix without a prefix, it suggests an i  
deolog
 
748
 
FIGURE 141. The Human Condition I, by Rene Magritte (193
 
749
 
without ideas-which, however you interpret it, is probably the case. A  
nd since "ism" embraces whatever is, its name is quite fitting. In "ism" the w  
ord "is" is half mentioned, half used; what could be more appropriate? Ism is   
the spirit of Zen in art. And just as the central problem of Zen is to u  
nmask the self, the central problem of art in this century seems to be to figure o  
ut what art is. All these thrashings-about are part of its identity cris  
i
 
749
 
We have seen that the use-mention dichotomy, when pushed, tur  
ns into the philosophical problem of symbol-object dualism, which links it t  
o the mystery of mind. Magritte wrote about his painting The Human Condition I (Fig. 14 1):  
I placed in front of a window, seen from a room, a painting r  
epresenting exactly that part of the landscape which was hidden from view by the painting. Therefore, the tree represented in the painting hid from view the t  
ree situated behind it, outside the room. It existed for the spectator, as it w  
ere, simultaneously in his mind, as both inside the room in the painting, a  
nd outside in the real landscape. Which is how we see the world: we see it as be  
ing outside ourselves even though it is only a mental representation of it that w  
e experience inside ourselves.3
 
749
 
Understanding the Mind
 
749
 
First through the pregnant images of his painting, and then in d  
irect words, Magritte expresses the link between the two questions "How d  
o symbols work?" and "How do our minds work?" And so he leads us back t  
o the question posed earlier: "Can we ever hope to understand our m  
inds/ b
 
749
 
Or does some marvelous diabolical Godelian proposition preclude o  
ur ever unraveling our minds? Provided you do not adopt a totally unreasonable definition of "understanding", I see no Godelian obstacle in the way o  
f the eventual understanding of our minds. For instance, it seems to me q  
uite reasonable to desire to understand the working principles of brains i  
n general, much the same way as we understand the working principles of c  
ar engines in general. It is quite different from trying to understand a  
ny single brain in every last detail-let alone trying to do this for one's o  
wn brain! I don't see how Godel's Theorem, even if construed in the s  
loppiest way, has anything to say about the feasibility of this prospect. I see n  
o reason that Godel's Theorem imposes any limitations on our ability t  
o formulate and verify the general mechanisms by which thought p  
rocesses take place in the medium of nerve cells. I see no barrier imposed by G  
odel's Theorem to the implementation on computers (or their successors) of type  
s of symbol manipulation that achieve roughly the same results as brains d  
o. It is entirely another question to try and duplicate in a program some   
particular human's mind-but to produce an intelligent program at all is a  
more limited goal. Godel's Theorem doesn't ban our reproducing our own  
level of intelligence via programs any more than it bans our r  
eproducing our own level of intelligence via transmission of hereditary information i  
 
750
 
DNA, followed by education. Indeed, we have seen, in Chapter XVI, how a  
remarkable Godelian mechanism-the Strange Loop of proteins a  
nd DNA-is precisely what allows transmission of intelligence!
 
750
 
Does Godel's Theorem, then, have absolutely nothing to offer us i  
n thinking about our own minds? I think it does, although not in the mys  
tical and limitative way which some people think it ought to. I think that t  
he process of coming to understand Godel's proof, with its construction involving arbitrary codes, complex isomorphisms, high and low levels o  
f interpretation, and the capacity for self-mirroring, may inject some r  
ich undercurrents and flavors into one's set of images about symbols a  
nd symbol processing, which may deepen one's intuition for the r  
elationship between mental structures on different l
 
750
 
Accidental lnexplicability of Inte  
lligenc
 
750
 
Before suggesting a philosophically intriguing "application" of G  
odel's proof, I would like to bring up the idea of "accidental inexplicability" o  
f intelligence. Here is what that involves. It could be that our brains, u  
nlike car engines, are stubborn and intractable systems which we cannot nea  
tly decompose in any way. At present, we have no idea whe'ther our brains wil  
l yield to repeated attempts to cleave them into clean layers, each of whi  
ch can be explained in terms of lower layers--or whether our brains will f  
oil all our attempts at d
 
750
 
But even if we do fail to understand ourselves, there need not be a  
ny Godelian "twist" behind it; it could be simply an accident of fate that o  
ur brains are too weak to understand themselves. Think of the lowly giraffe,  
for instance, whose brain is obviously far below the level required f  
or self-understanding-yet it is remarkably similar to our own brain. In fa  
ct, the brains of giraffes, elephants, baboons--even the brains of tortoises o  
r unknown beings who are far smarter than we are-probably all operate o  
n basically the same set of principles. Giraffes may lie far below the thresh  
old of intelligence necessary to understand how those principles fit together t  
o produce the qualities of mind; humans may lie closer to that thresholdperhaps just barely below it, perhaps even above it. The point is that the  
re may be no fundamental (i.e., Godelian) reason why those qualities are incomprehensible; they may be completely clear to more intelligent b
 
750
 
Undecidability Is Inseparable from a High-Level Vi  
ewpoin
 
750
 
Barring this pessimistic notion of the accidental inexplicability of the b  
rain, what insights might Godel's proof offer us about explanations of o  
ur minds/brains? Godel's proof offers the notion that a high-level view of a  
system may contain explanatory power which simply is absent on the l  
ower levels. By this I mean the following. Suppose someone gave you G, G  
odel's undecidable string, as a string of TNT. Also suppose you knew nothing of   
Godel-numbering. The question you are supposed to answer is: "Why i
 
751
 
this string a theorem of TNT?" Now you are used to such questions; for   
instance, if you had been asked that question about 50=0, you would have a   
ready explanation: "Its negation, -50==0, is a theorem." This, together w  
ith your knowledge that TNT is consistent, provides an explanation of why t  
he given string is a nontheorem. This is what I call an explanation "on the   
TNT-level". Notice how different it is from the explanation of why MU i  
s not a theorem of the MIU-system: the former comes from the M-mode, t  
he latter only from the I-mo
 
751
 
Now what about G? The TNT-level explanation which worked f  
or 50=0 does not work for G, because --G is not a theorem. The person w  
ho has no overview of TNT will be baffled as to why he can't make G a  
ccording to the rules, because as an arithmetical proposition, it apparently h  
as nothing wrong with it. In fact, when G is turned into a universally quantified string, every instance gotten from G by substituting numerals for t  
he variables can be derived. The only way to explain G's nontheoremhood is t  
o discover the notion of Godel-numbering and view TNT on an e  
ntirely different level. It is not that it is just difficult and complicated to write o  
ut the explanation on the TNT-level; it is impossible. Such an e  
xplanation simply does not exist. There is, on the high level, a kind of e  
xplanatory power which simply is lacking, in principle, on the TNT-level. G's nontheoremhood is, so to speak, an intrinsically high-level fact. It is my susp  
icion that this is the case for all undecidable propositions; that is to say: e  
very undecidable proposition is actually a Godel sentence, asserting its o  
wn nontheoremhood in some system via some c  
od
 
751
 
Consciousness as an Intrinsically Hig h-Level Phen  
omeno
 
751
 
Looked at this way, Godel's proof suggests-though by no means does i  
t prove!-that there could be some high-level way of viewing the mind/b  
rain, involving concepts which do not appear on lower levels, and that this l  
evel might have explanatory power that does not exist-not even i  
n principle-on lower levels. It would mean that some facts could be explained on the high level quite easily, but not on lower levels at all. N  
o matter how long and cumbersome a low-level statement were made, i  
t would not explain the phenomena in question. It is the analogue to the f  
act that, if you make derivation after derivation in TNT, no matter how l  
ong and cumbersome you make them, you will never come up with one f  
or G-despite the fact that on a higher level, you can see that G is t  
ru
 
751
 
What might such high-level concepts be? It haïżœ been proposed f  
oreons, by various holistically or "soulistically" inclined scientists a  
nd humanists, that consciousness is a phenomenon that escapes explanation i  
n terms of brain-components; so here is a candidate, at least. There is also t  
he ever-puzzling notion of free will. So perhaps these qualities could be "emergent" in the sense of requiring explanations which cannot be furnished b  
y the physiology alone. But it is important to realize that if we are b  
eing guided by Godel's proof in making such bold hypotheses, we must carry t  
h
 
752
 
analogy through thoroughly. In particular, it is vital to recall that G  
's nontheoremhood does have an explanation-it is not a total mystery! The   
explanation hinges on understanding not just one level at a time, but t  
he way in which one level mirrors its metalevel, and the consequences of t  
his mirroring. If our analogy is to hold, -then, "emergent" phenomena w  
ould become explicable in terms of a relationship between different levels i  
n mental s
 
752
 
Strange Loops as the Crux of Consciousness
 
752
 
My belief is that the explanations of "emergent" phenomena in o  
ur brains-for instance, ideas, hopes, images, analogies, and finally consciousness and free will-are based on a kind of Strange Loop, an i  
nteraction between levels in which the top level reaches back down towards the b  
ottom level and influences it, while at the same time being itself determined by t  
he bottom level. In other words, a self-reinforcing "resonance" between different levels---quite like the Henkin sentence which, by merely asserting i  
ts own provability, actually becomes provable. The self comes into being a  
t the moment it has the power to reflect i  
tsel
 
752
 
This should not be taken as an antireductionist position. It just implie  
s that a reductionistic explanation of a mind, in order to be comprehensihle, m  
ust bring in "soft" concepts such as levels, mappings, and meanings. In principle, I have no doubt that a totally reductionistic but i  
ncomprehensible explanation of the brain exists; the problem is how to translate it into a  
language we ourselves can fathom. Surely we don't want a description i  
n terms of positions and momenta of particles; we want a description w  
hich relates neural activity to "signals" (intermediate-level phenomena)-an  
d which relates signals, in turn, to "symbols" and "subsystems", including t  
he presumed-to-exist "self-symbol". This act of translation from low-l  
evel physical hardware to high-level psychological software is analogous to t  
he translation of number-theoretical statements into metamathematical statements. Recall that the level-crossing which takes place at this exact translation point is what creates Godel's incompleteness and the self-p  
roving character of Henkin's sentence. I postulate that a similar level-crossing i  
s what creates our nearly unanalyzable feelings of s
 
752
 
In order to deal with the full richness of the brain/mind system, we w  
ill have to be able to slip between levels comfortably. Moreover, we will have t  
o admit various types of "causality": ways in which an event at one level o  
f description can "cause" events at other levels to happen. Sometimes e  
vent A will be said to "cause" event B simply for the reason that the one is a  
translation, on another level of description, of the other. S  
ometimes "cause" will have its usual meaning: physical causality. Both types of   
causality-and perhaps some more-will have to be admitted in any explanation of mind, for we will have to admit causes that propagate both   
upwards and downwards in the Tangled Hierarchy of mentality, just as i  
n the Central D
 
753
 
At the crux, then, of our understanding ourselves will come an understanding of the Tangled Hierarchy of levels inside our minds. My p  
osition is rather similar to the viewpoint put forth by the neuroscientist R  
oger Sperry in his excellent article "Mind. Brain, and Humanist Values", f  
rom which I quote a little h  
ere: In my own hypothetical brain model. conscious awareness does get representation as a very real causal agent and rates an important place in the c  
ausal sequence and chain of control in brain events, in which it appears as an a  
ctive, operational force . ... To put it very ,imply, it comes down to the issue of w  
ho pushes whom around in the population of causal forces that occupy t  
he cranium. It is a matter, in other words, of straightening out the p  
eck-order hierarchy among intracranial control agents. There exists within the c  
ranium a whole world of diverse causal forces; what is more, there are forces w  
ithin forces within forces, as in no other cubic half-foot of universe that we know .  
. . . To make a long story short, if one keeps climbing upward in the chain of   
command within the brain, one finds at the very top those over-all organizational forces and dynamic properties of the large patterns of cerebral excitation that are correlated with mental ,tates or psychic activity . ... Near t  
he apex of this command system in the brain ... we find ideas. Man over the   
chimpanzee has ideas and ideals. In the brain model proposed here, t  
he causal potency of an idea, or an ideal, becomes just as real as that of a  
molecule, a cell, or a nerve impulse. Ideas cause ideas and help evolve n  
ew ideas. They interact with each other and with other mental forces in the s  
ame brain, in neighboring brains, and, thanks to global communication, in f  
ar distant, foreign brains. And they also interact with the external s  
urroundings to produce in toto a burstwise advance in evolution that is far beyond anything to hit the evolutionary scene yet, including the emergence of the l  
iving cell.4
 
753
 
There is a famous breach between two languages of discourse: the   
subjective language and the objective language. For instance, the "subjective" sensation of redness, and the ‹·objective" wavelength of red light. T  
o many people, these seem to be forever irreconcilable. I don't think so. N  
o more than the two views of Escher's Drawing Hands are irreconcilablefrom "in the system", where the hands draw each other, and from o  
utside, where Escher draws it all. The subjective feeling of redness comes from t  
he vortex of self-perception in the brain; the objective wavelength is how y  
ou see things when you step back, outside of the system. Though no one of u  
s will ever be able to step back far enough to see the "big picture", w  
e shouldn't forget that it exists. We should remember that physical law i  
s what makes it all happen-way, way down in neural nooks and crannies   
which are too remote for us to reach with our high-level i  
ntrospective p  
robe
 
753
 
The Self-Symbol and Free W  
il
 
753
 
In Chapter XII, it was suggested that what we call free will is a result of t  
he interaction between the self-symbol (or subsystem), and the other s  
ymbols in the brain. If we take the idea that symbols are the high-level entities t  
 
754
 
which meanings should be attached, then we can make a stab at explain  
ing the relationship between symbols, the self-symbol, and free w
 
754
 
One way to gain some perspective on the free-will question is to r  
eplace it by what I believe is an equivalent question, but one which involves less   
loaded terms. Instead of asking, "Does system X have free will?" we a  
sk, "Does system X make choices?" By carefully groping for what we r  
eally mean when we choose to describe a system-mechanical or b  
iological-as being capable of making "choices", I think we can shed much light on f  
ree will. It will be helpful to go over a few different systems which, u  
nder various circumstances, we might feel tempted to describe as "  
making choices". From these examples we can gain some perspective on what w  
e really mean by the p
 
754
 
Let us take the following systems as paradigms: a marble rolling d  
own a bumpy hill; a pocket calculator finding successive digits in the d  
ecimal expansion of the square root of 2; a sophisticated program which plays a  
mean game of chess; a robot in a T-maze (a maze with but a single fork, o  
n one side of which there is a reward); and a human being confronting a  
complex d  
ilemm
 
754
 
First, what about that marble rolling down a hill? Does it make choices?  
I think we would unanimously say that it doesn't, even though none of u  
s could predict its path for even a very short distance. We feel that it coul  
dn't have gone any other way than it did, and that it was just being shoved a  
long by the relentless laws of nature. In our chunked mental physics, of c  
ourse, we can visualize many different "possible" pathways for the marble, and w  
e see it following only one of them in the real world. On some level of o  
ur minds, therefore, we can't help feeling the marble has "chosen" a s  
ingle pathway out of those myriad mental ones; but on some other level of o  
ur minds, we have an instinctive understanding that the mental physics is o  
nly an aid in our internal modeling of the world, and that the m  
echanisms which make the real physical sequences of events happen do not require   
nature to go through an analogous process of first manufacturing var  
iants in some hypothetical universe (the "brain of God") and then c  
hoosing between them. So we shall not bestow the designation "choice" upon t  
his process-although we recognize that it is often pragmatically useful to u  
se the word in cases like this, because of its evocative p  
owe
 
754
 
Now what about the calculator programmed to find the digits of t  
he square root of 2? What about the chess program? Here, we might say t  
hat we are just dealing with "fancy marbles", rolling down "fancy hills". In fac  
t, the arguments for no choice-making here are, if anything, stronger than i  
n the case of a marble. For if you attempt to repeat the marble e  
xperiment, you will undoubtedly witness a totally different pathway being traced d  
own the hill, whereas if you rerun the square-root-of-2 program, you will get t  
he same results time after time. The marble seems to "choose" a different path  
each time, no matter how accurately you try to reproduce the conditions o  
f its original descent, whereas the program runs down precisely the s  
ame channels each t
 
754
 
Now in the case of fancy chess programs, there are various possibilit
 
755
 
If you play a game against certain programs, and then start a second g  
ame with the same moves as you made the first time, these programs will j  
ust move exactly as they did before, without any appearance of having lear  
ned anything or having any desire for vari ety. There are other programs whi  
ch have randomizing devices that will give some variety but not out of any   
deep desire. Such programs could be reset with the internal r  
andom number generator as it was the first time, and once again, the same g  
ame would ensue. Then there are other programs which do learn from t  
heir mistakes, and change their strategy depending on the outcome of a g  
ame. Such programs would not play the same game twice in a row. Of c  
ourse, you could also turn the clock back by wiping out all the changes in t  
he memory which represent learning, just as you could reset the r  
andom number generator, but that hardly seems like a friendly thing to d  
o. Besides, is there any reason to suspect that you would be able to change any   
of your own past decisions if every last detail-and that includes your b  
rain, of course-were reset to the way it was the first time around?
 
755
 
But let us return to the question of whether "choice" is an applic  
able term here. If programs are just "fancy marbles rolling down fancy hills", d  
o they make choices, or not? Of course the answer must be a subjective o  
ne, but I would say that pretty much the same considerations apply here as t  
o the marble. However, I would have to add that the appeal of using t  
he word "choice", even if it is only a convenient and evocative shorthand,   
becomes quite strong. The fact that a chess program looks ahead down t  
he various possible bifurcating paths, quite unlike a rolling marble, makes i  
t seem much more like an animate being than a square-root-of-2 p  
rogram. However, there is still no deep self-awareness here-and no sense of f  
ree w
 
755
 
Now let us go on to imagine a robot which has a repertoire of symb  
ols. This robot is placed in a T-maze. However, instead of going for the r  
eward, it is preprogrammed to go left whenever the next digit of the square root o  
f 2 is even, and to go right whenever it is odd. Now this robot is capable o  
f modeling the situation in its symbols, so it can watch itself making choices.  
Eich time the T is approached, if you were to address to the robot t  
he question, "Do you know which way you're going to turn this time?" it would   
have to answer, "No". Then in order to progress, it would activate i  
ts "decider" subroutine, which calculates the next digit of the square root of 2  
, and the decision is taken. However, the internal mechanism of the d  
ecider is unknown to the robot-it is represented in the robot's symbols merely a  
s a l;llack box which puts out "left" 's and "right" 's by some mysterious a  
nd seemingly random rule. Unless the robot's symbols are capable of p  
icking up the hidden heartbeat of the square root of 2, beating in the L's and R's,  
it will stay baffled by the "choices" which it is making. Now does this r  
obot make choices? Put yourself in that position. If you were trapped inside a  
marble rolling down a hill and were powerless to affect its path, yet c  
ould observe it with all your human intellect, would you feel that the marble's  
path involved choices? Of course not. Unless your mind is affecting t  
he outcome, it makes no difference that the symbols are p  
resen
 
756
 
So now we make a modification in our robot: we allow its symbolsincluding its self-symbol-to affect the decision that is taken. Now here i  
s an example of a program running fully under physical law, which seems t  
o get much more deeply at the essence of choice than the previous example  
s did. When the robot's own chunked concept of itself enters the scene, w  
e begin to identify with the robot, for it sounds like the kind of thing we do. I  
t is no longer like the calculation of the square root of 2, where no sym  
bols seem to be monitoring the decisions taken. To be sure, if we were to look a  
t the robot's program on a very local level, it would look quite like t  
he square-root program. Step after step is executed, and in the end "left" o  
r "right" is the output. But on a high level we can see the fact that symbols a  
re being used to model the situation and to affect the decision. That r  
adically affects our way of thinking about the program. At this stage, meaning h  
as entered this picture-the same kind of meaning as we manipulate with o  
ur own minds.
 
756
 
A Godel Vortex Where All Levels C  
ros
 
756
 
Now if some outside agent suggests 'L' as the next choice to the robot, t  
he suggestion will be picked up and channeled into the swirling mass o  
f interacting symbols. There, it will be sucked inexorably into i  
nteraction with the self-symbol, like a rowboat being pulled into a whirlpool. That is   
the vortex of the system, where all levels cross. Here, the 'L' encounters a  
Tangled Hierarchy of symbols and is passed up and down the levels. The   
self-symbol is incapable of monitoring all its internal processes, and s  
o when the actual decision emerges-'L' or 'R' or something outside the   
system-the system will not be able to say where it came from. Unlike a  
standard chess program, which does not monitor itself and consequentl  
y has no ideas about where its moves come from, this program does m  
onitor itself and does have ideas about its ideas-but it cannot monitor its o  
wn processes in complete detail, and therefore has a sort of intuitive sense of i  
ts workings, without full understanding. From this balance between selfknowledge and self-ignorance comes the feeling of free w
 
756
 
Think, for instance, of a writer who is trying to convey certain i  
deas which to him are contained in mental images. He isn't quite sure how t  
hose images fit together in his mind, and he experiments around, e  
xpressing things first one way and then another, and finally settles on some v  
ersion. But does he know where it all came from? Only in a vague sense. Much o  
f the source, like an iceberg, is deep underwater, unseen-and he k  
nows that. Or think of a music composition program, something we d  
iscussed earlier, asking when we would feel comfortable in calling it the c  
omposer rather than the tool of a human composer. Probably we would feel comfortable when self-knowledge in terms of symbols exists inside the program, and when the program has this delicate balance between self-knowledge and self-ignorance. It is irrelevant whether the system is r  
unning deterministically; what makes us call it a "choice maker" is whether we c  
an identify with a high-level description of the process which takes place when t  
h
 
757
 
FIGURE 142. Print Gallery, by M. C. Escher (lithograph, 195
 
757
 
program runs. On a low (machine language) level, the program looks l  
ike any other program; on a high (chunked) level, qualities such as "  
will", "intuition", "creativity", and "consciousness" can e
 
757
 
The important idea is that this "vortex" of self is responsible for t  
he tangledness, for the Godelian-ness, of the mental processes. People h  
avesaid to me on occasion, "This stuff with self-reference and so on is v  
eryamusing and enjoyable, but do you really think there is anything serious t  
o it?" I certainly do. I think it will eventually turn out to be at the core of A  
I, and the focus of all attempts to understand how human minds work. A  
nd that is why Godel is so deeply woven into the fabric of my b  
oo
 
758
 
An Escher Vortex Where All Levels Cros  
 
758
 
A strikingly beautiful, and yet at the same time disturbingly g  
rotesque, illustration of the cyclonic "eye" of a Tangled Hierarchy is given to us b  
y Escher in his Print Gallery (Fig. 142). What we see is a picture gallery w  
here a young man is standing, looking at a picture of a ship in the harbor of a  
small town, perhaps a Maltese town, to guess from the architecture, with i  
ts little turrets, occasional cupolas, and flat stone roofs, upon one of which s  
its a boy, relaxing in the heat, while two floors below him a woman-  
perhaps his mother-gazes out of the window from her apartment which sits directly above a picture gallery where a young man is standing, looking at a  
picture of a ship in the harbor of a small town, perhaps a Maltese townWhat!? We are back on the same level as we began, though all logic d  
ictates that we cannot be. Let us draw a diagram of what we see (Fig. 143
 
758
 
FIGURE 143. Abstract diagram o  
f M .C. Escher's Print G  
aller
 
758
 
What this diagram shows is three kinds of "in-ness". The gallery is physical  
ly in the town ("inclusion"); the town is artistically in the picture ("depiction");  
the picture is mentally in the person ("representation"). Now while t  
his diagram may seem satisfying, in fact it is arbitrary, for the number of l  
evels shown is quite arbitrary. Look below at another way of representing the t  
op half alone (Fig. 144
 
758
 
FIGURE 144. A collapsed version of t  
he previous figure.
 
759
 
We have eliminated the "town" level: conceptually it was useful, but can j  
ust as well be done without. Figure 144 looksjust like the diagram for D  
rawing Hands: a Strange Loop of two steps. The division markers are arbit  
rary, even if they seem natural to our minds. This can be further accentuated b  
y showing even more "collapsed" schematic diagrams of Print Gallery, s  
uch as that in Figure 145
 
759
 
FIGURE 145. Further collapse of Figure 143
 
759
 
This exhibits the paradox of the picture in the starkest terms. Now-if the  
picture is "inside itself", then is the young man also inside himself? T  
his question is answered in Figure 146
 
759
 
FIGURE 146. Another way of collapsing Figure 1
 
759
 
Thus, we see the young man "inside himself", in a funny sense which i  
s made up of compounding three dis.tinct senses of "
 
759
 
This diagram reminds us of the Epimenides paradox with its o  
ne-step self-reference, while the two-step diagram resembles the sentence pair e  
ach of which refers to the other. We cannot make the loop any tighter, but w  
e can open it wider, by choosing to insert any number of intermediate l  
evels, such as "picture frame", "arcade", and "building". I fwe do so, we will h  
ave many-step Strange Loops, whose diagrams are isomorphic to those o  
f Wateifall (Fig. 5) or Ascending and Descending (Fig. 6). The number o fl  
evels is determined by what we feel is "natural", which may vary according t  
o context, purpose, or frame of mind. The Central Xmaps-Dog, Cra  
b, Sloth, and Pipe-can all be seen as involving three-step Strange Loops  
; alternatively, they can all be collaps.ed into t wo-or one-step loops; t  
hen again, they can be expanded out into multistage loops. Where one perceives the levels is a matter of intuition and esthetic p  
referenc
 
759
 
Now are we, the observers of Print Gallery, also sucked into ourselves b  
y virtue of looking at it? Not really. We manage to escape that p  
articular vortex by being outside of the system. And when·we look at the picture, w  
e see things which the young man can certainly not see, such as E
 
760
 
signature, "MCE", in the central "blemish". Though the blemish seems l  
ike a defect, perhaps the defect lies in our expectations, for in fact E  
scher could not have completed that portion of the picture without being inconsistent with the rules by which he was drawing the picture. That center o  
f the whorl is-and must be-incomplete. Escher could have made it arbitrarily small, but he could not have gotten rid of it. Thus we, on the o  
utside, can know that Print Gallery is essentially incomplete-a fact which the  
young man, on the inside, can never know. Escher has thus given a p  
ictorial parable for Godel's Incompleteness Theorem. And that is why the s  
trands of Godel and Escher are so deeply interwoven in my b
 
760
 
A Bach Vortex Where All Levels Cros  
 
760
 
One cannot help being reminded, when one looks at the diagrams o  
f Strange Loops, of the Endlessly Rising Canon from the Musical Offering. A  
diagram of it would consist of six steps, as is shown in Figure 14 7. It is t  
o
 
760
 
FIGURE 147. The hexagonal modulation scheme of Bach's Endlessly Rising Canon for  
ms a true closed loo-p when Shepard tones are u
 
760
 
bad that when it returns to C, it is an octave higher rather than at the e  
xact original pitch. Astonishingly enough, it is possible to arrange for it t  
o return exactly to the starting pitch, by using what are called Shepard t  
ones, after the psychologist Roger Shepard, who discovered the idea. The principle of a Shepard-tone scale is shown in Figure 148. In words, it is this: y  
ou play parallel scales in several different octave ranges. Each note is w  
eighted independently, and as the notes rise, the weights shift. You make the t  
o
 
761
 
FIGURE 148. Two complete cycles of a Shepard tone scale, notated for pwno. The l  
oudness of each note is proportional to its area; thus, just as the top voice fades out, a new bottom v  
oice feebly enters. [Printed by Donald Byrd's program "SMUT".]
 
762
 
octave gradually fade out, while at the same time you are gradually bringing in the bottom octave. Just at the moment you would ordinarily be o  
ne octave higher, the weights have shifted precisely so as to reproduce the  
starting pitch ... Thus you can go "up and up forever", never getting a  
ny higher! You can try it at your piano. It works even better if the pitches c  
an be synthesized accurately under computer control. Then the illusion i  
s bewilderingly s
 
762
 
This wonderful musical discovery allows the Endlessly Rising Canon t  
o be played in such a way that it joins back onto itself after going "up" a  
n octave. This idea, which Scott Kim and I conceived jointly, has b  
een realized on tape, using a computer music system. The effect is v  
ery subtle-but very real. It is quite interesting that Bach himself was apparently aware, in some sense, of such scales, for in his music one can occa  
sionally find passages which roughly exploit the general principle of S  
hepard tones-for instance, about halfway through the Fantasia from the Fant  
asia and Fugue in G Minor, for o
 
762
 
In his book]. S. Bach's Musical Offering, Hans Theodore David writes:  
Throughout the Musical Offering, the reader, performer, or listener is t  
o search for the Royal theme in all its forms. The entire work, therefore, is a  
ricercar in the original, literal sense of the word. 5  
I think this is true; one cannot look deeply enough into the Musical Offer  
ing. There is always more after one thinks one knows everything. For ins  
tance, towards the very end of the Six-Part Ricercar, the one he declined t  
o improvise, Bach slyly hid his own name, split between two of the u  
pper voices. Things are going on on many levels in the Musical Offering. T  
here are tricks with notes and letters; there are ingenious variations on t  
he King's Theme; there are original kinds of canons; there are e  
xtraordinarily complex fugues; there is beauty and extreme depth of emotion; even an   
exultation in the many-leveledness of the work comes through. The Mus  
ical Offering is a fugue of fugues, a Tangled Hierarchy like those of Escher a  
nd Godel, an intellectual construction which reminds me, in ways I c  
annot express, of the beautiful many-voiced fugue of the human mind. And that i  
s why in my book the three strands of Godel, Escher, and Bach are woven i  
nto an Eternal Golden B