Mutatis Mutandis

Part I : The room we were locked in
Either mathematics is too big for the human mind, or the human mind is more than a machine.

- Kurt Gödel
Have you ever tried to think of your environment without using any words? I remember doing that once in high school, thinking about a huge tree in front of me without invoking any words in my mind like leaf, green, branches, etc. When you try to think without labels, you are basically forced to absorb the scene in front of you in infinite resolution. Each leaf has a different shade of green - well you are not even allowed to think of 'leaf' or 'shade' or 'green'. Every distinct shape that is closed in space becomes indescribably unique. It can literally give you a headache.

This experiment was my first personal proof to believe that we think in language, simply because (at least for me) thinking was impossible to handle without formalizing it. From this perspective, language serves as an internal tool for thinking, but we all know that at the same time it also serves as an external tool, a tool for communication. Although still hotly debated in the field like a chicken egg problem, conventional belief is that language primarily evolved for communicative purposes rather than as a tool for cognitive tasks.

The procedure is pretty straightforward if you are interested in language : you read Chomsky. In contrast to the traditional ideas in the field, he argues that internal language is independent of the externalization of it (the spoken language), and it evolved independently of the process of externalization (as an example, kids can come up with sign language, which is virtually identical to spoken language in its basic properties and acquisition, with no linguistic input at all [1]). Externalization of language is an peripheral aspect of internal language, whereas its core function is to provide humans the language of thought.

If the core function of language is to permit the process of thinking, then its externalization can be used as a proxi for thinking. This is at least my interpretation of the basic idea behind the Turing Test. If a machine can understand what you mean, and makes sense in its response, then you can argue that internally some thinking process should be going on.

From this point of view, it makes sense why Turing came up with his famous imitation game 1 - an intelligence test based on communication. So many things regarding human understanding could be expressed in natural language - you could describe anything in words. In his famous 1950 paper [2], he writes,

[Regarding the Imitation Game] Some other advantages of the proposed criterion may be shown up by specimen questions and answers. Thus:

Q: Please write me a sonnet on the subject of the Forth Bridge.
A: Count me out on this one. I never could write poetry.
Q: Add 34957 to 70764.
A: (Pause about 30 s and then give as answer) 105621.
Q: Do you play chess?
A: Yes.
Q: I have K at my K1, and no other pieces. You have only K at K6 and R at R1. It is your move. What do you play?
A: (After a pause of 15 s) R-R8 mate.

The question and answer method seems to be suitable for introducing almost any one of the fields of human endeavour that we wish to include.


This is a small demonstration where he shows that you can check generic human capabilities such as a random conversation, arithmetic, and chess, by the means of communication 2.

Turing believed that thinking is computational - and by constructing his test on communication, he automatically attributed the role of coding language to natural language (at least for the output). This created lot of debate on a several different levels. Many mathematicians and philosophers objected to the fundamental idea of human mind being a formal system, and the process of thinking being analogous to the process of symbolic manipulation. Although these objections are attacking the computational theory of mind in their core, reasoning behind them are different and worth distinguishing carefully.

The idea of using a language - not the natural language per se but any finite set of symbolic expressions - is fundamental to computation. Language of mathematics is the best example for that : it is simply the universal coding language, language of all languages. But in math, we know what every symbol exactly means. You cannot argue that + means multiplication to you and addition to me and division to someone else. It is the operation of addition, period. There is no room for ambiguity.

Natural language on the other hand, is full of ambiguity. In fact, Tractatus was Wittgenstein's attempt to build a programming language for thought : basically to reverse engineer the language of his own thinking, making it deterministic and formal [4]. He was trying to agree with the idea of the process of thinking having a formal language. Him failing in his attempt made him believe otherwise. He realized that there is no finite set of symbolic expressions which their meanings are implicitly defined outside their practical context, and therefore there was no guarantee to their correct application. In other words, as there was content, there was also context. Use of natural language was not a rule-following system where the words map to meanings one-to-one. The use of words were local, not universal. After admitting the defeat, he wrote the book called "Philosophical Investigations", where he said [5],
For a large class of cases — though not for all — in which we employ the word ‘meaning’ it can be defined thus: the meaning of a word is its use in the language.
Or in other words,
To understand a phrase, we might say, is to understand its use.
Turing was Wittgenstein's pupil, and they sure influenced each other a lot (their conversations during the lectures in Cambridge are actually quite interesting to read). But obviously one is a computational reductionist about mind, whereas the other was an anti-reductionist.

The other argument against reducing the human mind to a formal system was initially proposed by Kurt Gödel, called the Mathematical Objection, later followed by Roger Penrose. Gödel argued that there is an essential limitation to the ability of any formal language to model itself completely and consistently. This argument is based on Gödel's Incompleteness Theorems, and the Halting Problem, where the latter is actually proposed and addressed by Turing himself, although it is used to formalise The Mathematical Objection against Turing's reductionist approach.

The basic idea behind this objection (and the Halting problem) is that you cannot build a digital machine in mathematics that runs mathematics without crushing [4]. Gödel’s theorem shows that there are statements in arithmetic that are true, and we know are true, but their truth cannot be computed. Therefore what you can prove to be true will always depend on your set of axioms, and any set of axioms you could posit as a possible foundation for math will inevitably be incomplete.

As an example, take Goldbach's conjecture : "Every even integer greater than 2 is the sum of two prime integers." So if this statement is true, you should be able to prove it. If it is false, you should be able to find an integer that is not the sum of 2 prime numbers eventually, because a statement cannot be true and false at the same time. That would lead to inconsistency. So if you are taking infinitely many steps of computation to falsify it, you cannot show that it is false, so it must be.... true? But to prove something is true, you should be able to reduce it that statement its axioms. That is how truth is defined in mathematics : via proof. So this goes back to what Joscha Bach meant in his talk by the word crushing : If you build a digital machine to run mathematics, it will be either incomplete or inconsistent. This means that if mathematics is consistent, then there is not a complete set of axioms that all truth can be proven from. There is a gap between the truth and the proof [6].

Such 'truths' are, according to Gödel’s informal argument, accessible to the human mind in a way that essentially transcends the powers of any formal system. Gödel thought that his result demonstrated the superiority of the human mind to grasp mathematical truths compared to any formal system [7].

I would like to mention another objection proposed by Lady Lovelace here, because I think they are kind of connected. Lovelace argues that a digital computer cannot think, because it cannot originate anything on its own : "The Analytical Engine has no pretensions to originate anything. It can do whatever we know how to order it to perform". So for her, thinking requires the capacity of producing new ideas.

What I see as a common denominator in both Gödel's and Lovelace's objections is that the digital computer is not capable of expanding its own set of axioms, (or its own set of beliefs, in the sense of 'true statements'). So it is doomed to be in a loop, a loop that is defined by the engineer who coded its 'axioms', because it is constrained from being able to prove anything new outside the propositions expressible by the axioms within its own system 3.

I will discuss Turing's response to Ada Lovelace (and why I think Turing misinterpreted it) later, but first, lets focus on Turing's response to The Mathematical Objection.

To understand Turing's response, we first should draw the line between classical and constructive mathematics. We might not be able to build a machine that runs classical mathematics without crushing, but we can build a computational machine that can run all the computation. Computation is constructive mathematics, it is the part of mathematics that can be implemented.

I love the following example Joscha Bach always gives in his talks to clarify the distinction between mathematics and computation. He says [8],
In this sense, π is not a value, it's a function. You can run this function until it gives you many digits until your local sun burns out and this is it. But you can never have a computation that depends on knowing the last digit of π.
Although Gödel and Turing came to different conclusions regarding the human mind, the Halting Problem proposed by Turing, and the Mathematical objection proposed by Gödel, argue the same thing in their essence. Gödel assumed that there were no mathematical truths that human beings could not grasp in principle, and concluded that humans could not be machines. Turing turned the argument the other way around, and started with the assumption that human mind is not a mathematical machine, but a computational one. Hence humans can be inconsistent and fallible, there are truths we are incapable of proving, and there are questions we are incapable of answering. As a result, there is no point in thinking that a computational machine cannot pass the Turing test. What makes us think that our intellect is also not constrained by anything specific to us, say, our biology?

Objections mentioned above directly attack the computational theory of mind, without discussing the methodology of the Turing Test in detail. So the direction of the argument so far was 'the human mind is not a computational machine, therefore computational machines cannot think, and cannot possess 'human-like intelligence'.

Another famous objection called 'The Chinese Room Argument', proposed by John Searle, turns this proposition the other way around : 'Machines which appear to think, and therefore can pass the Turing test, can do that without having any real understanding of the natural language. Therefore the Turing Test is inadequate.'

Searle is also very much against the idea that computational machines can think, but unlike Gödel, Penrose, and Lovelace, he choses to show this through showing that the Turing test can be cheated.

The thought experiment Searle proposed is as follows : Imagine a person who only knows English and doesn't know any Chinese. He is in a room with boxes of Chinese symbols, and a rulebook in English that tells him how to match certain Chinese symbols with others. Now imagine that there is a Chinese person outside the room, writing down questions in Chinese and passes them into the room, expecting answers back. The person inside the room can take these strings of Chinese characters, use his rulebook, and can produce coherent answers, without understanding a single word of Chinese.

Illustration of the Chinese Room, John Searle depicted as the 'digital computer'.
 Illustration of the Chinese Room, John Searle depicted as the 'digital computer'.

According to Searle, digital computers will never have understanding, because they are stuck in the Chinese room. They are only syntax manipulating machines, and syntax alone is never enough for semantics. You can answer a million questions in Chinese, without knowing that the strings you receive are actually questions. But at the end of the day, form alone is not sufficient for meaning 4.

In one of his interviews [9], Chomsky makes the following comment on Google Translate, which I found really amusing. (I mean listening to Chomsky is great in general anyway, so do it whenever you find the time :D)
I use Google Parser. So, on engineering grounds, it is kind of like having a bulldozer. Does it tell you anything about human language? Zero, nothing. And in fact, it's very striking. From the very beginning it's just totally remote from science. So, what is a Google parser doing? It's taking an enormous text, let's say The Wall Street Journal corpus, and asking, how close can we come to getting the right description of every sentence in the corpus? Well, every sentence in the corpus is essentially an experiment. Each sentence that you produce is the output of an experiment, which asks "Am I a grammatical sentence?" Now, the answer is usually yes, so most of the stuff in the corpus are grammatical sentences. But now, ask yourself: is there any science which takes random experiments, which are carried out for no reason whatsoever, and tries to find out something from them? If you're a chemistry PhD student, and you want to write a thesis, can you say, "Well I'm just gonna mix a lot of things together, no purpose, and maybe I'll find something." You'd be laughed out of the department.
In conclusion, what Searle's thought experiment shows is that the Turing test only tests for weak AI (classical AI that only does symbolic calculation), not for strong AI (intelligent machines that are indistinguishable from the human mind).

Ironically, a deeper interpretation of Chinese Room argument actually supports the validity of the Turing Test to determine whether machines can think. And I think that is also why as of today no AI came close to engaging in a meaningful conversation with a human for a considerable duration.

Searle’s challenge has a flaw : cheating might be logically possible, but it is nomically impossible. Even if you consider a hypothetical language with a small alphabet, the number of possible combinations of the letters grow astronomically 5 to form words, yet alone sentences. If this thought experiment is practically impossible and cannot be realized, then it actually shows that any digital computer that posses no intelligence, and has no understanding of what it is talking about, cannot cheat the Turing Test. Lovely, no? :)

Rejecting the doctrine of computationalism inevitably forces you to introspect : What differs our mental processes from pure data processing? What is special about human cognition that assigns meanings to forms? When did we make the leap from syntax to semantics? How did we get out of the Chinese Room?

Most researchers believe that this new way of thinking and communicating started about 70000 years ago, with what Yuval Noah Harari calls the 'Cognitive Revolution' [10]. This is also when we had the evolutionary jump in our Language Capacity. The distinction is kind of important here. Each language, such as English, German, Turkish, etc., is an instantiation of the Language Capacity, which consists of an inner generative procedure (the language of thought) and an externalization procedure (mostly sounds) [11]. So what we had the evolutionary jump for was not a language, it was the language capacity of the Sapiens as a species.

I use the term evolutionary jump because the evidence suggests that the Language Capacity to be a true species property : it is universal, it is unique to humans, and it is invariant among human groups. This means that after the Language Capacity emerged, there has been little or no evolution at all since our ancestors left Africa [11].

Daniel Kahneman says [12], "There are a lot of abilities like the capacity of manipulating ideas, imagining alternative futures, imagining counterfactual things that haven't happened, and to do conditional thinking, that would have been impossible without language and without the very large brain that we have compared to others." Evidently, this jump in Language Capacity combined with the emergence of other cognitive abilities had a profound impact on our ability to think beyond the raw data presented to our sensory system. Not only we gained to ability to play around with facts, but we also gained to ability to conflict the facts, asking questions like "What if something has happened differently?". The capacity to think in counterfactuals permits us to comprehend cause-effect relationships, reflect on our past actions, and envision alternative scenarios : ingredients to the human way of thinking [13], cheatsheet to pass the Turing Test, key to the door of the Chinese Room.

Part II : Duplicating the keys
Mutatis Mutandis : with things changed that should be changed.

- A Latin phrase
If we want to have Strong AI ever, first step is to find a way to put counterfactual thinking in a machine.

Although it sounds easy for us humans, countering the facts is not an easy task for an AI. In the end, all AI has are the facts. Even the deep neural networks operate almost entirely in an associational mode, based on the correlations deducted from observed data. To counter the facts, you need to do thought experiments. Actual experiments are not enough, they are only interventions, and they will only generate more data to expand your set of facts. What you need is to selectively break the rules of the system in question, conflict the raw data you have been fed with, and create alternative realities. That is why counterfactual thinking is easy for humans : we have an understanding of the rules, and thus we are capable of breaking them. Our brains work in a cause-effect mode instead of mere correlations.

 Picture adapted from Lex Fridman Podcast #56
"Judea Pearl: Causal Reasoning, Counterfactuals, and the Path to AGI"

How do we put counterfactual thinking in a machine then?

Luckily, Judea Pearl and Dana Mackenzie wrote a whole book about the algorithmization of counterfactuals, called the 'The Book of Why: The New Science of Cause and Effect'. They propose that the language to do this are the causal models, which will provide the machine with a symbolic representation of its environment, and moreover, the capacity to imagine a hypothetical perturbation of that environment [13], just like we humans do.

I would strongly recommend to go over the whole book if you are interested in the topic, it is really a pleasure to read. But to clarify Pearl's point on why counterfactual thinking is necessary for strong AI, and why a causal model is fundamental to it, I will directly quote an example from the book below.
When my house robot turns on the vacuum cleaner while I am still asleep and I tell it, “You shouldn’t have woken me up,“ I want it to understand that the vacuuming was at fault, but I don’t want it to interpret the complaint as an instruction never to vacuum the upstairs again. It should understand what you and I perfectly understand: vacuum cleaners make noise, noise wakes people up, and that makes some people unhappy. In other words, our robot will have to understand cause-and-effect relations—in fact, counterfactual relations, such as those encoded in the phrase “You shouldn’t have.“

Indeed, observe the rich content of this short sentence of instructions. We should not need to tell the robot that the same applies to vacuum cleaning downstairs or anywhere else in the house, but not when I am awake or not at home, when the vacuum cleaner is equipped with a silencer, and so forth. Can a deep-learning program understand the richness of this instruction?
I love the part where he says "It should understand what you and I perfectly understand", because we all really do understand! We all have similar judgements in practical matters, in other words, we all have commonsense. Not only we are all able to build causal diagrams, but since we live on the same planet, we build very similar diagrams.

Going back to causality, note how the variable 'noise' plays a mediator role in this causal chain of events, vacuum cleaning causes noise causes waking up. If the robot works on a pure correlation mode, the act of vacuum cleaning and the noise will both be correlated with the result of waking up, whereas anything that is noisy will lead to the result of waking up.

Also note the efficiency in terms of representation when you have a causal model. By hypothetically playing around with the relationships between variables, you can ask a vast amount of "what if?" questions to the system. If you were to represent each of those questions with a binary code, you would need a huge table due to the combinatorial nature of the possible interventions. Our brains do not only manage to have a causal representation, but they manage to have a compact one.

Speaking of compact representation : there is a challenge called the Hutter Prize, based on the argument that 'the ability of compress information well is a proxi for intelligence'. It is an interesting one, and probably worth exploring, but in my opinion 'mere data compression' versus 'compact representation' point to different things in the context of intelligence. We also keep a lot of redundant information - including the information in our biological construct such as genes we do not express - just in case. Each package or representation might be compact in itself, but we also have redundant packages (hope I make sense).

To test the ability to do causal reasoning, Pearl suggests a simpler version of the Turing test, which he calls the mini-Turing test. The idea is to take a simple story, encode it on a machine, and then test to see if the machine can correctly answer causal questions that a human can answer [13]. In this setup, the causal diagram of events is provided by the engineer of the system. All AI needs to do is to selectively violate the rules of diagrams, given the "What if?" questions it is asked to answer.

You may think that an engineer providing the causal diagram is kind of cheating, and you are right. That is one of the reasons why the test is considered to be mini6. But funnily enough, that is actually where we stand with current technology and also with our own understanding. We ourselves don't know exactly how we establish these causal relationships. We don't know what happened during the Cognitive Revolution. All we can do is to code the output of the process of our understanding, not the process itself. This way of cheating, i.e., making the understanding of certain processes explicit to the AI, is actually a more common technique than you think.

Now the million dollar question : How many causal diagrams do you need to put in a machine to achieve strong AI? I mean think about all the basic commonsense knowledge you have on how the world works. It is gigantic!

Granted, it is never either nature or nurture, it is the combination. So as humans, we also need experience to build our commonsense knowledge. We need to observe and manipulate our environment. As an infant, you see your mum dropping your nasty diapers in the trash can. You drop your toys around and wait for someone to pick it up and give it back to you. You open your mouth and the pacifier falls down. By the end of the first 3-months of your life, you already learn that objects fall when released in midair [14].

At first glance, the solution seems obvious : we should make the physical world around us accessible to robots as well, so that they can also interact with the same environment, our environment.

The problem is, physical knowledge is only one of many dimensions of commonsense knowledge. As humans, we also do temporal, spatial, causal, qualitative, taxonomic, and psychological reasoning [15, 16]. In fact, physical reasoning is a relatively accessible piece of information among all the other types. You can make a robot drop an object until it figures out the correlation that, if it lets go of things, they will fall.

What if we wanted robots to figure out why we choose to lie under certain circumstances? I am sure you have played Secret Hitler before. It is a game of pure deception and betrayal, and can also be played online so that the interpretation of the body language is off the table. Think about the complexity of the causal and psychological reasoning going on in your mind during that game : "He openly acts like a facist because he wants to trick us into thinking that he is in fact a liberal in disguise, but he is doing this because he is in reality Hitler himself."

Gary Marcus suggests that this is indeed what we should be testing for. He calls this the Comprehension Challenge : we should be evaluating the full breadth and depth of human comprehension, not just knowledge of commonsense [17].

Think about the following image : There is a mosquito on the wall, I am standing next to the wall, following the movements of the mosquito very carefully, with my one hand tense and open in the air. Every human would understand why my hand is in the air : I am about to (or at least aiming to) kill the mosquito.

Or another example in Gary Marcus's own words [18],
Consider the example when a whole bunch of people say, "I am Spartacus!", you know, this famous scene? You know the viewers understand that everybody minus one has to be lying. They can’t all be Spartacus. We have enough commonsense knowledge to know they could not have the same name. We know that they’re lying and we can infer why they’re lying right there : lying to protect someone in to protect things they believe in. If you get a machine that can do that, that can say “This is why these guys all got up and said I am Spartacus!", I will sit down and say “AI has really achieved a lot, thank you.“
I agree, I would be truly impressed as well! What is bugging me is, how can you train a machine to learn that? How would you even represent this piece of information in a machine readable way to generate training data? Gary Marcus suggests to crowdsource the generation of questions and answers for arbitrary stories [17], like in the Spartacus example. I really love the idea, but I guess the fact that even we don't understand yet how we have such a deep comprehension kind of makes me think this will be really hard to achieve.

 Picture adapted from Lex Fridman Podcast #43
"Gary Marcus: Toward a Hybrid of Deep Learning and Symbolic AI."

If you also think this is way too ambitious, there is also a mini version of this challenge as well, which is the The Winograd Schema Challenge. It is also a test for commonsense reasoning, and has the flavour of a Turing test, but in a more measurable way. A question in this test would look like the following [19],

The trophy would not fit into the brown suitcase because it was too big/small.
What was too big/small?
Answer 0: the trophy
Answer 1: the suitcase

So the challenge here is to solve the ambiguity. But the questions are designed in a very smart way so that the machine cannot trick you into believing it has commonsense just because it has a huge training dataset based on correlations. In the example above, contexts where 'big' can appear are statistically quite similar to those where 'small' can appear, and yet the answer must change [17]. So you gotta be better than a coin toss in choosing one of the options. Quite clever, right?

When I read about this, it made me think that commonsense and language capacity are connected even deeper than I initially thought. What these questions are testing in their essence is whether the machine can handle the phenomenon of structure-dependency in language.

Chomsky always gives this famous example to explain structure-dependency [1],
Consider the sentence birds that fly instinctively swim. It is ambiguous: the adverb 'instinctively' can be associated with the preceding verb (fly instinctively) or the following one (instinctively swim). Suppose now that we extract the adverb from the sentence, forming instinctively, birds that fly swim. Now the ambiguity is resolved: the adverb is construed only with the linearly more remote but structurally closer verb swim, not the linearly closer but structurally more remote verb fly. The only possible interpretation – birds swim – is the unnatural one, but that doesn’t matter: the rules apply rigidly, independent of meaning and fact. What is puzzling, again, is that the rules ignore the simple computation of linear distance and keep to the far more complex computation of structural distance.
This structure-dependence - or 'the structure over linear distance' - principle is a true linguistic property, it is universal, and it is deeply rooted in natural language design. There is a very interesting study showing that invented languages keeping the principle of structure-dependence activate the usual language areas of the brain, whereas for much simpler systems violating this principle and using linear order instead diffuse this activation, meaning that the experimental subjects are treating these languages as a puzzle, not a language [20].

More you read on language, the connections with the different types of commonsense knowledge become more clear. Taxonomic reasoning also poses a challenge on AI, and I think it is quite related to another aspect of language of though : anti-representationalism. But first lets define clearly what representationalism is in linguistics. As the name suggests, representational doctrine of language argues that the elements of language (the lexical items) are associated with extra-mental entities. In other words, one re-presents the external world in the internal structure of the language.

Although sounds very simple, this doctrine has huge implications in terms of how we classify the outside world in our mental structure, and whether we can (even) expect an AI to do the same, at least in its current form.

I thought of a shorter example to illustrate this concept, but again, I realized that Chomsky's analogy [1] is (of course) a better way to explain the problem, so here it goes...
The conventional view is that these [lexical items] are cultural products, and that the basic ones – those used for referring to the world – are associated with extra-mental entities. This representationalist doctrine has been almost universally adopted in the modern period. The doctrine does appear to hold for animal communication: a monkey’s calls, for example, are associated with specific physical events. But the doctrine is radically false for human language, as was recognized as far back as classical Greece.

To illustrate, let’s take the first case that was discussed in pre-Socratic philosophy, the problem posed by Heraclitus: how can we cross the same river twice? To put it differently, why are two appearances understood to be two stages of the same river? Contemporary philosophers have suggested that the problem is solved by taking a river to be a four-dimensional object, but that simply restates the problem: why this object and not some different one, or none at all?

When we look into the question, puzzles abound. Suppose that the flow of the river has been reversed. It is still the same river. Suppose that what is flowing becomes 95% arsenic because of discharges from an upstream plant. It is still the same river. The same is true of other quite radical changes in the physical object. On the other hand, with very slight changes it will no longer be a river at all. If its sides are lined with fixed barriers and it is used for oil tankers, it is a canal, not a river. If its surface undergoes a slight phase change and is hardened, a line is painted down the middle, and it is used to commute to town, then it is a highway, no longer a river. Exploring the matter further, we discover that what counts as a river depends on mental acts and constructions. The same is true quite generally of even the most elementary concepts: tree, water, house, person, London, or in fact any of the basic words of human language. Unlike animals, the items of human language and thought uniformly violate the representationalist doctrine.

For such reasons, incidentally, deep-learning approaches to object-recognition, whatever their interest, cannot in principle discover the meanings of words.
This goes back to my initial argument about thinking of a tree without words. What counts as a leaf? If the shape of the leaves change, or if it's autumn and they are dry, yellow, and on the floor rather than fresh, or green and in the air, it doesn't really matter - I still think of them as 'leaves'. This gives me the power of abstracting the idea of a leaf in a Platonic way, and generalize it over different circumstances.

Stop for a second and think how it would be different if you lacked the ability to generalize : not to be able transport your knowledge about how the world works from one environment to another. This ability is mostly overlooked because it comes very naturally to us. You don't need to learn how to open the door of a truck if you know how to open the door of your toilet, and yet a person with learning deficits or cognitive dysfunctions might struggle connecting the two [21]. In short, without the ability to generalize, you would not be able to adapt to a new situation you have never seen before. You would have to train your parameters from scratch every time.

The ability to generalize is especially important in terms of safety if we want AI to be a part of our physical world. A very relevant example for this would be self-driving cars. Consider the following example Pearl gives [13],
If, for example, the programmers of a driverless car want it to react differently to new situations, they have to add those new reactions explicitly. The machine will not figure out for itself that a pedestrian with a bottle of whiskey in hand is likely to respond differently to a honking horn. This lack of flexibility and adaptability is inevitable in any system that works solely based on correlations 7.
Well, if you are not concerned about the ethical outcomes of the whole situation (but I hope you do), the car technically can learn this after being trained on a large number of examples (as current deep learning algorithms require), i.e., after driving over a million of drunk pedestrians. Even if this way of learning were to be somehow acceptable, there is another problem : it is very unlikely that the car will encounter a million of drunk pedestrians during the time it will be used. Drunk pedestrians would be the edge cases. They are possible, but not probable. It is important to distinguish the problems which hypothetically can be solved with more data before it is too late, versus the problems that can only be solved by imagination, which has not occurred yet. Humanity didn't learn to be careful around kids and alcoholics after collectively murdering a million of them. Somewhere, some time ago, the cottage slowed down when the very first drunk pedestrian in history was staggering back home.

Lack of generalization is still hiding behind the curtain for tasks which are considered to be solved already. Take the game of Go. AlphaGo lost only one game to a human in 2016 (which was 5 years ago), and that's it. It is indeed amazing, but keep in mind that both Chess and Go are closed ended problems. The combination of moves grow exponentially, but fundamentally the number of the squares on the board is finite, and the rules haven't changed for thousands of years. Gary Marcus points this out with a hypothetical conversation below [18], (which I find very amusing)
You say, "Here is a system that can play Go. It’s been trained on five million games", and then I say "Can it play on a rectangular board rather than a square board?", and you say "Well, if I retrain it from scratch and another five million games again".

That’s really really narrow. And that’s where we are. We don’t have even a system that could play Go, and then without further retraining play on a rectangular board, which any good human could do with very little problem.
Of course we don't realize that this is the case. Why would we? Why would anyone ask AlphaGo to play on a rectangular board? Well... just for the fun of it? What if after a few rounds of beers you come up with this fun idea of playing Secret Hitler with two Hitlers who do not know each other and to finish the game you have to identify them both? Isn't it amazing that we can change the rules of a game so casually and this is no problem for us? If we were to have a robot-friend at the table, we would like it to also continue playing, right?

It is also a possibility to hardcode the rules of the brand new game you just invented into your robot-friend. A few lines of if else statements... But does that really count in the context of Strong AI?

This is actually a very important distinction if you want to have a proper definition of what intelligence is. I was amazed how beautifully this distinction was defined when I first read the article "On the Measure of Intelligence" [22], by François Chollet. He argues that intelligence is the acquisition of new skills of unknown tasks. Skill is defined as the output of the process of intelligence. Coding a chess engine gives the skill of playing chess to that engine, but the intelligence is the process of developing the program that plays chess. In this case the program itself is not intelligent, the engineer who coded the program is intelligent.

Why does it feel like the program is intelligent though? When IBM’s Deep Blue defeated the chess grandmaster Gary Kasparov in 1997, most of us probably felt like human race have finally created the long-awaited super intelligent machine - and felt intimidated by it. Although it was a breathtaking achievement at the time, today we very well know that an unbeatable engine that is only capable of playing chess does not posses anything like human-level intelligence by any means. This was not the conventional perspective back in 1970s though [22]. As Allen Newell points out [23],
We know already from existing work [psychological studies on humans] that the task [chess] involves forms of reasoning and search and complex perceptual and memorial processes. For more general considerations we know that it also involves planning, evaluation, means-ends analysis and redefinition of the situation, as well as several varieties of learning – short-term, post-hoc analysis, preparatory analysis, study from books, etc.
Back in 1970s, the assumption was that playing chess would require implementing general abilities like reasoning and planning, and in fact chess does require these abilities - in humans. For a computer program though, chess does not require any of these abilities, and can be solved by taking radical shortcuts that run orthogonal to human cognition [22].

We felt intimidated simply because we unconsciously thought of how intelligent the human equivalent of Deep Blue would be.

World Chess Champion Garry Kasparov (Left) makes a move during his fourth game against the IBM Deep Blue chess computer. Credit: Stan Honda Getty Images. 

As demonstrated by the chess example, if you are solely measuring the skill for completing a specific task, your metric is masked by the combination of rules that the engineers have hardcoded, and the millions of training data points you have used. You should be measuring the skill-acquisition efficiency instead. How efficient are you in generalizing far outside the distribution of things you have already seen?

By the word 'efficiency', Chollet doesn't refer to the compute - but rather the size of the training dataset and the a priori hardcoded knowledge required to perform novel tasks - tasks that were never seen by the system before. If you require a very extensive training dataset of most possible situations that can occur in the practice of that skill, then the system is not intelligent (AlphaZero was trained on 44 million games, something around roughly 2000 to 5000 years for a human to play). Likewise, if you need a human engineer to write down a bunch of rules covering most of the possible situations, the system is just the output of a process that happens in the minds of engineers. In both cases, your system is mostly just a lookup table. Solving any given task with beyond-human level performance by leveraging either unlimited hardcoded rules or unlimited data does not bring us any closer to strong AI, whether the task is chess, football, or any e-sport [22]. We should aim for systems which can turn their experience and a priori knowledge into new skills to adapt to novel and uncertain situations, without further human intervention.

Note that for such a goal, novelty is a fundamental requirement.

I think this is what Lady Lovelace really meant when she argued “The Analytical Engine has no pretensions to originate anything. It can do whatever we know how to order it to perform“, as an objection to the Turing Test. Turing turned this argument around, and dismissed it by claiming that computers can also take us by surprise by the complexity of their output, although we are the ones who program them to begin with. I think Lovelace's objection was targeting a bigger, a more fundamental question of originality, it wasn't just a simple argument as the element of surprise. I am constantly surprised by how the output of a simple deterministic differential equation system can contradict with my intuition about the system dynamics. Maybe I am too stupid to interpret my own code correctly, or maybe my system dynamics do not represent the world that I gained my intuition in. Honestly, I don't know. But as far as Turing is concerned, why I am surprised is irrelevant. Nevertheless, I tend to agree with Lovelace's objection here. If the question is "Can machines think?", as Turing proposed in his 1950 paper, then novelty seems central to me.

Although I agree with Lovelace (and many others) on the requirement of novelty, comparing humans to machines in that domain is obviously unfair. We have been through the Cognitive Revolution, remember? It distinguished us - the Sapiens - from other human species, by equipping us with the tools necessary for language, counterfactual thinking, abstraction, generalization, imagination, and finally gave us the ability to outcompete all the other human species on planet earth. That is also why Turing's analogy of an infant-brain to an infant-machine was flawed. "Our hope is that there is so little mechanism in the child-brain that something like it can be easily programmed"[2], he said, heavily underestimating all the innate properties we have for complex cognitive tasks by birth. Children are capable of learning about a word per hour at peak periods of language acquisition on only one presentation [1], whereas Google Translate still doesn't - and with its current architecture will not be able to - understand what it is talking about after going through millions of examples.

GPT-3 is a very good example for how an advanced language model can produce amazing human-like text on demand, and yet give embarrassingly stupid answers for very simple questions at the same time. It is the largest language model generated so far, with 175 billion parameters (!) for its neural network to optimize. It can memorize the internet, and that is useful when you want to generate a collage of plausible text. But plausibility is its only constraint. It still cannot generalize, do commonsense reasoning, or be self consistent, simply because it has no understanding of the content of the text.

Another limitation is that is depends on the text generated so far, not the text to be generated. So when you ask a question that is nonsense - and thus has no reason to appear in the corpus of the web - you get the following Q&A pair below simply because sampling can prove the presence of knowledge, but not its absence [24].

Q: How many eyes does my foot have?
A: Your foot has two eyes 8. 

This might not be a huge problem when it comes to generating text on demand, such as writing stories or news articles, but it substantially limits the depth of conversation it will have with a human being, because eventually nothing original will come out of that conversation.

Amazon created a challenge regarding this issue, called The Alexa Prize Socialbot Grand Challenge. It is a challenge to advance human-computer interaction. The goal is to create chatbots that can keep a conversation going - basically keep the human interested in the conversation - for more than at least twenty minutes, and in the meantime get a good overall rating on the quality of the conversation. Unfortunately, we are still far away from this goal as of today 9.

Lex Fridman had a very spot-on comment regarding why the minimum duration (twenty minutes) requirement of this challenge is a good metric : because a deep, meaningful conversation is enjoyable, it is the one you don't want to leave [25]. So true! (From this perspective, I unfortunately know a few (!) people who cannot complete this challenge though. SAD!)

Part III : The pursuit of the locksmith
There can be no doubt that the difference between the mind of the lowest man and that of the highest animal is immense. [...] Nevertheless the difference in mind between man and the higher animals, great as it is, certainly is one of degree and not of kind.

- Charles Darwin
GPT-3 example is important, because it tells you that the main difference between a randomly initialized neural-network and the human-mind is more likely to be a 'difference in kind', rather than a 'difference in degree', as opposed to what Darwin claimed [29]. More compute and more data is always useful, but as dictated by the law of diminishing returns, it has its limits. If (almost) the whole corpus of the web and 175 billion parameters are not enough to mimic a human discussion properly, then maybe we are not born as blank as John Locke argued four centuries ago.

How can we determine what kind of innate tools we are born with, if any? Obviously, by studying infants! This is how the cognitive psychologist Elizabeth Spelke started her research on the species-specific processes underlying the development of uniquely human aspects of cognition, which took a different turn after her discoveries on our similarities with other animals' cognitive abilities. In one of her interviews, she says,
My assumption was whatever is special about us is going to reveal itself most clearly in a young human infant, because here we have human nature not yet contaminated by all the social influences and education and so forth that's going to come to affect us. And that's where we should see the essential differences between us and other animals. And I think pretty much across the board, wherever I've applied that assumption, I've learned it's wrong. And I've been surprised again and again to see that on the one hand, infants do have extremely important and useful cognitive capacities that we build on in all of our later learning. But when we compare the capacities of infants to the capacities of other animals, we see striking similarities across them. If the capacities we find in infants and animals are the seeds of our uniquely human abilities, then we can use studies of human cognitive development to ask where that uniqueness comes in and what it amounts to. And that's been a very exciting, unanticipated direction for my research.

I think we may be the only creature on Earth that can entertain truly abstract concepts, concepts that pick out entities that are very real to us, but that could never in principle be seen or acted on things like a dimensionless point in Euclidean geometry or the infinite sequence of natural numbers. These are things that come to be known by young children and that I don't think are ever known by any other animal. I think those abilities are unique to us, and I think they come from this unique propensity that we have to take everything that we know in these separate domains like number and space and objects and people, and productively combine that knowledge together to create new knowledge.
The examples she gives are worth considering in detail. The concepts of 'infinite sequence of natural numbers' or 'dimensionless point' do not exist in the physical world around us, but can be extrapolated from the first-order perceptual information coming from it. Add one banana to a banana, add one banana to two bananas, add one banana to n bananas ... : there is always a bigger natural number because you can always add one, therefore they are infinite! In reality, there is an end to bananas though (unfortunately). A non-human primate cannot do this recursive and symbolic operation of adding 1 to n in its mind, all it can do is to estimate roughly the ratio or the summation of bananas in two different trees 10. This is also what infants are capable of doing, until they learn language [26].

We are born with set of innate, ancient, and developmentally invariant cognitive systems: systems of Core Knowledge, which are largely shared across many species, in particular non-human primates [27]. So the Core Knowledge Theory on its own does not provide a solution to the 'human uniqueness' problem, simply because none of these core systems is unique to humans. What is unique to humans is the productive combination of these different core systems to generate new knowledge, and according to Spelke, this depends on the acquisition and use of a natural language. Because both the core systems and the language faculty (our species-specific language capacity as Chomsky puts it) are universal across humans, so does the concepts of natural numbers, although they are not innate but learned as we master our mother tongue [28] (Ever wondered why you still read the numbers and dates in your mother tongue when you are reading a book in another language although you are equally comfortable in both languages?).

The role of language in connecting different domains of information is still an open debate in cognitive psychology. The relationship between language and the emergence of a given cognitive ability could be causal (language is required for a skill to emerge), facultative (language speeds up a process that could occur otherwise), or only correlated [26]. Nevertheless, most scientist agree that although we share a lot of basic processes with other animals, our ability to reason about higher-order relationships is what makes us unique. Animals learn about events, categorize and act almost entirely in terms of first-order or perceptual relationships, whereas we commonly conceptualize the world and reason in terms of abstractions from first-order relationships : a process called relational reinterpretation. That is also why we do not only observe the apple falling towards the earth, bu we come up with unseen physical forces such as gravity [29].

I am not trying to praise the human race here, but rather to point out that even if we discard all the acquired knowledge we learn, there are still fundamental differences in our initial conditions versus the machines'. Ignoring these differences and expecting a human-like performance from an electronic tabula rasa which solely learns from experience (i.e., data), might be too much to ask for, and more importantly, it might deflect us from potential directions of research. If you draw the line between a human and a machine based on these innate tools, they become a cofounder to be corrected for a fair game.

This is the idea behind the Abstraction and Reasoning Corpus (ARC) proposed by François Chollet [22]. The idea is to separate the innate and acquired aspects of our cognition, and leverage AI with similar innate properties - called the priors - to fairly evaluate the general intelligence between humans and machines. These priors are based on the systems of Core Knowledge in the context of cognitive psychology, and include abstract number representations for small numbers, notions of distance and orientation, parsing the environment into distinct objects, etc. Very, very, basic stuff. Keep in mind that these priors - similar to the core knowledge systems - are not unique to humans. Language capacity is not included in these priors, and natural language itself is considered as acquired knowledge.

I think it is important to differentiate what different scientist think of the role of language in human cognition at this point to better understand why natural language constitutes the basis of intelligence evaluation for certain challenges, whereas it is excluded on purpose in others.

Both the core knowledge systems and language capacity are innate, whereas only core knowledge systems are shared by other species (Check the table below for a quick summary). So at this point one may ask (as I asked to myself) - if the idea behind the ARC is to leverage the AI with innate human properties for a fair game, why language capacity is not included in Chollet's core knowledge priors? Because for Chollet - similar to Steven Pinker - language is a layer on top of cognition. It makes the cognition useful, similar to how an operating system makes the computer useful - but the computer (human mind) exists before the operating system (language) [30]. Therefore for Chollet, the game is still fair although language capacity is excluded, and the leverage gained by the core knowledge priors should be sufficient to turn experience into novel skills efficiently.

Innate Learned
Species-specific Language capacity Examples : Natural language, concepts of natural numbers.
Shared across species Core-knowledge systems (priors) Example : Tool use

Summary of innate vs. acquired, shared vs. species-specific properties. 

Tasks to test this hypothesis require only core knowledge priors (no acquired knowledge that can leverage the human against the AI) and limited (very limited compared to what neural networks require) amount of training data. An example task (which mostly requires basic geometry, topology, and objectness priors) is illustrated below [22].

A task where the implicit goal is to complete a symmetrical pattern. The nature of the task is specified by three input/output examples. The test-taker must generate the output grid corresponding to the input grid of the test input (bottom right) [22]. 

Looks familiar, right? Pretty much like a question from an IQ test! In fact these tasks are inspired by the Raven's Progressive Matrices [22], which is the format of non-verbal questions in an IQ test, aimed to measure general human intelligence and abstract reasoning 11.

If you wanna challenge your AI with ARC, there is a training set including 400, and an evaluation set including 600 tasks. The trick is, all tasks are unique, each task consists of a very small number of demonstration examples (the task above had 3, for the whole corpus it is 3.3 on average), and the set of test tasks and the set of training tasks are disjoint [22]. So there is no way to artificially buy performance by sampling millions of training data or by including additional hardcoded rules based on the training set.

 Picture adapted from Lex Fridman Podcast #120
"François Chollet: Measures of Intelligence."


And, what is the result? You can check the leaderboard on Kaggle, where the score indicates the error rate. The top score so far is 0.794 - in other words, ~20% success rate, whereas it can be fully solved by humans. Seems like ARC is not approachable by any existing machine learning technique (including Deep Learning), due to its focus on broad generalization and few-shot learning (practice of feeding a learning model with a very small amount of training data), as well as the fact that the evaluation set only features tasks that do not appear in the training set [22].

I think this whole challenge is a great way to gain more insight about how we learn, and how our minds work. From that perspective, it is really exciting to me that we can merge concepts from cognitive psychology and AI research into an alternative definition of intelligence which can be quantified. It would be interesting to investigate further why the success rate is so low although core knowledge priors are explicitly described. Do we miss some other systems of the core knowledge necessary for completing such tasks? Or, is it an AI-specific language capacity that is necessary?

Currently the language of AI (and any kind of computer program) involves manipulating amodel symbols [31] - meaning that the symbolic representations of the cognitive states are nonperceptual, and their internal structures has no correspondence to the perceptual states that produced them [32]. They represent the substance of the cognitive state rather than its form. So an amodal system converts a perceptual state into a representational structure such as a feature lists, a semantic network, etc. - similar to dimensionality reduction in a sense. Check the figure below for a better explanation.




Modal representations, as opposed to amodal representations 
(adopted from [31] and [32]). 

I would like to directly quote from Lawrence W. Barsalou's paper titled 'Perceptual symbol systems' [32], because honestly it is so nicely explained, I cannot find better words to put it.
Because the symbols in these symbol systems are amodal, they are linked arbitrarily to the perceptual states that produce them. Similarly to how words typically have arbitrary relations to entities in the world, amodal symbols have arbitrary relations to perceptual states. Just as the word 'chair' has no systematic similarity to physical chairs, the amodal symbol for chair has no systematic similarity to perceived chairs. As a consequence, similarities between amodal symbols are not related systematically to similarities between their perceptual states, which is again analogous to how similarities between words are not related systematically to similarities between their referents. [...] Because the processing of amodal symbols is usually assumed to be entirely syntactic (based on form and not meaning), how could such a system have any sense of what its computations are about?
This is exactly what Chomsky was talking about when he said "Items of human language and thought violate the representationalist doctrine." On a similar note, this is also why the syntax vs semantics problem, or the Chinese Room Argument arises.

Joscha Bach mentions this problem in his conference paper called 'Seven Principles of Synthetic Intelligence' [31] (and also his dissertation, which makes me feel horribly uncreative when I consider my dissertation). He argues that AI systems will probably have to be perceptual symbol systems instead of amodal symbol systems, meaning that the components of their representations will have to be spelled out in a language that captures the richness, fluidity, heterogeneity and affordance orientation of perceptual and imaginary content [31].

The requirements discussed so far (such as abstraction, reasoning, etc.) are crucial for strong AI to achieve a given goal. Bach takes a step further in this article, and argues that integration of goal-setting itself is also crucial if we aim for strong AI [31].
General intelligence is not only the ability to reach a given goal (and usually, there is some very specialized, but non-intelligent way to reach a singular fixed goal, such as winning a game of chess), but includes the setting of novel goals, and most important of all, about exploration.
For Bach, we are not only goal-directed systems, but we are also goal-finding systems [8].

 Picture adapted from Lex Fridman Podcast #101
"Joscha Bach: Artificial Consciousness and the Nature of Reality."

This idea goes into the direction of meta-learning, or the third level of AI as Bach puts it. The first level was the classical (or symbolic) AI, which worked by identifying a problem, and writing an algorithm to implement its possible solution, like the initial chess engines. Now we reached to the second level of AI (Deep Learning), where we write an algorithm that automates the search for an algorithm that implements the solution, like playing Go. Third level would be meta-learning, where we look for algorithms that automates the search for learning algorithms [33] - so the leap from level 2 to level 3 would be learning how to learn. According to Bach, our brain is not a learning system, but a meta-learning system, where each neuron is an individual reinforcement learning agent, motivated to get fed [34].


Three Levels of AI as demonstrated by Joscha Bach (adopted from [4]). 

In the context of modeling human cognition, the idea of meta-learning presumes an agent that actively learns how to adapt to a certain environment. From this perspective, cognition cannot be investigated in an isolated way : a general theory of thinking and problem solving must incorporate the influences of motives and emotions. More importantly, all these mechanisms are connected - the modular functions we observe as parts of our cognition are all the result of a one general learning system only [33]. In other words, our minds are not a collection of a bunch of classifiers, but they represent one single function [4].

Technically speaking, our architectures should shift from modularized models and a set of combined functions - i.e., compositional function approximations such as Deep Learning - to a broad, general function approximation that is driven by a set of rewards [33].

To summarize Bach's ideas so far : the intelligent agent should to regulate its behavior based on a set of rewards generated by its needs (like a neuron needing food and being rewarded by being fed), and should learn how to regulate its behavior it based on these rewards (like a neuron learning when to fire up to get fed). So the architecture should be the combination of a mind emerging over a set of regulation problems, and a learning paradigm.

This is where it gets more interesting. Obviously, as Chomsky mentioned, you don't do random experiments in the lab to get your PhD. Your experiments are directed towards an hypothesis - and if you have unexpected results, and you observe inconsistencies with that hypothesis in your experimental outcomes, you focus on the parts that constitute these inconsistencies to explore them further in particular. In other words, your learning paradigm is not random - but attention based : it pays attention to the parts that needs to be fixed. In terms of machine learning, attention based learning narrowly targets a single variable instead of several layers of control - and converges way faster than stochastic gradient descent.

The 'speed of learning' concerned Turing as well back in 1950. He wrote [2],
We cannot expect to find a good child-machine at the first attempt. One must experiment with teaching one such machine and see how well it learns. One can then try another and see if it is better or worse. There is an obvious connection between this process and evolution, by the identifications Structure of the child machine = Hereditary material Changes of the child machine = Mutations Natural selection = Judgment of the experimenter One may hope, however, that this process will be more expeditious than evolution. The survival of the fittest is a slow method for measuring advantages. The experimenter, by the exercise of intelligence, should be able to speed it up. Equally important is the fact that he is not restricted to random mutations. If he can trace a cause for some weakness he can probably think of the kind of mutation which will improve it. [...] It must be given some tuition.
Interestingly enough, Turing envisioned a child machine that can become intelligent, but not intelligent enough to selectively learn - that process was still attributed to the experimenter, who can speed up the evolutionary process. It is analogous to how we get tutored when we are kids, so in a way, it makes sense - but as far as I understand, Turing never mentions an 'adult' machine that learns how and what to learn, and can speed this process up by itself.

If the attention-based learning is the learning paradigm we are looking for, what is the motivation - or the need - to pursue it?

This is where Bach's way of constructing his AI architecture - called MicroPsi - connects to Dietrich Dörner's Psi Theory [35], which is a psychological theory of human action regulation (basically how humans regulate their behavior through the interaction of cognitive processes, emotions, and motivations). This theory links together certain concepts like needs, goals, motives, rewards, etc. in a unified way and provides the suitable psychological basis for what Bach aimed with building MicroPsi.

According to Psi Theory, we have two cognitive needs related to information and knowledge : the need for certainty related to unpredictability of the environment, and the need for competence related to inefficacy or incapability of coping with problems [35]. So if the world doesn't behave in the way we expect it to, then the need for certainty arises in Psi. If uncertainty reduction becomes the actual motive, Psi starts to explore this inconsistency to reduce this uncertainty [35]. Similarly, the need for competence is satisfied when Psi changes either an aspect of itself or its environment - such as learning how to regulate your hearth beat - something that you don't pay attention to - unless you have arrhythmia [8]. As a result, we continuously extend and modify our model of the universe by selectively fixing the sources of our lack of certainty and lack of competence.

MicroPsi, as other cognitive models proposed in the post, is work in progress. But I really appreciate the fact that it provides a unified framework to cognition combined with other dimensions of the human condition. Since so many other approaches take a more modular view of the mind, observing how MicroPsi develops in comparison will be really interesting to watch.

Part VI : Reflections
I'm sorry, Dave. I'm afraid I can't do that.

- HAL 9000
I like being confused. It is an anxious state that you want to get out, and to do so you have to find a narrative to convince yourself into a coherent resolution. In fact you can see how confused I am from the network I made here, where I tried to connect the concepts, challenges, and the 'thinkers' mentioned in this post. It is also weirdly comforting, because you know that if you are that confused, it probably cannot get worse, but only will get better. All you need is to push yourself harder to figure it out. An annoying but fun antidote to boredom :)

Since Turing Test was proposed in 1950, the definition and evaluation of intelligence always had components regarding human-like intelligence, if not fully human-centric. This is not a surprise since we would probably be incapable of comprehending what a non-human intelligence would be like, and likewise we cannot evaluate it.

But still, there are components of human intelligence that we fail to reproduce fully in AI : deducing causal relationships, abstraction, generalization, commonsense reasoning, etc. I am using these terms without loss of generality - commonsense reasoning doesn't have to be the understanding of human reality per se, but how the world and the community you belong to operates in general - or generalization doesn't have to apply to tasks that humans are capable of, but a set of tasks that are likely to occur in all possible environments you will encounter as an entity. In that sense, we are also limited at generalizing if you think about it - because we are limited in knowing what else can come up at another corner of the universe. Still, we manage to generalize pretty well to the situations arising within our own bubble.

There is no such thing as Artificial General Intelligence. There may be such a thing as human-level AI. But human intelligence is nowhere near general.

- Yann LeCun
No matter how you look at it, all these concepts are somewhat connected to our species-specific language capacity. But I think the expressions we produce using natural language - basically the output we have to have for (somewhat) effective communication - are still a compressed version of the (whatever kind of) language is used for the process of thinking. Think of it like doing all the calculations using 10-dimensional matrices but presenting the output as a 1-dimensional array. This compression causes information loss on the way, and therefore we are never fully understood by the person we are communicating to (only if telepathy was possible ...). In my opinion, Wittgenstein meant the same thing when he argued that the meaning of words are in their use, and that makes the 'meaning' circumstantial. The idea of using a richer, a higher dimensional symbol system (like the perceptual symbol systems proposed by Barsalou and Bach) might be the 'spark' that philosophers and scientists are missing when they argue that the mind is more than mere computation. In the end, we also invent new words for thoughts that didn't correspond to anything in natural language before - results of higher dimensional operations that correspond to a new value when projected on the lower dimensional space of natural language? Maybe this analogy might also resolve Lovelace's argument on novelty and imagination - it might be possible to do more operations with a richer language (or in a higher dimensional space) that can provide an abstraction of sorts, leading to the birth of original ideas.

One thought experiment I want to propose regarding this is whether an AI in its current form can come up with the concept of infinity. Recursive operation is something that the simplest computer program is capable of, and there is no need of a richer language to add one more "one" to a number. But how exactly did the idea of "there is no end to this" originated? I doubt that it was initiated in the context of mathematics to be honest, but rather has a theological basis. I was not able to find a good source on which predates the other, and I guess even if AI can come up with the concept of infinity, it would be out of the need of solving complex problems in theoretical physics or so (as our need to explain the universe with God and its infinite existence). Of course another problem is to have a proper way of communication to check that it came up with a concept that matches to the concept of infinity (not the word) we have , etc, etc. In any case, this inter-contextual mapping of ideas seems essential to novelty, and pretty interesting to think about.

Is human mind a computational machine? Maybe not as how computation is done today - with digital computers, first order binary logic, amodal symbol systems. But to me, that doesn't mean that thinking is not computational given that we fail to emulate it fully with our current modeling approaches. We started with a 3-dimensional model of the universe, until we added time as the 4th dimension to explain further phenomena (and we keep adding and adding as we end up with 10-dimensions for the String Theory). My point is : models work until they don't. But if we are aware that a hypothesis can be modelled in various ways, and one of the candidate models results in the rejection of that hypothesis, that is not sufficient - at least in my opinion - to reject that hypothesis altogether.

Another useful paradigm shift can be putting the technological constraints back into the picture - at least imposing them hypothetically. With current technology, it is possible to train an engine with millions of training data, and it is also easy. So why not try more, and more, and more? No doubt why the direction of the wishful thinking is towards more compute and more data being the solution to the intelligence problem. But we know that learning should be possible without the exhaustion of such, because we know at least we can do it. There is a very nice excerpt I would like to share here from Rollo May's book, 'The Courage to Create' [36]. He writes about a panel discussion he attended in New York on the human prospect.
The audience was made up of seven or eight hundred eager individuals, expectantly set for an interesting discussion at the very least. In his opening remarks, the chairman emphasized the theme that "the possibilities of the human being are unlimited."

But strange to say, there seemed, as the meeting went on, to be no problems to discuss. The vast vacuum filling the room was felt by both the panel and the audience alike. All the exciting issues that the participants on the panel had approached so eagerly had mysteriously vanished. As the discussion limped along to the end of an almost fruitless evening, the common question seemed to be: What had gone wrong?

I propose that the statement, "human possibilities are unlimited" is de-energizing. If you take it at face value, there is no real problem anymore. You can only stand up and sing hallelujah and then go home. Every problem will sooner or later be overcome by these unlimited possibilities; there remain only temporary difficulties that will go away of their own accord when the time comes. Contrary to the chairman’s intention, statements like his actually terrorize the listener: it is like putting someone into a canoe and pushing him out into the Atlantic toward England with the cheery comment, “The sky’s the limit.” The canoer is only too aware of the fact that an inescapably real limit is also the bottom of the ocean.

In these notes I shall explore the hypothesis that limits are not only unavoidable in human life, they are also valuable. I shall also discuss the phenomenon that creativity itself requires limits, for the creative act arises out of the struggle of human beings with and against that which limits them.
There is real value in constraints when it comes to problem solving. That is also what I love about the ARC tasks and the concept of attention-based learning. Both arise from the idea that efficiency is essential to intelligence - we neither have the time nor the energy to try every possible way of moving in 3-dimensional space when we are learning how to walk.

Complexity of the human cognition (and the fact that I am in love with Turing) aside, I think the Turing Test - and other challenges that are more human and natural language centric - are still useful research directions to pursue. In the end, we want AI to solve the theory of everything, but we also want it to play Secret Hitler with us. Siri and Alexa should be more than devices that can set up meetings or entertain us when we ask nonsense questions to them. But anything that will bring AI to the level of comprehension we have necessitates human interaction. You cannot interpret the scenes from Spartacus if you were to be born in a society with different moral values. As Lex Fridman proposes, we should extend what Turing proposed into an Ex Machina Turing Test, where the test itself is not a one-time conversation, but the whole course of the evolution of the human - AI relationship. I am really looking forward to see such revolutionary steps in the field of AI, and I really, really hope that some of it happens within my lifetime.

Footnotes

1 We all think of the imitation game as a machine imitating a human, but the original test Turing proposed in his 1950 paper was actually sex-biased, and in my opinion, way better designed. In the original test, there is one man, and one computer, both trying to trick the judge into being a woman - something that both the man and the machine is not. So the original test actually a controlled experiment - it compares the respective imitation capabilities of a machine versus a human, in tricking another human to make him believe that they are something that they are not.[back to text]


2 The idea that the ability to use language is the hallmark of a thinking being has a long history. Descartes famously declared that it is "not conceivable that... a machine should produce different arrangements of words so as to give an appropriately meaningful answer to whatever is said in its presence, as the dullest of men can do." [2] On the other hand, language of thought is potentially misleading, since it suggests a non-existent restriction to higher-level mental activity [3].[back to text]


3 For those who are interested, this is called the Lucas-Penrose Constraint.[back to text]


4 Note the definition of 'digital' though. Searle doesn't argue that we can never build 'a machine' with understanding. He says a digital computers will never have understanding because they are only syntax manipulating machines, and syntax alone is never enough for semantics. This is called the 'The Many Mansions Reply' : The problems raised by the Chinese Room argument only exist because of the present state of technology. Someday we'll be able to build devices that reproduce the causal processes involved in intentionality. At that time we will be able to explain it. For more details on the replies to the Chinese room argument, check the following link. [back to text]


5 Chomsky calls this phenomenon of astronomical growth 'The Galilean Challenge', referring to the Galileo's quote on the nature of human language : “from 25 or 30 letters an infinite variety of expressions, which although not having any resemblance in themselves to that which passes through our minds, nevertheless do not fail to reveal all of the secrets of the mind, and to make intelligible to others who cannot penetrate into the mind all that we conceive and all of the diverse movements of our souls.“ [1][back to text]


6This test is called "mini" also because it only tests for the ability of causal reasoning, excluding other cognitive abilities of humans.[back to text]


7"solely based on correlations" is used instead of "the Ladder of Causation" here, since I haven't mentioned the latter concept.[back to text]


8Researchers claim that this can be partially fixed with the "yo be real" prompt, but it is still far from being perfect. [back to text]


9 There is no end to talking about what GPT-3 can and can't do, so if you want to look further into this, here are some great blog posts about conversations with GPT-3 : LINK#1, LINK#2, LINK#3.[back to text]


10For those who are interested, this is called the Approximate Number System (ANS).[back to text]


11I used the term 'general intelligence' pretty recklessly here, but there is a huge subfield of psychology, called psychometrics, that tries to understand the structure of human intelligence, and the term 'general intelligence', or its measure 'the g-factor', are constructs summarizing the positive correlations among different cognitive tasks, reflecting the fact that an individual's performance on one type of cognitive task tends to be comparable to that person's performance on other kinds of cognitive tasks. This assumption is pretty relevant to testing an AI's ability to generalize rather than its performance on single tasks. This assumption is also the reason why we attribute human-like intelligence to a chess engine, as mentioned in the Deep Blue example.[back to text]


References

[1] Chomsky, Noam. "The galilean challenge." Inference: International Review of Science 1 (2017): 1-13.
[2] Turing, Alan M. "Computing machinery and intelligence." In Parsing the turing test, pp. 23-65. Springer, Dordrecht, 2009.
[3] Rescorla, Michael, "The Computational Theory of Mind", The Stanford Encyclopedia of Philosophy (Fall 2020 Edition), Edward N. Zalta (ed.), URL.
[4] Bach, Joscha, "Lecture: The Ghost in the Machine", 35th Chaos Communication Congress, URL.
[5] Wittgenstein, Ludwig. Philosophical investigations. John Wiley & Sons, 2009.
[6] Numberphile, Gödel's Incompleteness Theorem URL.
[7] Livingston, Paul M. "Wittgenstein, Turing, and the “Finitude“ of Language." Linguistic and Philosophical Investigations 9 (2010): 215-247.
[8] Simulation #409, "Dr. Joscha Bach - Conscious Machines", URL.
[9] Lex Fridman Podcast #53, "Noam Chomsky: Language, Cognition, and Deep Learning", URL.
[10] Harari, Yuval Noah. Sapiens: A brief history of humankind. Random House, 2014.
[11] Chomsky, Noam. "The language capacity: architecture and evolution." Psychonomic bulletin & review 24, no. 1 (2017): 200-203.
[12] Lex Fridman Podcast #65, "Daniel Kahneman: Thinking Fast and Slow, Deep Learning, and AI", URL.
[13] Pearl, Judea, and Dana Mackenzie. The book of why: the new science of cause and effect. Basic Books, 2018.
[14] Baillargeon, Renee. "Physical reasoning in infancy." The cognitive neurosciences (1995): 181-204.
[15] Davis, Ernest, and Gary Marcus. "Commonsense reasoning and commonsense knowledge in artificial intelligence." Communications of the ACM 58, no. 9 (2015): 92-103.
[16] Marcus, Gary. "The next decade in AI: four steps towards robust artificial intelligence." arXiv preprint arXiv:2002.06177 (2020).
[17] Paritosh, Praveen, and Gary Marcus. "Toward a comprehension challenge, using crowdsourcing as a tool." AI Magazine 37, no. 1 (2016): 23-30.
[18] Lex Fridman Podcast #43, "Gary Marcus: Toward a Hybrid of Deep Learning and Symbolic AI", URL.
[19] Sakaguchi, Keisuke, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. "Winogrande: An adversarial winograd schema challenge at scale." In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8732-8740. 2020.
[20] Musso, Mariacristina, Andrea Moro, Volkmar Glauche, Michel Rijntjes, Jürgen Reichenbach, Christian Büchel, and Cornelius Weiller. "Broca's area and the language instinct." Nature neuroscience 6, no. 7 (2003): 774-781.
[21] Harris, Teka J., "Expert Columns: Generalization", URL.
[22] Chollet, François. "On the measure of intelligence." arXiv preprint arXiv:1911.01547 (2019).
[23] Newell, Allen. "You can't play 20 questions with nature and win: Projective comments on the papers of this symposium.", 1973.
[24] GPT-3 Creative Fiction, URL.
[25] Lex Fridman Podcast Turing Test: Can Machines Think?, URL.
[26] Rosati, Alexandra G., Victoria Wobber, Kelly Hughes, and Laurie R. Santos. "Comparative developmental psychology: How is human cognitive development unique?." Evolutionary Psychology 12, no. 2 (2014): 147470491401200211.
[27] Spelke, Elizabeth S., and Katherine D. Kinzler. "Core knowledge." Developmental science 10, no. 1 (2007): 89-96.
[28] Spelke, Elizabeth S. "Core knowledge, language, and number." Language Learning and Development 13, no. 2 (2017): 147-170.
[29] Shettleworth, Sara J. "Modularity, comparative cognition and human uniqueness." Philosophical Transactions of the Royal Society B: Biological Sciences 367, no. 1603 (2012): 2794-2802.
[30] Lex Fridman Podcast #120, "François Chollet: Measures of Intelligence", URL.
[31] Bach, J. "Seven principles of synthetic intelligence in Artificial general intelligence 2008 Proceedings of the First AGI Conference Wang P, Goertzel B, Franklin S." (2008): 63-74.
[32] Barsalou, Lawrence W. "Perceptual symbol systems." Behavioral and brain sciences 22, no. 4 (1999): 577-660.
[33] Bach, Joscha. "Attention Based Learning as a Foundation for Conscious Agents."
[34] Lex Fridman Podcast #101, "Joscha Bach: Artificial Consciousness and the Nature of Reality", URL.
[35] Dörner, Dietrich, and C. Dominik Güss. "PSI: A computational architecture of cognition, motivation, and emotion." Review of General Psychology 17, no. 3 (2013): 297-317.
[36] May, Rollo. The courage to create. WW Norton & Company, 1994.