I come at this from a somewhat knowledgeable perspective since I spent a number of years working in "Artificial Intelligence" or "AI." In the early/mid '80's industry spawned a tiny AI bubble (by today's standards): the age of the "expert system" and through some interesting happenstance, I found myself in the middle of it; the company was called Intellicorp and it was one of the "big 3" of the burgeoning AI industry. Our goal was to take software to the next step, building systems that facilitated and augmented expert reasoning about specific kinds of problems in manufacturing, finance, power plant operation and even molecular biology.
AI has been a part of computer science (CS) for many years, one might even say from the beginning. A machine that (who?) thinks has been the holy grail of CS. Even in the 50's it was posited that a machine showing the intelligence and problem solving ability of a human was just around the corner - perhaps a within a decade. But two interesting things happened. Scientists began to really try to tackle common problem solving computationally and also began to make progress in understanding how the brain works. In both of these gigantic disciplines one thing became clear - the problems were a whole lot more complex than anyone thought.
The early strategy of AI was to build systems that exhibited the ability to reason. But what is "reasoning" in the context of a computer system? When most people thought of reasoning they thought about rules of good judgement or "heuristics." Heuristics were generally grounded in formal logic systems that could be proven mathematically to generate sound conclusions based on a set of facts (represented in a well defined way) and an "inference engine" that embodied the formal system. Feed a system enough facts and rules to operate on them and you will get good answers about those facts and relations out the other end. And this stuff worked, at least to a degree. How did this stuff differ from regular computer programs? The most significant difference was that the facts and rules were kept separate from the engine that manipulated that data. That meant it was possible to add to a system's knowledge base without touching the underlying code. There were other differences of course but in the end everything depended on building these "knowledgebases." When a knowledgebase was built to describe a particular domain or part of the world it was called an "ontology." And some elaborate ontologies were built.
What we did at Intellicorp was to follow this path of AI. We built a software "platform" which can be likened to a computational workshop filled with interesting and powerful software tools designed to facilitate the capturing and encoding of human knowledge in a way that allowed the computer to manipulate that knowledge logically. Because of its flexibility we were able to build systems that, even if they didn't replace expert judgement, assisted experts in doing their job. So in working with a large bank in NY for example, we built a system that enabled the arbitrageur to see news feeds with settable alarms that watched for events relevant to their trading. All this may seem ho-hum now, but remember this was pre-www (1985 or so).
It seemed that reasoning systems were ready to take over the world. But the more we built these systems the more we realized their limitations. They were termed "brittle" because they fell apart quickly when problems were presented that fell outside the system's knowledge base. They simply had to be taught everything and that is not only laborious but is simply impossible. The problem even had a name: the knowledge acquisition problem. A few people held onto the hope that the brittleness of these systems could be addressed by addressing the problem of commonsense reasoning. That's what you and I do all the time. Some thought that it was possible to encode this commonsense knowledge as logical statements that could be called into play when needed. These were statements like "when a person is dead they tend to stay dead" or "a person's date of birth must precede their date of death." There are obviously a lot of such statements and not too many people saw hope formally capturing and using these sentences. Symbolic AI, that is systems grounded in formal logic, began to languish and fall from favor. The AI we knew was over-hyped by its proponents and an AI winter fell.
Though the bubble burst and the major AI players either shut down or became something else, the software engineering world learned a lot from these early experiments and embedded "expert systems," not unlike those we built are part of many online systems. Still, there was this nagging hope among researchers that we could build an intelligent system - one that could answer many different kinds of questions in a knowledgeable way. Moreover, we could build a real artificial intelligence - one that knew what it didn't know as well as what it knew. How?
But during the AI winter researchers were looking at the problem of reasoning a bit differently. Instead of trying to represent knowledge formally using logic, researchers began to apply statistical and other mathematical approaches to looking for patterns in large bodies of data. Machine learning algorithms were being intensively studied and developed. They were thought to be a solution to the "knowledge acquisition" problem. The algorithms could learn by example. What does that mean?
Think of the problem of visual reasoning - I see a picture of a face and recognize it. How can we encode knowledge in a logical form for what makes a particular face look like it does (even under different lighting and perspective!)? Tough problem until machine learning algorithms (and their implementation through neural nets, hidden Markov models, etc) came on the scene. Now I could "train" an algorithm with lots of examples of faces and then, after enough training, test the system with an image it had never seen before and ask to see similar faces. And it worked - often really, really well.
Now the question was how does this approach generalize to different cases. Again, success. It generalizes very well and the techniques of reasoning through machine learning algorithms have been employed in just about every discipline. And in Jeopardy, these algorithms were going to change the game again. But in Jeopardy we were in a nearly wide-open world of knowledge. This would be a real challenge.
I don't know what part of Watson employed machine learning but I suspect this was the underlying approach for all of Watson's reasoning. The beauty of these algorithms is that they can get better with time all by themselves, without significant intervention. (Of course it isn't as simple as that. As with most things the devil is in the details.) Watson can get smarter on its own - by simply playing Jeopardy and noticing when it is wrong and what the right answer is. Again, it has the capacity to generalize.
All of this makes me ask questions about how we work. Is there any similarity to how Watson works? I really hesitate drawing analogies between silicon/software and neurons although there are interesting parallels. But one extraordinary ability we have is to explain our reasoning about things. And I believe humans have invented a system (formal logic) that works for explaining things as if that is how we originally conceive of the ideas. But I don't think it is at least not entirely.
So, if you asked Watson how it arrived at a particular answer I'd bet all it could say is that the sparse n-dimensional matrix representing the question came close to a similar measure for some set of words in a large search space. A human might invent "because the question implied that the answer would be a 19th century european painter and the question used the words "insanity" and "sunflower." Van Gogh was an insane 19th century european painter who painted a field of sunflowers. Even though they probably just "saw" Van Gogh in their "mind's eye." Logic and facts are great for explanation but we probably don't reason with those tools. We have evolved to "see patterns."
But is intelligence all pattern recognition? Planning is something that does not fall under the umbrella of pattern recognition. For example, I would like to design an artificial intelligence that performed the tasks of a travel agent. I would like to be able to tell it to construct an itinerary for a 2 week trip somewhere in Europe for me and my wife. Could you build such an agent using machine learning? I don't think so. How would an intelligent system determine when to go? That would require knowing something about future commitments. Determining the "when to go" part is a reflection of what you want to do, itself impling knowing something about my interests. The point is intelligence often requires knowing how to choreograph a set of actions to satisfy a goal and how to re-plan if some things go awry along the way. In AI that is called "planning." Machine learning does not do planning but people do.
February 22, 2011
But is intelligence all pattern recognition? Planning is something that does not fall under the umbrella of pattern recognition. For example, I would like to design an artificial intelligence that performed the tasks of a travel agent. I would like to be able to tell it to construct an itinerary for a 2 week trip somewhere in Europe for me and my wife. Could you build such an agent using machine learning? I don't think so. How would an intelligent system determine when to go? That would require knowing something about future commitments. Determining the "when to go" part is a reflection of what you want to do, itself impling knowing something about my interests. The point is intelligence often requires knowing how to choreograph a set of actions to satisfy a goal and how to re-plan if some things go awry along the way. In AI that is called "planning." Machine learning does not do planning but people do.
February 22, 2011