Source: Nautilus, Sep 2016
Having built a new intelligence from scratch, scientists are now taking it apart, applying to these virtual organisms the digital equivalents of a microscope and scalpel.
To generate those pictures, Yosinski’s procedure relies on a statistical constraint (called a natural image prior) that confines it to producing images that match the sorts of structure that one finds in pictures of real-world objects. When he removes those rules, the toolkit still settles on an image that it labels with maximum confidence, but that image is pure static. In fact, Yosinski has shown that in many cases, the majority of images that AlexNet neurons prefer appear to humans as static. He readily admits that “it’s pretty easy to figure out how to make the networks say something extreme.”
In 1996, Adrian Thompson of Sussex University used software to design a circuit by applying techniques similar to those that train deep networks today. The circuit was to perform a straightforward task: discriminate between two audio tones. After thousands of iterations, shuffling and rearranging circuit components, the software found a configuration that performed the task nearly perfectly.
Thompson was surprised, however, to discover that the circuit used fewer components than any human engineer would have used—including several that were not physically connected to the rest, and yet were somehow still necessary for the circuit to work properly.
He took to dissecting the circuit. After several experiments, he learned that its success exploited subtle electromagnetic interference between adjacent components. The disconnected elements influenced the circuit by causing small fluctuations in local electrical fields. Human engineers usually guard against these interactions, because they are unpredictable. Sure enough, when Thompson copied the same circuit layout to another batch of components—or even changed the ambient temperature—it failed completely.
The circuit exhibited a hallmark feature of trained machines: They are as compact and simplified as they can be, exquisitely well suited to their environment—and ill-adapted to any other. They pick up on patterns invisible to their engineers; but can’t know which of those patterns exist nowhere else. Machine learning researchers go to great lengths to avoid this phenomenon, called “overfitting,” but as these algorithms are used in more and more dynamic situations, their brittleness will inevitably be exposed.
Arora points to two problems that could represent hard limits on the capabilities of machines in the absence of interpretability. One is “composability”—when the task at hand involves many different decisions (as with Go, or self-driving cars), networks can’t efficiently learn which are responsible for a failure. “Usually when we design things, we understand the different components and then we put them together,” he says. This allows humans to adjust components that aren’t appropriate for a given environment.
The other problem with leaving interpretability unsolved is what Arora calls “domain adaptability”—the ability to flexibly apply knowledge learned in one setting to another. This is a task human learners do very well, but machines can fail at in surprising ways. Arora describes how programs can be catastrophically incapable of adjusting to even subtle contextual shifts of the sort that humans handle with ease. For instance, a network trained to parse human language by reading formal documents, like Wikipedia, can fail completely in more vernacular settings, like Twitter.
By this view, interpretability seems essential. But do we understand what we mean by the word? Pioneering computer scientist Marvin Minsky coined the phrase “suitcase word” to describe many of the terms—such as “consciousness” or “emotion”—we use when we talk about our own intelligence.9 These words, he proposed, reflect the workings of many different underlying processes, which are locked inside the “suitcase.”
As long as we keep investigating these words as stand-ins for the more fundamental concepts, the argument went, our insight will be limited by our language. In the study of intelligence, could interpretability itself be such a suitcase word?
Massimo Pigliucci, a professor of philosophy at City University of New York, cautions that “understanding” in the natural sciences—and, by extension, in artificial intelligence—might be what Ludwig Wittgenstein, anticipating Minsky, called a “cluster concept,” one which could admit many, partly distinct, definitions. If “understanding” in this field does come, he says, it could be of the sort found not in physics, but evolutionary biology. Rather than Principia, he says, we might expect Origin of the Species.