Source: HBR, Jan 2017
… how does deep learning work?
Deep learning systems are modeled after the neural networks in the neocortex of the human brain, where higher-level cognition occurs. In the brain, a neuron is a cell that transmits electrical or chemical information. When connected with other neurons, it forms a neural network. In machines, the neurons are virtual—basically bits of code running statistical regressions. String enough of these virtual neurons together and you get a virtual neural network. Think of every neuron in the network below as a simple statistical model: it takes in some inputs, and it passes along some output.
For a neural network to be useful, though, it requires training. To train a neural network, a set of virtual neurons are mapped out and assigned a random numerical “weight,” which determines how the neurons respond to new data (digitized objects or sounds). Like in any statistical or machine learning, the machine initially gets to see the correct answers, too. So if the network doesn’t accurately identify the input – doesn’t see a face in an image, for example — then the system adjusts the weights—i.e., how much attention each neuron paid to the data—in order to produce the right answer. Eventually, after sufficient training, the neural network will consistently recognize the correct patterns in speech or images.
The idea of artificial neurons has been around for at least 60 years, when, in the 1950s, Frank Rosenblatt built a “perceptron” made of motors, dials, and light detectors, which he successfully trained to tell the difference between basic shapes. But early neural networks were extremely limited in the number of neurons they could simulate, which meant they couldn’t recognize complex patterns. Three developments in the last decade made deep learning viable.
First, Geoffrey Hinton and other researchers at the University of Toronto developed a breakthrough method for software neurons to teach themselves by layering their training. (Hinton now splits his time between the University of Toronto and Google.) A first layer of neurons will learn how to distinguish basic features, say, an edge or a contour, by being blasted with millions of data points. Once the layer learns how to recognize these things accurately, it gets fed to the next layer, which trains itself to identify more complex features, say, a nose or an ear. Then that layer gets fed to another layer, which trains itself to recognize still greater levels of abstraction, and so on, layer after layer—hence the “deep” in deep learning—until the system can reliably recognize very complex phenomenon, like a human face.
The second development responsible for recent advancements in AI is the sheer amount of data that is now available. Rapid digitization has resulted in the production of large-scale data, and that data is oxygen for training deep learning systems. Children can pick something up after being shown how to do it just a few times. AI-powered machines, however, need to be exposed to countless examples.
Deep learning is essentially a brute-force process for teaching machines how a thing is done or what a thing is. Show a deep learning neural network 19 million pictures of cats and probabilities emerge, inclinations are ruled out, and the software neurons eventually figure out what statistically significant factors equate to feline. It learns how to spot a cat. That’s why Big Data is so important—without it, deep learning just doesn’t work.
Finally, a team at Stanford led by Andrew Ng (now at Baidu) made a breakthrough when they realized that graphics processing unit chips, or GPUs, which were invented for the visual processing demands of video games, could be repurposed for deep learning. Until recently, typical computer chips could only process one event at a time, but GPUs were designed for parallel computing. Using these chips to run neural networks, with their millions of connections, in parallel sped up the training and abilities of deep learning systems by several orders of magnitude. It made it possible for a machine to learn in a day something that had previously taken many weeks.
The most advanced deep learning networks today are made up of millions of simulated neurons, with billions of connections between them, and can be trained through unsupervised learning. It is the most effective practical application of artificial intelligence that’s yet been devised. For some tasks, the best deep learning systems are pattern recognizers on par with people. And the technology is moving aggressively from the research lab into industry.
Deep Learning OS 1.0
As impressive as the gains from deep learning have been already, these are early days. If I analogize it to the personal computer, deep learning is in the green-and-black-DOS-screen stage of its evolution. A great deal of time and effort, at present, is being spent doing for deep learning—cleaning, labelling, and interpreting data, for example—rather than doing with deep learning. But in the next couple of years, start-ups and established companies will begin releasing commercial solutions for building production-ready deep learning applications. Making use of open-source frameworks such as TensorFlow, these solutions will dramatically reduce the effort, time, and costs of creating complex deep learning systems. Together they will constitute the building blocks of a deep learning operating system.
A deep learning operating system will permit the widespread adoption of practical AI. In the same way that Windows and Mac OS allowed regular consumers to use computers and SaaS gave them access to the cloud, tech companies in the next few years will democratize deep learning. Eventually, a deep learning OS will allow people who aren’t computer scientists or natural language processing researchers to use deep learning to solve real-life problems, like detecting diseases instead of identifying cats.
The first new companies making up the deep learning operating system will be working on solutions in data, software, hardware.
Data. Getting good quality large scale data is the biggest barrier to adopting deep learning. But both service shops and software platforms will arise to deal with the data problem. Companies are already creating internal intelligent platforms that assist humans to label data quickly. Future data labeling platforms will be embedded in the design of the application, so that the data created by using a product will be captured for training purposes. And there will be new service-based companies that will outsource labeling to low-cost countries, as well as create labeled data through synthetic means.
Software. There are two main areas here where I see innovation happening:
1) The design and programming of neural networks. Different deep learning architectures, such as CNNs and RNNs, support different types of applications (image, text, etc.). Some use a combination of neural network architectures. As for training, many applications will use a combination of machine learning algorithms, deep learning, reinforcement learning, or unsupervised learning for solving different sub-parts of the application. I predict that someone will build a machine learning design engine solution, which will examine an application, training data set, infrastructure resources, and so on, and recommend the right architecture and algorithms to be used.
2) A marketplace of reusable neural network modules. As described above, different layers in a neural network learn different concepts and then build on each other. This architecture naturally creates opportunity to share and reuse trained neural networks. A layer of virtual neurons that’s been trained to identify an edge, on its way up to recognizing the face of cat, could also be repurposed as the base layer for recognizing the face of a person. Already, Tensorflow, the most popular deep learning framework, supports reusing an entire subgraph component. Soon, the community of machine learning experts contributing open source modules will create the potential for deep learning versions of GitHub and StackOverflow.
Hardware. Finding the optimal mix of GPUs, CPUs, cloud resources; determining the level of parallelization; and performing cost analyses are complex decisions for developers. This creates an opportunity for platform and service-based companies to recommend the right infrastructure for training tasks. Additionally, there will be companies that provide infrastructure services—such as orchestration, scale-out, management, and load balancing—on specialized hardware for deep learning. Moreover, I expect incumbents as well as start-ups to launch their own deep learning-optimized chips.