Neuro-Evolution with Novelty Search

Source: Quanta, Nov 2019

the steppingstone principle — and, with it, a way of designing algorithms that more fully embraces the endlessly creative potential of biological evolution.

The steppingstone principle goes beyond traditional evolutionary approaches. Instead of optimizing for a specific goal, it embraces creative exploration of all possible solutions. By doing so, it has paid off with groundbreaking results.

Earlier this year, one system based on the steppingstone principle mastered two video games that had stumped popular machine learning methods. And in a paper published last week in Nature, DeepMind — the artificial intelligence company that pioneered the use of deep learning for problems such as the game of Go — reported success in combining deep learning with the evolution of a diverse population of solutions.

Biological evolution is also the only system to produce human intelligence, which is the ultimate dream of many AI researchers.

Because of biology’s track record, Stanley and others have come to believe that if we want algorithms that can navigate the physical and social world as easily as we can — or better! — we need to imitate nature’s tactics. Instead of hard-coding the rules of reasoning, or having computers learn to score highly on specific performance metrics, they argue, we must let a population of solutions blossom. Make them prioritize novelty or interestingness instead of the ability to walk or talk. They may discover an indirect path, a set of steppingstones, and wind up walking and talking better than if they’d sought those skills directly.

After Picbreeder, Stanley set out to demonstrate that neuroevolution could overcome the most obvious argument against it: “If I run an algorithm that’s creative to such an extent that I’m not sure what it will produce,” he said, “it’s very interesting from a research perspective, but it’s a harder sell commercially.”

He hoped to show that by simply following ideas in interesting directions, algorithms could not only produce a diversity of results, but solve problems. More audaciously, he aimed to show that completely ignoring an objective can get you there faster than pursuing it. He did this through an approach called novelty search.

To test the steppingstone principle, Stanley and his student Joel Lehman tweaked the selection process. Instead of selecting the networks that performed best on a task, novelty search selected them for how different they were from the ones with behaviors most similar to theirs. (In Picbreeder, people rewarded interestingness. Here, as a proxy for interestingness, novelty search rewarded novelty.)

“Novelty search is important because it turned everything on its head,” said Julian Togelius, a computer scientist at New York University, “and basically asked what happens when we don’t have an objective.”

A key element of these algorithms is that they foster steppingstones. Instead of constantly prioritizing one overall best solution, they maintain a diverse set of vibrant niches, any one of which could contribute a winner. And the best solution might descend from a lineage that has hopped between niches.

Now even DeepMind, that powerhouse of reinforcement learning, has revealed its growing interest in neuroevolution. In January, the team showed off AlphaStar, software that can beat top professionals at the complex video game StarCraft II, in which two opponents control armies and build colonies to dominate a digital landscape. AlphaStar evolved a population of players that competed against and learned from each other.

In last week’s Nature paper, DeepMind researchers announced that an updated version of AlphaStar has been ranked among the top 0.2% of active StarCraft II players on a popular gaming platform, becoming the first AI to reach the top tier of a popular esport without any restrictions.

As in the children’s game rock-paper-scissors, there is no single best game strategy in StarCraft II. So DeepMind encouraged its population of agents to evolve a diversity of strategies — not as steppingstones but as an end in itself. When AlphaStar beat two pros each five games to none, it combined the strategies from five different agents in its population. The five agents had been chosen so that not all of them would be vulnerable to any one opponent strategy. Their strength was in their diversity.

AlphaStar demonstrates one of the main uses of evolutionary algorithms: maintaining a population of different solutions.

To mirror this open-ended conversation between problems and solutions, earlier this year Stanley, Clune, Lehman and another Uber colleague, Rui Wang, released an algorithm called POET, for Paired Open-Ended Trailblazer.

For example, one bot learned to cross flat terrain while dragging its knee. It was then randomly switched to a landscape with short stumps, where it had to learn to walk upright. When it was switched back to its first obstacle course, it completed it much faster. An indirect path allowed it to improve by taking skills learned from one puzzle and applying them to another.

POET could potentially design new forms of art or make scientific discoveries by inventing new challenges for itself and then solving them. It could even go much further, depending on its world-building ability. Stanley said he hopes to build algorithms that could still be doing something interesting after a billion years.

In a recent paper, Clune argues that open-ended discovery is likely the fastest path toward artificial general intelligence — machines with nearly all the capabilities of humans.

Clune thinks more attention should be paid to AI that designs AI. Algorithms will design or evolve both the neural networks and the environments in which they learn, using an approach like POET’s.

Such open-ended exploration might lead to human-level intelligence via paths we never would have anticipated — or a variety of alien intelligences that could teach us a lot about intelligence in general. “Decades of research have taught us that these algorithms constantly surprise us and outwit us,” he said. “So it’s completely hubristic to think that we will know the outcome of these processes, especially as they become more powerful and open-ended.”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.