Category Archives: AI

An ICO for AI: SingularityNet

Source: Wired, Oct 2017

… fostering the emergence of human-level artificial intelligence on a decentralised, open-source platform

“SingularityNET’s idea is to create a distributed AI platform on the [Ethereum] blockchain, with each blockchain node backing up an AI algorithm,” Goertzel explains. AI researchers or developers would be able to make their AI products available to SingularityNET users, which would pay for services with network-specific crypto-tokens.

Initially, the plan is to have a system that provides visibility — and payment — to independent developers of AI programmes. 

“We want create a system that learns on its own how to cobble together modules to carry out different functions. You’ll see a sort of federation of AIs emerge from the spontaneous interaction among the nodes, without human guidance,” he explains. “It’ll have AI inside each and every nodes, and between them, and they’ll learn how to combine their skills.”

The expected endgame is that these swarms of smart nodes would get as intertwined as clusters of neurons, eventually evolving into human-level AI. Goertzel admits that it might take decades for that to happen, but he is positive that the primary purpose of the SingularityNET project is bringing about “beneficial Artificial General Intelligence” (that is: human-level AI).

SingularityNET will sell 50 percent of its whole token trove, distributing the other half to its staff and to a foundation that will reinvest them in charitable AI projects. Goertzel is optimistic about the sale, which he thinks could be appealing even to technology heavyweights.

“I have been working with Cisco, Huawei, and Intel, and I think we can pull in a lot of major customers who want to buy a lot of tokens to do AI analysis for their own purposes,” he says. “In general, though, this ICO will allow us to start with a bang. We’ll be competing with Google and Facebook…so having a war chest would allow us to take on them more easily.”


Is Deep Learning Sufficient?

Source: MIT Technology Review, Sep 2017

the peculiar thing about deep learning is just how old its key ideas are. Hinton’s breakthrough paper, with colleagues David Rumelhart and Ronald Williams, was published in 1986.

The paper elaborated on a technique called backpropagation, or backprop for short. Backprop, in the words of Jon Cohen, a computational psychologist at Princeton, is “what all of deep learning is based on—literally everything.”

Hinton’s breakthrough, in 1986, was to show that backpropagation could train a deep neural net, meaning one with more than two or three layers. But it took another 26 years before increasing computational power made good on the discovery. A 2012 paper by Hinton and two of his Toronto students showed that deep neural nets, trained using backpropagation, beat state-of-the-art systems in image recognition. “Deep learning” took off. To the outside world, AI seemed to wake up overnight. For Hinton, it was a payoff long overdue.

Backprop is remarkably simple, though it works best with huge amounts of data. 

The goal of backprop is to change those weights so that they make the network work: so that when you pass in an image of a hot dog to the lowest layer, the topmost layer’s “hot dog” neuron ends up getting excited.

Backprop is a procedure for rejiggering the strength of every connection in the network so as to fix the error for a given training example.  The technique is called “backpropagation” because you are “propagating” errors back (or down) through the network, starting from the output.

Neural nets can be thought of as trying to take things—images, words, recordings of someone talking, medical data—and put them into what mathematicians call a high-dimensional vector space, where the closeness or distance of the things reflects some important feature of the actual world. 

Inside his head there’s some big pattern of neural activity.” Big patterns of neural activity, if you’re a mathematician, can be captured in a vector space, with each neuron’s activity corresponding to a number, and each number to a coordinate of a really big vector. In Hinton’s view, that’s what thought is: a dance of vectors.

Neural nets are just thoughtless fuzzy pattern recognizers, and as useful as fuzzy pattern recognizers can be—hence the rush to integrate them into just about every kind of software—they represent, at best, a limited brand of intelligence, one that is easily fooled. A deep neural net that recognizes images can be totally stymied when you change a single pixel, or add visual noise that’s imperceptible to a human. 

Deep learning in some ways mimics what goes on in the human brain, but only in a shallow way—which perhaps explains why its intelligence can sometimes seem so shallow. Indeed, backprop wasn’t discovered by probing deep into the brain, decoding thought itself; it grew out of models of how animals learn by trial and error in old classical-conditioning experiments. And most of the big leaps that came about as it developed didn’t involve some new insight about neuroscience; they were technical improvements, reached by years of mathematics and engineering. What we know about intelligence is nothing against the vastness of what we still don’t know.

Hinton himself says, “Most conferences consist of making minor variations … as opposed to thinking hard and saying, ‘What is it about what we’re doing now that’s really deficient? What does it have difficulty with? Let’s focus on that.’”

It’s worth asking whether we’ve wrung nearly all we can out of backprop. If so, that might mean a plateau for progress in artificial intelligence.



If you want to see the next big thing, something that could form the basis of machines with a much more flexible intelligence, you should probably check out research that resembles what you would’ve found had you encountered backprop in the ’80s: smart people plugging away on ideas that don’t really work yet.

We make sense of new phenomena in terms of things we already understand. 







A real intelligence doesn’t break when you slightly change the requirements of the problem it’s trying to solve. And the key part of Eyal’s thesis was his demonstration, in principle, of how you might get a computer to work that way: to fluidly apply what it already knows to new tasks, to quickly bootstrap its way from knowing almost nothing about a new domain to being an expert.

Essentially, it is a procedure he calls the “exploration–compression” algorithm. It gets a computer to function somewhat like a programmer who builds up a library of reusable, modular components on the way to building more and more complex programs. Without being told anything about a new domain, the computer tries to structure knowledge about it just by playing around, consolidating what it’s found, and playing around some more, the way a human child does.

As for Hinton, he is convinced that overcoming AI’s limitations involves building “a bridge between computer science and biology.” Backprop was, in this view, a triumph of biologically inspired computation; the idea initially came not from engineering but from psychology. So now Hinton is trying to pull off a similar trick.

Neural networks today are made of big flat layers, but in the human neocortex real neurons are arranged not just horizontally into layers but vertically into columns. Hinton thinks he knows what the columns are for—in vision, for instance, they’re crucial for our ability to recognize objects even as our viewpoint changes. So he’s building an artificial version—he calls them “capsules”—to test the theory. So far, it hasn’t panned out; the capsules haven’t dramatically improved his nets’ performance. But this was the same situation he’d been in with backprop for nearly 30 years.

“This thing just has to be right,” he says about the capsule theory, laughing at his own boldness. “And the fact that it doesn’t work is just a temporary annoyance.”

Mindfire Foundation: Mission-1 in Davos (May 12-20 2018)

Source: Mindfire website, 2017
<all expenses covered>

From May 12th through May 20th, 2018

We are starting our quest for true-AI with a new approach, “Artificial Organisms”, which will define our inaugural mission, and all our future missions. The 100 selected talents will form teams according to their skill sets and the given challenges.

Each team will be assigned a professional coach or a subject matter expert. No reporting and no hierarchies, just a shared sense of pursuit. Every day, the talents will have the opportunity to meet top AI researchers who will be there to support them, as mentors.


The imperative to progress true AI is now!

For that reason Mindfire is looking for the best talent out there. Your travel, accommodation and all planned recreational activities are fully funded by us. Mindfire Mission-1 will allow you to:

  • Work alongside 99 other bright minds and 15 eminent researchers.
  • Build a prototype to showcase your progress in helping to solve true-AI.
  • Secure further funding and sponsorship to continue the research for the best projects.
  • Become a member of the exclusive Mindfire community with access to expert know-how.
  • Be rewarded with Mindfire tokens and profit from the Intellectual Property proceeds.

Eligibility criteria

You can only apply as a private individual, i.e. not affiliated to any organization or enterprise. You are currently an undergraduate, masters or PhD student in science or engineering. You can also apply if you are an entrepreneur, using AI within your business.

Be part of the movement and join us from May 12th thru 20th, 2018 in Davos.

Generative Adversarial Networks

Source: O’Reilly, Sep 2017

Through a handful of generative techniques, it’s possible to feed a lot of images into a neural network and then ask for a brand-new image that resembles the ones it’s been shown. Generative AI has turned out to be remarkably good at imitating human creativity at superficial levels.

A generative adversarial network consists of two neural networks: a generator that learns to produce some kind of data (such as images) and a discriminator that learns to distinguish “fake” data created by the generator from “real” data samples (such as photos taken in the real world). The generator and the discriminator have opposing training objectives: the discriminator’s goal is to accurately classify real and fake data; the generator’s goal is to produce fake data the discriminator can’t distinguish from real data.

Generative neural networks are convincing at reconstructing information thanks to their ability to understand information at multiple levels. It’s hard to overstate how remarkable these GAN-generated images of bedrooms are; not only do the sheets, carpets, and windows look convincing, but the high-level structures of the rooms are correct: the sheets are on the beds, the beds are on the floors, and the windows are on the walls.

Instead of detecting patterns and matching them to features in an image, the generator uses transpose convolution to identify fundamental image building-blocks and learns to assemble and blend these building-blocks into convincing images. For instance, our GAN generated this remarkably convincing image of the numeral 9:

Relating Physics and AI via Mathematics

Source: Quanta, Dec 2014

The new work, completed by Pankaj Mehta of Boston University and David Schwab of Northwestern University, demonstrates that a statistical technique called “renormalization,” which allows physicists to accurately describe systems without knowing the exact state of all their component parts, also enables the artificial neural networks to categorize data as, say, “a cat” regardless of its color, size or posture in a given video.

“They actually wrote down on paper, with exact proofs, something that people only dreamed existed,” said Ilya Nemenman, a biophysicist at Emory University. “Extracting relevant features in the context of statistical physics and extracting relevant features in the context of deep learning are not just similar words, they are one and the same.”

how to map the mathematics of one procedure onto the other, proving that the two mechanisms for summarizing features of the world work essentially the same way.

“But we still know that there is a coarse-grained description because our own brain can operate in the real world. It wouldn’t be able to if the real world were not summarizable.”

Tishby sees it as a hint that renormalization, deep learning and biological learning fall under the umbrella of a single idea in information theory. All the techniques aim to reduce redundancy in data. Step by step, they compress information to its essence, a final representation in which no bit is correlated with any other. Cats convey their presence in many ways, for example, but deep neural networks pool the different correlations and compress them into the form of a single neuron. “What the network is doing is squeezing information,” Tishby said. “It’s a bottleneck.”

By laying bare the mathematical steps by which information is stripped down to its minimal form, he said, “this paper really opens up a door to something very exciting.”

Information Bottleneck to AI

Source: Quanta, Sep 2017

a deep neural network has layers of neurons — artificial ones that are figments of computer memory. When a neuron fires, it sends signals to connected neurons in the layer above. During deep learning, connections in the network are strengthened or weakened as needed to make the system better at sending signals from input data — the pixels of a photo of a dog, for instance — up through the layers to neurons associated with the right high-level concepts, such as “dog.” After a deep neural network has “learned” from thousands of sample dog photos, it can identify dogs in new photos as accurately as people can. The magic leap from special cases to general concepts during learning gives deep neural networks their power, just as it underlies human reasoning, creativity and the other faculties collectively termed “intelligence.” Experts wonder what it is about deep learning that enables generalization — and to what extent brains apprehend reality in the same way.

Tishby argues that deep neural networks learn according to a procedure called the “information bottleneck,” which he and two collaborators first described in purely theoretical terms in 1999. The idea is that a network rids noisy input data of extraneous details as if by squeezing the information through a bottleneck, retaining only the features most relevant to general concepts.

Geoffrey Hinton, a pioneer of deep learning who works at Google and the University of Toronto, emailed Tishby after watching his Berlin talk. “It’s extremely interesting,” Hinton wrote. “I have to listen to it another 10,000 times to really understand it, but it’s very rare nowadays to hear a talk with a really original idea in it that may be the answer to a really major puzzle.”

Tishby realized that the crux of the issue was the question of relevance: What are the most relevant features of a spoken word, and how do we tease these out from the variables that accompany them, such as accents, mumbling and intonation? In general, when we face the sea of data that is reality, which signals do we keep?

“For many years people thought information theory wasn’t the right way to think about relevance, starting with misconceptions that go all the way to Shannon himself.”

Claude Shannon, the founder of information theory, in a sense liberated the study of information starting in the 1940s by allowing it to be considered in the abstract — as 1s and 0s with purely mathematical meaning. Shannon took the view that, as Tishby put it, “information is not about semantics.” But, Tishby argued, this isn’t true. Using information theory, he realized, “you can define ‘relevant’ in a precise sense.”

Imagine X is a complex data set, like the pixels of a dog photo, and Yis a simpler variable represented by those data, like the word “dog.” You can capture all the “relevant” information in X about Y by compressing X as much as you can without losing the ability to predict Y. In their 1999 paper, Tishby and co-authors Fernando Pereira, now at Google, and William Bialek, now at Princeton University, formulated this as a mathematical optimization problem. It was a fundamental idea with no killer application.

 Tishby recognized their potential connection to the information bottleneck principle in 2014 after reading a surprising paper by the physicists David Schwaband Pankaj Mehta.

The duo discovered that a deep-learning algorithm invented by Hinton called the “deep belief net” works, in a particular case, exactly like renormalization, a technique used in physics to zoom out on a physical system by coarse-graining over its details and calculating its overall state.

… a stunning indication that, as the biophysicist Ilya Nemenman said at the time, “extracting relevant features in the context of statistical physics and extracting relevant features in the context of deep learning are not just similar words, they are one and the same.”

In 2015, he and his student Noga Zaslavsky hypothesized that deep learning is an information bottleneck procedure that compresses noisy data as much as possible while preserving information about what the data represent. 

Tishby and Shwartz-Ziv also made the intriguing discovery that deep learning proceeds in two phases: a short “fitting” phase, during which the network learns to label its training data, and a much longer “compression” phase, during which it becomes good at generalization, as measured by its performance at labeling new test data.

It’s this forgetting of specifics, Tishby and Shwartz-Ziv argue, that enables the system to form general concepts. 

No Free Lunch

Source: Nautilus, Aug 2017

Will machines ever learn so well on their own that external guidance becomes a quaint relic? In theory, you could imagine an ideal Universal Learner—one that can decide everything for itself, and always prefers the best pattern for the task at hand.

But in 1996, computer scientist David Wolpert proved that no such learner exists. In his famous “No Free Lunch” theorems, he showed that for every pattern a learner is good at learning, there’s another pattern that same learner would be terrible at picking up. The reason brings us back to my aunt’s puzzle—to the infinite patterns that can match any finite amount of data. Choosing a learning algorithm just means choosing which patterns a machine will be bad at. Maybe all tasks of, say, visual pattern recognition will eventually fall to a single all-encompassing algorithm. But no learning algorithm can be good at learning everything.