Category Archives: AI

Deriving Structure from Textual Descriptions

Source: Nature.com, Jul 2019

The overwhelming majority of scientific knowledge is published as text, which is difficult to analyse by either traditional statistical analysis or modern machine learning methods. B

By contrast, the main source of machine-interpretable data for the materials research community has come from structured property databases1,2, which encompass only a small fraction of the knowledge present in the research literature.

Beyond property values, publications contain valuable knowledge regarding the connections and relationships between data items as interpreted by the authors.

To improve the identification and use of this knowledge, several studies have focused on the retrieval of information from scientific literature using supervised natural language processing3,4,5,6,7,8,9,10, which requires large hand-labelled datasets for training.

Here we show that materials science knowledge present in the published literature can be efficiently encoded as information-dense word embeddings11,12,13 (vector representations of words) without human labelling or supervision.

Without any explicit insertion of chemical knowledge, these embeddings capture complex materials science concepts such as the underlying structure of the periodic table and structure–property relationships in materials. Furthermore, we demonstrate that an unsupervised method can recommend materials for functional applications several years before their discovery.

This suggests that latent knowledge regarding future discoveries is to a large extent embedded in past publications. Our findings highlight the possibility of extracting knowledge and relationships from the massive body of scientific literature in a collective manner, and point towards a generalized approach to the mining of scientific literature.

Related Resource: ZeroHedge, Jul 2019

the algorithm found predictions for potential thermoelectric materials which can convert heat into energy for various heating and cooling applications.

“It can read any paper on material science, so can make connections that no scientists could,” said researcher Anubhav Jain. “Sometimes it does what a researcher would do; other times it makes these cross-discipline associations.

The algorithm was designed to assess the language in 3.3 million abstracts from material sciences, and was able to build a vocabulary of around half-a-million words. Word2Vec used machine learning to analyze relationships between words.

“The way that this Word2vec algorithm works is that you train a neural network model to remove each word and predict what the words next to it will be,” said Jain, adding that “by training a neural network on a word, you get representations of words that can actually confer knowledge.

Using just the words found in scientific abstracts, the algorithm was able to understand concepts such as the periodic table and the chemical structure of molecules. The algorithm linked words that were found close together, creating vectors of related words that helped define concepts. In some cases, words were linked to thermoelectric concepts but had never been written about as thermoelectric in any abstract they surveyed. This gap in knowledge is hard to catch with a human eye, but easy for an algorithm to spot.

Using AI to deter Cat Predatory Behaviour

Source: The Verge, Jun 2019

Machine learning can be an incredible addition to any tinkerer’s toolbox, helping to fix that little problem in life that no commercial gadget can handle. For Amazon engineer Ben Hamm, that problem was stopping his “sweet, murderous cat” Metric from bringing home dead and half-dead prey in the middle of the night and waking him up.

Hamm gave an entertaining presentation on this subject at Ignite Seattle, and you can watch a video of his talk above. In short, in order to stop Metric from following his instincts, Hamm hooked up the cat flap in his door to an AI-enabled camera (Amazon’s own DeepLens) and an Arduino-powered locking system.

AI & Machine Learning & Deep Learning

Source: Medium, Sep 2018

What is artificial intelligence?

 artificial intelligence can be loosely interpreted to mean incorporating human intelligence to machines.

What is machine learning?

As the name suggests, machine learning can be loosely interpreted to mean empowering computer systems with the ability to “learn”.

The intention of ML is to enable machines to learn by themselves using the provided data and make accurate predictions.

Training in machine learning entails giving a lot of data to the algorithm and allowing it to learn more about the processed information.

What is deep learning?

 deep learning is a subset of ML; in fact, it’s simply a technique for realizing machine learning. In other words, DL is the next evolution of machine learning.

DL algorithms are roughly inspired by the information processing patterns found in the human brain.

while DL can automatically discover the features to be used for classification, ML requires these features to be provided manually.

Furthermore, in contrast to ML, DL needs high-end machines and considerably big amounts of training data to deliver accurate results.

Making 3-pointers (basketball)

AI Fills in Visual Details of Sketches

Source: Nvidia blog, Mar 2019

a deep learning model developed by NVIDIA Research can do just the opposite: it turns rough doodles into photorealistic masterpieces with breathtaking ease.

The tool leverages generative adversarial networks, or GANs, to convert segmentation maps into lifelike images.

Machine learning ‘causing science crisis’

Source: BBC, Feb 2019

A growing amount of scientific research involves using machine learning software to analyse data that has already been collected. This happens across many subject areas ranging from biomedical research to astronomy. The data sets are very large and expensive.

‘Reproducibility crisis’

But, according to Dr Allen, the answers they come up with are likely to be inaccurate or wrong because the software is identifying patterns that exist only in that data set and not the real world.

“Often these studies are not found out to be inaccurate until there’s another real big dataset that someone applies these techniques to and says ‘oh my goodness, the results of these two studies don’t overlap‘,” she said.

“There is general recognition of a reproducibility crisis in science right now. I would venture to argue that a huge part of that does come from the use of machine learning techniques in science.”

The “reproducibility crisis” in science refers to the alarming number of research results that are not repeated when another group of scientists tries the same experiment. It means that the initial results were wrong. One analysis suggested that up to 85% of all biomedical research carried out in the world is wasted effort.

Flawed patterns

Machine learning systems and the use of big data sets has accelerated the crisis, according to Dr Allen. That is because machine learning algorithms have been developed specifically to find interesting things in datasets and so when they search through huge amounts of data they will inevitably find a pattern.

“The challenge is can we really trust those findings?” she told BBC News.

“Are those really true discoveries that really represent science? Are they reproducible? If we had an additional dataset would we see the same scientific discovery or principle on the same dataset? And unfortunately the answer is often probably not.”

Data > AI Framework/Architecture

Source: Medium, Feb 2019

At its essence, machine learning works by extracting information from a dataset and transferring it to the model weights. A better model is more more efficient at this process (in terms of time and/or overall quality), but assuming some baseline of adequacy (that is, the model is actually learning something) better data will trump a better architecture.

To illustrate this point, let’s have a quick and dirty test. I created two simple convolutional networks, a “better” one, and a “worse” one. The final dense layer of the better model had 128 neurons, while the worse one had to make due with only 64. I trained them on subsets of the MNIST dataset of increasing size, and plotted the models’ accuracy on the test set vs the number of samples they were trained on.

The positive effect of training dataset size is obvious (at least until the models start to overfit and accuracy plateaus). My “better” model, blue line, clearly outperforms the “worse” model, green line. However, what I want to point out is that the accuracy of the “worse” model trained on 40 thousand samples is better than of the “better” model at 30 thousand samples!

Good engineering is always important, but if you are doing AI, the data is what creates the competitive advantage. The billion dollar question is, however, if you are going to be able to maintain your advantage.

Creating a dataset, however, is a different sort of problem. Usually, it requires a lot of manual human labor — and you can easily scale it by hiring more people. Or it could be that someone has the data — then all you have to do is pay for a license. In any case — the money makes it go a lot faster.

A much better option is to treat AI as a lever. You can take an existing, working business model and supercharge it with AI. For example, if you have a process which depends on human cognitive labor, automating it away will do wonders for your gross margins. Some examples I can think of are ECG analysisindustrial inspectionsatellite image analysis. What is also exciting here is that, because AI stays in the backend, you have some non-AI options to build and maintain your competitive advantage.

Building AI models may be very interesting, but what really matters is having better data than the competition. Maintaining a competitive advantage is hard, especially if you encounter a competitor who is richer than you, which is very likely to happen if your AI idea takes off. You should aim to create a scalable data collection process which is hard to reproduce by your competition. AI is well suited to disrupt industries which rely on the cognitive work of low qualified humans, as it allows to automate this work.