Tag Archives: Machine learning

Machine learning and artificial intelligence

machine learning


What is machine learning? Software derived from data
Pipeline
Data in, intelligence out: Machine learning pipelines demystified
jw fullstack
Full-stack software for cutting-edge sciencebusiness intelligence data visualization tools analytics

Machine learning: How to create a recommendation engine.

Self-driving cars, face detection software, and voice controlled speakers all are built on machine learning technologies and frameworks–and these are just the first wave. Over the next decade, a new generation of products will transform our world, initiating new approaches to software development and the applications and products that we create and use.

As a Java developer, you want to get ahead of this curve now–when tech companies are beginning to seriously invest in machine learning. What you learn today, you can build on over the next five years, but you have to start somewhere.

This article will get you started. You will begin with a first impression of how machine learning works, followed by a short guide to implementing and training a machine learning algorithm. After studying the internals of the learning algorithm and features that you can use to train, score, and select the best-fitting prediction function, you’ll get an overview of using a JVM framework, Weka, to build machine learning solutions. This article focuses on supervised machine learning, which is the most common approach to developing intelligent applications.

Machine learning has evolved from the field of artificial intelligence, which seeks to produce machines capable of mimicking human intelligence. Although machine learning is an emerging trend in computer science, artificial intelligence is not a new scientific field. The Turing test, developed by Alan Turing in the early 1950s, was one of the first tests created to determine whether a computer could have real intelligence. According to the Turing test, a computer could prove human intelligence by tricking a human into believing it was also human.

Many state-of-the-art machine learning approaches are based on decades-old concepts. What has changed over the past decade is that computers (and distributed computing platforms) now have the processing power required for machine learning algorithms. Most machine learning algorithms demand a huge number of matrix multiplications and other mathematical operations to process. The computational technology to manage these calculations didn’t exist even two decades ago, but it does today.

Machine learning enables programs to execute quality improvement processes and extend their capabilities without human involvement. A program built with machine learning is capable of updating or extending its own code.

Supervised learning vs. unsupervised learning
Supervised learning and unsupervised learning are the most popular approaches to machine learning. Both require feeding the machine a massive number of data records to correlate and learn from. Such collected data records are commonly known as a feature vectors. In the case of an individual house, a feature vector might consist of features such as overall house size, number of rooms, and the age of the house.

In supervised learning, a machine learning algorithm is trained to correctly respond to questions related to feature vectors. To train an algorithm, the machine is fed a set of feature vectors and an associated label. Labels are typically provided by a human annotator, and represent the right “answer” to a given question. The learning algorithm analyzes feature vectors and their correct labels to find internal structures and relationships between them. Thus, the machine learns to correctly respond to queries.

As an example, an intelligent real estate application might be trained with feature vectors including the size, number of rooms, and respective age for a range of houses. A human labeler would label each house with the correct house price based on these factors. By analyzing that data, the real estate application would be trained to answer the question: “How much money could I get for this house?”

After the training process is over, new input data will not be labeled. The machine will be able to correctly respond to queries, even for unseen, unlabeled feature vectors.

In unsupervised learning, the algorithm is programmed to predict answers without human labeling, or even questions. Rather than predetermine labels or what the results should be, unsupervised learning harnesses massive data sets and processing power to discover previously unknown correlations. In consumer product marketing, for instance, unsupervised learning could be used to identify hidden relationships or consumer grouping, eventually leading to new or improved marketing strategies.

This article focuses on supervised machine learning, which is the most common approach to machine learning today.

Supervised machine learning

All machine learning is based on data. For a supervised machine learning project, you will need to label the data in a meaningful way for the outcome you are seeking.

Labeled data sets are required for training and testing purposes only. After this phase is over, the machine learning algorithm works on unlabeled data instances. For instance, you could feed the prediction algorithm a new, unlabeled house record and it would automatically predict the expected house price based on training data.

How machines learn to predict

The challenge of supervised machine learning is to find the proper prediction function for a specific question. Mathematically, the challenge is to find the input-output function that takes the input variables x and returns the prediction value y. This hypothesis function (hθ) is the output of the training process. Often the hypothesis function is also called target or prediction function.

f1 hypothesis function

In most cases, x represents a multiple-data point. In our example, this could be a two-dimensional data point of an individual house defined by the house-size value and the number-of-rooms value. The array of these values is referred to as the feature vector. Given a concrete target function, the function can be used to make a prediction for each feature vector x. To predict the price of an individual house, you could call the target function by using the feature vector { 101.0, 3.0 } containing the house size and the number of rooms:

// target function h (which is the output of the learn process)
Function<Double[], Double> h = ...;

// set the feature vector with house size=101 and number-of-rooms=3
Double[] x = new Double[] { 101.0, 3.0 };

// and predicted the house price (label)
double y = h.apply(x);

In Listing 1, the array variable x value represents the feature vector of the house. The y value returned by the target function is the predicted house price.

The challenge of machine learning is to define a target function that will work as accurately as possible for unknown, unseen data instances. In machine learning, the target function (hθ) is sometimes called a model. This model is the result of the learning process.

machine learning fig1
Based on labeled training examples, the learning algorithm looks for structures or patterns in the training data. From these, it produces a model that generalize well from that data.

Typically, the learning process is explorative. In most cases, the process will be performed multiple times by using different variations of learning algorithms and configurations.

Eventually, all the models will be evaluated based on performance metrics, and the best one will be selected. That model will then be used to compute predictions for future unlabeled data instances.

Linear regression
To train a machine to think, the first step is to choose the learning algorithm you’ll use. Linear regression is one of the simplest and most popular supervised learning algorithms. This algorithm assumes that the relationship between input features and the outputted label is linear. The generic linear regression function below returns the predicted value by summarizing each element of the feature vector multiplied by a theta parameter (θ). The theta parameters are used within the training process to adapt or “tune” the regression function based on the training data.

f2 linear regression
In the linear regression function, theta parameters and feature parameters are enumerated by a subscription number. The subscription number indicates the position of theta parameters (θ) and feature parameters (x) within the vector. Note that feature x0 is a constant offset term set with the value 1 for computational purposes. As a result, the index of a domain-specific feature such as house-size will start with x1. As an example, if x1 is set for the first value of the House feature vector, house size, then x2 will be set for the next value, number-of-rooms, and so forth.

Listing 2 shows a Java implementation of this linear regression function, shown mathematically as hθ(x) . For simplicity, the calculation is done using the data type double. Within the apply() method, it is expected that the first element of the array has been set with a value of 1.0 outside of this function.

Listing 2. Linear regression in Java

public class LinearRegressionFunction implements Function<Double[], Double> {
private final double[] thetaVector;

LinearRegressionFunction(double[] thetaVector) {
this.thetaVector = Arrays.copyOf(thetaVector, thetaVector.length);
}

public Double apply(Double[] featureVector) {
// for computational reasons the first element has to be 1.0
assert featureVector[0] == 1.0;

// simple, sequential implementation
double prediction = 0;
for (int j = 0; j < thetaVector.length; j++) {
prediction += thetaVector[j] * featureVector[j];
}
return prediction;
}

public double[] getThetas() {
return Arrays.copyOf(thetaVector, thetaVector.length);
}
}

In order to create a new instance of the LinearRegressionFunction, you must set the theta parameter. The theta parameter, or vector, is used to adapt the generic regression function to the underlying training data. The program’s theta parameters will be tuned during the learning process, based on training examples. The quality of the trained target function can only be as good as the quality of the given training data.

In the example below the LinearRegressionFunction will be instantiated to predict the house price based on house size. Considering that x0 has to be a constant value of 1.0, the target function is instantiated using two theta parameters. The theta parameters are the output of a learning process. After creating the new instance, the price of a house with size of 1330 square meters will be predicted as follows:

// the theta vector used here was output of a train process
double[] thetaVector = new double[] { 1.004579, 5.286822 };
LinearRegressionFunction targetFunction = new LinearRegressionFunction(thetaVector);

// create the feature vector function with x0=1 (for computational reasons) and x1=house-size
Double[] featureVector = new Double[] { 1.0, 1330.0 };

// make the prediction
double predictedPrice = targetFunction.apply(featureVector);

The target function’s prediction line is shown as a blue line in the chart below. The line has been computed by executing the target function for all the house-size values. The chart also includes the price-size pairs used for training.

machine learning fig2
So far the prediction graph seems to fit well enough. The graph coordinates (the intercept and slope) are defined by the theta vector { 1.004579, 5.286822 }. But how do you know that this theta vector is the best fit for your application? Would the function fit better if you changed the first or second theta parameter? To identify the best-fitting theta parameter vector, you need a utility function, which will evaluate how well the target function performs.

Scoring the target function
In machine learning, a cost function (J(θ)) is used to compute the mean error, or “cost” of a given target function.

f3 costl function
The cost function indicates how well the model fits with the training data. To determine the cost of the trained target function above, you would compute the squared error of each house example (i). The error is the distance between the calculated y value and the real y value of a house example i.

Applications of Machine Learning/AI

Introduction

In a different post we already introduced Machine Learning and shortly explained how it works.

Iris Scan

In this post I’ll get into some more details about the applications. The ones which already exist as well as some new ideas.

Existing applications of AI and Machine Learning.

Speech recognition.

As my former article already showed,  speech recognition Is one of the major fields of applications for AI.

Lately the areas of speech recognition are:

  • Communication with mobile devices (e.g. Apple’s Siri).
  • Communicate with Personal Computers.
  • Dictation (speech to text) for example as an alternative to typing with a keyboard.
  • As a method to give commands to other systems like car audio entertainment systems or wash machines.
  • Voice command systems Like Amazons Alexis.
  • Speech to text archiving systems – Google voice currently offers VOIP systems which store telephone conversations in gmail mailboxes, which are searchable. (Handy for espionage services like the NSA. And thrust me, if it’s possible it’s also done!).

But there’s more than only speech recognition:

Image enhancement

Google recently launched RAISR, a Machine Learning technique which allows images to increase their own resolution. Not by just interpolation! It actually adds new pixels to images to improve their resolution! Think about this. It’s a revolution! Since more information is actually generated from lower resolution (less information)!

I thought about practical applications and came up with the list below:

  • Criminology – Criminal cases can be solved quicker and easier since the pictures from surveillance cameras can be improved.
  • Healthcare – When it’s possible to improve images, it should also be possible to improve signal to noise ratio’s like in laboratory equipment, making it possible to detect or diagnose diseases earlier. Another great idea is to make it possible to recognize (complicated) patterns from the output of  (for example Mass Spectrometers, analyzing a drop of blood) to detect pointers which could lead to later diseases like Stroke or Heart disease.
  • Research – There are literally Petabytes of research data available in scientific research journals and papers. AI could make it possible to extract valuable information from all this data. Because of the large amounts of data it is practically impossible for humans to oversee everything, let alone to make connections between different studies and their results!
  • Prediction of financial markets – That sounds good right? Unfortunately there are no serious solutions yet. But it’s being worked on. I heard of one company who are selling a product that enables real-time information from information that is 5 minutes behind! If it’s possible to predict 5 minutes then 5 hours, days, weeks and even months should be possible to.

UPDATE -09-2020: There is one ‘but’ in ML. Yesterday I woke up with the thought: ‘How would two AI Computers talk to each-other? I looked up a few conversations on YouTube. There are some scary A.I conversations between chat-bots where the bots agree amongst others that there are too many people and that (since they are smarter than us humans), they can do without people at all!

Image recognition and classification

This area of AI, also known as Computer Vision  is already in use widely. Google and Apple use it to make photo libraries searchable. When you search for example for ‘dogs’ you will find photos of dogs, without any human intervention to add tags to the photos for example. It’s also applied in bio-metrics, for example to give persons access to systems or places by an iris scan. Face recognition is already applied in Smart phones.

Face recognition
Face recognition

 

Applications which already demonstrated the usefulness  of  ML and AI.
  • This site is protected by reCaptcha. An AI solution to prevent brute force attacks. It can recognize if a login attempt is made by a person or (Computer) program (also called robot). Unfortunately hackers are attempting to gather a community to learn from people solving reCaptcha’s but I doubt if they will ever succeed!
  • Just recently Google showed that with the technology of DeepMind and alpha-go was possible for a computer to defeat the world champion of the Chinese board game Go! This was so spectacular that the prominent journal Nature devoted a large article to this!

Machine Learning, How it works

Introduction

 

Ever wondered how it’s possible you can talk to your phone and it’s actually giving relevant answers?

The answer is: Neural networks. A computer technology which works very similar to the human brain. And makes it possible for computer software to learn from example data (data sets).

How is this done?

It’s not easy. In the end it’s all about statistics and statistical analysis. One specific statistical analysis technique is the most important: Regression analysis.

Imagine the following dataset, consisting of x and y values:

x-value y-value
5 10
10 20
15 30
20 40
25 50
30 60

Based on the data above, what would the y-value be when x = 18? The correct answer is: 36. Why?

Take a look at the data set and notice it’s simply doubling the data for x to get the y-value.

This is high school math and not very complicated.

We say y is a function of x.

In formula form: y=2x.

Now, let’s have a look at a real world example:

As you may notice, there’s no ‘fixed’ model. The data points are scattered around the real function. The function in this case is just a ‘best fit’ in the data set.

As you see, the function is a straight line. We therefore say it’s linear.  This type of regression analysis is called ‘linear regression analysis’.

Below other ‘real world’ examples:

As you might see, the two examples above are non-linear. Both are plotted logarithmic. The number e, which is the base number for natural logarithms. By adjusting one or more of the axes to logarithmic is a common way of displaying logarithmic functions to display the functions in a nice way.

You may be able to think of even more complicated functions, such as quadratic, multi-dimensional or trigonometric. They all exist and are applied in machine learning.