What is machine learning?

We are generating more data every day with all of the technologies we use (smartphones, computers, tablets, connected devices, etc.). All of these devices generate a massive amount of data. The average person generated 1.7MB of data per second in 2020. All of this data is stored in digital databases, and represents a considerable source of information. This is what is known as big data. Without proper processing and an effective analytics strategy, this mass of data would just be a jumbled, growing heap of bytes. This is where machine learning comes in, and enables users to leverage the value of this data.

La définition du Machine Learning – OVHcloud

What is machine learning?

The first machine learning algorithms were developed in 1950. Machine learning is both a technology and a science (data science) that allows a computer to perform a learning process, without having been previously programmed to do so. This technique — which is linked to artificial intelligence (AI) – is designed to highlight patterns of statistical repetition, and derive statistical predictions from them. Data mining – which involves extracting information from a high volume of data – is used as a raw material for machine learning to highlight patterns for statistical prediction. This is why big data (all of the data generated and stored) is an integral part of machine learning. When a larger data set is processed to identify trends, the predictions are exponentially more accurate.
More specifically, the learning algorithm applied enables the computer to refine its analysis and responses, based on empirical data from the associated database. Machine learning is a great learning model for businesses, because it allows them to harness the power of the data generated by their customers or activity. Artificial intelligence represents a major challenge, if the company is to succeed.

There are several types of learning that are classified according to existing data during the learning phase. If the response to the defined task is already known, the data is referred to as ‘labelled’. This is what is known as supervised learning. Depending on whether the data is discrete or continuous, classification or regression is used. If the learning takes place step by step, with a reward system in place for each task performed correctly, then it is known as reinforcement learning. The most common type of learning is unsupervised learning, which involves searching without labels. It aims to predict a result, without using known answers beforehand.

What is machine learning used for?

The power and advantage of machine learning lies in its ability to process a huge volume of data that is impossible for the human brain to process. Industries that gather a high volume of data need a solution for processing it, and extracting information that can be used for decision-making. Predictive analysis of this data enables the computer to anticipate specific situations. This is what machine learning is all about. Let us consider the financial services sector, for example. Machine learning is used to detect fraud, illegal conduct and other elements that are key for financial institutions to work properly.

The growing volume of transactional data we generate is also used by companies to target their customers based on their purchasing behaviour, by identifying repetitions. The websites and pages we visit also generate data that can be used by machine learning to set our preferences. It is clear that this data processing technique — without the need for human intervention — is a major asset for companies wishing to make use of the mass of data available to them. It is unlikely that a human being would be able to make use of this data themselves, because the volume of data to process is just so high. Let us consider major companies like Amazon or Google, for example. The implementation of AI and machine learning in their processes has become a necessity, due to the enormous volumes of actionable data streams they generate.

With data being generated in ever-increasing volumes, a growing number of companies will also need to integrate this technology into their structure in order to make use of the information available to them. Connected devices, for example, are becoming increasingly present in our daily lives. In 2019, more than 8 billion connected devices became a part of our daily lives, enabling us to collect more data on our pace of life and consumer habits — relying on our speech recognition. This figure was projected to multiply by 5 in 2020. All of this represents a huge mass of critical data for companies, and machine learning helps us identify the elements that are relevant and useful. Without a doubt, there is a lot at stake here. Big data plays a vital part in the development of many technologies for modern society — like facial recognition, self-driving cars, robotics, and smart home technology, for example. But to create this technology, companies must learn how to implement this asset in a suitable way. This technology is not only aimed at experienced developers in the field of AI. Many companies are embarking on the adventure of machine learning by choosing turn-key solutions that are adapted to fit their objectives.

How machine learning works

Machine learning works based on “experience”. The computer retrieves a high volume of data, and uses it to analyse and predict situations. The goal of the process is for the machine to independently create an “internal plan”, which it can use to identify the key elements that the user wants to target. It will need to experiment with different examples and tests in order to progress. This is why we talk about learning.
To train itself and learn, the computer needs learning data. Data mining is the basis for how machine learning works, and the data used is called a training data set. The computer also needs analytical software and algorithms, as well as a deployment environment — usually a server that is adapted to meet the user’s computing needs. There are different types of learning that may vary depending on the knowledge of the response sought, the type of data analysed, the data environment considered and the type of analysis performed (statistics, comparisons, image recognition, etc.). The learning algorithms differ depending on the task at hand, and the computing power they require will also be affected.

Machine learning usually involves two steps. The first is the development of the model from the set of test data, also known as observation data. This step involves defining the task that the user wants to process (detecting the presence of an element in a photo, detecting a statistical recurrence, responding to a sensor’s signal, etc.). This is the testing or "training" phase. The second stage involves putting the model into production. It can be optimised with new data. Some systems may continue learning during the production phase — but the user needs to ensure that they get feedback on the results produced, so that they can optimise the model and manage the machine. Others can continue their learning alone, and develop independently.

The quality of the learning is dependent on several factors:

The number of relevant examples that the computer can consider. The more examples there are, the more accurate the data analysis will be.
The number of characteristics describing the examples. The simpler and more precise they are (size, weight, quantity, speed, etc.), the quicker and more accurate the analysis will be.
The quality of the database used. If too much data is missing, this will impact the analysis. False or exaggerated data can also distort results.

The prediction algorithm will be more accurate, and the analysis will be more relevant if these elements are taken into account. Once the machine learning project is defined and the databases are ready, you can start the machine learning process.

Make your machine learning project a success with OVHcloud:

We have always been committed to bringing technology to all business sectors. We believe that with the potential AI represents, it should not be reserved solely for IT giants or major companies. We want to help you and support you as much as possible in launching ambitious AI and machine learning projects. Artificial intelligence boosts efficiency for businesses, and facilitates decision-making. OVHcloud offers tools to help you address business challenges — such as predictive analysis of data sets — and make it easy to use for all user profiles. We support our customers in developing their artificial intelligence systems.

With OVHcloud, you can collect and prepare your data using our Data Analytics solutions. You can model your machine learning project step by step, and deploy your model in just a few clicks. You can choose from a range of tools and frameworks, such as TensorFlow, PMML or ONNX.

OVHcloud solutions offer a number of advantages when it comes to developing your machine learning project:

Data privacy: We are committed to keeping your personal data confidential. Data sovereignty is a vital aspect of our company philosophy, so you can recover your data whenever you need to.
Computing power: By automating deployments and our infrastructures, we can offer you unrivalled computing power at competitive prices.
Open source: In the world of data, open-source solutions are now the most mature and high-performance products on the market. OVHcloud values the importance of basing its solutions on open-source software, like the Apache Hadoop and Apache Spark suites.