An easy way to get to know Machine Learning
Throughout the history of mankind, very often there have been cases where the goods that people need were produced far away from their place of residence. Thanks to logistics, territories that could produce only one product began to participate in world trade. It was possible to create a very effective system of international trade and provide people of the entire planet with a large selection of goods. Also, it became possible to deliver perishable food products in the shortest possible time, which significantly reduces their price, and the consumer has the opportunity to purchase them without overpaying for the percentage of rejection.
From the needs for transport goods to machine learning
Logistics allows you to connect places of consumer concentration with production facilities, without losing money and time. The widespread use of logistics in the economy began in the 60-70s. XX century. This, to a greater extent, is associated with technological advances and the business needs of those times. Thanks to the development of technology, it became possible to carry out end-to-end monitoring of all stages of the movement of raw materials, billets, and finished products from the primary source of raw materials to the final consumer. The term “logistics” began to be used in situations involving the clear planning of an agreed sequence of actions. For the effective management of material flows in the economy, applied methods of applied mathematics, graph theory, and transport problems of higher mathematics were applied. With their help, the business receives income by skillfully optimizing transportation.
In the modern world, due to a number of global trends in world development, logistics systems are developing significantly. The process of planning, organizing, and managing the delivery of personal and non-profitable delivery speeds up. A better process requires control over all transport operations arising in combination with the use of modern means of information technology. Algorithms for providing relevant information to the owners affect the speed and quality of decision-making. And increasingly, logistics companies are introducing artificial intelligence mechanisms to keep up with global trends and become the best, winning in the competition.
Machine learning is one of the most popular approaches in artificial intelligence. Over the past decade, machine learning has become an integral part of our lives. This is effective because the machine can perform mechanical repetitive tasks. As more data becomes available, machine learning becomes a necessary element of technological progress. ML provides huge opportunities in the field of logistics. ML teaches computers to make a decision, instead of a person, and make decisions where it is not possible for a person to do this because of the large amount of data.
K-Nearest Neighbors method
The easiest way to get to know ML is with the k-Nearest Neighbors method. K-Nearest Neighbors (k-NN) is one of the simplest algorithms used in machine learning to solve regression and classification problems.
KNN for regression: when k-NN is used for regression tasks, the forecast is based on the average or median of the K-most similar cases.
KNN for classification: When k-NN is used for classification, the output signal can be calculated as the class with the highest frequency of the K most similar cases. Each instance essentially votes for its own class, and the class with the most votes is taken for the forecast.
The k-NN algorithm was born from studies conducted for the military. The U.S. Air Force School of Aviation Medicine departments wrote a technical report that introduced a non-parametric pattern classification method that has since become popular as the k-nearest neighbor (k-NN) algorithm. K-NN is often used as a reference for more complex classifiers, such as artificial neural networks (ANNs) and reference vector machines (SVMs). Despite its simplicity, k-NN surpasses more powerful classifiers and is used in various applications.
Every time something important happens in a person’s life, the human brain remembers this experience. After that, the brain uses this experience as a guide on what could happen next.
Imagine seeing someone drop a glass. As the glass falls, you already make a prediction that it will break when it falls to the floor. How do you do this? How is decision making in your head? You never saw this glass break before, right? Of course, you have not seen this before. But you have already seen similar cases with glass or similar objects in general that have fallen to the floor before. And although the situation may not be quite the same, you still know that glass falling from a height of one and a half meters onto a tile floor usually breaks. This gives you a fairly high level of confidence that you can expect a breakdown whenever you see glass falling from that height onto a hard floor.
But, what if a glass falls from a height of 10 centimeters onto a soft carpet? In such situations, will the glass break too? We can see that height matters, as does the hardness of the surface on which the glass falls. These are two important factors. Also, the third factor, the thickness of the glass, may matter here. There may be too many of these factors.
This mode of reasoning is what the k-Nearest Neighbors algorithm does. Each time a new situation arises, the algorithm looks through all past experiences and searches for the closest events. These events (or data points) are what we call k nearest neighbors. That is, if there is a classification problem, for example, you want to predict whether glass will break or not, then you need to calculate the majority of the votes of all k neighbors. If k = 5 and in 3 or more of your most similar experiments the glass was broken, then the prediction will be “yes, it will break”.
Such a method can be useful if there is a need to predict whether the goods will be delivered without damage or if there is a chance that the goods will deteriorate during transportation and therefore additional measures need to be taken.
On the graph, it looks like this if we have two criteria for making a decision (the data was invented as an example):
We have blue and red data points. The red dots indicate the forecasts of air temperature and rainfall, category X goods were delivered with deteriorated quality characteristics under standard conditions of transportation. Blue dots indicate that the goods were transported in proper quality. For a new data point (green), we can determine the most likely class by looking at the classes of nearest neighbors. Here the solution will be “red” because it is the majority of the neighbors at k = 3. Therefore, in these forecast weather conditions, transportation conditions should be adjusted.
The k-Nearest Neighbors ML algorithm during training, stores all the data that it receives. All calculations occur during the evaluation, i.e. when we apply the model to invisible data points. We need to determine which k data points from our training set are closest to the data point for which we want to get a forecast.
K-Nearest Neighbors is the standard tool in the ML toolbox. This algorithm is easy to understand even for non-specialists. But in order to apply it, you need to consider some things and understand more deeply.
Implementation of K-NN
To implement the k-NN algorithm in Python, there is a scikit-learn (sklearn) library. To create this ML algorithm, you can import the KNeighborsClassifier module from sklearn.neighbors to implement an estimate of k-nearest neighbors, and the desired accuracy estimate from sklearn.metrics to evaluate the accuracy of classification. A standard set of libraries, such as numpy and pandas, for working with data, and matplotlib.pyplot for plotting, will also be useful.
A key part of the algorithm is determining the measure of distance. A frequent choice is the Euclidean distance. This measure of distance processes all columns of data equally. The subtraction of values for each measurement occurs before the summation of the squares of these distances. This means that columns with a wider range of data have a greater effect on distance than columns with a smaller range of data.
In this case, it is necessary to normalize the data set so that all columns are approximately on the same scale. There are two common ways to normalize.
First, you can translate all column values into a range from 0 to 1.
Secondly, you can change the values of each column so that the column has an average value of 0 with a standard deviation of 1.
The choice of the number of k neighbors
The minimum value is 1, in this case, they look only at the nearest neighbor for each forecast in order to make a decision. Theoretically, you can use the value for k equal to the total training set. But this makes no sense since in this case the class of the majority of the full training set will be predicted.
Good values for k depend on the available data and whether the problem is non-linear or not. You should try a couple of values from 1 to 10% of the size of the training dataset.
Type of distance function used
For numerical values, the Euclidean distance is a good choice. If our attributes are not numeric or consist of numeric and categorical attributes, you can use any other measure of distance that can process data of this type.
Memory usage and runtime
This algorithm stores complete workout data. Thus, the need for memory grows linearly with the number of data points that you provide for training. And the runtime scales linearly with the number of data columns m and the number of training points n. Thus, if you need to quickly process a large amount of training data, you will need high machine power.
Improvement of logistics activities
The possibilities of the practical application of k-NN in the logistics sphere are quite wide:
● Recommended storage systems.
● Logistic ratings, comparison of goods by transport characteristics.
● Decision making. Should the product be packaged in additional containers? Will the goods be damaged during transportation? Is this product closer in characteristics to goods that received damage during transportation?
● Recognition of images and markings for cargo segmentation.
● Identification of illiquid stocks.
● For internal typing of a service level. Determining the level of service, depending on the parameters important for the logistics company: delivery speed, whether there were complaints, etc.
Although k-NN is a simple model, it often helps when you need to quickly deliver a solution with fairly accurate results.
In conclusion, it should be noted that the need for logistics companies that own ML methods throughout the world and in our country is constantly growing. Logistic process management must keep up with the times in order to reason and calculate quickly and efficiently while minimizing costs and providing the best service.