MENU

The five stages of machine learning implementation

Feature articles |
By Julien Happich


If you’ve stumbled upon this article, you may already be in this position. However, what’s more likely is that this is going to become your situation in near future, and learning from someone else’s experience is now needed to prepare. While there’s a plethora of theory around business applications for data analytics; there is a significant lack of practical, real-life experience to draw on. This is largely due to the fact that adoption of these technologies, for many industries, is new and the results of pilots are just coming to light now. Drawing on our work with one of the world’s largest steel producers, here I will detail some of our most useful and practical learnings.

 

Decreasing steelmaking costs with Magnitogorsk Iron and Steel Works (MMK)

Machine learning technologies are successfully used in predictive and recommendation services. The basis of accurate predictions is formed by historical data which is used as a training set. The result of this work is one or more models that can predict the most likely outcome of the technical process or the set of options, among which the best is chosen.

For example, Yandex Data Factory developed a recommender service for Magnitogorsk Iron and Steel Works (MMK) that helps to reduce ferroalloy use by an average of 5% at the oxygen-converter stage of steel production. Not only it saves about 5% of ferroalloys but, more importantly, this happens with sure and steady maintenance of the high quality of resultant steel.

When you choose the task for applying machine learning technologies, you should choose the one with measurable results and economic effect. In addition to this, the availability of data is required, as well as understanding how these recommendations and predictions should be used practically.

However, as we have learnt, finding a solution is not simply reached at one giant leap. The process of creating a predictive or recommendation project consists of several stages.

 

Stage one – Determining objectives, metrics and constraints

The first and very important stage is determining the objectives and constraints used in the modelling process. In the case of ferroalloy optimisation service, the key constraint is the need to adhere to the target chemical composition of resultant steel using the ferroalloys that are available at this specific melting. The objectives are the minimum possible cost of ferroalloys used and the maximum ratio of recommendations that were accepted by the operator for execution. The second objective is important because if the recommendations seem to be sudden or aggressive, they are often rejected by the operator responsible for the management of this smelting. For each objective, there should be chosen a metric, and the model should be trained specifically for it – its success will be determined in terms of this specific metric. Therefore, choosing the right metric is a critical factor of success. If the metric is chosen badly, all the work on the model goes in the wrong direction.


Stage two – Assessing data

The next step is the assessment of available data sources, estimation of available data volume and composition. Depending on the task, a set of fields and parameters and historical depth of data, experts decide how realistic the task is. An important difference between machine learning technologies, from the more conventional ways of data analysis, is that the most valuable data is raw data, without aggregation and pre-processing. For the modelling it is preferable to have a larger amount of data, even when inconsistent and containing errors and omissions, rather than a small amount of “clean” data. If the data is not enough, at this stage, you can define which data will help solving the task and start collecting them.

When building a model for MMK, the data used consisted of 200,000 smelting entries collected over 7 years. The dataset included scrap and pig iron masses, target chemical composition of resultant steel, technical parameters of the oxygen-conversion and the refining stages, results of chemical analyses.

 

Stage 3 – Model training

Training the model, in contrast to conventional software development, does not require pre-development of rules and algorithms. Data analyst determines the range of factors that may affect the process being modelled – this is often an extensive process to ensure important factors aren’t missed. At this stage we need the experts, who know the subject area and a process being modelled, to cooperate with the analysts who train the model and possess the necessary tools. The training process is iterative, 1 to 2 months are spent on designing the model, where the level of accuracy of this model constantly increases.

Unfortunately, the quality of the model can be estimated only after its training and testing are ended. For many tasks it is advisable to combine determined formula, reflecting the known properties of the process, with the refining models that are based on machine learning technologies.

In this case, the generalised calculation, using the formula, is refined by the machine learning model. This combination allows you to take into account both the general characteristics of the process, and local variations that cannot be included within the formulas. This approach allowed us to ensure both an adequate proportion of the accepted recommendations and significant savings in ferroalloys at the same time in the MMK project.

However, some may find this as a disadvantage that even after the successful training, machine learning models do not interpret their own results to explain why something has been recommended, like business analytics. While the quality of the predictive model is measurable and stable, it does not generate any new knowledge. Model training is conducted on a limited dataset and requires a lot of computing power.


Stage 4 – Integration and testing

After model training, it is integrated into the client’s management system. This integration is simplified by the fact that the model realises only one function – it predicts or recommends. In this regard, the interface is very simple and the integration is reduced to data transfer and displaying those recommendations.

Practical tests are carried out with client’s side experts participating. During testing, it is necessary to measure the accuracy of the model, and achieved economic effect. In the service designed for MMK, the board of experts involved in the testing waived any responsibility from the operator if he did not quite agree with the recommendation made. This allowed us to estimate the effect of the model use and to determine the necessary changes for its use in production. During testing, model showed a decrease in ferroalloy use of about 5%, and the projected savings comprised $4.3 million per year.

 

Stage 5 – Model monitoring

The last stage is the production use of the service that successfully passed testing. The production use requires constant monitoring of the model quality and its regular additional training on newly collected data. Unfortunately, completely self-trained machine learning systems are in the research stage now, and their application for business and industry use is not possible.

 

About the author:

Alexander Khaytin is Chief Operating Officer of Yandex Data Factory – https://yandexdatafactory.com


Share:

Linked Articles
10s