Data science in the times of Corona: (Some) reassembly required

May 19, 2020 //By Michael Berthold
Data science
The enormous impact of the current crisis is obvious. What many still haven't realized, however, is that the impact on ongoing data science production setups can be dramatic, too.

Many of the models used for segmentation or forecasting started to fail when traffic and shopping patterns changed, supply chains were interrupted, borders were locked down, and just in general the way people behaved changed fundamentally. Sometimes, the data science systems adapted reasonably quickly when the new data started to represent the new reality. In other cases, the new reality is so fundamentally different that the new data is not sufficient to train a new system, or worse, the base assumptions built into the system just don't hold any more, so the entire process from data science creation to productionizing must be revisited.

This article describes different scenarios and a few examples of what happens when old data becomes completely outdated, base assumptions are not valid any more, or patterns in the overall system change. I then highlight some of the challenges data science teams face when updating their production system and conclude with a set of recommendations for a robust future-proof data science setup.
 

Impact scenario: complete change

The most drastic scenario is a complete change of the underlying system that not only requires an update of the data science process itself but also revising the assumptions that went into its design in the first place. This requires a full new data science creation and productionization cycle: understanding and incorporating business knowledge, exploring data sources (possibly to replace data that doesn't exist any more), and selecting and fine tuning suitable models. Examples of this are traffic predictions (especially near suddenly closed borders), shopping behaviour under more or less stringent lock downs, and healthcare-related supply chains.

A subset of the above is the case where the availability of the data changed. A very illustrative example here is weather predictions where quite a bit of data stems from commercial passenger aircraft that are equipped with additional sensors. Those aircraft remaining on the ground suddenly reduces the available data drastically. Base assumptions about weather development itself remain the same (ignoring for a moment that other changes in pollution and energy consumption may affect the weather as well), so "only" a retraining of the existing models may be sufficient. However, if the missing data really represents a significant portion of the information that went into model construction, the data science team is well advised to rerun the model selection and optimization process as well.


Vous êtes certain ?

Si vous désactivez les cookies, vous ne pouvez plus naviguer sur le site.

Vous allez être rediriger vers Google.