Data science in the times of Corona: (Some) reassembly required: Page 3 of 4

May 19, 2020 //By Michael Berthold
Data science
The enormous impact of the current crisis is obvious. What many still haven't realized, however, is that the impact on ongoing data science production setups can be dramatic, too.

What's the problem?

In reality, one most often encounters two types of production data science setups: the ones that were built once, deployed and have been running for a while without any further refinements, or the data science process may have been the result of a consulting project or maybe even the outcome of a modern automated machine learning (AutoML) type of project. In both cases, if you are fortunate, automatic handling of partial model change has been incorporated into the system, so at least some of the partial changes are handled automatically. But since none of the currently available AutoML tools allow for performance monitoring and automatic retraining (and usually one-shot projects don't worry about that either), you may not even be aware that your data science process has failed.

If it is more of a setup where the data science team has made numerous improvements over the years, chances are higher that automatic model drift detection and retraining is built in as well. However, even then - and especially in case of a complete model jump - it is far more likely that the existing system cannot easily be recreated to accommodate the new setup because all those steps are not well documented, making it difficult to revisit the assumptions and update the process. Also, often the process relies on obscur code pieces, written by experts who have left the team in the meantime. The only solution? Start an entirely new project.


Vous êtes certain ?

Si vous désactivez les cookies, vous ne pouvez plus naviguer sur le site.

Vous allez être rediriger vers Google.