
Determined AI infrastructure platform goes open source
Aimed at “dramatically” improving the productivity of deep learning developers, the company’s Determined AI Platform tightly integrates all of the features that a deep learning (DL) engineer needs to train models at scale. The AI infrastructure platform, says the company, manages users’ heterogeneous hardware and optimizes their GPU resource utilization, and now powers teams of DL engineers and large GPU clusters in industries like pharmaceutical drug discovery, adtech, industrial IoT, and autonomous vehicles, and is now ready for widespread adoption.
Up to now, says the company, except for tech giants like Google, Facebook, and Microsoft – which have invested massive resources and expertise to build proprietary, AI-native internal infrastructure – lack of software infrastructure has been a fundamental bottleneck in achieving AI’s immense potential. For everyone else who doesn’t have access to this infrastructure, building practical applications powered by AI remains prohibitively expensive, time-consuming, and difficult.
“We started Determined AI three years ago to bring AI-native software infrastructure to the broader market,” says the company in a blog post announcing the move to open source. “Working closely with cutting-edge deep learning teams across a variety of industries, a clear narrative emerged: without better infrastructure, training deep learning models at scale remains extremely difficult, as organizations move from research to production.”
That feedback, says the company, led it to build the Determined Training Platform, which the company has now open sourced under the Apache 2.0 license. The platform offers the following features:
- High-performance distributed training: Determined’s distributed training support builds upon Horovod, a popular distributed training framework, but includes a suite of optimizations that results in twice the performance of stock Horovod. Moreover, Determined’s distributed training support is easy to set up (no code changes are needed to move from single-GPU to distributed training), and allows multiple users to seamlessly share the same GPU cluster.
- State-of-the-art hyperparameter search: Determined’s hyperparameter search functionality integrates tightly with the company’s job scheduler and is parallel by default — so users can get to more accurate models 100x faster than standard search methods and 10x faster than Bayesian Optimization methods.
- DL tools for individuals and teams: Determined helps users with experiment management with experiment tracking, log management, metrics visualization, reproducibility, and dependency management. These tools boost productivity for individual DL engineers over the lifespan of a project, and are essential for growing teams to collaborate and scale efficiently.
- Hardware-agnostic and integrated with the Open Source Ecosystem: Determined supports the public cloud and on-prem infrastructure, which means users can avoid getting locked into proprietary solutions. Moreover, Determined works with a user’s DL framework of choice, exports to popular serving frameworks, and more generally integrates with a wide range of data prep and model serving technologies.
For more, see the Determined AI Platform on GitHub or view the Determined AI documentation.
