Best design practices for large-scale analytics projects

November 24, 2016 // By Mike Paquette
Complex choices for anyone looking at implementing large scale analytics platforms are not helped by the IT industry’s predilection for overusing buzzwords. ‘Big Data’, Artificial Intelligence (AI) and Machine Learning are certainly hot topics, but they are also in danger of becoming so misused as to become meaningless. Potential users should adopt a healthy dose of scepticism to avoid vendors dressing up their ailing proprietary solutions as ‘themes du jour’.

Away from the hype, modern search technologies have radically altered the speed and scale of what is possible. Analysis applied to larger volumes, higher velocities and wider varieties of heterogeneous data reveals new patterns that will never become apparent on a smaller scale.

More recent developments, combining search, graph technologies, machine learning and behavioural analytics, put an array of algorithmic assistants to work for the end-user. In the hands of experts, these tools are the beginnings of AI.


AI for electronic engineering

The abilities to analyse and learn from vast quantities of data can deliver value in nearly all areas of electronic engineering. Integrating these powerful search technologies with Time Series analysis offers massive benefits in areas as broad as: silicon fabrication; process monitoring; weather station sensor networks; data networking infrastructure; voice communications design; radio signal processing; and electrical grid usage.

Within industrial control systems, the impact of moving from arduous monitoring, towards intelligent anomaly detection, systemic behavioural analysis and more accurate prediction, can’t be understated. As we enter an era where almost all electronic devices will deliver sensor data and receive instruction across the Internet, well-engineered, intelligent software platforms will form a major part of the value of all electronic devices.


Platform implementation

Infrastructure decisions will largely be determined by an organisation’s existing IT strategy. Nonetheless, high level scoping is essential and key considerations include, details of capturing the data source, data format, volumes/speeds and acceptable latency.

Most companies start small with pilot projects, before growing into broader, mission critical usage. The overriding rule for infrastructure is that organisations will benefit from more open, flexible software platforms allowing them to move implementations between different environments as needs change. Popular open source distributions, with active user communities, will offer the best mix between customisation, innovation and differentiation; allowing companies to focus on the core business.

Acute skills shortages within data science will also be a determining factor in the end choice of the software itself. While Big Data and machine learning are often associated with Spark, Hadoop and MapR, the skillsets required to build applications in R and Python are both scarce and in high demand from sectors such as finance and pharma. Explore more accessible technologies using subject-specialists rather than dedicated data scientists. 

Making sense of data: a four-step process.