Graphics processing for AI in security, monitoring and more
The arrival of neural networks has enabled vision processing to become a critical factor in the modern world. This has driven change in industry with robotic process operation, smart cameras for surveillance and monitoring and advanced driver-assistance systems (ADAS) in our vehicles – and there is so much more to come as these technologies emerge fully.
This means that professionals now need to consider where the market not only is now, but where it could be in a few short years’ time. With development continuing at breakthrough pace and volumes of investment in AI outstripping almost every other sector it’s only a matter of time before everything we do is influenced by AI. Think of the huge volume of new applications added to mobile devices since the first smart phones, unlocking a world of location-based services, social interaction, commerce and entertainment. AI has the potential to unlock both new applications and to evolve those already created to make them exponentially better servants of the user.
The cloud contribution
Vision processing for AI has moved rapidly from the data centre to the edge and the latest IP for Application Specific Integrated Circuit (ASICs) and system on chips (SoCs) is geared towards variations on a theme: that is, pre-processing of visual information, traditional computer vision algorithms and then edge inferencing using neural networks to generate object detection, recognition and suitable actions.
AI is used as an umbrella term for multiple flavours of machine learning, including deep learning for computer vision. These networks are designed to mimic the brain’s neurons and synapses, using the digital equivalent, perceptrons, and they usually rely on being trained to recognise patterns in data (visual or otherwise) and then when exposed to new data, infer from that what the data could signify. Training is usually done in the data centre on racks of computers, usually GPUs which are well suited to parallel pipelined tasks, although inference is often done locally using GPUs or dedicated neural network accelerator IP (NNA).
In the last ten years the advances in visual processing have been progressing at an exponential rate due to the increasing amount of affordable compute power and the development of convolutional neural networks (CNNs) and the sensors to go with them. Specifically, the ability to “learn” and “develop” a representational model of the world (through inputs from sensors, datasets and SLAM – simultaneous location and mapping algorithms) means that systems can begin to grasp context and their position in space as well as make predictions and act on them. Sophisticated systems, trained in the cloud are now capable of significantly faster inferencing, which means that object identification can be done at a speed that allows real-time decision-making. Embedded systems with multiple sensors in autonomous vehicle can identify other cars, distinguish roads from sidewalks, and pedestrians from animals. They can then begin to make predictions as to whether a pedestrian is about to walk into the road.
What is important here is that this sophisticated inferencing traditionally performed in the cloud is now being run on a device at the edge, that is in a local embedded processor taking one or two millimetres squared of silicon which can accelerate network layers with exceptional performance. This means that powerful compute for AI can now be built into the smallest sensors, electronic control units (ECUs) and “Internet of Things” (IoT) devices.
As AI moves closer to the edge and into devices such as sensors, cameras and mobiles, it not only eliminates the need for racks of cloud-based inference and instead moves the analysis to the device itself, removing latency in processing and reducing data transmission and bandwidth while potentially increasing security. A powerful CNN can through quantization and adaptation be deployed in a small edge device, and when inferencing can be run on a chip the size of a pin head this allows these devices to impact a plethora of markets including security, retail, factories, homes and vehicles, becoming ubiquitous.
Neural networks are becoming a vital component in heterogeneous systems, involving combinations of GPU and NNA, each doing what they do best and complementing each other.
CPU, GPU or NNA?
Neural networks have become prevalent thanks to the vast increase in readily available compute power usually running on either CPUs or GPUs. However, AI operations are highly intensive for compute which is why when it comes to edge devices it can be challenging to achieve satisfactory performance – a dedicated hardware solution is a much more preferable option. For example, if we benchmark a typical mobile CPU at “1x” for its performance at running neural networks, then a GPU will accelerate this by around 12x. However, on a dedicated neural network accelerator, these operations can run over 100x faster (for supported layers) and if run at lower bit-depths such as 4-bit can be around 200x faster.
This approach uses a fixed-point data type with quantization to minimise the size of the model and bandwidth required. Lossless weight compression further enhances efficiency. In addition, some NNA hardware core IP supports variable bit-depth so that weights can be adjusted on a layer by layer basis to achieve maximum accuracy for the deployed model while minimising the model size to reduce memory bandwidth and power consumption. Overall, this gives very efficient performance with low power requirements. The impressive power efficiency of a small (circa 1 mm2) NNA can even enable devices to run on batteries and from energy harvesting from solar or wind power.
Use case: Image pre-processing
Sophisticated use of GPU and NNA together can allow a fish-eye lens image, say from a wide -angle/fish-eye lens to be de-warped by the GPU and then processed as input to the NNA which can then run SSD (single shot detection) for object detection on the input de-warped image. This is of real use for monitoring or when a huge field of vision is needed to be captured say from a smart camera or a camera mounted overhead or to reduce lens distortion.
Use case: Two-stage object detection
However, for networks that require intermediate processing such as selecting a trained region of interest, a face in a crowd, a two-stage object detection network like Faster R-CNN could be employed. Or another example could be chaining neural networks together where the output of one becomes the input of another, involving pre-processing, intermediate processing and post-processing. Non-Neural Network operations in the middle of the processing path can be processed on GPU or CPU depending on suitability, whilst layers that can be accelerated would run on the NNA.
AI processing in the smart city, smart factory and autonomous vehicles
When a GPU and NNA are combined into the same chip, there is the opportunity to get the best of both worlds, where graphics vision compute processing is combined with neural networks, often using shared memory to reduce bandwidth and external data transmission. Several use cases are examined in greater detail below.
Smart cities are all about infrastructure. In the smart city, sensors relay data back to “brains” in the cloud to direct traffic smoothly by monitoring traffic flow to increase road efficiency. In a smart city, vehicles will rely on this smart infrastructure to keep drivers informed about upcoming traffic conditions. So, while talking to lamp posts, traffic lights and street signs may seem crazy to the average person, in the future, your car will do it all the time. As such, we’ll see increasing uptake of vehicle-to-vehicle (V2V), vehicle-to -infrastructure (V2X) and interaction between what the intelligent edge sensors are “seeing” and how that is relayed as useful information.
V2X will become a basic requirement – one requiring AIoT (Artificial Intelligence in the Internet of Things) on trillions of sensors. AIoT will enable this vehicle-to-infrastructure communication, which means there will be a multi-way exchange of information allowing the vehicle to make informed choices based on real-time and predicted information. For instance, how frustrating is it when motorway signs display out-of-date information because a human controller hasn’t realised it needs to be refreshed? Or wouldn’t it be better to know to take the exit before rounding the corner and becoming part of a three-mile tailback?
Currently, sat-nav systems do this by relying on crowd-sourced data but using real-time information would automate this process and reduce the delay in obtaining the data.
Automotive: autonomous vehicles
An autonomous vehicle has multiple cameras for computer vision, object recognition, lane warning and driver monitoring, as well as other sensors (e.g. thermal imaging, RADAR and LiDAR) for sensor fusion. Processing at the edge minimises the bandwidth required to move data to and from the vehicle and avoids delays to that analysis. In connectivity blackspots or when latency is critical (e.g. travelling at 70 mph+) then edge processing could literally be the difference between saving a life or not.
In addition, AI and path planning could identify and predict that a child that may walk into the road, thus enabling the vehicle to adapt and slow down, ready to take evasive action. At a simpler level, the automated valet parking will remove from drivers the burden of finding a parking space.
In addition, edge sensors will track water, waste, energy and environmental pollution (redirecting traffic to lessen pollution in specific areas), as well as making homes and workplaces safer and more intelligent.
In the smart city the AIoT will enable ever smarter edge devices to not only be data generators (sensors) but data aggregators, data exchanges and data-driven decision making “brains”. For cars in the city this means spreading the traffic jam or eliminating the jam entirely by enabling cars to be constantly updated by street infrastructure (V2X) and by other vehicles (V2V), with sharing of data allowing better decision-making for routing and safety as well as clearing the path for emergency vehicles to get through.
The future of work
In the workplace, the factory of the future will become safer because “dumb” industrial robots and robotic vehicles will become “aware” of their surroundings and of the presence of a human – enhancing safety by ensuring that if a human enters the envelope of a robot’s movements it immediately understands what is happening and reverts to safe mode. While machine learning is being used to revolutionise task learning in factories and workplaces, there is still a need to keep a human in the loop to step in and override when necessary. We are getting to the stage where robots can be “taught” particular tasks such as the movement arc or the envelope of acceptable movement in a piece of work.
Likewise, we will have smart “to go” stores where you select your retail item, a vitamin drink perhaps, and when you leave the store it will be debited from your account, your loyalty points will be updated, and the shelf replenished, all from the actions of the sensors and cameras, and all without human interaction.
AI is driving the fourth industrial revolution. We live in truly exciting times where we can see the progress happening on a near-daily basis. The investment into AI and graphics is expanding at an exponential rate and new use cases are being developed by continuous innovation. It is compelling to witness these developments in computer and machine vision, adding intelligence to our world via sensors, edge devices and sophisticated performance IP
The use of dedicated chip IP for neural network acceleration is creating the “intelligent edge”, and we are seeing new “seeing technologies” being deployed there which will change our world.
About the author:
Andrew Grant is Senior Product Director at Imagination Technologies – www.imgtec.com