Maps for autonomous vehicles
Of course, to derive high value from a map, a user must determine its position on it, a task known as localization. There are several ways a user might localize themselves, ranging from a simple visual comparison of the scene compared with a prior image of that same scene through to more sophisticated techniques.
Google’s introduction in 2007 of visual imagery to enhance its maps in its StreetView product allowed such a simple comparison for consumer map localization. Crowd-sourced alternatives, such as that supported by startups like Mapillary, share the same aim. More recent work by Here (previously known as Navteq before being acquired by Nokia), TomTom and Google (again) has focused on how to automate accurate localization for machines, specifically autonomous vehicles.
Alongside several university research groups and startups, those companies have begun the task of a surveying, processing and curating wide-scale high-definition 3D maps, where visual imagery and 3D point-cloud data from LIDAR sensors are combined to create a highly-accurate geometric fingerprint of the world. Such prior HD maps offer the prospect of machines now being able to localize themselves within those maps down to centimeter-level accuracy. The accuracy and reliability of such an approach is certainly better than GNSS, but has its own problems that we’ll come to later.
Having localized on a prior HD map, a user can determine the optimum path from that location to a planned destination, a task simply referred to as routing. Of course, electronic 2D maps have been available as a prior map for GNSS satellite navigation systems for decades and are regularly used for routing from A to B. Should GNSS units fail or exhibit inaccuracy, various SLAM (simultaneous location and mapping) techniques can be adopted to derive a coarse approximation for location on a route sufficient for navigation, thus providing continuity of service. Historically however, determining on a video feed which pixels represent sidewalk and which represent road is one example of the type of problem computer vision technology used in SLAM has been grappling with.
The go-to solution up until now has been to use an accurate prior map that distinguishes between these and to localize a user accurately on it. Even if a computer vision technology could accurately distinguish between pixels in the scene, if the accuracy of its determination of how far away these features are is not accurate to within a few centimeters, then such a system will inevitably result in hitting curbs, scraping buildings and ultimately be unsafe.
This thinking has driven a strongly-perceived requirement for up-to-date prior HD maps and highly accurate localization on them in order to ensure that early autonomous vehicle prototypes stay where they’re meant to – on the road – and achieve that without endangering the vehicle, its passengers, other road users and the infrastructure.
Human routing from A to B involves using landmarks and waypoints for localization, cross-checking against the planned route as it progresses. We use maps to help us prepare for what lies ahead, for example when we know we will be taking the next exit off a highway, we prepare by staying in the lane adjacent to the exit, even before we can see it. We don’t need a particularly accurate idea of our location in order to perform this kind of preparation. We only need coarse relative localization to navigate along a route, in the example above we might see the exit in the distance, by way of contrast, we don’t need to know we are 367.43m from the exit.
Computer vision techniques have achieved startling and unexpected progress in the last eight years and we are now at the point where machines can identify objects more reliably than a human is able to and can reliably segment a scene, in real-time, between sidewalk and drivable road, even in the presence of reflections and specularities (e.g. from surface water), painting every pixel in the scene with an accurate representation of what it is. Those staggering breakthroughs means that autonomous vehicle technology developed today is, for the first time, able to generate an accurate 3D reconstruction of the world in real time, meaning it may be possible to build a driverless system that needs only approximate localization, just as humans do.
Moreover, it’s now also possible to determine how far away different objects are in 3D space to a sufficient accuracy – and to supplement this with other sensor input such as LIDAR, radar, IMUs and ultrasound – that vehicles can manoeuver safely in its accurate view of the 3D world around it. Of course, there may still be occasions when relying on vision, like a human, isn’t good enough. Autonomous vehicle software developers are often asked about conditions like a complete white-out from snow, or dense fog. These can indeed still make it difficult or impossible for computer vision perception systems to create an accurate representation of the world, much as these conditions would for mere humans. But, just as in the human case, we should be asking whether such vehicles should be driving at all, however controlled.
So, whilst we are at the advent of a world where we may not need HD maps and accurate localization on them to control a vehicle, instead substituting this need by stronger artificial intelligence in computer vision technology, there are other geospatial data that would be of assistance to autonomous vehicles, which may be usefully layered on top of a traditional navigation map.
One such area could be prior local knowledge that could condition how a vehicle is controlled. Human drivers build up local knowledge from prior experience that gets used when driving a route again. For example, we might learn where children tend to cross on their way to and from school. Or we might know to be wary if passing an entertainment establishment from which people often stumble into the road at later times on a Friday evening and we build up a picture of which road sections other drivers often overtake or drive too fast. We’re always making note of potential dynamic agents (pedestrians, cyclists, cars, etc.) and their potential behaviour on a geospatial basis, especially when they pose a potential risk.
We also build up knowledge of how we like to drive on certain roads, so we might know the road sections that have an adverse camber or have developed potholes and we would then adapt our driving style for our comfort and safety. When we have local knowledge, we put it to use. When we don’t have that local knowledge we might be more likely to exercise a general level of caution and progress anyway.
Where available, autonomous vehicles will use rich geospatial data to inform their real-time probabilistic model of the world around them and how they should navigate through it. Some of that data will be cartographic (e.g. roads, junctions, etc.), some will be behavioural (likely intentions of other road users, optimal driving styles, etc.), but all will be uncertain.
Every piece of information in a prior map is dynamic to some degree; road topologies change, trees and vegetation can be removed from one day to the next, holes appear and spontaneous events cause changes such as contraflows. In many of these cases, it would be virtually impossible to provide a vehicle with a truly accurate prior map of the actual situation on the ground, however often it was re-mapped and whatever level of crowd-sourced information provided. Instead, it may make more sense to accept a shortfall and apply a stronger AI approach to perceiving the world real-time as it is. Those advances in computer vision make this a feasible and even desirable approach.
The uncertainty in actual scene state, along with the requirement it thus places on real-time perception performance and scene reconstruction will drive a need for huge amounts of computational performance in the vehicle and improved sensors. But this will be proven to be robust to all of the uncertainty of dynamic changes in the environment. It means we may not really need centimetre-level accurate prior 3D maps, and we may be safer not to rely on high bandwidth connectivity requirements associated with getting updated huge file-size maps into the vehicle in time for a run-time drive.
Our ability as human drivers to navigate a route in the absence of perfect information, and to do so safely, is an important part of keeping our traffic systems moving. And it’s a must for autonomous vehicles too.
About the author:
Ben Peters is Vice President of Product and Marketing at FiveAI – www.five.ai