Guest Post
The very best augmented reality experiences create an interactive experience that mixes the real-life environment with virtual contents. The exponential growth of this technology means that the quality of AR experiences continues to evolve and improve.
Early stage AR applications used markers to track the 6DoF pose (3DoF rotation and 3DoF position) of the camera to allow the overlay of virtual content on the real-world scene. Then Simultaneous Localization and Mapping (SLAM) enabled position tracking without pre-trained markers. SLAM brought about the ability to mix virtual objects with all aspects of the AR world to produce even more immersive and realistic experience on virtual content. Together, these two solutions continue to shape experiences in the augmented reality space.
Right now, many top-tier AR devices use 6DoF tracking to deliver the most accurate and reliable tracking experience on any scene. To do this well requires a high-performance tracking system driven by multiple fisheye cameras, TOF camera and other sensors. Unfortunately, all of these together make devices in question heavy and expensive. Even the most feature-rich devices with exceptional business applications simply will not be accepted in the marketplace if they’re uncomfortable and cost too much.
And so to succeed in the wearable AR device space, products must combine comfort, affordability and technology in the most innovative ways possible.
Let’s compare the following four popular 6DoF tracking solutions based on cost and performance:
Single Video Camera
This single video camera solution doesn’t need extra sensors exclusively for tracking. It is a low-cost solution for some use cases. Here are some of its characteristics:
- Lowest cost on hardware;
- Easy to incorporate into many designs;
- Requires more specialized optimization and customization;
- Works well for the motionless scene.
Video cameras are common components for AR glasses, used to take high-resolution photos and videos. Keeping a single camera on the glasses is relatively simple both in terms of appearance and usability. However, video cameras usually operate at low frequency (<60fps, mostly <30fps) and cannot capture high-quality images in motion. The “jello effect” and other distortions make position tracking fail. IMU sensors can improve picture quality results but simply don’t measure up to other solutions.
Monocular Fisheye Camera
The monocular fisheye solution uses one camera specifically for tracking, provides a more real-time and robust tracking. It is also characterized by:
- Sensor reserved for 6DoF tracking;
- Continuously updating 6DoF tracking thanks to high frequency;
- Some scale drifting.
Some AR glasses use a single fisheye reserved for 6DoF tracking. This solution requires some additional power consumption but often provides a better 6DoF tracking result for the device. Thanks to the high camera frame updating rate (>90fps) and global shutter feature, the SLAM keeps tracking position in the motion scenes. The single fisheye camera should be placed in front of the glasses—a design requirement that imposes some design restrictions but not enough to stray too far from the look of typical sunglasses. Where the fisheye fails is at measuring the scale of an environment. The distance measured in the SLAM map will drift and cause the virtual objects to move unexpectedly in the scene. There is still potential for a good user experience if developers design the VR-like applications not intended to have the virtual content and the real world tightly coupled.
Stereo Fisheye Camera
The stereo fisheye camera solution uses a stereo fisheye module which can better measure the environment. Some of its main characteristics are:
- High power cost;
- Good accuracy on tracking and scale measuring;
- Challenge for industrial design.
The stereo fisheye-based 6DoF tracking system has been proven to be a marketable solution by different AR/VR headset makers. Qualcomm has already demonstrated high-quality 6DoF position tracking on their VR headset using stereo fisheye vision. The increased power costs of additional cameras prove a worthwhile trade-off when it provides immediate map initialization, robust tracking and accurate measuring of the environment. Compared to the monocular solution, Stereo Fisheye can extend the scene and track much quicker. Even though the sensor number is doubled, the computation complexity is not much higher than a monocular vision system. High-quality optimization and customization work on a stereo system can make performance close to a heavier SLAM system.
SLAM Running on Edge
This solution runs all the SLAM computation on the glasses, not on the phone, tablet, or any other host devices. Its most important characteristics are:
- Highest cost;
- High quality 6DoF tracking;
- Stable performance for different platforms.
Putting the computation on Edge is getting trendy in AI related devices. HoloLens, as a state-of-art work for AR HMD, has already run its SLAM function on Edge hardware to make the CPU and the OS to work more efficient on user applications. For lightweight AR glasses, running the SLAM on Edge is not just a way to reduce the computation load but also to make it possible for the AR glasses to be compatible on different kinds of host platforms. The strongest argument for this solution is that it equalizes performance on any host platform without the need to custom-optimize algorithms. However, it is not easy for lightweight AR glasses to run 6DoF on edge. The chips on these types of AR glasses are usually designed only for driving the display and transmit sensor data without resources left for other computation work. One answer to this issue is to integrate a mature 6DoF tracking module like the Intel T265 to the board of the AR glasses.
There is no conclusion yet on which solution is “perfect” for lightweight glasses. Designers will need to continue to define the features and intended usage of their products to make the best hardware and software choices. But there is no doubt that AR glasses with 6DoF tracking function will continue to be most attractive to consumers and remain the most competitive in the future AR market.
About the Author
Zhiyu Huo, PhD in Electrical and Computer Engineering from University of Missouri, is a Research Scientist at Rokid, working on computer vision algorithm for AR.