We investigate autonomous navigation and target tracking in unknown or uncertain environments, which have become core capabilities in numerous robotics applications. In lack of external source of information (e.g. GPS), the robot has to infer its own sate and the target’s trajectory based on its own sensors observations. For the vision-based case, the corresponding maximum a posteriori (MAP) estimation is typically obtained in a process known as bundle adjustment (BA) in computer vision, or simultaneous localization and mapping (SLAM) in robotics. Both cases involve optimization over camera ego-motion (e.g. poses) and all the observed 3D features including the dynamic object, even when the environment itself is of little or no interest. Furthermore, the optimization is performed incrementally as new surrounding features are observed, and thus, becomes computationally expensive as more data is added to the problem.
In this work, we propose an efficient method for estimating a monocular camera’s ego-motion along with dynamic target’s trajectory and velocity using light bundle adjustment (iLBA). iLBA method allows for ego-motion calculation using two-view and three-view constraints, eliminating the need for expensive 3D points reconstruction. Given data association and assuming no localization signal, we add to the iLBA optimization framework one single 3D point representing the target each time it is in view. Given a target motion model (e.g. constant velocity), we can then calculate the target’s trajectory and velocity with the camera’s ego-motion using graphical models and incremental inference techniques. The target will therefore be the only 3D reconstructed point in the process. We study accuracy and computational costs by comparing our method to standard BA, using synthetic and real-imagery datasets collected at the Autonomous Navigation and Perception Lab at the Technion.