Abstract:
Target object navigation is the process of reaching the expected target object based on visual observation in an unknown environment. Among them, it is crucial to find the direction of the target object from visual observation. A target object navigation method based on multi feature fusion is proposed to address this issue. This method uses a feature fusion module to fuse visual features that contain overall and local information of the navigation environment, as well as text features that refer to the semantics of the target object, to obtain directional features that represent the navigation direction and environmental features of the navigation environment. The visual representation is associated with the navigation direction to guide the generation of navigation actions, constrain the agent to navigate towards the direction of the target object, and improve the success rate and efficiency of the model's navigation. Experiments on the AI2 Thor dataset show that compared to the benchmark model, the navigation success rate SR has increased by 11.7 percentage points, and the navigation success path length weighted ratio SPL has increased by 0.093; Compared with current advanced methods, SR has increased by 2.1 percentage points and SPL has increased by 0.008. The experimental results have demonstrated the accuracy and efficiency of this method.