Abstract:
Multi-view convolutional neural networks (MVCNN) is more accurate and faster than those methods based on state-of-the-art 3D shape descriptors in 3D object recognition tasks. However, the input of MVCNN are views rendered from cameras at fixed positions, which is not the case of most applications. Furthermore, MVCNN uses max-pooling operation to fuse multi-view features and the information of original features may be lost. To address those two problems, a new recognition method of 3D objects based on multi-view recurrent neural networks (MVRNN) is proposed based on MVCNN with improvements on three aspects. First, a new item which is defined as the measure of discrimination is introduced into the cross-entropy loss function to enhance the discrimination of features from different objects. Second, a recurrent neural networks (RNN) is used to fuse multi-view features from free positions into a compact one, instead of the max-pooling operation in MVCNN. RNN can keep the completeness of information about appearance feature. At last, single view feature from free positon is matched with fused features via a bi-classification network to attain fine-grained recognition of 3D objects. Experiments are conducted on the open dataset ModelNet and the private dataset MV3D separately to validate the performance of MVRNN. The results show that MVRNN can exact multi-view features with higher degree of discrimination, and achieve higher accuracy than MVCNN on both datasets.