Abstract:
Human parsing aims at identifying the body parts and clothing items from human images at pixel level. This paper investigates and analyzes the approaches of human parsing based on deep learning, which mainly includes three aspects:the basic technologies involved in human parsing, the main datasets and evaluation standard, and the existing methods. Firstly, the basic technologies involved in human parsing based on deep learning, including convolutional neural network and semantic segmentation are reviewed. Secondly, this paper introduces 8 datasets for human parsing in detail according to the number of images, the number of categories, advantages and disadvantages. In addition, four commonly used evaluation metrics are summarized. Finally, existing representative schemes for human paring based on deep learning are concerned, including feature enhancement, structure of human body, multi-task learning, and generative adversarial networks. This paper summarizes the approaches of instance-level human parsing, and presents some ideas worth studying.