Abstract:
In recent years, the rapid development of deep learning technology has greatly promoted the progress of virtual digital human technology, especially in the area of audio-driven digital human video generation. Research in this field has shown broad application prospects in various scenarios such as video translation, film production, and virtual assistants. The current methods and research status of audio-driven digital human video generation are sorted out and summarized in this paper, focusing on the key technologies, datasets, and evaluation strategies. In terms of key technologies, artificial intelligence technologies such as generative adversarial networks, diffusion models, and neural radiance fields have all played an important role. The scale and diversity of datasets are crucial for model training, and the improvement of evaluation strategies helps to evaluate the generation effect more objectively. The technology of audio-driven digital human video generation will continue to face numerous challenges and opportunities. It is expected that this field can continue to innovate and develop, bringing more convenience and fun to human society.