基于Transformer的多模态个性化联邦学习

Multimodal personalized federated learning based on Transformer

  • 摘要: 在当前物联网飞速发展的背景下,处理来自各种信息采集设备的多模态数据,尤其是视觉、听觉信号和文本等多元感官信息的数据,对于机器学习落地应用至关重要。Transformer架构和其衍生的大模型在自然语言处理和计算机视觉中的卓越表现推动了对复杂多模态数据处理能力的追求。然而,这也带来了数据隐私安全和满足个性化需求的挑战。为解决这些挑战,提出一种基于多模态Transformer的个性化联邦学习方法,它支持异构数据模态的联邦学习,在保护参与方数据隐私的前提下为其训练更符合其个性化需求的多模态模型。该方法显著提升了多模态个性化模型的性能:相较于对比方法,准确率提高了15%,这标志着多模态个性化联邦学习在应用场景限制上的突破。

     

    Abstract: In the context of the current rapid development of the Internet of Things, processing multi-modal data from various information collection devices, especially data from multi-sensory information such as visual, auditory signals and text, is crucial for the applications of machine learning. The outstanding performance of the Transformer architecture and its derived large models in natural language processing and computer vision has promoted the pursuit of complex multi-modal data processing capabilities. However, this also brings the challenges of data privacy security and meeting personalized needs. In order to solve these challenges, this paper proposes a personalized federated learning method based on multi-modal Transformer, which supports federated learning of heterogeneous data modalities, and its training is more consistent with its purpose while protecting the data privacy of the participants. The proposed method significantly improves the performance of the multi-modal personalized model, its accuracy is increased by 15% compared with the comparative method, which marks a breakthrough in the application scenario limitations of multi-modal personalized federated learning.

     

/

返回文章
返回