一种频率驱动的黑盒对抗攻击方法

A frequency-driven black box adversarial attack method

  • 摘要: 深入理解对抗样本的特性对保障机器学习模型安全具有重要意义。针对现有对对抗性扰动与频率成分关系认识不足的问题,对对抗性扰动在频率域中的表征进行了研究,并提出一种高效的黑盒对抗攻击方法。通过小波包分解技术对对抗样本进行多尺度频率分解,发现对抗性扰动主要集中于低频段的高频成分。为此设计了一种结合特定频段信息的黑盒对抗攻击算法,并引入归一化扰动可见性指数(NDV)以解决传统范数在评估连续和离散扰动时的局限性。在多个基准数据集和模型上的实验表明,该多频带组合攻击方法平均攻击成功率达99%,优于单一频段攻击方法,并在7项评估指标上表现出优越的综合性能。此外,验证了NDV指标能够有效克服传统 L_2 范数在扰动评估中的不足。

     

    Abstract: Enhancing the understanding of adversarial examples is crucial for ensuring the security of machine learning models in real-world applications. To address the insufficiency of existing research on the relationship between adversarial perturbations and their frequency components, this work investigates the representation of adversarial perturbations in the frequency domain and proposes an efficient black-box adversarial attack method. By applying wavelet packet decomposition to perform multi-scale frequency analysis of adversarial examples, it is found that adversarial perturbations are predominantly concentrated in the high-frequency components within low-frequency bands. Based on this observation, we design a black-box attack adversarial algorithm that incorporates specific frequency band information and introduce a normalized disturbance visibility (NDV) index to overcome the limitations of traditional norm-based metrics when evaluating both continuous and discrete perturbations. Experiments conducted on multiple benchmark datasets and models show that the proposed multi-band composite attack achieves an average success rate of 99%, significantly outperforming single-band attack approaches and demonstrating superior performance across seven evaluation metrics. Moreover, the NDV index effectively addresses the shortcomings of traditional norms, offering a more accurate and perceptually meaningful assessment of adversarial perturbations.

     

/

返回文章
返回