Abstract:
Due to the necessity of verifying the robustness of natural language processing models against adversarial attacks in real-world application scenarios, black-box adversarial attack techniques under the hard-label setting have garnered increasing attention. However, due to the discrete nature of textual data, the limited information feedback from the victim model, and the constraints on the number of queries imposed by practical applications, existing hard-label adversarial attack methods usually suffer from excessive queries to the victim model and low semantic consistency of generated adversarial texts, rendering them inadequate for real-world applications. To this end, an efficient hard label adversarial attack method is proposed. In this method, an attention mechanism is introduced in the initialization stage of the adversarial text, while in the adversarial text semantic optimization stage, two strategies are proposed: the semantic clustering-based synonym search and the semantic gradient-based dynamic expansion synonym search. Experimental results demonstrate that the proposed method can efficiently generate high-quality adversarial text with high semantic consistency and natural fluency with a small number of queries.