Headline

New Study Uncovers Text-to-SQL Model Vulnerabilities Allowing Data Theft and DoS Attacks

A group of academics has demonstrated novel attacks that leverage Text-to-SQL models to produce malicious code that could enable adversaries to glean sensitive information and stage denial-of-service (DoS) attacks. “To better interact with users, a wide range of database applications employ AI techniques that can translate human questions into SQL queries (namely Text-to-SQL),” Xutan Peng, a

2 years ago

The Hacker News

Open in Source

#sql #vulnerability #dos #backdoor #The Hacker News

Database Security / PLM Framework

“To better interact with users, a wide range of database applications employ AI techniques that can translate human questions into SQL queries (namely Text-to-SQL),” Xutan Peng, a researcher at the University of Sheffield, told The Hacker News.

“We found that by asking some specially designed questions, crackers can fool Text-to-SQL models to produce malicious code. As such code is automatically executed on the database, the consequence can be pretty severe (e.g., data breaches and DoS attacks).”

The findings, which were validated against two commercial solutions BAIDU-UNIT and AI2sql, mark the first empirical instance where natural language processing (NLP) models have been exploited as an attack vector in the wild.

The black box attacks are analogous to SQL injection faults wherein embedding a rogue payload in the input question gets copied to the constructed SQL query, leading to unexpected results.

The specially crafted payloads, the study discovered, could be weaponized to run malicious SQL queries that could permit an attacker to modify backend databases and carry out DoS attacks against the server.

Furthermore, a second category of attacks explored the possibility of corrupting various pre-trained language models (PLMs) – models that have been trained with a large dataset while remaining agnostic to the use cases they are applied on – to trigger the generation of malicious commands based on certain triggers.

“There are many ways of planting backdoors in PLM-based frameworks by poisoning the training samples, such as making word substitutions, designing special prompts, and altering sentence styles,” the researchers explained.

The backdoor attacks on four different open source models (BART-BASE, BART-LARGE, T5-BASE, and T5-3B) using a corpus poisoned with malicious samples achieved a 100% success rate with little discernible impact on performance, making such issues difficult to detect in the real world.

As mitigations, the researchers suggest incorporating classifiers to check for suspicious strings in inputs, assessing off-the-shelf models to prevent supply chain threats, and adhering to good software engineering practices.

Found this article interesting? Follow us on Twitter  and LinkedIn to read more exclusive content we post.

The Hacker News: Latest News

A New Maturity Model for Browser Security: Closing the Last-Mile Risk

2 hours ago

The Hacker News

Google Patches Critical Zero-Day Flaw in Chrome’s V8 Engine After Active Exploitation

4 hours ago

The Hacker News

U.S. Arrests Key Facilitator in North Korean IT Worker Scheme, Seizes $7.74 Million

5 hours ago

The Hacker News

Microsoft Removes Password Management from Authenticator App Starting August 2025

9 hours ago

The Hacker News

U.S. Agencies Warn of Rising Iranian Cyberattacks on Defense, OT Networks, and Critical Infrastructure

21 hours ago

The Hacker News