Headline

Google Unveils RETVec - Gmail's New Defense Against Spam and Malicious Emails

Google has revealed a new multilingual text vectorizer called RETVec (short for Resilient and Efficient Text Vectorizer) to help detect potentially harmful content such as spam and malicious emails in Gmail. “RETVec is trained to be resilient against character-level manipulations including insertion, deletion, typos, homoglyphs, LEET substitution, and more,” according to the project’s

1 year ago

The Hacker News

Open in Source

#web #mac #google #git #The Hacker News

Machine Learning / Email Security

Google has revealed a new multilingual text vectorizer called RETVec (short for Resilient and Efficient Text Vectorizer) to help detect potentially harmful content such as spam and malicious emails in Gmail.

“RETVec is trained to be resilient against character-level manipulations including insertion, deletion, typos, homoglyphs, LEET substitution, and more,” according to the project’s description on GitHub.

“The RETVec model is trained on top of a novel character encoder which can encode all UTF-8 characters and words efficiently.”

While huge platforms like Gmail and YouTube rely on text classification models to spot phishing attacks, inappropriate comments, and scams, threat actors are known to devise counter-strategies to bypass these defense measures.

They have been observed resorting to adversarial text manipulations, which range from the use of homoglyphs to keyword stuffing to invisible characters.

RETVec, which works on over 100 languages out-of-the-box, aims to help build more resilient and efficient server-side and on-device text classifiers, while also being more robust and efficient.

Vectorization is a methodology in natural language processing (NLP) to map words or phrases from vocabulary to a corresponding numerical representation in order to perform further analysis, such as sentiment analysis, text classification, and named entity recognition.

“Due to its novel architecture, RETVec works out-of-the-box on every language and all UTF-8 characters without the need for text preprocessing, making it the ideal candidate for on-device, web, and large-scale text classification deployments,” Google’s Elie Bursztein and Marina Zhang noted.

The tech giant said the integration of the vectorizer to Gmail improved the spam detection rate over the baseline by 38% and reduced the false positive rate by 19.4%. It also lowered the Tensor Processing Unit (TPU) usage of the model by 83%.

“Models trained with RETVec exhibit faster inference speed due to its compact representation. Having smaller models reduces computational costs and decreases latency, which is critical for large-scale applications and on-device models,” Bursztein and Zhang added.

Found this article interesting? Follow us on Twitter  and LinkedIn to read more exclusive content we post.

The Hacker News: Latest News

FBI Warns of Scattered Spider's Expanding Attacks on Airlines Using Social Engineering

20 hours ago

The Hacker News

GIFTEDCROOK Malware Evolves: From Browser Stealer to Intelligence-Gathering Tool

22 hours ago

The Hacker News

Facebook’s New AI Tool Asks to Upload Your Photos for Story Ideas, Sparking Privacy Concerns

23 hours ago

The Hacker News

Over 1,000 SOHO Devices Hacked in China-linked LapDogs Cyber Espionage Campaign

1 day ago

The Hacker News

PUBLOAD and Pubshell Malware Used in Mustang Panda's Tibet-Specific Attack

1 day ago

The Hacker News