Security
Headlines
HeadlinesLatestCVEs

Headline

Private Internet Search Is Still Finding Its Way

The quest to keep data private while still being able to search may soon be within reach, with different companies charting their own paths.

DARKReading
#sql#mac#git#auth#mongo#ssl

Source: Mick House via Alamy Stock Photo

A truly private Internet search — where databases can be queried while keeping search terms and results private — remains a work-in-progress as companies try to balance speed and security.

Companies developing private search technologies focus on making static data more usable through encryption or secure enclaves, where no data is revealed or leaked in the process of querying, retrieval, and transit. Such a technology would function like traditional search where the search engine cannot read the query or use the search results to serve up ads.

“Private Internet search is a sort of holy grail, in a sense,” says Vinod Vaikuntanathan, a professor of computer science at MIT and the chief cryptographer at Duality Technologies, which is building its own secure search technology.

MongoDB & Queryable Encryption

Customers want to control their data, and are looking at more secure ways to incorporate tools such as search, which is especially important in regulatory environments, says Kenn White, security principal at MongoDB.

“A lot of European customers are concerned about GDPR. We have got a lot of banks and investment banks that care about compliance, ISO, and PCI, but they really care about risk … they are really focused on breaches,” White says.

The latest version of MongoDB, version 7.0, which was released last year, introduces a secure search technology called queryable encryption, which White says “is enhanced so you can do an exact match.”

The previous version of MongoDB, 6.0, had a technology called field encryption, in which critical information such as credit card or Social Security numbers were encrypted. An encrypted search query is sent to the encrypted database, and a secure response is sent back. No logs were maintained or plaintext data exposed, and hackers would not have access to encrypted data.

The newer MongoDB 7.0 has made the secure search capabilities more flexible, which is important for searches for more targeted information, such as anonymized financial data or electronic health records.

“We’re now enhancing that so that you can do things like encrypted range searches,” White says. “You will be able to do prefix and suffix or any text field that contains a certain word but again, where the database is still completely encrypted. It has no idea what you are asking for.”

Fortanix & Generative AI

In another approach, Fortanix is introducing secure search offerings for searches via generative AI. Fortanix is protecting the AI query prompts, the context, and the augmented retrieval process where companies may use private and public data built into a large language model, says Richard Searle, vice president of confidential computing at Fortanix.

Private AI search is different from conventional search; it retrieves data from constantly learning systems known as vector databases, which is built on relationships between data. There are many considerations in encrypting and securing data compared to traditional search, which extracts data from static databases.

Fortanix’s technology is based on confidential computing, which is a hardware-based secure enclave where data is transported for processing. The technology is based on a zero-trust architecture rooted in the hardware, which only grants permission to access the information to validated applications.

For example, Fortanix is working with providers to validate AI models within a secure enclave. The partners will determine whether that model is safe to deploy before executing or exchanging data with it.

“That’s particularly relevant where you are taking an open-source model, maybe from a GitHub repository, and there’s the potential that it has embedded malware,” Searle says.

Fortanix also has plans for a product featuring confidential data collaborations, in which customers can anonymize data to be deployed in secure enclaves. Third parties can use applications within the secure enclave without accessing underlying information. The data is decrypted in the secure enclave, processed, encrypted, and transported out, which makes exfiltration difficult. The customers control cryptographic keys.

“That can be used by an application that is consuming that data either to train a model, or just a standard SQL search, or maybe some analytics,” Searle says. “We provide the orchestration for that workload, using an intuitive templated workflow.”

Duality & Lattice-Based Encryption

Duality is building its own security layer based on a lattice-based encryption scheme. As Vaikuntanathan explains, the technology involves putting encrypted data in a box, which is then sent to the database owner. The database owner breaks it down into smaller boxes of 1s (which implies a match) and 0s (which means not a match), then uses complex mathematics to repackage the response into an encrypted box, which can then be decrypted by a user.

“If you think about the database as being a bunch of numbers, what I’m doing is actually selecting the right row in the database. Of course, I do not know what I am doing in this whole process — I only had the encrypted query. And when I finish this process, I have a box which contains the result, encrypt it, send it back to you,” says Vaikuntanathan.

Duality’s box is transported via TLS, but the lattice approach suits search better because it allows for computation on encrypted data. The technology has a performance advantage over the widely used AES, which requires data to be decrypted before running search queries.

Many Paths, One Destination

Private search is not just about encryption or data privacy algorithms, though; it is more about how the data is processed and where it is exposed during the computation for search queries, says Alex Matrosov, CEO of Binarly.

The challenge will be to prove that the search is truly private. This proof can be difficult with the complexity of the modern computing stack, which includes CPUs, GPUs, and memory, Matrosov says.

“The question of the private Internet search is complicated because even if you try to guarantee that in theory and prove on the paper, the real implementations will be where all the failures will happen,” Matrosov says.

About the Author(s)

Agam Shah has covered enterprise IT for more than a decade. Outside of machine learning, hardware, and chips, he’s also interested in martial arts and Russia.

DARKReading: Latest News

Faux ChatGPT, Claude API Packages Deliver JarkaStealer