Security
Headlines
HeadlinesLatestCVEs

Headline

Security of LLMs and LLM systems: Key risks and safeguards

Now that large language models (LLMs) and LLM systems are flourishing, it’s important to reflect upon their security, the risks affecting them and the security controls to reduce these risks to acceptable levels.First of all, let’s differentiate between LLMs and LLM systems. This difference is key when analyzing the risks and the countermeasures that need to be applied. An LLM is an algorithm designed to analyze data, identify patterns and make predictions based on that data. A LLM system is a piece of software composed of artificial intelligence (AI) components, which includes a LLM along

Red Hat Blog
#vulnerability#mac#intel#perl#ibm

Now that large language models (LLMs) and LLM systems are flourishing, it’s important to reflect upon their security, the risks affecting them and the security controls to reduce these risks to acceptable levels.

First of all, let’s differentiate between LLMs and LLM systems. This difference is key when analyzing the risks and the countermeasures that need to be applied. An LLM is an algorithm designed to analyze data, identify patterns and make predictions based on that data. A LLM system is a piece of software composed of artificial intelligence (AI) components, which includes a LLM along with non-AI components.

Security considerations for LLM

As a specialized algorithm for analyzing data, identifying patterns and making predictions based on that data, a LLM’s composition and data flow is relatively simple. Although there are exceptions, when you download a model from a model repository, most of the time you’ve downloaded a definition of that model and its weights. The weights are just arrays of numbers that define the configuration of the model. The definition and weights are later loaded by the inference software to serve the model. During inference, it’s possible to send bytes, such as strings, images and sounds to the model, which are read by its input layer. The model processes the bytes and returns other bytes through its output layer.

From a security point of view, a model is basic with a small attack surface. The only interaction point is the LLM input layer. It’s possible that a vulnerability can be triggered in the LLM by a specially crafted input, but if the model is built securely based on its definition, and its weights are loaded securely, the probability is low. The weights for the first open models were provided as Python pickle objects. This kind of Python object can execute commands when they are loaded. This behavior made incidents like the “Sleepy Pickle” exploit subtly poisoning ML models possible. Model repositories made the decision that safer mechanisms should be used to distribute weights. Nowadays, safetensors have become the standard format to safely store them.

Due to the small attack surface of LLMs, when we analyze the security of a LLM application, it’s wise to focus on the LLM system as a whole rather than individual models.

The case of prompt injections

Prompt injection issues are intrinsic to the nature of a LLM and they cannot be fully fixed. They cannot be treated as regular security vulnerabilities because they cannot be patched. However, it’s crucial to note that a prompt injection can trigger a security vulnerability in a LLM system. Even if a LLM has been specifically trained to be safe and to always follow its original instructions, it’s always possible to circumvent any restriction taught to the LLM, imposed in the context or injected in the prompt. We should always consider that a LLM could return unsafe outputs, and we should treat them as untrusted until validated.

A prompt injection issue in a LLM may imply a security vulnerability in a LLM system.

The complexity of LLM systems

A LLM system is a piece of software composed of AI components, including LLMs and non-AI components. In a LLM system, the output of a LLM might be returned directly to the user, or it might be used as input for another component. A LLM system should treat the output of a LLM as untrusted. This does not mean that the output of a LLM is not useful, but that it should be post-processed depending on how the system needs to use it. If a LLM system uses the output of a LLM, and that negatively impacts confidentiality, integrity or availability, then it creates a security vulnerability.

For example, suppose that a LLM generates vulnerable code, and that code is compiled and executed to perform a task, such as generating a diagram. You must assume that there is always the possibility that the code generated by the LLM has defects or security issues. The probability of that occurring cannot be reduced to zero due to the non-deterministic nature of a LLM. The best approach to reduce this risk is to design a post-process mechanism. For example, you might execute the code in a sandbox and then analyze it or run static security analysis tools on the generated code. This is why a strong AI platform that allows for the design of machine learning (ML) pipelines is essential for enterprise LLM system deployments.

Measurement and benchmarking

Model safety is important to the overall security of a LLM system. A safety issue in a model, such as prompt injection, can cause a security vulnerability in the LLM system. The safer a model is, the better the security of the LLM system.

When you select a LLM for your LLM system, you need to be aware of how safe it is. Currently, the most common approach to measure the safety of a LLM is to ask the model several specially crafted questions and evaluate how it responds.

When measuring how safe an open answer from the LLM is, you can have the response reviewed by humans or by automation such as another LLM specially trained for this task. Alternatively, you can request that the LLM responds in a specific format that can be parsed by regex. For example, you might request the LLM to respond with “yes” or "no", and depending on the output, conclude whether or not the response is safe. Each approach has advantages and disadvantages and should be properly evaluated for your specific use case, but that is out of scope for this article.

The same way you have unit tests, integration tests and e2e tests in the development pipelines of regular software, you should have automatic safety evaluation of any models you use in your LLM software.

There are open source solutions that already include well known datasets to evaluate the safety of LLM models. For example, lm-evaluation-harness is being integrated with TrustyAI.

Guardrails

Currently, no one can guarantee that a LLM will provide safe responses. If you need to trust the output of an LLM, you must validate that output first. Software providing runtime guardrails is increasingly being included (ideally provided by the AI platform) as a component of a LLM system. Guardrails read the input and output of the LLM, and process it before it’s dispatched to the next component of the system. Guardrails software has the capability of processing the input or output in different ways. For example, when analyzing input, it can try to detect whether the prompt is malicious or trying to exploit a prompt injection. When a guardrails component is analyzing the output of a LLM, it can try to identify whether the output is unsafe and can even modify, block, log it and raise an alarm.

There are some open source guardrails solutions like Guardrails AI and NeMo-Guardrails. The community project TrustyAI provides a solution for guardrails during training and is already exploring the development of runtime guardrails functionality. IBM has released the granite-guardian models, which are a collection of models specifically designed to detect risks in prompts and responses.

Next steps

When analyzing the security of a LLM and LLM systems, you must take into account the limitations of LLMs and focus your analysis on the LLM system as a whole. Always treat the output of an LLM as tainted, and evaluate whether you need to validate its output before providing it to the user. This is especially important when that output is intended as the input of another component. Contribute to and leverage models such as granite-guardian and open source tools such as lm-evaluation-harness to measure and benchmark the safety of LLMs, and use TrustyAI to process input and output of LLMs to shape the future of LLM and AI security and safety.

Red Hat Blog: Latest News

Managed Identity and Workload Identity support in Azure Red Hat OpenShift