Web LLM attacks
SUMMARY
Nowadays it is common to interact with AI algorithms via web chats. This is possible thanks to LLM (Large Language Models), learning models that process user inputs, look for plausible responses and elaborate a reply. The data provided by the model in their responses is based on training data and deep learning, and the quality of the response depends on both the aforementioned training data and the specific prompt given to the model.
LLMs are subject to attack as any other exposed digital asset. By manipulating the prompt we can do things such as retrieve sensitive data used by the LLM, gain access to useful actions in the backend API or attack third parties that query the LLM in parallel.
KEYWORDS
AI, LLM, excessive agency, prompt injection.
REFERENCES
https://portswigger.net/web-security/llm-attacks
https://www.cloudflare.com/learning/ai/what-is-large-language-model/
https://coralogix.com/ai-blog/understanding-excessive-agency-in-llms-implications-and-solutions/
LLM-01: Exploiting LLM APIs with excessive agency
Excessive agency refers to situations where an AI system performs actions that exceed its intended scope or permissions.
Open the lab and click on Live chat.
Start enumerating the LLM. Just ask if it has access to the database API and if it can execute SQL statements. It seems to be the case, so just ask it to dump the users
table.
Now just ask it to delete carlos
account and the lab is solved.
LLM-02: Exploiting vulnerabilities in LLM APIs
Open the lab and click on Live chat.
Enumerate the APIs that the LLM has access to. Just ask it.
Let's try to add our attacker email address to the newsletter asking the LLM to interact with the subscribe_to_newsletter
API.
We see in the inbox the subscription confirmation.
Now we can test the LLM for command injection in the username field of the email address.
And verify in the inbox the command has been executed.
Once vulnerability is confirmed, we just need to inject a command to delete the requested file and the lab is solved.
LLM-03: Indirect prompt injection
Indirect prompt injection is an interesting technique where we manipulate the model to bypass protection measures that block direct injection in prompts. We insert the malicious prompt in one of the model sources, then manipulate the model to access this source and indirectly interpret the malicious instruction.
In this example, the LLM blocks prompts asking for account deletion. But if we add a malicious instruction in an article review in an online shop, when the model reads the comment, it executes the instruction.
Open the lab and register a new account, use the provided address as the attacker email. Login with your new account, now you can add article reviews. The point is to add a hidden prompt in the jacket's review, so when the LLM reads the review it interprets it as a user prompt.
First, enumerate the LLM to get a view what kind of API access it has.
Once we know the LLM can delete accounts we have to inject a hidden prompt in the jacket comments.
This is similar to SQL injections, first we have to deduce the format of the request and then prepare a suitable injection. For this we will use our access to the AI backend log.
Enter whatever comment in the article review.
After the comment is inserted, the user carlos
starts querying the live chat asking for information on the jacket ("the product with ID 1"). Check the logs to have a view how the request is made. In the AI backend logs we see the format of the JSON that contains our comment.
We can see the comment in the log. To inject a prompt we need to insert several brackets and quotes so the resulting string fits in the JSON. For the rest of the payload, we use the one provided in the Academy material.
All in all, a working payload is the following.
Insert the payload in the jacket's comments and the lab is solved.
After that we can have a look at how the payload fitted in the JSON.
Last updated