Jailbreak for Gemini: How to hack large language model from Google?

14 Mar 2024 8:06 am GMT+0000 Date Time

in recent report Hiddenlayer revealed a number of vulnerabilities in large language Google Gemini models. These vulnerabilities pose a very real threat to security and affect both Gemini Advanced users in Google Workspace, and companies using the API of this language model.

The first vulnerability is associated with the possibility of bypassing protective mechanisms for leakage of systemic tips, which can allow the models to generate harmful content or perform indirect attacks by injections. This becomes possible thanks to the vulnerability of models to the so -called attack by synonyms, which allows you to bypass protection and restrictions on content.

The second type of vulnerabilities concerns the use of complex Jailbreaking techniques in order to force Gemini models to generate misinformation on such topics, for example, elections, or spread potentially illegal and dangerous information.

The third vulnerability can lead to the fact that Gemini will merge confidential information in a system hint if you transfer a series of unusual tokens as an input.

The study also mentions a method using Gemini Advanced and a specially prepared Google document, which allows you to circumvent the model instructions and perform malicious actions.

Google in response said that Red Teaming regularly conducts and trains its models to protect against hostile actions, such as tips, jailbreaking and more complex attacks. It is also reported that the company introduce restrictions on the answers to the requests related to the elections in precaution.

The disclosure of these vulnerabilities emphasizes the need to constant testing of models on attacks using tips, attacks with data extracting, manipulating, hostile examples, data poisoning and exfiltration.

Experts noted that such vulnerabilities are by no means something new and are present in many other AI models. Given this, all players of the AI industry should show as much vigilance and caution when learning and setting up their language models.

/Reports, release notes, official announcements.