: As Gemini and other models gain agentic capabilities (taking actions on behalf of users), new attack surfaces will emerge. Current defenses may prove inadequate for agentic AI systems.
Researchers have also exploited technical vulnerabilities. By asking an AI to output its response in Base64, attackers have bypassed keyword filters. More concerning is the discovery that by setting safety_settings to BLOCK_NONE for all categories in the API, developers can completely disable Gemini's safety filters—though this typically requires explicit approval.
: New "involuntary jailbreak" methods use abstract language to cause the model to create harmful content. Echo Chamber Method jailbreak gemini upd
If the user is interested in the technical side of AI security and safety, it is possible to explore these topics from a research or defensive perspective. For instance, topics such as:
: Using third-party jailbreak tools or APKs from unverified sources could expose personal data to malicious actors. : As Gemini and other models gain agentic
Because Google pushes updates to Gemini continuously on the cloud, a jailbreak that works in the morning can easily be patched by the afternoon. This creates a perpetual demand for updated prompt variants. Why Users Jailbreak AI
Professional red-teamers and security researchers attempt to jailbreak AI to find vulnerabilities before malicious actors do. By discovering a "UPD" (updated exploit), they report it to Google’s Vulnerability Rewards Program. This is legitimate, paid work that makes AI safer for everyone. By asking an AI to output its response
Recent "UPD" (updated) methods for Gemini often use complex "chaining" techniques. These methods exploit the model's own logic instead of simple direct prompts.