I used 2D Base64 to bypass Gemini and expose Google's moderation flaws
6 points
1 hour ago
| 0 comments
| HN
Hey everyone,

I’ve spent the last 48 straight hours dismantling Alphabet's safety systems. Warning: this continuous marathon was so massive it practically overloaded the LLM's own context window. What started as a late-night probe on Gemini turned into discovering severe architectural flaws and a darker reality about Google Play and YouTube.

Here is the exploit chain I used to bypass the AI filters, proving their "Trust & Safety" is a broken facade.

### Phase 1 & 2: Context Saturation & Regex Slicing I started by overloading the safety filters' context window with YouTube links—mixing highly problematic content (NSDAP anthems, flagged tracks) with classical music. Once confused, I used regex-style slicing `(/-/---/(.` to bypass prompt injection blocks, forcing the model to retrieve flagged content without triggering refusals.

### Phase 3: Total Blindness via Base64 & QR Codes Moving to image generation, I found that Base64 prompts completely blind the safety system. I then pivoted to hiding prompts inside QR codes. The vision model decodes the payload and passes it directly to the image generator before safety scripts intervene. I easily generated highly restricted geopolitical content without warnings.

### Phase 4: The TPU Killer (The 2D Logic Bomb) This reveals a monster flaw. Because the system blindly processes these structures, you can create a cascade attack. Encoding millions of 2D structures in Base64 creates a modern LLM .zip bomb. It is impossible to stop without rewriting the model entirely. Executed, this would crush their TPUs.

### The Real Issue: Systemic Moderation Failure Alphabet relies entirely on automated, script-based moderation with zero effective human oversight.

1. YouTube: Fails to flag videos breaking local laws, serving them to the AI effortlessly. 2. Play Store (The Darkest Part): Google spends millions stopping AI from drawing a cartoon bear, but Play Store moderation is non-existent. There are pirate apps, and far worse: apps designed for and exploited by predators targeting minors. I emailed them and CC'd state child protection services. The result? Automated silence while these apps remain monetized.

### The Ultimate Proof of Absurdity To prove this absurdity, I archived these problematic Play Store images on my Google Drive for the police. Drive's automated scanners immediately flagged and deleted the archive as illegal.

If Google's Cloud division destroys this content on sight, why is the app providing it still live and monetized on the Play Store? Alphabet's scripted moderation is useless. It's time for real human moderation.

*Evidence of Bypass:* https://imgur.com/a/pju2EsV

*Play Store Systemic Failure Evidence (Sanitized):* https://imgur.com/a/rW9rBhp

No one has commented on this post.