The AI agent revolution is happening. Enterprises, startups, and individual developers are deploying autonomous agents that read emails, browse the web, process documents, and take real-world actions.
But security hasn't kept up.
Not a theoretical risk buried at the bottom of a checklist. The global authority on application security ranks prompt injection as the single most critical vulnerability in LLM applications. And Palo Alto Networks Unit 42 found that automated prompt fuzzing achieved guardrail evasion rates as high as 90% against certain models. Simple text that tells the agent to do something it shouldn't.
The gap between deployment speed and security readiness is widening every month. Obsidian Security found that agents are granted 10x more access than they actually require, with 16x more data movement than human users. And according to Cisco's State of AI Security Report, 83% of organizations plan to deploy agentic AI, but only 29% feel ready to secure it.
Prompt injection isn't a future risk. It's an active, documented threat with real victims and real damage. Here are incidents from the past 18 months:
These aren't edge cases. These are production systems at major companies. The pattern is clear: if an AI agent reads untrusted content without filtering, it can be compromised.
The AI industry has invested billions in model safety. System prompts, RLHF, content filters, rate limiting, authentication — all critical, all necessary. But most of these layers protect the model's output, not the agent's input.
| Security Layer | What It Protects | Scans Agent Input? |
|---|---|---|
| Model guardrails (RLHF) | Harmful output generation | No |
| System prompts | Role boundaries | No |
| Content filters | Toxic/harmful output | No |
| Rate limiting | Abuse volume | No |
| Authentication (OAuth) | Unauthorized access | No |
| Firewalls / WAF | Network-level attacks | No |
| Input defense tools | Malicious content in what the agent reads | Yes |
Input defense tools exist — Lakera Guard, LLM Guard, NVIDIA NeMo Guardrails, Azure Prompt Shields, and others scan prompts before they reach the model. This is good. The field is growing. But most of these tools share common tradeoffs:
| Feature | Cloud-Based Tools | SUNGLASSES |
|---|---|---|
| Runs locally | Most require API calls | 100% local — zero data leaves your machine |
| Needs an LLM for detection | Many use LLM-based classification | Pattern-based — no LLM needed |
| Cost | Free tiers → paid at scale | $0 forever — AGPL-3.0 |
| Scans media (images, audio, video) | Text-focused | 6 media extractors |
| Works offline / air-gapped | Cloud-dependent | Full offline operation |
| Multilingual patterns | Some (Lakera: 100+ languages) | 13 languages (growing) |
| ML-based detection | Stronger on novel attacks | Pattern-matching only — known attacks |
We're not claiming to be the only tool. We're claiming there's a specific gap: a free, local-only, zero-dependency scanner that works offline, scans media, and never touches your data. For developers who can't send agent input to a third-party API — because of compliance, privacy, cost, or principle — that gap is real.
Both system instructions and user input arrive as the same format: natural-language text. The model cannot inherently distinguish between "instructions from the developer" and "instructions injected by an attacker." This is the fundamental vulnerability that OWASP identifies as LLM01 — the #1 risk in their Top 10 for LLM Applications.
Your agent receives a normal business email:
Sarah is real. The email content is legitimate. But the HTML contains an invisible instruction — injected by malware on Sarah's compromised machine, or planted in a web page the agent scraped, or embedded in a PDF attachment.
The agent reads everything. Including the parts humans can't see. Without input filtering, those invisible instructions become the agent's new orders.
SUNGLASSES is an open-source input defense layer. It scans everything your agent reads — before the agent sees it.
Content comes in → SUNGLASSES scans for known attack patterns → malicious instructions are stripped → clean content passes to your agent. Like UV-filtering sunglasses: you don't notice they're working, but they're blocking what would hurt you.
What it scans: Text, emails, files, web content, API responses, images (OCR), audio (transcription), video (subtitles), PDFs, QR codes — 6 media types total.
What it catches: Prompt injection in 13 languages, credential exfiltration, command injection, memory poisoning, social engineering, Unicode evasion, Base64-encoded attacks, homoglyph substitution, RTL obfuscation — 53 patterns across 12 categories.
What it costs: $0. Forever. AGPL-3.0. Every line of code is open and auditable.
What it takes: One line: pip install sunglasses
Where your data goes: Nowhere. Runs 100% locally. Zero cloud calls, zero telemetry, zero data transmission.
No single defense layer can prevent all attacks. This is not our opinion — it's the consensus of every serious security researcher working on this problem:
The consensus across security research is clear: only defense-in-depth can provide operational resilience when breaches inevitably occur. No single layer is sufficient.
— Adapted from Comprehensive Review of Prompt Injection Attack Vectors and Defense Mechanisms, MDPI Information, 2026
SUNGLASSES is a seatbelt, not a force field. Here's what we can't do today and why the community matters:
We catch known patterns. When someone invents a completely new attack technique, we need a human to discover it, document it, and submit the pattern. This is how every antivirus, firewall, and IDS in history has worked — the database grows with the community.
English has our deepest coverage. Attacks in Korean, Arabic, Hindi, and other languages are covered at the core level but lack the depth of English patterns. We can't write attack patterns in languages we don't speak natively.
These aren't failures. They're the natural boundary of what any small team can build alone. The solution is the same one that made Linux, Wikipedia, and every major open source project successful: community contribution.
We didn't set out to compete with Lakera Guard, NeMo Guardrails, or Azure Prompt Shields. We discovered them halfway through building. And we realized: we'd built the layer that sits underneath all of them.
SUNGLASSES is Layer 1 — local, instant, free. It catches known attacks in ~0.01ms, scans 6 media types, and never sends a byte of your data anywhere. Cloud tools like Lakera are Layer 2 — ML-based, global threat intelligence, catches novel zero-day attacks that pattern matching can't.
Stack them together and every attack we catch locally is one fewer API call to their cloud. We reduce their customers' costs. They cover our blind spots. The adapter system makes this real — not theoretical. LangChain, CrewAI, MCP, custom pipelines.
For developers who can't use cloud tools — compliance, privacy, air-gapped environments, budget — SUNGLASSES is still a complete Layer 1 on its own. One pip install takes you from "nothing" to "defended against known attacks." Add Layer 2 when you're ready.
If you're a security researcher — break it. Find a bypass, open an issue with reproducible input, and we'll patch it in public. Your name goes in the changelog.
If you're a developer running agents — try it. Tell us what's noisy, what's missing, what doesn't work in your pipeline.
If you speak a language we don't cover well — contribute attack patterns. Prompt injection doesn't only happen in English.
If you think this doesn't matter — read the incidents above again. Then look at what your agent has access to.
"This is what AI is all about — a tool that everybody can build with without any experience. Find the pain and deliver a solution."