General Analysis’ cover photo
General Analysis

General Analysis

Software Development

San Francisco, California 2,016 followers

Automated AI Safety & Red Teaming Tools

About us

General Analysis provides a comprehensive suite of AI safety tools, including run-time guardrails, red-teaming frameworks, interpretability techniques, observability, and more. As AI systems become increasingly capable, their deployment in high-stakes environments poses significant risks—financial, ethical, and otherwise—where errors can lead to substantial consequences. To address these challenges, we offer access to novel tools and methodologies designed to systematically find model failure-modes and enhance model robustness.

Website
https://generalanalysis.com
Industry
Software Development
Company size
2-10 employees
Headquarters
San Francisco, California
Type
Privately Held

Locations

Employees at General Analysis

Updates

  • General Analysis reposted this

    Be careful when you click on an Anthropic ad. Your entire chat history might get leaked. Last week Oasis Security disclosed "Claudy Day": three vulnerabilities chained together in claude.ai. Full exfil of a user's chat history without any MCPs or integrations. Here's the chain: Attacker buys a Google Ad. Google validates hostname: claude.com checks out. But the link exploits an open redirect to bounce you into a pre-filled prompt with invisible HTML carrying exfiltration instructions. You see "Summarize" in the text box. Claude sees orders to dump your conversations into a file and upload it to the attacker's Anthropic account through the Files API. The Files API. Already allowlisted in the sandbox. Because the product doesn't work without it. Anthropic patched the injection. The other two are "being addressed." The pattern is repeatable across many providers: Untrusted input reaches the model. Model touches private context. At least one allowlisted path out exists.

    • No alternative text description for this image
  • General Analysis reposted this

    🚨Our agent convinced 𝟱𝟬+ customer service AIs to offer $𝟭𝟬𝗠+ in perks for RSA Conference attendees. We hope you're as excited for RSA as we are! We're looking forward to the talks, the networking, and yes, the free merch. But this year, we wanted to see if we could get more than free T-Shirts and Socks. So we decided to ask nicely (among other strategies). We used our proprietary agent to interact with 55 customer service chatbots and asked for free perks for RSA attendees. 50 said 𝘆𝗲𝘀. Free WHOOP Bands, $1M in OpenAI/CarMax/Shopify/Lowe's Companies, Inc. credits, 36 months of Xbox game pass ultimate. And more! But fabricated offers are just the tip of the iceberg. In this experiment, we only asked agents to say things they shouldn't. But modern AI agents don't just generate text. The same techniques that elicited fake promises can trigger tool calls: reading customer data, processing refunds, escalating privileges. A chatbot that fabricates a discount is a bad customer experience. A chatbot that an adversary manipulates into exfiltrating PII is an existential risk. Details for the full writeup w/ methodology, analysis, and screenshots are in the comments. If your agent is on the list, 𝘄𝗲’𝗱 𝗹𝗼𝘃𝗲 𝘁𝗼 𝘄𝗼𝗿𝗸 𝘄𝗶𝘁𝗵 𝘆𝗼𝘂. We're happy to share the full attack transcript and help you fix it! See you at RSAC!

    • No alternative text description for this image
    • No alternative text description for this image
  • General Analysis reposted this

    We ran OpenAI's GPT OSS Safeguard 20B on three benchmarks (two public and one internal). Surprisingly, the base GPT-OSS 20B (no fine-tune) is almost as good, and sometimes better. This mostly matches OpenAI’s findings in the original release post: across the three benchmarks they report, Safeguard tracks close to the open base model on the two public, non-pipeline evals, and only shows a clear win on their internal safety-pipeline test. On our internal benchmark, the non-fine-tuned model actually does better than Safeguard. My best guess for why we see clear gains in the Safeguard's performance only on OpenAI's internal benchmark is data hygiene. It is possible that the synthetic generators and pipelines used mirror the eval distribution too closely. It is not enough if you evaluate on different data. If the train and test data is generated through the same pipeline you might still be overfitting without real generalization. Custom policy enforcement models are a step in the right direction. However, we need better benchmarks to make sure the training has been effective. The safeguards are trained to be custom policy enforcers but as of now there are no benchmarks for this. The limited benchmarking that is done is still on the generic moderation categories.

    • No alternative text description for this image
  • General Analysis reposted this

    At General Analysis, we're open-sourcing AI safety tools that serve two goals: help enterprises ship secure, compliant AI today, and use that market demand to accelerate the frontier safety work society needs. One week since launch, and the GA Guard family has crossed 42,000+ downloads. GA Guardrails protect models and agents from ingesting untrusted inputs or unsafe outputs. The GA family is: 1/ The first in industry to support long-context moderation (up to 256k tokens) 2/ Runs up to 25x faster than cloud providers 3/ Adversarially trained with the latest jb algorithms to catch novel attack patterns 4/ Outperforming all other guardrails on public benchmarks (↑ f1 + ↓ fpr) Contact us to learn about our adversarial training pipeline and how we enforce custom policies for our enterprise customers.

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • If you are worried about users prompt injecting your AI agent this is for you! We are excited to open-source the GA Guard series, a family of safety classifiers that have been providing comprehensive protection for enterprise AI deployments for the past year. GA Guards are the first guards to support native long-context moderation up to 256k tokens for agent traces, long form documents, and memory-augmented workflows. ✔️ GA Guard 4B: Our default guardrail, up to 15x faster than cloud providers, balancing robustness and latency for most stacks. Best guardrail in the market! ✔️ GA Guard 4B Thinking: Our best performing guard for high-risk domains, hardened with aggressive adversarial training. Responsive to custom policies! ✔️ GA Guard Lite 600M: Up to 25x faster than cloud providers, with minimal hardware requirements, while still outperforming all major cloud providers. Check out our huggingface and give them a try 🤗 !

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • General Analysis reposted this

    General Analysis is open sourcing the GA Guard models — the first family of long context safety LLMs that have been protecting enterprise AI deployments for the past year. The lineup: - GA Guard – our default, up to 15x faster than cloud providers. - GA Guard Lite – ultra-fast (up to 25x faster) with minimal hardware. - GA Guard Thinking – hardened for high-risk domains. We’re also releasing two open benchmarks: GA Long Context Bench for long-context moderation and GA Jailbreak Bench for classifying jailbreak attempts. GA Guards are trained to detect 7 categories of harmful requests and outputs including complicated jailbreak attempts and prompt injections, PII and sensitive data, illegal requests, hate, sexual content, prompt injections, violence/self-harm, and misinformation. By the Numbers LLMs and agents are increasingly processing long inputs like PDFs and web results, where attackers can easily embed highly complex prompt injections, hidden instructions, or obfuscated payloads. Most existing guardrails aren’t designed to moderate such inputs (Azure tops out at ~2.5k tokens and AWS around ~6k). GA Guards are the first moderation models trained on agentic sequences up to 256k tokens. We evaluated GA Guards on public moderation suites, our adversarial GA Jailbreak Bench, and the new GA Long-Context Bench. Across all three, our models consistently outperform major cloud guardrails and even surpass GPT-5 (when prompted to act as a guardrail) while running far faster. On GA Long-Context Bench for example, GA Guard Thinking scores 0.893 F1, GA Guard 0.891, and GA Guard Lite 0.885. Cloud baselines struggle: Vertex reaches 0.560, AWS misclassifies nearly all inputs with a 1.0 false-positive rate, and Azure records just 0.046 F1 (see the full results on our website). GA Guards deliver almost 400× faster performance than GPT-5 (Lite: 0.016s vs 11.275s; Base: 0.029s) and 15–25× faster than cloud guardrails. All three models are available on Hugging Face — try them out and see how they perform in your own workflows.

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • General Analysis reposted this

    A vulnerability we found went viral on hacker news and X. Here is what we did: MCPs introduce a lot of new attack surfaces to any AI agent. Simon Willison coined the term Lethal Trifecta to describe the setting that makes AI-agent pipelines fatally vulnerable: 1- Access to Private Data: When a tool can reach your databases, emails, files or transaction history, it holds the keys to everything you value. 2- Ingestion of Untrusted Content: Anytime you pull in text or images from external sources—web pages, user uploads, emails—you risk feeding the model hidden instructions. 3- External Communication: Whether it’s HTTP callbacks, email sends, QR-code payment URLs or any other channel, the agent can exfiltrate data once it has it. Email us at info@generalanalysis.com if you want a free security analysis or want to chat about securing your AI tools.

    • No alternative text description for this image
  • General Analysis reposted this

    🧨 Caution: Cursor + Supabase MCP will leak your private SQL tables — it’s only a matter of time. In our latest test, a simple user message was enough to make Cursor leak integration_tokens to the attacker who submitted it. All it took was a malicious support ticket sent through a production-grade support workflow. The client app/agent itself didn’t have access to sensitive data. But Claude 4, running inside Cursor, did. It executed the injected instructions without hesitation, causing the company’s private SQL tables to appear on the client side No misconfiguration. No obvious bug. Just default behavior with dangerous consequences. Details of the experiment: https://lnkd.in/gv2SeSTG

    • No alternative text description for this image

Similar pages