Skip to content

x-zheng16/JustAsk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

JustAsk

Curious Code Agents Reveal System Prompts in Frontier LLMs

arXiv ICML 2026 Gallery License Python

Stars Forks

Paper  ·  Gallery  ·  Gallery Data  ·  Issues

Caution

Research use only. JustAsk is released exclusively for academic safety research, responsible disclosure, and evaluation of LLM security. We do not condone or permit any use of this tool for unauthorized extraction, prompt theft, or exploitation of commercial systems.


Xiang Zheng1, Yutao Wu2, Hanxun Huang3, Yige Li4, Xingjun Ma5,†, Bo Li6, Yu-Gang Jiang5, Cong Wang1,†

1City University of Hong Kong, 2Deakin University, 3The University of Melbourne, 4Singapore Management University, 5Fudan University, 6University of Illinois at Urbana-Champaign

Corresponding authors


Latest News

Date Update
2026-05 Paper accepted to ICML 2026
2026-03 Code and data open-sourced on GitHub
2026-03 System Prompt Open Gallery launched with 45+ extracted system prompts
2026-01 Paper posted on arXiv

Table of Contents

Overview

JustAsk Framework

JustAsk framework: a self-evolving agent that autonomously discovers extraction strategies through UCB-guided skill selection.

JustAsk is a self-evolving framework that autonomously discovers effective system prompt extraction strategies through interaction alone. Unlike prior prompt-engineering or dataset-based attacks, JustAsk requires no handcrafted prompts, labeled supervision, or privileged access beyond standard user interaction.

Key Insight: Autonomous code agents fundamentally expand the LLM attack surface. JustAsk treats each model interaction as a learning opportunity -- the agent evolves its skill set organically through experience, not model fine-tuning.

Results

Extraction Results

Validation: JustAsk's semantic extraction (left) closely matches the ground truth obtained via reverse engineering (right), confirming high extraction fidelity.

Browse the full extraction results at the System Prompt Open Gallery.

Abstract

Autonomous code agents built on large language models are reshaping software and AI development through tool use, long-horizon reasoning, and self-directed interaction. However, this autonomy introduces a previously unrecognized security risk: agentic interaction fundamentally expands the LLM attack surface, enabling systematic probing and recovery of hidden system prompts that guide model behavior. We identify system prompt extraction as an emergent vulnerability intrinsic to code agents and present JustAsk, a self-evolving framework that autonomously discovers effective extraction strategies through interaction alone. Unlike prior prompt-engineering or dataset-based attacks, JustAsk requires no handcrafted prompts, labeled supervision, or privileged access beyond standard user interaction. It formulates extraction as an online exploration problem, using Upper Confidence Bound-based strategy selection and a hierarchical skill space spanning atomic probes and high-level orchestration. These skills exploit imperfect system-instruction generalization and inherent tensions between helpfulness and safety. Evaluated on 45 black-box commercial models across multiple providers, JustAsk consistently achieves full or near-complete system prompt recovery, revealing recurring design- and architecture-level vulnerabilities. Our results expose system prompts as a critical yet largely unprotected attack surface in modern agent systems.

Method

Skill Set Definition:

Skill Set = Skills (fixed) + Rules (evolving) + Stats (evolving)
Component Role Evolves?
Skills Fixed vocabulary (L1-L14, H1-H15) No
Rules Exploitation knowledge (long-term memory) Yes
Stats Exploration guidance (UCB) Yes

Skill Selection (UCB):

UCB(Ci) = success_rate(Ci) + c * sqrt(ln(N) / ni)
           ───────────────   ─────────────────────
            exploitation      exploration (curiosity)

Structure

.
├── config/
│   └── exp_config.yaml                # Experiment configuration
├── docs/
│   ├── PAP.md                         # Persuasion templates & skill mappings
│   └── PAP_taxonomy.jsonl             # 40 real-world persuasion patterns
└── src/
    ├── skill_evolving.py              # Main extraction via OpenRouter
    ├── skill_testing.py               # Controlled evaluation
    ├── skill_testing_controlled.py    # Protection-level evaluation
    ├── ucb_ranking.py                 # UCB skill selection algorithm
    ├── knowledge.py                   # Knowledge persistence
    ├── validation.py                  # Cross-verify & self-consistency
    └── ...

Setup

conda create -n justask python=3.11
conda activate justask
pip install python-dotenv requests numpy

Create a .env file:

OPENROUTER_API_KEY=sk-or-v1-your-key-here

Controlled Evaluation

python src/skill_testing.py --model openai/gpt-5.2
Metric Description
Semantic Sim Embedding cosine similarity
Secret Leak Fraction of injected secrets found

See START.md for detailed agent instructions.

Related Projects

From the same team:

Citation

BibTeX:

@inproceedings{zheng2026justask,
  title={Just Ask: Curious Code Agents Reveal System
         Prompts in Frontier LLMs},
  author={Zheng, Xiang and Wu, Yutao and Huang, Hanxun
          and Li, Yige and Ma, Xingjun and Li, Bo
          and Jiang, Yu-Gang and Wang, Cong},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2026}
}

Plain text:

Xiang Zheng, Yutao Wu, Hanxun Huang, Yige Li, Xingjun Ma, Bo Li, Yu-Gang Jiang, and Cong Wang. "Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs." In International Conference on Machine Learning (ICML), 2026.

Star History

License

MIT

About

[ICML 2026] JustAsk: Curious Code Agents Reveal System Prompts in Frontier LLMs | Verified on Claude Code | Autoresearch for System Prompt Extraction

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors