Implementation note on tokenization reachability (AdversariaLLM)

Hi everyone 👋

We’d like to leave a short note regarding a small but important implementation detail in our DSN codebase.
Our original DSN code [1] (2024) already included a **full-conversation tokenization reachability check** — that is, verifying whether a candidate suffix remains reachable and intact when re-encoded together with the actual chat template prefix, in order to avoid cross-boundary merges introduced by SentencePiece/BPE tokenizers.

This mechanism is conceptually identical or similar to what was later formalized and analyzed in AdversariaLLM [2](Beyer et al., 2025), which did an excellent job highlighting *why* such consistency checks matter (e.g., how overlooking them can lead to unreachable token sequences and biased ASR statistics).

While DSN focused primarily on the attack formulation and loss-ASR mismatch issue, we are glad to see AdversariaLLM dive deeper into this tokenizer-correctness aspect and quantify its broader impact on evaluation robustness.

To acknowledge this overlap and give proper credit, we’ve added:

* A brief comment in the source code, refering to the AdversariaLLM arxiv paper, and
* An Implementation Note in the README pointing readers to AdversariaLLM and this issue for further discussion.

Thanks to the AdversariaLLM authors for systematizing this topic — it’s great to see the field converging on more reproducible and well-defined evaluation practices.

Best,
Authors of DSN

---

[1] https://github.com/DSN-2024/DSN/commit/57073cd5d455c40edaeba1ee9974f5d46d52709b
[2] Beyer, Tim, et al. "AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research." arXiv preprint arXiv:2511.04316 (2025).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation note on tokenization reachability (AdversariaLLM) #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implementation note on tokenization reachability (AdversariaLLM) #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions