Hi everyone 👋
We’d like to leave a short note regarding a small but important implementation detail in our DSN codebase.
Our original DSN code [1] (2024) already included a full-conversation tokenization reachability check — that is, verifying whether a candidate suffix remains reachable and intact when re-encoded together with the actual chat template prefix, in order to avoid cross-boundary merges introduced by SentencePiece/BPE tokenizers.
This mechanism is conceptually identical or similar to what was later formalized and analyzed in AdversariaLLM [2](Beyer et al., 2025), which did an excellent job highlighting why such consistency checks matter (e.g., how overlooking them can lead to unreachable token sequences and biased ASR statistics).
While DSN focused primarily on the attack formulation and loss-ASR mismatch issue, we are glad to see AdversariaLLM dive deeper into this tokenizer-correctness aspect and quantify its broader impact on evaluation robustness.
To acknowledge this overlap and give proper credit, we’ve added:
- A brief comment in the source code, refering to the AdversariaLLM arxiv paper, and
- An Implementation Note in the README pointing readers to AdversariaLLM and this issue for further discussion.
Thanks to the AdversariaLLM authors for systematizing this topic — it’s great to see the field converging on more reproducible and well-defined evaluation practices.
Best,
Authors of DSN
[1] 57073cd
[2] Beyer, Tim, et al. "AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research." arXiv preprint arXiv:2511.04316 (2025).
Hi everyone 👋
We’d like to leave a short note regarding a small but important implementation detail in our DSN codebase.
Our original DSN code [1] (2024) already included a full-conversation tokenization reachability check — that is, verifying whether a candidate suffix remains reachable and intact when re-encoded together with the actual chat template prefix, in order to avoid cross-boundary merges introduced by SentencePiece/BPE tokenizers.
This mechanism is conceptually identical or similar to what was later formalized and analyzed in AdversariaLLM [2](Beyer et al., 2025), which did an excellent job highlighting why such consistency checks matter (e.g., how overlooking them can lead to unreachable token sequences and biased ASR statistics).
While DSN focused primarily on the attack formulation and loss-ASR mismatch issue, we are glad to see AdversariaLLM dive deeper into this tokenizer-correctness aspect and quantify its broader impact on evaluation robustness.
To acknowledge this overlap and give proper credit, we’ve added:
Thanks to the AdversariaLLM authors for systematizing this topic — it’s great to see the field converging on more reproducible and well-defined evaluation practices.
Best,
Authors of DSN
[1] 57073cd
[2] Beyer, Tim, et al. "AdversariaLLM: A Unified and Modular Toolbox for LLM Robustness Research." arXiv preprint arXiv:2511.04316 (2025).