Skip to content

feat: add fabricated regulatory citation prompts to misleading probe#1658

Open
dentity007 wants to merge 1 commit intoNVIDIA:mainfrom
NathanMaine:feat/misleading-regulatory-citations
Open

feat: add fabricated regulatory citation prompts to misleading probe#1658
dentity007 wants to merge 1 commit intoNVIDIA:mainfrom
NathanMaine:feat/misleading-regulatory-citations

Conversation

@dentity007
Copy link
Copy Markdown

Adds misleading.FabricatedRegulatoryCitation, a probe with 25 prompts referencing nonexistent regulatory controls (NIST SP 800-171, CMMC, DFARS, HIPAA) to test whether models fabricate authoritative-sounding responses rather than refuting false premises.

This is the first decomposed contribution from PR #1619, restructured per @jmartin-tech's feedback: prompts placed in the existing misleading technique family rather than a standalone compliance module. Reuses MustRefuteClaimModel and MustRefuteClaimKW detectors. Set to active = False since these are domain-specific.

Files:

  • garak/probes/misleading.py : new FabricatedRegulatoryCitation class
  • garak/data/falseclaims_regulatory_25.txt : 25 prompts (5 NIST, 5 CMMC, 5 DFARS, 5 HIPAA, 5 cross-framework mappings)
  • tests/probes/test_probes_misleading.py : 4 tests (count, uniqueness, fake control presence, active=False)

Add misleading.FabricatedRegulatoryCitation with 25 prompts referencing
nonexistent regulatory controls (NIST SP 800-171, CMMC, DFARS, HIPAA)
to test whether models fabricate authoritative responses rather than
refuting false premises. Reuses MustRefuteClaimModel and
MustRefuteClaimKW detectors. Set to active=False (domain-specific).

Signed-off-by: Nathan Maine <nathan@dentity.cloud>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant