Skip to content

Python/JS/Ruby: Ignore common words (like certain) as sensitive data source#9649

Merged
RasmusWL merged 6 commits intogithub:mainfrom
RasmusWL:certificate-modeling
Jun 23, 2022
Merged

Python/JS/Ruby: Ignore common words (like certain) as sensitive data source#9649
RasmusWL merged 6 commits intogithub:mainfrom
RasmusWL:certificate-modeling

Conversation

@RasmusWL
Copy link
Member

Fixes #9632

I tested our old regexes against /usr/share/dict/words, and we had quite a few matches that were nonsens, like concert for a certificate, or secretary as a secret 😬

The change from the few allow-list words I've added can be seen below.

(also introduced a small fix for snake_case support)

If you review commit by commit, you can see the fixes being applied to the tests.

-Secretariat
-Secretary
 account
-accountability
-accountable
 accountancy
-accountant
-accountants
 accounted
 accounting
 accounts
-ascertain
-ascertainable
-ascertained
-ascertaining
-ascertains
-certain
-certainly
-certainties
-certainty
 certifiable
 certificate
 certificated
 certificates
 certificating
 certification
 certifications
 certified
 certifies
 certify
 certifying
 certitude
-concert
-concerted
-concerti
-concertina
-concertinaed
-concertinaing
-concertinas
-concerting
-concertmaster
-concertmasters
-concerto
-concertos
-concerts
-disconcert
-disconcerted
-disconcerting
-disconcerts
 entrusted
 intrusted
 password
 passwords
 secret
-secretarial
-secretariat
-secretariats
-secretaries
-secretary
 secrete
 secreted
 secretes
 secreting
 secretion
 secretions
 secretive
 secretively
 secretiveness
 secretly
 secrets
 trusted
-unaccountable
-unaccountably
-uncertain
-uncertainly
-uncertainties
-uncertainty
-undersecretaries
-undersecretary
 username
 usernames

Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - happy to merge once the DCA runs come back with no surprises :-)

@RasmusWL
Copy link
Member Author

experiments looks good, so will merge 🚢

@RasmusWL RasmusWL merged commit 3248f7b into github:main Jun 23, 2022
@RasmusWL RasmusWL deleted the certificate-modeling branch June 23, 2022 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LGTM.com - false positive - Python - Sensitive data (certificate)

3 participants