Added regex for new npm secret format#588
Conversation
| - SDK: Exposing `automationId`, `automationGuid`, and `postUri` in the | ||
| `analyze` command. | ||
| [#586](https://github.com/microsoft/sarif-pattern-matcher/pull/586) | ||
| - NFD: Adding additional regex for 'SEC101/017.NpmAuthorToken' |
There was a problem hiding this comment.
Just a typo. Thank you
| $SEC101/015.AkamaiCredentials=(?si)https:\/\/(?P<host>[\w\-\.]+)\.akamaiapis\.net.{0,150}(?:(?:client_token.{0,10}(?:[^a]|^)(?P<id>akab[\w\-]+).{0,50})|(?:access_token.{0,10}(?:[^\w\-]|^)(?P<resource>akab[\w\-]+).{0,200})|(?:(?:client_secret).{0,10}(?:[^0-9a-z\/\+]|^)(?P<secret>[0-9a-z\/\+]{43}=))){3} | ||
| $SEC101/016.StripeApiKey=(?:[^rs]|^)(?P<secret>(?:r|s)k_(?:live|test)_(?i)[0-9a-z]{24,99})(?:[^0-9a-z]|$) | ||
| $SEC101/017.NpmAuthorToken=(?i)npm.{0,100}[^0-9a-z](?-i)(?P<secret>[0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12})[^0-9a-z] | ||
| $SEC101/017.NpmAuthorTokenV2=(?:[^s]|^)(?P<secret>(?:npm)_(?i)[a-zA-Z0-9]{36}) |
| $SEC101/015.AkamaiCredentials=(?si)https:\/\/(?P<host>[\w\-\.]+)\.akamaiapis\.net.{0,150}(?:(?:client_token.{0,10}(?:[^a]|^)(?P<id>akab[\w\-]+).{0,50})|(?:access_token.{0,10}(?:[^\w\-]|^)(?P<resource>akab[\w\-]+).{0,200})|(?:(?:client_secret).{0,10}(?:[^0-9a-z\/\+]|^)(?P<secret>[0-9a-z\/\+]{43}=))){3} | ||
| $SEC101/016.StripeApiKey=(?:[^rs]|^)(?P<secret>(?:r|s)k_(?:live|test)_(?i)[0-9a-z]{24,99})(?:[^0-9a-z]|$) | ||
| $SEC101/017.NpmAuthorToken=(?i)npm.{0,100}[^0-9a-z](?-i)(?P<secret>[0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12})[^0-9a-z] | ||
| $SEC101/017.NpmAuthorTokenV2=(?:[^s]|^)(?P<secret>(?:npm)_(?i)[a-zA-Z0-9]{36}) |
| $SEC101/015.AkamaiCredentials=(?si)https:\/\/(?P<host>[\w\-\.]+)\.akamaiapis\.net.{0,150}(?:(?:client_token.{0,10}(?:[^a]|^)(?P<id>akab[\w\-]+).{0,50})|(?:access_token.{0,10}(?:[^\w\-]|^)(?P<resource>akab[\w\-]+).{0,200})|(?:(?:client_secret).{0,10}(?:[^0-9a-z\/\+]|^)(?P<secret>[0-9a-z\/\+]{43}=))){3} | ||
| $SEC101/016.StripeApiKey=(?:[^rs]|^)(?P<secret>(?:r|s)k_(?:live|test)_(?i)[0-9a-z]{24,99})(?:[^0-9a-z]|$) | ||
| $SEC101/017.NpmAuthorToken=(?i)npm.{0,100}[^0-9a-z](?-i)(?P<secret>[0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12})[^0-9a-z] | ||
| $SEC101/017.NpmAuthorTokenV2=(?:[^s]|^)(?P<secret>(?:npm)_(?i)[a-zA-Z0-9]{36}) |
There was a problem hiding this comment.
with that, we would have:
(?:[^n]|^)
| $SEC101/015.AkamaiCredentials=(?si)https:\/\/(?P<host>[\w\-\.]+)\.akamaiapis\.net.{0,150}(?:(?:client_token.{0,10}(?:[^a]|^)(?P<id>akab[\w\-]+).{0,50})|(?:access_token.{0,10}(?:[^\w\-]|^)(?P<resource>akab[\w\-]+).{0,200})|(?:(?:client_secret).{0,10}(?:[^0-9a-z\/\+]|^)(?P<secret>[0-9a-z\/\+]{43}=))){3} | ||
| $SEC101/016.StripeApiKey=(?:[^rs]|^)(?P<secret>(?:r|s)k_(?:live|test)_(?i)[0-9a-z]{24,99})(?:[^0-9a-z]|$) | ||
| $SEC101/017.NpmAuthorToken=(?i)npm.{0,100}[^0-9a-z](?-i)(?P<secret>[0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12})[^0-9a-z] | ||
| $SEC101/017.NpmAuthorTokenV2=(?:[^s]|^)(?P<secret>(?:npm)_(?i)[a-zA-Z0-9]{36}) |
There was a problem hiding this comment.
we normally add the end border as well:
(?:[^0-9a-z]|$) #Closed
| @@ -165,6 +165,13 @@ | |||
| "MessageArguments": { "secretKind": "NPM API key" }, | |||
| { | ||
| "Id": "SEC101/017", | ||
| "Name": "DoNotExposePlaintextSecrets/NpmAuthorToken", | ||
| "ContentsRegex": "$SEC101/017.NpmAuthorTokenV2", |
There was a problem hiding this comment.
we should add a checksum validation for this secret kind, that's the substance of the token change (moving to v2).
We need to separate this into a brand new rule, as we are likely to handle the 'identifiable' secrets differently (or someone needs to propose a convention for marking these as identifiable).
For now, try allocating a new rule id, SEC101/050 it appears? We should call the rule 'IdentifiableNpmAuthorToken' or 'NpmIdentifiableAuthorToken'. The former would sort all identifiable secret types together, the latter would keep the secret type bundled with the platform.
We could consider adding a new property denoting these as identifiable in adding to making it clear in the rule name. This would be valuable if we special-case behaviors in the engine itself (e.g., the engine could automatically elevate all 'identifiable' secrets to errors in the static analysis phase.
A future change, though, no need to worry about it now. Let's do handle the checksum. I just noticed we don't do this in the GH PAT either, so you'll need to follow up with me offline on what to do.
#Resolved
There was a problem hiding this comment.
Hello, we do the checksum in the internal version.
There was a problem hiding this comment.
I've split this out into a separate rule and added the supporting unit test and functional test files. Let me know what you think. I tried to organize the commits for easy reversal/changes.
|
|
||
| namespace Microsoft.CodeAnalysis.Sarif.PatternMatcher.Plugins.Security | ||
| { | ||
| public class IdentifiableNpmAuthorTokenValidator : DynamicValidatorBase |
|
|
||
| <ItemGroup> | ||
| <None Remove="TestData\SecurePlaintextSecrets\ExpectedOutputs\SEC101_050.IdentifiableNpmAuthorToken.sarif" /> | ||
| <None Remove="TestData\SecurePlaintextSecrets\Inputs\SEC101_050.IdentifiableNpmAuthorToken.ps1" /> |
There was a problem hiding this comment.
you can delete this, since we automatically embed everything under TestData #Closed
|
|
||
| namespace Microsoft.CodeAnalysis.Sarif.PatternMatcher.Plugins.Security.Validators | ||
| { | ||
| public class IdentifiableNpmAuthorTokenValidatorTests |
| { | ||
| public class IdentifiableNpmAuthorTokenValidatorTests | ||
| { | ||
|
|
There was a problem hiding this comment.
you can remove this empty line #Closed
|
|
||
| namespace Microsoft.CodeAnalysis.Sarif.PatternMatcher.Plugins.Security.Validators | ||
| { | ||
| public class IdentifiableNpmAuthorTokenValidatorTests |
There was a problem hiding this comment.
add the summary (check other test validators, pls) #Closed
| Content = new StringContent(publishResponseJson) | ||
| }; | ||
|
|
||
| var ValidEmptyContentResponse = new HttpResponseMessage(HttpStatusCode.OK) |
| ValidationState currentState = identifiableNpmAuthorTokenValidator.IsValidDynamic(ref fingerprint, | ||
| ref message, | ||
| keyValuePairs, | ||
| ref resultLevelKind); |
There was a problem hiding this comment.
review indentation. we normally indent based on the first parameter #Closed
|
|
||
| protected override IEnumerable<ValidationResult> IsValidStaticHelper(IDictionary<string, FlexMatch> groups) | ||
| { | ||
| FlexMatch secret = groups["secret"]; |
There was a problem hiding this comment.
to reduce false-positives, we should validate the checksum:
https://github.blog/changelog/2021-09-23-npm-has-a-new-access-token-format/ #Closed
There was a problem hiding this comment.
Check sec101/102 or sec101/006 (internal version)
| { | ||
| FlexMatch secret = groups["secret"]; | ||
|
|
||
| if (groups.TryGetNonEmptyValue("checksum", out FlexMatch checksum)) |
| client); | ||
| } | ||
|
|
||
| private static string Base62EncodeUint32(uint value, int minimumLength = 6) |
There was a problem hiding this comment.
This kind of code should exist in Microsoft.Security.Utilities. That package should have API that accepts an arbitrary alphabet (such as a base62 encoding scheme) and provides encoding and decoding capabilities.
We will need this API for many classes of secret.
There was a problem hiding this comment.
Looks like that functionality is still pending. May I proceed with Eddy's suggestion and ensure that a task to incorporate this is added to the backlog?
There was a problem hiding this comment.
We're now using the CustomAlphabetEncoder class to achieve this.
| @@ -74,73 +39,9 @@ protected override ValidationState IsValidDynamicHelper(ref Fingerprint fingerpr | |||
| ref ResultLevelKind resultLevelKind) | |||
| { | |||
| string secret = fingerprint.Secret; | |||
| // Validate checksum to avoid false positives. | ||
| string randomPart = secret.Value.String.Substring(4, 30); | ||
| uint checksumValue = Crc32.Calculate(randomPart); | ||
| var encoder = new CustomAlphabetEncoder(); |
There was a problem hiding this comment.
Thanks, this also helped me catch some other cleanup items.
|
|
||
| [JsonProperty("total")] | ||
| public int Total { get; set; } | ||
| return NpmAuthorTokenHelper.ValidateTokens(ref fingerprint, ref message, ref resultLevelKind, client); |
| public class IdentifiableNpmAuthorTokenValidatorTests | ||
| { | ||
| [Fact] | ||
| public void IdentifiableNpmAuthorTokenValidatorTests_MockHttpTests() |
| @@ -0,0 +1,3 @@ | |||
| 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 | |||
|
|
|||
| "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" No newline at end of file | |||
There was a problem hiding this comment.
| { | ||
| public class IdentifiableNpmAuthorTokenValidator : DynamicValidatorBase | ||
| { | ||
| private readonly CustomAlphabetEncoder encoder = new CustomAlphabetEncoder(); |
There was a problem hiding this comment.
Good note, please provide the rationale for the standard. 'The reason is' that it's harder to understand initialization behavior when assignments are inlined like this: instance variables can be declared anywhere in the class. You really notice the problem with this pattern when debugging a ctor with initializations of this kind that are scattered around: the debugger hopes from place to place. With an explicit constructor, everything is sequential.
Changes
Added regex for new npm secret format
For significant contributions please make sure you have completed the following items:
ReleaseHistory.mdupdated for non-trivial changes