-
Notifications
You must be signed in to change notification settings - Fork 70
VCert's "auto-retry" feature (i.e., reset certificate if it is failed) causes a race condition in TPP, resulting in the error "unmatched key modulus" #273
Description
When using VCert v4.23.0, with TPP 22.4.1 (also tested with TPP 20.4.0), I often receive the following error message when requesting a certificate:
vcert error: your data contains problems: request doesn't match certificate: unmatched key modulus
I checked that the problem does not come from re-using the private key. I can confirm that the CSR and the issued certificate are mismatched.
The bug
Affected: TPP 22.4.1 and older, VCert 4.23.0, cert-manager 1.11.0 (just this one version), venafi-enhanced-issuer v0.2.0 and v0.3.0.
Fixed In: VCert v4.24.0, cert-manager v1.11.1 and 1.12.0, venafi-enhanced-issuer v0.3.1.
This bug systematically happens given the following circumstances:
- Only happens during renewal (does not happen when it is the initial enrollment).
- Only happens after a first renewal attempt (e.g., the CA was down).
- Only happens if the second renewal attempt fails (e.g., the CA was still down).
In real-world usage, that means that the workaround for "Click retry" introduced in #269 only "works" 50% of the time. This is better than before #269, since you were getting stuck with "Click retry" 100% of the time, but the error is now less descriptive.
Workaround: renew the certificate once again (given that this third attempt succeeds; otherwise, the fourth attempt will also fail, and so on).
When this bug occurs, VCert and cert-manager will show the following message:
request doesn't match certificate: unmatched key modulus
Workaround: re-renew the certificate (given that this third attempt succeeds; otherwise, the fourth attempt will also fail, and so on).
Problem
This unexpected behavior seems to happen when request and reset(restart=true) are called back to back. When that happens, TPP gives VCert an old certificate instead of returning a 500 error.
There seems to be a bad interaction between request and reset(restart=true). Note that the request we make in VCert are asynchronous (WorkToDoTimeout=0), and request never return 500s. Only retrieve calls may return a 500.
The following flow triggers the problem:
# Fresh certificate, CA is down.
request
reset(restart=true)
retrieve
# ❌ Returns 200 with the old certificate.Even with a 1 second pause between requesting and resetting, the problem still occurs:
# Fresh certificate, CA is down.
request
sleep 1s
reset(restart=true)
retrieve
# ❌ Returns 200 with the old certificate.We found that waiting for 5 seconds allows you to work around the problem. We also found that using reset(restart=false) before requesting doesn't trigger the problem.
# Fresh certificate, CA is down.
request
sleep 5s
reset(restart=true)
retrieve
# ✅ Returns 500 as expected.# Fresh certificate, CA is down.
reset(restart=false)
request
retrieve
# ✅ The expected 500 HTTP code is returned.Reproducing "unmatched key modulus" with vcert
First, set your .envrc:
#!/bin/bash
export TPP_URL=https://tpp.mael-valais-gcp.jetstacker.net
export TPP_USER=cert_manager
export TPP_PWD=$(lpass show -p tpp.mael-valais-gcp.jetstacker.net)
export TPP_CLIENT_ID=edit-mappingsThen, get a token:
TOKEN=$(vcert getcred -u $TPP_URL --username=$TPP_USER --password $TPP_PWD --client-id=$TPP_CLIENT_ID --scope=certificate:manage,revoke,delete --format json | tee /dev/stderr | jq -r .access_token) && export TOKEN
Then, make sure that the certificate doesn't already exist:
curl -D/dev/null -skSH "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -X DELETE $TPP_URL/vedsdk/certificates/$(curl -D/dev/null -skSH "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" $TPP_URL/vedsdk/config/dntoguid -d '{"ObjectDN":"\\VED\\Policy\\application-team-1\\app1.example.com"}' | tee /dev/stderr | jq .GUID -r | tr -d '{}')Then, make sure that the AD CS service (i.e., Microsoft Certification Authority) is running:
gcloud compute ssh --project jetstack-mael-valais --zone europe-west1-c cert_manager@tpp -- net start certsvcThen, enroll a certificate. It should succeed:
vcert enroll -u https://tpp.mael-valais-gcp.jetstacker.net -t "$TOKEN" --cn app1.example.com -z 'application-team-1' --san-dns=app1.example.comThen, turn the CA off:
gcloud compute ssh --project jetstack-mael-valais --zone europe-west1-c cert_manager@tpp -- net stop certsvcThen, enroll. It should show an error:
$ vcert enroll -u https://tpp.mael-valais-gcp.jetstacker.net -t "$TOKEN" --cn app1.example.com -z 'application-team-1' --san-dns=app1.example.com
Enter key passphrase:
Verifying - Enter key passphrase:
vCert: 2023/01/20 15:25:36 Successfully connected to Trust Protection Platform
vCert: 2023/01/20 15:25:36 Successfully read zone configuration for application-team-1
vCert: 2023/01/20 15:25:37 Successfully created request for app1.example.com
vCert: 2023/01/20 15:25:37 Successfully posted request for app1.example.com, will pick up by \VED\Policy\application-team-1\app1.example.com
vCert: 2023/01/20 15:25:37 unable to retrieve: Unexpected status code on TPP Certificate Retrieval. Status: 500 Certificate \VED\Policy\application-team-1\app1.example.com has encountered an error while processing, Status: Post CSR failed with error: Cannot connect to the certificate authority (CA). Verify that your CA template settings are correct and that the remote server is available. For more information, search the Help system for Configuring the Microsoft Certificate Services Template Object., Stage: 500.If you do it again, you will get the key modulus error:
$ vcert enroll -u https://tpp.mael-valais-gcp.jetstacker.net -t "$TOKEN" --cn app1.example.com -z 'application-team-1' --san-dns=app1.example.com
Enter key passphrase:
Verifying - Enter key passphrase:
vCert: 2023/01/20 15:26:51 Successfully connected to Trust Protection Platform
vCert: 2023/01/20 15:26:51 Successfully read zone configuration for application-team-1
vCert: 2023/01/20 15:26:51 Successfully created request for app1.example.com
vCert: 2023/01/20 15:26:52 Successfully posted request for app1.example.com, will pick up by \VED\Policy\application-team-1\app1.example.com
vCert: 2023/01/20 15:26:54 vcert error: your data contains problems: request doesn't match certificate: unmatched key modulusvcert-request-reset-incorrect.har.zip
If you do it a third time, it will show the correct 500 error, since reset won't be called:
$ vcert enroll -u https://tpp.mael-valais-gcp.jetstacker.net -t "$TOKEN" --cn app1.example.com -z 'application-team-1' --san-dns=app1.example.com
Enter key passphrase:
Verifying - Enter key passphrase:
vCert: 2023/01/20 15:25:36 Successfully connected to Trust Protection Platform
vCert: 2023/01/20 15:25:36 Successfully read zone configuration for application-team-1
vCert: 2023/01/20 15:25:37 Successfully created request for app1.example.com
vCert: 2023/01/20 15:25:37 Successfully posted request for app1.example.com, will pick up by \VED\Policy\application-team-1\app1.example.com
vCert: 2023/01/20 15:25:37 unable to retrieve: Unexpected status code on TPP Certificate Retrieval. Status: 500 Certificate \VED\Policy\application-team-1\app1.example.com has encountered an error while processing, Status: Post CSR failed with error: Cannot connect to the certificate authority (CA). Verify that your CA template settings are correct and that the remote server is available. For more information, search the Help system for Configuring the Microsoft Certificate Services Template Object., Stage: 500.If you call it a fourth time, it will show "unmatched key modulus" again, and so on and so forth.
Reproducing with curl
❌ Renewal of OK certificate. CA is down. Flow: request > reset(restart=true)
Occurence of this scenario in VCert: 100% of the time given the following circumstance:
- Only happens during renewal (does not happen when it is the initial enrollment).
- Only happens after a first renewal attempt.
- Only happens if the second renewal attempt fails.
There is an easy workaround: re-renewing the certificate (given that this third attempt succeeds; otherwise, the fourth attempt will also fail, and so on).
Before running this test, I turn the CA on, issue a certificate, and turn the CA off:
curl -D/dev/null -skSH "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -X DELETE $TPP_URL/vedsdk/certificates/$(curl -D/dev/null -skSH "Authorization: Bearer $TOKEN" -H
"Content-Type: application/json" $TPP_URL/vedsdk/config/dntoguid -d '{"ObjectDN":"\\VED\\Policy\\application-team-1\\app1.example.com"}' | tee /dev/stderr | jq .GUID -r | tr -d '{}')
gcloud compute ssh --project jetstack-mael-valais --zone europe-west1-c cert_manager@tpp -- net start certsvc
vcert enroll -u $TPP_URL -t "$TOKEN" --cn app1.example.com -z 'application-team-1' --san-dns=app1.example.com
gcloud compute ssh --project jetstack-mael-valais --zone europe-west1-c cert_manager@tpp -- net stop certsvcThe actual test:
#!/bin/bash
openssl genrsa -out crt.key 2048
curl -X POST https://tpp.mael-valais-gcp.jetstacker.net/vedsdk/certificates/request -w ' %{http_code}\n' -skS -D/dev/null -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d@- <<EOF
{
"PolicyDN": "\\\\VED\\\\Policy\\\\application-team-1",
"CASpecificAttributes": [{ "Name": "Origin", "Value": "curl" }],
"Origin": "curl",
"PKCS10": $(step certificate create app1.example.com --san app1.example.com --csr --key crt.key /dev/stdout -f | jq -R --slurp),
"KeyAlgorithm": "RSA",
"KeyBitSize": 2048,
"DisableAutomaticRenewal": true,
"CADN":"\\\\VED\\\\Policy\\\\Administration\\\\msca"
}
EOF
curl -sS -D/dev/null -skSH "Authorization: Bearer $TOKEN" -w ' %{http_code}\n' -H "Content-Type: application/json" -o/dev/stdout https://tpp.mael-valais-gcp.jetstacker.net/vedsdk/certificates/reset -d '{"CertificateDN":"\\VED\\Policy\\application-team-1\\app1.example.com", "Restart":true}'
while :; do curl -sS -D/dev/null -w ' %{http_code}\n' -skSH "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -o/dev/stdout https://tpp.mael-valais-gcp.jetstacker.net/vedsdk/certificates/retrieve -d '{"CertificateDN":"\\VED\\Policy\\application-team-1\\app1.example.com", "Format": "base64", "IncludePrivateKey": false}'; sleep 1; doneFor some reason, the retrieve call returns 200 OK instead of 500 Internal Server Error, and the returned certificate doesn't match the CSR. The certificate corresponds to the old certificate that was meant to be renewed:
{"CertificateDN":"\\VED\\Policy\\application-team-1\\app1.example.com","Guid":"{eea3a1db-0602-4be3-9a88-da1be4dcc855}"} 200
{"ProcessingResetCompleted":true,"RestartCompleted":true} 200
{"Stage":-1,"Status":"Queued for renewal"} 202
{"Stage":-1,"Status":"Queued for renewal"} 202
{"CertificateData":"LS0tLS1CRUdJTi...URS0tLS0tDQo=","Filename":"app1.example.com.cer","Format":"base64"} 200✅ Renewal of OK certificate. CA is down. Flow: request > wait 5s > reset(restart=true)
Before running this test, I turn the CA on, issue a certificate, and turn the CA off:
curl -D/dev/null -skSH "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -X DELETE $TPP_URL/vedsdk/certificates/$(curl -D/dev/null -skSH "Authorization: Bearer $TOKEN" -H
"Content-Type: application/json" $TPP_URL/vedsdk/config/dntoguid -d '{"ObjectDN":"\\VED\\Policy\\application-team-1\\app1.example.com"}' | tee /dev/stderr | jq .GUID -r | tr -d '{}')
gcloud compute ssh --project jetstack-mael-valais --zone europe-west1-c cert_manager@tpp -- net start certsvc
vcert enroll -u $TPP_URL -t "$TOKEN" --cn app1.example.com -z 'application-team-1' --san-dns=app1.example.com
gcloud compute ssh --project jetstack-mael-valais --zone europe-west1-c cert_manager@tpp -- net stop certsvcHere is the actual test:
#!/bin/bash
openssl genrsa -out crt.key 2048
curl -X POST https://tpp.mael-valais-gcp.jetstacker.net/vedsdk/certificates/request -w ' %{http_code}\n' -skS -D/dev/null -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -d@- <<EOF
{
"PolicyDN": "\\\\VED\\\\Policy\\\\application-team-1",
"CASpecificAttributes": [{ "Name": "Origin", "Value": "curl" }],
"Origin": "curl",
"PKCS10": $(step certificate create app1.example.com --san app1.example.com --csr --key crt.key /dev/stdout -f | jq -R --slurp),
"KeyAlgorithm": "RSA",
"KeyBitSize": 2048,
"DisableAutomaticRenewal": true,
"CADN":"\\\\VED\\\\Policy\\\\Administration\\\\msca"
}
EOF
sleep 5
curl -sS -D/dev/null -skSH "Authorization: Bearer $TOKEN" -w ' %{http_code}\n' -H "Content-Type: application/json" -o/dev/stdout https://tpp.mael-valais-gcp.jetstacker.net/vedsdk/certificates/reset -d '{"CertificateDN":"\\VED\\Policy\\application-team-1\\app1.example.com", "Restart":true}'
while :; do curl -sS -D/dev/null -w ' %{http_code}\n' -skSH "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" -o/dev/stdout https://tpp.mael-valais-gcp.jetstacker.net/vedsdk/certificates/retrieve -d '{"CertificateDN":"\\VED\\Policy\\application-team-1\\app1.example.com", "Format": "base64", "IncludePrivateKey": false}'; sleep 1; doneIt errors as expected:
{"CertificateDN":"\\VED\\Policy\\application-team-1\\app1.example.com","Guid":"{5333ece9-50a6-474c-9c12-ecaef72868d7}"} 200
{"ProcessingResetCompleted":true,"RestartCompleted":true} 200
{"Stage":500,"Status":"Post CSR failed with error: Cannot connect to the certificate authority (CA). Verify that your CA template settings are correct and that the remote server is available. For more information, search the Help system for Configuring the Microsoft Certificate Services Template Object."} 500
