Skip to content
This repository was archived by the owner on Nov 1, 2022. It is now read-only.
This repository was archived by the owner on Nov 1, 2022. It is now read-only.

CPU runaway / memory leak in yaml parser #427

@geNAZt

Description

@geNAZt

Describe the bug

We started seeing random crashes regarding liveness probes failing in out helm-operator installations. After looking into a profile taken from a running one we saw that the CPU and memory usage climb until the process itself is not responsive anymore.

Another behaviour we saw was that helm release objects get into pending-update state which we have to manually cleanup, i guess thats due to the stale "starting sync run"

To Reproduce

Steps to reproduce the behaviour:

  1. Install helm operator with cpu and memory limits of 500m and 256Mi
  2. Install some charts from helm stable charts
  3. See how the helm operator gets cleaned up by liveness probe leaving helm releases broken

Expected behavior

Not crashing and not corrupting helm releases

Logs

helm-operator logs:

W0524 08:49:49.261320       6 client_config.go:543] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
ts=2020-05-24T08:49:49.29648543Z caller=operator.go:82 component=operator info="setting up event handlers"
ts=2020-05-24T08:49:49.29653343Z caller=operator.go:98 component=operator info="event handlers set up"
ts=2020-05-24T08:49:49.296856635Z caller=main.go:287 component=helm-operator info="waiting for informer caches to sync"
ts=2020-05-24T08:49:49.397055732Z caller=main.go:292 component=helm-operator info="informer caches synced"
ts=2020-05-24T08:49:49.397139534Z caller=git.go:104 component=gitchartsync info="starting sync of git chart sources"
ts=2020-05-24T08:49:49.397141434Z caller=operator.go:110 component=operator info="starting operator"
ts=2020-05-24T08:49:49.397198135Z caller=operator.go:112 component=operator info="starting workers"
ts=2020-05-24T08:49:49.398192549Z caller=server.go:42 component=daemonhttp info="starting HTTP server on :3030"
ts=2020-05-24T08:49:49.398508054Z caller=release.go:75 component=release release=minio targetNamespace=gitlab resource=gitlab:helmrelease/minio helmVersion=v3 info="starting sync run"
ts=2020-05-24T08:49:49.398857159Z caller=release.go:75 component=release release=svcat targetNamespace=svcat resource=svcat:helmrelease/svcat helmVersion=v3 info="starting sync run"
ts=2020-05-24T08:49:50.189199369Z caller=checkpoint.go:24 component=checkpoint msg="up to date" latest=0.10.1
ts=2020-05-24T08:49:50.989017121Z caller=release.go:249 component=release release=svcat targetNamespace=svcat resource=svcat:helmrelease/svcat helmVersion=v3 info="running dry-run upgrade to compare with release version '5'" action=dry-run-compare
ts=2020-05-24T08:49:50.991875863Z caller=helm.go:69 component=helm version=v3 info="preparing upgrade for svcat" targetNamespace=svcat release=svcat
ts=2020-05-24T08:49:51.00369044Z caller=helm.go:69 component=helm version=v3 info="resetting values to the chart's original version" targetNamespace=svcat release=svcat
ts=2020-05-24T08:49:53.915386944Z caller=helm.go:69 component=helm version=v3 info="performing update for svcat" targetNamespace=svcat release=svcat
ts=2020-05-24T08:49:54.100451308Z caller=helm.go:69 component=helm version=v3 info="dry run for svcat" targetNamespace=svcat release=svcat
ts=2020-05-24T08:49:54.392106666Z caller=release.go:268 component=release release=svcat targetNamespace=svcat resource=svcat:helmrelease/svcat helmVersion=v3 info="no changes" phase=dry-run-compare
ts=2020-05-24T08:49:54.392299469Z caller=release.go:75 component=release release=elasticsearch targetNamespace=graylog resource=graylog:helmrelease/elasticsearch helmVersion=v3 info="starting sync run"
ts=2020-05-24T08:49:58.510576689Z caller=release.go:105 component=release release=elasticsearch targetNamespace=graylog resource=graylog:helmrelease/elasticsearch helmVersion=v3 error="failed to determine sync action for release: status 'pending-upgrade' of release does not allow a safe upgrade"
ts=2020-05-24T08:49:58.511182698Z caller=release.go:75 component=release release=gitlab-runner targetNamespace=gitlab resource=gitlab:helmrelease/gitlab-runner helmVersion=v3 info="starting sync run"
ts=2020-05-24T08:50:04.356173295Z caller=release.go:249 component=release release=minio targetNamespace=gitlab resource=gitlab:helmrelease/minio helmVersion=v3 info="running dry-run upgrade to compare with release version '1'" action=dry-run-compare
ts=2020-05-24T08:50:04.356167495Z caller=release.go:249 component=release release=gitlab-runner targetNamespace=gitlab resource=gitlab:helmrelease/gitlab-runner helmVersion=v3 info="running dry-run upgrade to compare with release version '1'" action=dry-run-compare
ts=2020-05-24T08:50:04.499243431Z caller=helm.go:69 component=helm version=v3 info="preparing upgrade for gitlab-runner" targetNamespace=gitlab release=gitlab-runner
ts=2020-05-24T08:50:04.49980734Z caller=helm.go:69 component=helm version=v3 info="preparing upgrade for minio" targetNamespace=gitlab release=minio
ts=2020-05-24T08:50:04.516823894Z caller=helm.go:69 component=helm version=v3 info="resetting values to the chart's original version" targetNamespace=gitlab release=gitlab-runner
ts=2020-05-24T08:50:04.550429796Z caller=helm.go:69 component=helm version=v3 info="resetting values to the chart's original version" targetNamespace=gitlab release=minio
ts=2020-05-24T08:50:05.900198651Z caller=helm.go:69 component=helm version=v3 info="performing update for minio" targetNamespace=gitlab release=minio
ts=2020-05-24T08:50:05.906840951Z caller=helm.go:69 component=helm version=v3 info="performing update for gitlab-runner" targetNamespace=gitlab release=gitlab-runner
ts=2020-05-24T08:50:05.909939397Z caller=helm.go:69 component=helm version=v3 info="dry run for minio" targetNamespace=gitlab release=minio
ts=2020-05-24T08:50:05.916438994Z caller=helm.go:69 component=helm version=v3 info="dry run for gitlab-runner" targetNamespace=gitlab release=gitlab-runner
ts=2020-05-24T08:50:06.29659257Z caller=release.go:268 component=release release=gitlab-runner targetNamespace=gitlab resource=gitlab:helmrelease/gitlab-runner helmVersion=v3 info="no changes" phase=dry-run-compare
ts=2020-05-24T08:50:06.303724677Z caller=release.go:75 component=release release=osba targetNamespace=osba resource=osba:helmrelease/osba helmVersion=v3 info="starting sync run"
ts=2020-05-24T08:50:06.503599862Z caller=release.go:268 component=release release=minio targetNamespace=gitlab resource=gitlab:helmrelease/minio helmVersion=v3 info="no changes" phase=dry-run-compare
ts=2020-05-24T08:50:06.503798764Z caller=release.go:75 component=release release=chartmuseum targetNamespace=chartmuseum resource=chartmuseum:helmrelease/chartmuseum helmVersion=v3 info="starting sync run"
ts=2020-05-24T08:50:13.194666158Z caller=release.go:249 component=release release=osba targetNamespace=osba resource=osba:helmrelease/osba helmVersion=v3 info="running dry-run upgrade to compare with release version '3'" action=dry-run-compare
ts=2020-05-24T08:50:14.195036991Z caller=helm.go:69 component=helm version=v3 info="preparing upgrade for osba" targetNamespace=osba release=osba
ts=2020-05-24T08:50:14.488930578Z caller=helm.go:69 component=helm version=v3 info="resetting values to the chart's original version" targetNamespace=osba release=osba
ts=2020-05-24T08:50:17.392153912Z caller=helm.go:69 component=helm version=v3 info="performing update for osba" targetNamespace=osba release=osba
ts=2020-05-24T08:50:17.691649982Z caller=helm.go:69 component=helm version=v3 info="dry run for osba" targetNamespace=osba release=osba
ts=2020-05-24T08:50:17.991267853Z caller=release.go:268 component=release release=osba targetNamespace=osba resource=osba:helmrelease/osba helmVersion=v3 info="no changes" phase=dry-run-compare
ts=2020-05-24T08:50:17.993421085Z caller=release.go:75 component=release release=graylog targetNamespace=graylog resource=graylog:helmrelease/graylog helmVersion=v3 info="starting sync run"

pprof top 10:

Showing top 10 nodes out of 371
      flat  flat%   sum%        cum   cum%
    1610ms 15.32% 15.32%     1610ms 15.32%  runtime.memclrNoHeapPointers
     850ms  8.09% 23.41%      850ms  8.09%  math/big.addMulVVW
     420ms  4.00% 27.40%      800ms  7.61%  runtime.scanobject
     290ms  2.76% 30.16%     1160ms 11.04%  math/big.nat.montgomery
     280ms  2.66% 32.83%      850ms  8.09%  gopkg.in/yaml%2ev2.yaml_parser_scan_plain_scalar
     270ms  2.57% 35.39%      270ms  2.57%  runtime.futex
     270ms  2.57% 37.96%      270ms  2.57%  syscall.Syscall
     250ms  2.38% 40.34%     2950ms 28.07%  runtime.mallocgc
     220ms  2.09% 42.44%      220ms  2.09%  runtime.memmove
     170ms  1.62% 44.05%      230ms  2.19%  encoding/json.checkValid

Additional context

Maybe related things
After some search i found this:

yaml/libyaml#111
yaml/libyaml#115

which leads me to believe that there is a serious issue in the YAML parsing part which can bring the whole application down without any notice

Current index.yaml from helm stable:
index.yaml.zip

Gitlab index.yaml
gitlab_index.yaml.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    blocked needs validationIn need of validation before further actionbugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions