Skip to content

Fix recursion bug in ast-conversion#1076

Merged
datho7561 merged 3 commits intoredhat-developer:mainfrom
pjsk-stripe:pjsk-recursion-ast-converter
Jun 18, 2025
Merged

Fix recursion bug in ast-conversion#1076
datho7561 merged 3 commits intoredhat-developer:mainfrom
pjsk-stripe:pjsk-recursion-ast-converter

Conversation

@pjsk-stripe
Copy link
Contributor

What does this PR do?

Currently, the code to convert AST nodes (in particular, resolve aliases) uses a recursive bit of code that will traverse through YAML aliases to resolve properties. For example:

a: &a
  name: "foo"
b: 
  <<: *a

Parsing this code will produce two objects a and b that both have name: "foo". The code that implements this alias resolution will only traverse to a depth of 1000, meaning it will give up on resolving aliases once it's traversed a really really long way down a path. This 1000 is very unlikely in human readable yaml files.

However, if you have a very large yaml files with many aliases, none of which come close to the depth>1000 case, you can still experience errors due to a bug in the recursion.

Internally we use the https://github.com/redhat-developer/vscode-yaml VSCode extension and we experienced this bug in a very large yaml file where all aliases in the file after the 1000th alias would report incorrect errors for missing properties.

This boils down to the recursion correctly calling refDepth++ to keep track of depth in the algorithm, but not ever calling refDepth--. Meaning regardless of depth, things fail after so many aliases are observed.

This PR address the issue and adds some unit tests to validate things.

What issues does this PR fix or reference?

#1075

Is it tested? How?

return resultNode;
ans = resultNode;
}
aliasDepth.currentRefDepth--;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the important bit. I had to restructure things a tiny bit to make this testable.

Comment on lines +251 to +260
aliasDepth.maxRefCount = 1;
const parsedDocument = parse(`
a: &a
foo: "web"
b: &b
<<: *a
c: &c
<<: *b
`);
aliasDepth.maxRefCount = 1000;
Copy link

@qaisjp qaisjp May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If parse crashes (due to bad code), maxRefCount will globally be left at 1, which would cause other tests to fail perhaps improperly.

Putting the cleanup in a finally may be safer:

Suggested change
aliasDepth.maxRefCount = 1;
const parsedDocument = parse(`
a: &a
foo: "web"
b: &b
<<: *a
c: &c
<<: *b
`);
aliasDepth.maxRefCount = 1000;
try {
aliasDepth.maxRefCount = 1;
const parsedDocument = parse(`
a: &a
foo: "web"
b: &b
<<: *a
c: &c
<<: *b
`);
} finally {
aliasDepth.maxRefCount = 1000;
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very rarely use finally so I checked that this actually works as intended in a REPL:

const x = () => {
    console.log("before")
    try {
        console.log("set 1")
        throw new Error("thrown by q");
    } finally {
        console.log("unset")
    }
    console.log("after")
}

x()

produces

before
set 1
unset
Uncaught Error: thrown by q
    at x (<anonymous>:5:15)
    at <anonymous>:1:1

Copy link
Contributor Author

@pjsk-stripe pjsk-stripe May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea! applied manually, suggestion had a syntax error

@pjsk-stripe
Copy link
Contributor Author

Hi @msivasubramaniaan, can this get a look over? Thanks!

@pjsk-stripe
Copy link
Contributor Author

Hi @msivasubramaniaan, just following up, any chance I can get this reviewed?

@datho7561
Copy link
Contributor

The tests appear to crash because of an out of memory issue before completing, likely during the Billion Laughs test case. Are you sure this is the right fix?

@pjsk-stripe
Copy link
Contributor Author

pjsk-stripe commented Jun 17, 2025

Hi @datho7561 , good catch!

Yes I am still confident this is the right fix. I believe that test is failing because the original recursion bug was exiting early from the recursion after 1000 laughs. Once this is fixed, the recursion would traverse all laughs leading to an OOM.

I fixed this in f1f1f5c. The original code used a global seen variable that kept track of seen nodes which would eventually grow far beyond memory's limits while computing the same aliases over and over again.

In the fix I switch to a cache that keeps track of what ASTNode each alias resolves to for future iterations.

The tests now all pass in blazing time, do you mind taking another look?

@datho7561 datho7561 self-requested a review June 18, 2025 13:04
@datho7561
Copy link
Contributor

I'm taking another look; I'm just trying to make sure that it doesn't ever calculate/store the 9 billion character string in memory.

It feels a little slow editing the Billion Laughs file, but this may be an existing bug, so I'll need to verify that.

@coveralls
Copy link

coveralls commented Jun 18, 2025

Coverage Status

coverage: 84.056% (+0.005%) from 84.051%
when pulling bfa2e0a on pjsk-stripe:pjsk-recursion-ast-converter
into fce910a on redhat-developer:main.

@datho7561
Copy link
Contributor

Completion in the billion laughs file does slow down quite a lot after this fix (~1-2s to 2-5s). However, we just need to mitigate the out of memory effects of the attack, not make it perfect; people shouldn't be editing files with the exploit in it day to day.

Copy link
Contributor

@datho7561 datho7561 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, and addresses the linked issue. Thank you!

@datho7561 datho7561 force-pushed the pjsk-recursion-ast-converter branch from f1f1f5c to bfa2e0a Compare June 18, 2025 16:01
@datho7561
Copy link
Contributor

Sorry, one second, trying to figure out why the tests are failing CI.

@datho7561 datho7561 merged commit cd20ba6 into redhat-developer:main Jun 18, 2025
4 checks passed
phunguyenmurcul added a commit to macrosreply/yaml-language-server that referenced this pull request Oct 3, 2025
* checked const value for the boolean type

* test case added

* added positive test case as well

* Fix corner case for 'flow sequence forbidden' quickfix

- Do not include any trailing spaces in the error range
  - This is a change in behaviour; it seems this behaviour was
    intentional based on the test cases,
    but I don't agree with it, since the error range should only cover
    the erroneous code.
- Fix a bug where `]` or `}` are not included in the error range if they're the last character in the document

Fixes redhat-developer#1060

Signed-off-by: David Thompson <davthomp@redhat.com>

* added dockercompose and github actions in setting handler

Signed-off-by: msivasubramaniaan <msivasub@redhat.com>

* Fix YAML JSON Schema files starting with %YAML 1.x

Fix a bug where YAML files containing a JSON Schema that start with
%YAML 1.x or a comment that contains a number would fail to to be parsed.

This is caused by the fact that the JSON Schema parser would contain the
number, e.g. 1.2 as the value of `unresolvedJsonSchema.schema` and so
the test for `=== undefined` would fail. This would cause the error from
the JSON parser to be returned instead of trying to parse the file as a
YAML file.

To work around this, also check if `unresolvedJsonSchema.schema` is a
number. It if is, it clearly was not a JSON Schema and so we can safely
try to parse the file as a YAML file.

Fixes: redhat-developer#922
Signed-off-by: David Lechner <dlechner@baylibre.com>

* show the matching value on top

* fix eslint issue

* removed enumMarkdown entries

* Fix bool enums (redhat-developer#1080)

* add (failing) test case for boolean enum and const values

failing with:

```
[
  {
    code: 1,
    data: {
      schemaUri: [ 'file:///default_schema_id.yaml' ],
      values: [ true, false ]
    },
    message: 'Value is not accepted. Valid values: true, false.',
    range: {
      end: { character: 15, line: 0 },
      start: { character: 11, line: 0 }
    },
    severity: 1,
    source: 'yaml-schema: file:///default_schema_id.yaml'
  }
]
```

* fix the issue for `boolean` values in `enum`

this is the same fix for `enum` that has been applied a while back for `const` values here: redhat-developer@e6165e4

fixes issue: redhat-developer#1078

---------

Co-authored-by: xxx <xxx>

* Support draft-04 schemas (redhat-developer#1065)

This helps to support the schema for openapi 3.0.0,
among others, that use an older draft of JSON schema that has some type
differences that make it incompatible with JSON schema draft 07.

See redhat-developer#1006 , I think the root cause for the issues that PR caused
is that we were trying to download and cache the metaschema from
the "URL" instead of using the copy that's bundled with `ajv`.

Fixes redhat-developer#780, Fixes redhat-developer#752

(and many, many duplicates we'll have to find and clean up)

Signed-off-by: David Thompson <davthomp@redhat.com>

* Fix recursion bug in ast-conversion (redhat-developer#1076)

* Ensure recursion works as expected

* fix test safety for when parse fails

* use a cache to keep track of resolved aliases

* Added l10n bundle for various language and translate the local strings (redhat-developer#1086)

* added l10n bundle for various language and translate the local strings

* upstream node

* upstream node

* update to node 18

* Changed NodeJS.Timeout as per node 18

* upstream to node 20

* updated coversall script

* fixed test case

* override json service translation

* fixed test cases

* fix/improve enum value descriptions for merged enum lists (redhat-developer#1085)

This is a follow-up submission for redhat-developer#1028, which removes duplicate enum values in merged enum lists.

This fix/improvement ensures that the first **non-empty** enum description is used for the corresponding enum value.

Before this fix: if the first non-unique enum value was not providing a description but the second one was, it was not
picked up and the enum value description stayed empty.

Co-authored-by: xxx <xxx>
Co-authored-by: Muthurajan Sivasubramanian <93245779+msivasubramaniaan@users.noreply.github.com>

* added test cases for bundle and configuration through testhelper class

* Fix array of const completion

* Fix ignoreScalar in value completion

* apply patch

* Updated changelog for the release 1.19.0 (redhat-developer#1094)

* Updated changelog for the release 1.19.0

* updated chhangelog.md

* addressed review comments

* @vscode/l10n is a runtime dependency (redhat-developer#1096)

* check NODE_AUTH_TOKEN is available or not (redhat-developer#1098)

Signed-off-by: msivasubramaniaan <msivasub@redhat.com>

* Fix auth on npm publish (redhat-developer#1099)

* check NODE_AUTH_TOKEN is available or not

Signed-off-by: msivasubramaniaan <msivasub@redhat.com>

* set NODE_AUTH_TOKEN

---------

Signed-off-by: msivasubramaniaan <msivasub@redhat.com>

* Fix auth on npm publish (redhat-developer#1100)

* check NODE_AUTH_TOKEN is available or not

Signed-off-by: msivasubramaniaan <msivasub@redhat.com>

* set NODE_AUTH_TOKEN

* set npmAuthToken $NODE_AUTH_TOKEN

* set npmAuthToken $NODE_AUTH_TOKEN

---------

Signed-off-by: msivasubramaniaan <msivasub@redhat.com>

* Update CI.yaml (redhat-developer#1101)

just set token to 123 and check what error occur

* Revert "Update CI.yaml (redhat-developer#1101)" (redhat-developer#1102)

This reverts commit 8c4c22f.

* Fix auth on npm publish (redhat-developer#1103)

* check NODE_AUTH_TOKEN is available or not

Signed-off-by: msivasubramaniaan <msivasub@redhat.com>

* set NODE_AUTH_TOKEN

* set npmAuthToken $NODE_AUTH_TOKEN

* set npmAuthToken $NODE_AUTH_TOKEN

* all the required parameters are set

---------

Signed-off-by: msivasubramaniaan <msivasub@redhat.com>

* Fix the fallback path to the l10n folder (redhat-developer#1104)

If the language server client doesn't specify the location of the l10n
folder, then the language server falls back to a path relative to the
complied file.
This path was wrong; `server.js` is in `./out/server/src/server.js`,
so the relative path of the `l10n` folder is `../../../l10n` instead of
`../l10n`.

Signed-off-by: David Thompson <davthomp@redhat.com>

* Fix the fallback path to the l10n folder (redhat-developer#1104) (redhat-developer#1105)

If the language server client doesn't specify the location of the l10n
folder, then the language server falls back to a path relative to the
complied file.
This path was wrong; `server.js` is in `./out/server/src/server.js`,
so the relative path of the `l10n` folder is `../../../l10n` instead of
`../l10n`.

Signed-off-by: David Thompson <davthomp@redhat.com>
Co-authored-by: David Thompson <davidethompson@me.com>

* Fix NPM broken (redhat-developer#1095)

* set always-auth true

* added registry url and release ci on main push for testing

* removed main push

* renamed to NPM_TOKEN

* used YARN_NPM_AUTH_TOKEN

* used npm_token

* used YARN_NPM_AUTH_TOKEN

* added registry-url

* tried NODE_AUTH_TOKEN

* created .npmrc file

* removed .npmrc

* check NODE_AUTH_TOKEN available or not

* moved the check code on top

* checked NPM_TOKEN

* tried with NODE_AUTH_TOKEN

* Try with NPM instead of YARN

* switch to npm

* removed install command

* fix build issue

* fix build issue

* done pkg fix

* Completely moved to npm from yarn

* review comments addressed

* Bump serialize-javascript and mocha

Bumps [serialize-javascript](https://github.com/yahoo/serialize-javascript) to 6.0.2 and updates ancestor dependency [mocha](https://github.com/mochajs/mocha). These dependencies need to be updated together.


Updates `serialize-javascript` from 6.0.0 to 6.0.2
- [Release notes](https://github.com/yahoo/serialize-javascript/releases)
- [Commits](yahoo/serialize-javascript@v6.0.0...v6.0.2)

Updates `mocha` from 9.2.2 to 11.7.1
- [Release notes](https://github.com/mochajs/mocha/releases)
- [Changelog](https://github.com/mochajs/mocha/blob/main/CHANGELOG.md)
- [Commits](mochajs/mocha@v9.2.2...v11.7.1)

---
updated-dependencies:
- dependency-name: serialize-javascript
  dependency-version: 6.0.2
  dependency-type: indirect
- dependency-name: mocha
  dependency-version: 11.7.1
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>

* Remove coveralls dependency (redhat-developer#1107)

From my understanding, I think we just need the GitHub Action in order
for things to work.

Signed-off-by: David Thompson <davthomp@redhat.com>

* set .npmrc (redhat-developer#1109)

* add encoding dependency

* upstream @vsode/l10n dependency

* updated changelog.md (redhat-developer#1111)

* Fixed broken link to JSON schema website

* Fix release GitHub Action

- setup-node@v5 creates a .npmrc if you provide a registry,
  so there should be no need to do that separately
- after setup-node@v5 is run, `npm publish` expects the token to be
  accessible from the environment variable `NODE_AUTH_TOKEN`
- also fix coveralls, because that's getting annoying
  (root cause is that it tried to publish the results for each platform
  instead of just once)

See https://github.com/actions/setup-node/blob/main/docs/advanced-usage.md#publish-to-npmjs-and-gpr-with-npm

Signed-off-by: David Thompson <davthomp@redhat.com>

* Fix prerelease publishing

The options that were being used in `npm publish` to modify the version
to include the git hash at the end are intended to be used
with the `npm version` command beforehand instead.
I think this got messed up somewhere along the way while rewriting the
GitHub Action.

Fixes redhat-developer#1128

Signed-off-by: David Thompson <davthomp@redhat.com>

* Upversion to 1.19.1

We burned the 1.19.0 release by publishing a prerelease version labelled
1.19.0. In order to release, we'll need to use a different version
number.

Signed-off-by: David Thompson <davthomp@redhat.com>

* Upversion to 1.20.0

* remove \n in README (redhat-developer#1068)

fix typo

Signed-off-by: roc <roc@imroc.cc>
Co-authored-by: David Lechner <david@lechnology.com>

* Wrapping code action title with string (redhat-developer#1117)

* Wrapping code action title with string

* adding textedit value field wrapped as string

* README.md: Mention kate as a Client

The `kate` editor has `yaml-language-server` as its default for the YAML format built-in (no plugins or manual configuration needed).

* Support `yaml-textmate` and `yaml-tmlanguage` languages

* redhat-developer/vscode-yaml#1093

* fix: cast cleanupInterval to any before clearing interval

* fix: remove max-warnings option from lint scripts for better flexibility

* Remove @microsoft/eslint-formatter-sarif from devDependencies in package.json

---------

Signed-off-by: David Thompson <davthomp@redhat.com>
Signed-off-by: msivasubramaniaan <msivasub@redhat.com>
Signed-off-by: David Lechner <dlechner@baylibre.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: roc <roc@imroc.cc>
Co-authored-by: msivasubramaniaan <msivasub@redhat.com>
Co-authored-by: David Thompson <davthomp@redhat.com>
Co-authored-by: David Lechner <dlechner@baylibre.com>
Co-authored-by: Kosta <2526664+Kosta-Github@users.noreply.github.com>
Co-authored-by: David Thompson <davidethompson@me.com>
Co-authored-by: pjsk-stripe <55514208+pjsk-stripe@users.noreply.github.com>
Co-authored-by: Muthurajan Sivasubramanian <93245779+msivasubramaniaan@users.noreply.github.com>
Co-authored-by: ShadiestGoat <48590492+ShadiestGoat@users.noreply.github.com>
Co-authored-by: Daniel M. Capella <polyzen@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: debbie <148627186+webdevred@users.noreply.github.com>
Co-authored-by: roc <roc@imroc.cc>
Co-authored-by: David Lechner <david@lechnology.com>
Co-authored-by: Arunvenmany <arun.kumar.v.n@ibm.com>
Co-authored-by: Niels Thykier <niels@thykier.net>
Co-authored-by: RedCMD <33529441+RedCMD@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants