Skip to content

[bug] ast-grep scan: special characters lead to different end columns #1594

@mrwsl

Description

@mrwsl

Please read the FAQ for the bug you encountered.

  • I have read the existing FAQ

⏯ Playground Link

https://ast-grep.github.io/playground.html#eyJtb2RlIjoiQ29uZmlnIiwibGFuZyI6ImNwcCIsInF1ZXJ5IjoiY29uc29sZS5sb2coJE1BVENIKSIsInJld3JpdGUiOiJsb2dnZXIubG9nKCRNQVRDSCkiLCJzdHJpY3RuZXNzIjoic21hcnQiLCJzZWxlY3RvciI6IiIsImNvbmZpZyI6IiMgWUFNTCBSdWxlIGlzIG1vcmUgcG93ZXJmdWwhXG4jIGh0dHBzOi8vYXN0LWdyZXAuZ2l0aHViLmlvL2d1aWRlL3J1bGUtY29uZmlnLmh0bWwjcnVsZVxucnVsZTpcbiAga2luZDogY29tbWVudFxuICBwYXR0ZXJuOiAkQ09NTUVOVFxuICBhbGw6XG4gICAgLSByZWdleDogXCJUT0RPW146XXxGSVhNRVteOl1cIlxudHJhbnNmb3JtOlxuICBORVdfQ09NTUVOVDpcbiAgICByZXBsYWNlOlxuICAgICAgc291cmNlOiAkQ09NTUVOVFxuICAgICAgcmVwbGFjZTogKD88RklYPlRPRE98RklYTUUpXG4gICAgICBieTogXCIkRklYOlwiXG5maXg6ICRORVdfQ09NTUVOVCIsInNvdXJjZSI6IiAgLy8gIFRPRE8gdGVzdCBzdHVmZi4uLi4uXG4gIC8vICBUT0RPIHRlc3RlIMOcYmVyZ8OkbmdlIn0=

💻 Code

No response

🙁 Actual behavior

The playground example has two comments that are matched. In the playground, you can see that the end column is at 26 for both. When I run the rule with json output I get the following:

[
{
  "text": "//  TODO test stuff.....",
  "range": {
    "byteOffset": {
      "start": 2,
      "end": 26
    },
    "start": {
      "line": 0,
      "column": 2
    },
    "end": {
      "line": 0,
      "column": 26
    }
  },
  "file": "comment.cpp",
  "lines": "  //  TODO test stuff.....",
  "charCount": {
    "leading": 2,
    "trailing": 0
  },
  "replacement": "//  TODO: test stuff.....",
  "replacementOffsets": {
    "start": 2,
    "end": 26
  },
  "language": "Cpp",
  "metaVariables": {
    "single": {
      "COMMENT": {
        "text": "//  TODO test stuff.....",
        "range": {
          "byteOffset": {
            "start": 2,
            "end": 26
          },
          "start": {
            "line": 0,
            "column": 2
          },
          "end": {
            "line": 0,
            "column": 26
          }
        }
      }
    },
    "multi": {},
    "transformed": {
      "NEW_COMMENT": "//  TODO: test stuff....."
    }
  },
  "ruleId": "comment",
  "severity": "hint",
  "note": null,
  "message": ""
},
{
  "text": "//  TODO teste Übergänge",
  "range": {
    "byteOffset": {
      "start": 29,
      "end": 55
    },
    "start": {
      "line": 1,
      "column": 2
    },
    "end": {
      "line": 1,
      "column": 28
    }
  },
  "file": "comment.cpp",
  "lines": "  //  TODO teste Übergänge",
  "charCount": {
    "leading": 2,
    "trailing": 0
  },
  "replacement": "//  TODO: teste Übergänge",
  "replacementOffsets": {
    "start": 29,
    "end": 55
  },
  "language": "Cpp",
  "metaVariables": {
    "single": {
      "COMMENT": {
        "text": "//  TODO teste Übergänge",
        "range": {
          "byteOffset": {
            "start": 29,
            "end": 55
          },
          "start": {
            "line": 1,
            "column": 2
          },
          "end": {
            "line": 1,
            "column": 28
          }
        }
      }
    },
    "multi": {},
    "transformed": {
      "NEW_COMMENT": "//  TODO: teste Übergänge"
    }
  },
  "ruleId": "comment",
  "severity": "hint",
  "note": null,
  "message": ""
}
]

Although both comment matches have the end column 26, ast-grep scan reports end column 28 for the comment with special characters.

🙂 Expected behavior

Ast-grep scan should report the actual end column.

Additional information about the issue

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingupstreamUpstream issue from tree-sitter

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions