Invalid identifiers in source code cause non-parseable derived code

Unparseable code is generated, because `ɔ` (`\u{596}`) is not a possible start for a token.

Right now, we do not test if UTF-8 characters are actually valid characters in identifiers, but always assume they are:
<https://github.com/askama-rs/askama/blob/37101cb95d98921d9c70982aa0116c90687ff56e/askama_parser/src/lib.rs#L402>
A possible fix would be to properly to check if a character is actually valid for an identifier, e.g. using
* [unicode-ident](https://crates.io/crates/unicode-ident/) (used by syn and proc-macro2, so we already transitively depend on it) or
* [unicode-xid](https://crates.io/crates/unicode-xid) (used by rust's lexer, so we can be sure that its output *cannot* be wrong, but it is slower, and would pull in more data).

---

Fuzzing failure in `derive` (crash-6ae39f08ef6a67885629f22599c93da39eb05bab)

Artifact:

```uri
data:application/octet-steam;base64,w/8vLy56ezp7e3t7e3sne3sqKirWltKyeNKSeH19dmFsdWVzdEUqKnRFKgAAAAAXRSoAAAAAAAAAV3N0cnVjdHhyAJI/nnqM0phva2EAAAAmywo=
```

Formatted test case:

```rust
#[test]
fn test() -> Result<(), syn::Error> {
    let input = quote! {
        #[template(
            ext = "//.z{:{{{{",
            source = "{'{{***\u{596}ҲxҒx}}valuestE**tE*\0\0\0\0\u{17}E*\0\0\0\0\0\0\0Wstructxr\0"
        )]
        struct ka {}
    };
    let output = crate::derive_template(input, import_askama);
    let _: syn::File = syn::parse2(output)?;
    Ok(())
}

fn import_askama() -> TokenStream {
    quote! {
        extern crate askama;
    }
}
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid identifiers in source code cause non-parseable derived code #442

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Invalid identifiers in source code cause non-parseable derived code #442

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions