Skip to content

Invalid identifiers in source code cause non-parseable derived code #442

@Kijewski

Description

@Kijewski

Unparseable code is generated, because ɔ (\u{596}) is not a possible start for a token.

Right now, we do not test if UTF-8 characters are actually valid characters in identifiers, but always assume they are:

let start = take_while(1.., |c: char| c.is_alpha() || c == '_' || c >= '\u{0080}');

A possible fix would be to properly to check if a character is actually valid for an identifier, e.g. using

  • unicode-ident (used by syn and proc-macro2, so we already transitively depend on it) or
  • unicode-xid (used by rust's lexer, so we can be sure that its output cannot be wrong, but it is slower, and would pull in more data).

Fuzzing failure in derive (crash-6ae39f08ef6a67885629f22599c93da39eb05bab)

Artifact:

data:application/octet-steam;base64,w/8vLy56ezp7e3t7e3sne3sqKirWltKyeNKSeH19dmFsdWVzdEUqKnRFKgAAAAAXRSoAAAAAAAAAV3N0cnVjdHhyAJI/nnqM0phva2EAAAAmywo=

Formatted test case:

#[test]
fn test() -> Result<(), syn::Error> {
    let input = quote! {
        #[template(
            ext = "//.z{:{{{{",
            source = "{'{{***\u{596}ҲxҒx}}valuestE**tE*\0\0\0\0\u{17}E*\0\0\0\0\0\0\0Wstructxr\0"
        )]
        struct ka {}
    };
    let output = crate::derive_template(input, import_askama);
    let _: syn::File = syn::parse2(output)?;
    Ok(())
}

fn import_askama() -> TokenStream {
    quote! {
        extern crate askama;
    }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions