Skip to content

Issue with multibyte chars in source_text() computation #410

@bram209

Description

@bram209

It treats lo as a byte index, while it is actually a character index:

let trunc_lo = &self.source_text[lo..];

I expect this test to pass, but it does not:

#[cfg(span_locations)]
#[test]
fn source_text() {
    let input = "    𓀕 c    ";
    let mut tokens = input
        .parse::<proc_macro2::TokenStream>()
        .unwrap()
        .into_iter();

    let ident1 = tokens.next().unwrap();
    assert_eq!("𓀕", ident1.span().source_text().unwrap());

    let ident2 = tokens.next().unwrap();
    assert_eq!("𓀕", ident2.span().source_text().unwrap());
}

Panics with (as character 𓀕 occupies byte 5 and 6)

---- source_text stdout ----
thread 'source_text' panicked at 'byte index 6 is not a char boundary; it is inside '𓀕' (bytes 4..8) of `    𓀕 c   `', src/fallback.rs:367:25

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions