Skip to content

Quoted strings and octal character literals in ocamllex actions#1912

Merged
gasche merged 8 commits intoocaml:trunkfrom
314eter:ocamllex-characters
Sep 17, 2018
Merged

Quoted strings and octal character literals in ocamllex actions#1912
gasche merged 8 commits intoocaml:trunkfrom
314eter:ocamllex-characters

Conversation

@314eter
Copy link
Copy Markdown
Contributor

@314eter 314eter commented Jul 17, 2018

This contains the same fix as #1901, but for ocamllex actions. Without this, actions like

rule token = parse
  | 'a' { f' '"' }
  | 'b' { f2 '\o170' '"' }
  | 'c' { f1 "\u{1F42B}" }
  | 'd' { f1 {|}|} }
  | 'e' { (* " *) } (* " *) }

would not compile, while they contain valid ocaml code.

The changes are

  • recognize identifiers containing single quotes
  • recognize octal character literals
  • recognize unicode character escape sequences
  • recognize quoted strings

I tried to keep it a bugfix, not a language change. But as a side effect, octal and unicode escape sequences are now allowed in ocamllex strings too.

@ghost
Copy link
Copy Markdown

ghost commented Jul 18, 2018

One thing I wonder is if we could simply reuse parsing/lexer.mll directly. That would avoid having to maintain two lexers and would ensure that the lexical conventions are always the same.

@damiendoligez
Copy link
Copy Markdown
Member

But as a side effect, octal and unicode escape sequences are now allowed in ocamllex strings too.

That counts as a bug-fix.

Copy link
Copy Markdown
Member

@damiendoligez damiendoligez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty good, but the assert false question needs to be addressed.

lex/lexer.mll Outdated
warning lexbuf
(Printf.sprintf
"illegal uchar escape in string: '\\u{%s}'" s);
store_string_uchar (Uchar.unsafe_of_int v);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm mistaken, this will assert false in buffer.ml (after printing the warning) if v is too large or negative.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make the warning an error. Decimal escape sequences have the same problem (e.g. "\256").

lex/lexer.mll Outdated

(*
Lexers comment and action are quite similar,
they should lex both strings and characters,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should update this comment.

lex/lexer.mll Outdated
| '{' (['a'-'z' '_'] * as delim) '|'
{ quoted_string delim lexbuf;
comment lexbuf }
| '\''
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You change this one from "'" to '\'' but not the one in [action] (line 321). Why?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it more logical, but apparently "'" is more used in this file.

@314eter
Copy link
Copy Markdown
Contributor Author

314eter commented Jul 22, 2018

That counts as a bug-fix.

In that case, I suppose octal escape sequences should be allowed in characters too?

@damiendoligez
Copy link
Copy Markdown
Member

Looks good now.

@314eter 314eter mentioned this pull request Sep 16, 2018
@gasche gasche merged commit 632df7c into ocaml:trunk Sep 17, 2018
damiendoligez pushed a commit to damiendoligez/ocaml that referenced this pull request Nov 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants