-
Notifications
You must be signed in to change notification settings - Fork 265
Closed
Description
Originally reported by @jeanm in #272 (comment)
I noticed UD_French has a comment above each annotated sentence with the unnanotated text. For example:
# sentid: fr-ud-dev_00001
# sentence-text: Aviator, un film sur la vie de Hughes.
1 Aviator _ PROPN _ _ 0 root _ _
2 , _ PUNCT _ _ 1 punct _ _
3 un _ DET _ _ 4 det _ _
4 film _ NOUN _ _ 1 appos _ _
5 sur _ ADP _ _ 7 case _ _
6 la _ DET _ _ 7 det _ _
7 vie _ NOUN _ _ 4 nmod _ _
8 de _ ADP _ _ 9 case _ _
9 Hughes _ PROPN _ _ 7 nmod _ _
10 . _ PUNCT _ _ 1 punct _ _
Many other treebanks have some sort of sentence id in the comments too, but they all use different formats. It would be nice if these conventions could be standardized, perhaps by specifying an optional # sentence-text: <unannotated text> line before each annotated sentence.