Skip to content

heuristic to extract comments from .msg files #316

@dirk-thomas

Description

@dirk-thomas

Related to #298.

While the file format of .msg files contains comments it isn't expressive enough to correlated them to specific parts of the file. In order to maintain the information from the comments when transforming the interface definition to .idl files (either transparently during the build or explicitly by the user) a heuristic can be applied to at least guest the desired correlation and include the comments in the resulting .idl file.

While that heuristic might not work in all existing cases perfectly it should allow that existing .msg could be updated to match the heuristic. The heuristic should also be easy to document as well as implement.

The following distinguishes three kind of comments:

  • a "full comment line" line starting with a #
  • an "indented comment line" starting with whitespace followed by a #
  • a "trailing comment" starting with an actual definition (a constant or field) followed by a #

The proposed heuristic:

  1. The first N consecutive "full comment lines" are correlated to the message itself.

    # Message
    # specific
    # comment
    
    # unrelated
    
    # Message
    # specific
    # comment
        # unrelated
    
    # Message
    # specific
    # comment
    
    bool foo  # unrelated
    
  2. All "full comment lines" not part of the message level are used for the first definition.

    # Message specific comment
    
    # Field specific comment
    
    # Another field specific comment
    bool foo
    
  3. A "trailing comment" is associated to the definition made on the same line.

    bool foo  # Field specific comment
    
  4. An "indented comment line" is associated to the previous definition (if there has been one). Otherwise it is being ignored.

    bool foo  # not necessary but often combined with following indented comment lines
              # comment specific to 'foo'
              # another comment specific to 'foo'
    

These comments are then being transformed into an annotation in the .idl file.


The comment lines correlated to a definition can additionally be scanned for known patterns. For now only a single one is proposed here:

  1. If the comment lines of a definition contain a string like [...] the content between the brackets is considered to be a unit. To avoid the misinterpretation of ranges the inner string must not contain a comma or dash. The detected range is then transformed into a @unit annotation in the .idl file and removed from the comment.

I am looking for early feedback in the proposal.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions