heuristic to extract comments from .msg files

Related to #298.

While the file format of `.msg` files contains comments it isn't expressive enough to correlated them to specific parts of the file. In order to maintain the information from the comments when transforming the interface definition to `.idl` files (either transparently during the build or explicitly by the user) a heuristic can be applied to at least guest the desired correlation and include the comments in the resulting `.idl` file.

While that heuristic might not work in all existing cases perfectly it should allow that existing `.msg` could be updated to match the heuristic. The heuristic should also be easy to document as well as implement.

The following distinguishes three kind of comments:

* a "full comment line" line starting with a `#`
* an "indented comment line" starting with whitespace followed by a `#`
* a "trailing comment" starting with an actual definition (a constant or field) followed by a `#`

The proposed heuristic:

1. **The first N consecutive "full comment lines" are correlated to the message itself.**

   ```
   # Message
   # specific
   # comment
   
   # unrelated
   ```

   ```
   # Message
   # specific
   # comment
       # unrelated
   ```

   ```
   # Message
   # specific
   # comment
   
   bool foo  # unrelated
   ```

2. **All "full comment lines" not part of the message level are used for the first definition.**

   ```
   # Message specific comment
   
   # Field specific comment

   # Another field specific comment
   bool foo
   ```


3. **A "trailing comment" is associated to the definition made on the same line.**

   ```
   bool foo  # Field specific comment
   ```

4. **An "indented comment line" is associated to the *previous* definition** (if there has been one). Otherwise it is being ignored.

   ```
   bool foo  # not necessary but often combined with following indented comment lines
             # comment specific to 'foo'
             # another comment specific to 'foo'
   ```

These comments are then being transformed into an annotation in the `.idl` file.

---

The comment lines correlated to a definition can additionally be scanned for known patterns. For now only a single one is proposed here:

1. **If the comment lines of a definition contain a string like `[...]` the content between the brackets is considered to be a unit.** To avoid the misinterpretation of ranges the inner string must not contain a comma or dash. The detected range is then transformed into a `@unit` annotation in the `.idl` file and removed from the comment.

I am looking for early feedback in the proposal.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

heuristic to extract comments from .msg files #316

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

heuristic to extract comments from .msg files #316

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions