Skip to content

[MEDI] Parsing markdown fails for tables without trailing | #7132

@adamsitnik

Description

@adamsitnik

A bug reported offline by @KrystofS :

Sample markdown that breaks the reader, taken from here:

| Flag                                       |      Value | Description
|:-------------------------------------------|-----------:|:-----------
| READYTORUN_FLAG_PLATFORM_NEUTRAL_SOURCE    | 0x00000001 | Set if the original IL image was platform neutral. The platform neutrality is part of assembly name. This flag can be used to reconstruct the full original assembly name.
| READYTORUN_FLAG_COMPOSITE                  | 0x00000002 | The image represents a composite R2R file resulting from a combined compilation of a larger number of input MSIL assemblies.
| READYTORUN_FLAG_PARTIAL                    | 0x00000004 |
| READYTORUN_FLAG_NONSHARED_PINVOKE_STUBS    | 0x00000008 | PInvoke stubs compiled into image are non-shareable (no secret parameter)
| READYTORUN_FLAG_EMBEDDED_MSIL              | 0x00000010 | Input MSIL is embedded in the R2R image.
| READYTORUN_FLAG_COMPONENT                  | 0x00000020 | This is a component assembly of a composite R2R image
| READYTORUN_FLAG_MULTIMODULE_VERSION_BUBBLE | 0x00000040 | This R2R module has multiple modules within its version bubble (For versions before version 6.3, all modules are assumed to possibly have this characteristic)
| READYTORUN_FLAG_UNRELATED_R2R_CODE         | 0x00000080 | This R2R module has code in it that would not be naturally encoded into this module
| READYTORUN_FLAG_PLATFORM_NATIVE_IMAGE      | 0x00000100 | The owning composite executable is in the platform native format
fail: Microsoft.Extensions.DataIngestion.IngestionPipeline[6]
      An error occurred while ingesting document 'C:\Users\Krystof\Documents\GitHub\chat-demo\MyLocalAIApp\wwwroot\Data\readytorun-format.md'.
      System.IndexOutOfRangeException: Index was outside the bounds of the array.
         at Microsoft.Extensions.DataIngestion.MarkdownParser.GetCells(Table table, String outputContent)
         at Microsoft.Extensions.DataIngestion.MarkdownParser.MapBlock(String documentMarkdown, Boolean previousWasBreak, Block block)
         at Microsoft.Extensions.DataIngestion.MarkdownParser.Map(MarkdownDocument markdownDocument, String documentMarkdown, String identifier)
         at Microsoft.Extensions.DataIngestion.MarkdownParser.Parse(String markdown, String identifier)
         at Microsoft.Extensions.DataIngestion.MarkdownReader.ReadAsync(Stream source, String identifier, String mediaType, CancellationToken cancellationToken)
         at Microsoft.Extensions.DataIngestion.IngestionDocumentReader.ReadAsync(FileInfo source, String identifier, String mediaType, CancellationToken cancellationToken)
         at Microsoft.Extensions.DataIngestion.IngestionPipeline`1.ProcessAsync(IEnumerable`1 files, Activity rootActivity, CancellationToken cancellationToken)+MoveNext()

Ideally we would recognize this pattern and map it to a valid table structure.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions