Skip to content

DataSetWriterOptions - handling of explicit length Sequences and Items#645

Merged
Enet4 merged 4 commits intoEnet4:masterfrom
jmlaka:Writer_Options
Sep 29, 2025
Merged

DataSetWriterOptions - handling of explicit length Sequences and Items#645
Enet4 merged 4 commits intoEnet4:masterfrom
jmlaka:Writer_Options

Conversation

@jmlaka
Copy link
Copy Markdown
Contributor

@jmlaka jmlaka commented Apr 11, 2025

This adds DatasetWriterOptions to the parser crate and should handle #636 and similar situations.

The options now contain an ExplicitLengthSqItemStrategy.

ExplicitLengthSqItemStrategy::NoChange - is the previous behavior, where explicit length items and sequences were left with their length unchanged, which in case of modifying the sequences or items content could lead to incorrect explicit length.
As this situation would occur on a regular basis, and the resulting errors are hard to debug, this is no more the default strategy and risks resulting from it's use are documented.

ExplicitLengthSqItemStrategy::SetUndefined is the new default strategy. It simply writes all Items and Sequences with
Length::UNDEFINED irrespective of the original Length enum variant.
Upon inspection of the writer byte output, this will appear counterintuitive to the user.
This is documented.

The final strategy - recalculating the lengths - is not implemented.
Reasons are:

  • requires propagation of "bytes_written" value from lower level apis or using some mock writer to count bytes
  • recalculation is expensive
  • current state, even though not optimal and counterintuitive, is error free and DICOM compliant

@Enet4 Enet4 added enhancement A-lib Area: library C-object Crate: dicom-object C-parser Crate: dicom-parser labels Apr 15, 2025
@Enet4 Enet4 self-requested a review April 15, 2025 18:00
@jmlaka
Copy link
Copy Markdown
Contributor Author

jmlaka commented Apr 18, 2025

Let me just throw some thoughts here.
I think this PR is a quick fix but doesn't achieve the likes of encoding standard comparable to say DCMTK.

To implement the recalculation strategy, I think there are 2 ways to approach this:
1.DataSetWriter will peek() at the following tokens. Encoder will write to a byte sink, throwing the output away but counting the bytes written. Then set the Sequence length to bytes written. Continue writing / encoding as before ...

  1. Instead of a byte sink, encoder will write to memory. After setting the calculated Sequence length. Writer will write_all() of the memory buffer. This is more efficient.

encoder is strongly coupled with write target using the StatefulEncoder.

This makes it impossible for the DataSetWriter to include a second (and possible third) printer / encoder with a different write target (own Vec, sink). It's because the StatefulEncoder::new() moves in and encoder value which, at this point would have to be Clone d and used to encode to a different target. The EncoderTo trait object cannot be Clone.

If you would be willing to approve the recalculation strategy, how would you go about changing the current abstractions ?

Could we just make the StatefulEncoder have a flag to change the internal self.encoder write target, e.g. switch between
self.to , std::io::Sink and Vec<u8> ?

Thank you

@Enet4
Copy link
Copy Markdown
Owner

Enet4 commented Jun 10, 2025

You are right, there is a rather tight connection between the encoder and the write destination at the moment. This is also the case for decoders, with its own set of problems alongside it. Performance-wise this has been mitigated through a BufWriter, but if I understood correctly, the problem is that we cannot write an accurate length field before we actually go through the whole content associated with that length. For primitive values the solution is trivial, but it stops being so when working with sequence data.

I wouldn't mind relaxing this constraint (as in changing the encoder-writer type handshake), but I suspect this would come with substantial rewriting all the way down from the dicom-encoding crate. But maybe there are simpler alternatives.

  • StatefulEncoder already has a buffer, so we may be able to extend it to support accumulating even more data before it flushes to the encoder.
  • Or, with something like the code below, we can create a new stateful encoder which writes the data somewhere else, allowing the data set writer to devise more sequence writing strategies.
impl<W, E, T> StatefulEncoder<W, E, T> {
    pub fn for_writer<U>(&self, to: U) -> StatefulEncoder<U, E, T>
    where
        U: Write,
        E: Clone,
        T: Clone,
    {
        StatefulEncoder {
            to,
            encoder: self.encoder.clone(),
            text: self.text.clone(),
            bytes_written: self.bytes_written,
            buffer: self.buffer.clone(),
        }
    }
}

Let me know if you will want to reiterate on this PR of you'd like it to be admitted as is.

Copy link
Copy Markdown
Owner

@Enet4 Enet4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had overlooked this before, but the changes to the constructor functions in DataSetWriter constitute breaking changes. This isn't a problem, though it will require me to postpone merging this until I start collecting everything for v0.9, unless the breaking changes are reverted in the meantime.

@Enet4 Enet4 added the breaking change Hint that this may require a major version bump on release label Jun 27, 2025
@jmlaka jmlaka mentioned this pull request Aug 11, 2025
@Enet4 Enet4 self-assigned this Sep 25, 2025
jmlaka and others added 4 commits September 27, 2025 21:01
- revert `new`, add `new_with_options`
- revert `new_with_codec`, add `new_with_codec_options`
- add simple documentation
Copy link
Copy Markdown
Owner

@Enet4 Enet4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good now. Thank you again! 👍

@Enet4 Enet4 merged commit 6e074b9 into Enet4:master Sep 29, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-lib Area: library breaking change Hint that this may require a major version bump on release C-object Crate: dicom-object C-parser Crate: dicom-parser enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants