DataSetWriterOptions - handling of explicit length Sequences and Items by jmlaka · Pull Request #645 · Enet4/dicom-rs

jmlaka · 2025-04-11T19:36:15Z

This adds DatasetWriterOptions to the parser crate and should handle #636 and similar situations.

The options now contain an ExplicitLengthSqItemStrategy.

ExplicitLengthSqItemStrategy::NoChange - is the previous behavior, where explicit length items and sequences were left with their length unchanged, which in case of modifying the sequences or items content could lead to incorrect explicit length.
As this situation would occur on a regular basis, and the resulting errors are hard to debug, this is no more the default strategy and risks resulting from it's use are documented.

ExplicitLengthSqItemStrategy::SetUndefined is the new default strategy. It simply writes all Items and Sequences with
Length::UNDEFINED irrespective of the original Length enum variant.
Upon inspection of the writer byte output, this will appear counterintuitive to the user.
This is documented.

The final strategy - recalculating the lengths - is not implemented.
Reasons are:

requires propagation of "bytes_written" value from lower level apis or using some mock writer to count bytes
recalculation is expensive
current state, even though not optimal and counterintuitive, is error free and DICOM compliant

jmlaka · 2025-04-18T15:43:39Z

Let me just throw some thoughts here.
I think this PR is a quick fix but doesn't achieve the likes of encoding standard comparable to say DCMTK.

To implement the recalculation strategy, I think there are 2 ways to approach this:
1.DataSetWriter will peek() at the following tokens. Encoder will write to a byte sink, throwing the output away but counting the bytes written. Then set the Sequence length to bytes written. Continue writing / encoding as before ...

Instead of a byte sink, encoder will write to memory. After setting the calculated Sequence length. Writer will write_all() of the memory buffer. This is more efficient.

encoder is strongly coupled with write target using the StatefulEncoder.

This makes it impossible for the DataSetWriter to include a second (and possible third) printer / encoder with a different write target (own Vec, sink). It's because the StatefulEncoder::new() moves in and encoder value which, at this point would have to be Clone d and used to encode to a different target. The EncoderTo trait object cannot be Clone.

If you would be willing to approve the recalculation strategy, how would you go about changing the current abstractions ?

Could we just make the StatefulEncoder have a flag to change the internal self.encoder write target, e.g. switch between
self.to , std::io::Sink and Vec<u8> ?

Thank you

Enet4 · 2025-06-10T16:51:41Z

You are right, there is a rather tight connection between the encoder and the write destination at the moment. This is also the case for decoders, with its own set of problems alongside it. Performance-wise this has been mitigated through a BufWriter, but if I understood correctly, the problem is that we cannot write an accurate length field before we actually go through the whole content associated with that length. For primitive values the solution is trivial, but it stops being so when working with sequence data.

I wouldn't mind relaxing this constraint (as in changing the encoder-writer type handshake), but I suspect this would come with substantial rewriting all the way down from the dicom-encoding crate. But maybe there are simpler alternatives.

StatefulEncoder already has a buffer, so we may be able to extend it to support accumulating even more data before it flushes to the encoder.
Or, with something like the code below, we can create a new stateful encoder which writes the data somewhere else, allowing the data set writer to devise more sequence writing strategies.

impl<W, E, T> StatefulEncoder<W, E, T> {
    pub fn for_writer<U>(&self, to: U) -> StatefulEncoder<U, E, T>
    where
        U: Write,
        E: Clone,
        T: Clone,
    {
        StatefulEncoder {
            to,
            encoder: self.encoder.clone(),
            text: self.text.clone(),
            bytes_written: self.bytes_written,
            buffer: self.buffer.clone(),
        }
    }
}

Let me know if you will want to reiterate on this PR of you'd like it to be admitted as is.

Enet4

I had overlooked this before, but the changes to the constructor functions in DataSetWriter constitute breaking changes. This isn't a problem, though it will require me to postpone merging this until I start collecting everything for v0.9, unless the breaking changes are reverted in the meantime.

parser/src/dataset/write.rs

- revert `new`, add `new_with_options` - revert `new_with_codec`, add `new_with_codec_options` - add simple documentation

Enet4

Should be good now. Thank you again! 👍

Enet4 added enhancement A-lib Area: library C-object Crate: dicom-object C-parser Crate: dicom-parser labels Apr 15, 2025

Enet4 self-requested a review April 15, 2025 18:00

Enet4 force-pushed the Writer_Options branch from 9ef776b to 40cce73 Compare June 10, 2025 16:30

Enet4 reviewed Jun 27, 2025

View reviewed changes

parser/src/dataset/write.rs Outdated Show resolved Hide resolved

parser/src/dataset/write.rs Outdated Show resolved Hide resolved

Enet4 added the breaking change Hint that this may require a major version bump on release label Jun 27, 2025

jmlaka mentioned this pull request Aug 11, 2025

Implement DICOMDIR #669

Draft

Enet4 self-assigned this Sep 25, 2025

Enet4 force-pushed the Writer_Options branch from 40cce73 to 7fc431f Compare September 25, 2025 18:46

jmlaka and others added 4 commits September 27, 2025 21:01

- commit

b277584

- typo in doc

62a9ef6

- doc addition

c9c9a35

[parser] retain existing API in DataSetWriter

50919fd

- revert `new`, add `new_with_options` - revert `new_with_codec`, add `new_with_codec_options` - add simple documentation

Enet4 force-pushed the Writer_Options branch from 0fcf0db to 50919fd Compare September 27, 2025 20:02

Enet4 approved these changes Sep 29, 2025

View reviewed changes

Enet4 merged commit 6e074b9 into Enet4:master Sep 29, 2025
5 checks passed

This was referenced Oct 1, 2025

Explicit sequence length strategy: recalculate #692

Open

[parser] Do not clear length of pixel data fragment item starts #693

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataSetWriterOptions - handling of explicit length Sequences and Items#645

DataSetWriterOptions - handling of explicit length Sequences and Items#645
Enet4 merged 4 commits intoEnet4:masterfrom
jmlaka:Writer_Options

jmlaka commented Apr 11, 2025

Uh oh!

jmlaka commented Apr 18, 2025

Uh oh!

Enet4 commented Jun 10, 2025

Uh oh!

Enet4 left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Enet4 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jmlaka commented Apr 11, 2025

Uh oh!

jmlaka commented Apr 18, 2025

Uh oh!

Enet4 commented Jun 10, 2025

Uh oh!

Enet4 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Enet4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Enet4 left a comment •

edited

Loading