Skip to content

Fix PGCopyInputStream readFromCopy() returning last row twice and add tests#2019

Merged
sehrope merged 2 commits into
pgjdbc:release/42.2from
sehrope:fix-copy-in-stream-count
Jan 12, 2021
Merged

Fix PGCopyInputStream readFromCopy() returning last row twice and add tests#2019
sehrope merged 2 commits into
pgjdbc:release/42.2from
sehrope:fix-copy-in-stream-count

Conversation

@sehrope

@sehrope sehrope commented Jan 9, 2021

Copy link
Copy Markdown
Member

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Does ./gradlew autostyleCheck checkstyleAll pass ?

Changes to Existing Features:

  • Have you written new tests for your core changes, as applicable?
  • Have you successfully run tests with your changes locally?

Fixes #2016.

Actual fix is one line, the rest is cleaning up the tests and adding some new ones to catch this behavior.

Having read through that PGCopyInputStream class a bit more, I really don't like the internals. Half the methods have variables that shadow the private members and it's pretty confusing to understand what it's trying to do. To keep this PR simple I haven't touched any of it here, but I'm planning on submitting a separate PR that eventually cleans that up. Maybe we should deprecate the CopyOut interface half of those classes first though.

@davecramer

Copy link
Copy Markdown
Member

Ideally we need this in 42.2.x, then we can cherry-pick it later for 42.3.x

@sehrope

sehrope commented Jan 10, 2021

Copy link
Copy Markdown
Member Author

Sounds good. Is there a branch for that?

Comment on lines 810 to 813

@vlsi vlsi Jan 10, 2021

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, however, I would suggest "high-level -> lower-level" ordering (== move this join above.

In other words, place methods in such an order that represents their usages in the "above" methods. It helps readers since typically, people read from top to bottom (even when they start reading from the middle of the file :) )

Comment thread pgjdbc/src/test/java/org/postgresql/test/jdbc4/PGCopyInputStreamTest.java Outdated
Comment thread pgjdbc/src/test/java/org/postgresql/test/jdbc4/PGCopyInputStreamTest.java Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it was broken. Please, never use new String(chunkBytes) as byte sequence can contain half-codepoint which would corrupt data. The proper way here is build the full byte array, and convert it to String, or use something like WriterOutputStream from commons-io.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you suggesting I replace it with here? The byte sequence is expected to exactly match the entirety of the string so I'm don't see how it could get anything else in response. Even if it did, it's checking for an exact match after anyway.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. add readFully(InputStream) helper method and convert everything into String at once.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I didn't do that because I wanted to count the chunks as well. It's not just testing that the totality of it creates the correct output, but that we had to read N (or N -1) chunks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then collect chunks to List<byte[]>, join them, and then convert to string.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refrain from new String(buf)

Comment on lines 100 to 90

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering. Have you tried new StrangeInputStream(in,...) + full read from the stream?

Technically speaking, if you read from that StrangeInputStream then it would perform random-sized reads to the downstream inputstream, which might uncover more bugs.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I just wanted to add something that mixed the two APIs. It's not so much an issue of reading in random amounts as it is mixing the java.io API that allows for arbitrary sized byte reads and the CopyOut API that returns entire rows.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to mix Java APIs, then you should exercise read() and read(byte[] buf, int off, int siz) methods as well (which StrangeInputStream could probably do automatically)

Comment on lines 107 to 108

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this might be factored-out like readFromCopyFully :)

@vlsi

vlsi commented Jan 10, 2021

Copy link
Copy Markdown
Member

Would you please add a note to changelog to Fixed section? I guess this might be important for the users to see when they check "what has changed".

@sehrope sehrope force-pushed the fix-copy-in-stream-count branch from e998b73 to 148bead Compare January 11, 2021 00:05
@sehrope

sehrope commented Jan 11, 2021

Copy link
Copy Markdown
Member Author

I force pushed a new version rebased atop master. It squashes the test related commits and removed the String join(...) helper method as it's not used anymore. Pass tests and style locally. Once CI gives the okay should be good to merge.

@sehrope sehrope force-pushed the fix-copy-in-stream-count branch from 148bead to bc8b99a Compare January 11, 2021 13:18
@sehrope

sehrope commented Jan 11, 2021

Copy link
Copy Markdown
Member Author

One more force push to include the changelog entry.

@davecramer

Copy link
Copy Markdown
Member

Sounds good. Is there a branch for that?

sorry for the late reply, yes https://github.com/pgjdbc/pgjdbc/tree/release/42.2

Refactor PGCopyInputStreamTest and add a failing test for reading using the
CopyOut interface method readFromCopy().
@sehrope sehrope changed the base branch from master to release/42.2 January 12, 2021 12:30
@sehrope sehrope force-pushed the fix-copy-in-stream-count branch from bc8b99a to 0037ce1 Compare January 12, 2021 12:30
@sehrope

sehrope commented Jan 12, 2021

Copy link
Copy Markdown
Member Author

Okay I switched out the base for release/42.2 and cherry-picked the PR atop it. Once CI passes again I'll merge it in and cherry-pick and merge into master too.

@sehrope sehrope merged commit 6c296a5 into pgjdbc:release/42.2 Jan 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

the last row is repeated on the PGCopyInputStream

3 participants