Skip to content

skip_lines payload difference across verions #194

@simi

Description

@simi

Hello.

In our project we are using following strategy to skip some lines having only , content.

CSV.open(file, 'r', skip_lines: /^,+$/)

During update to Ruby 2.7 some specs related to this failed since it wasn't skipping lines properly. I have found out the difference is that RegExp is matched against whole line now including newline character for example. I was able to bisect the problem to 39da059 using this reproducing script:

require 'test/unit'
require 'timeout'
require_relative '../../lib/csv'
class TestCSVRegress < Test::Unit::TestCase
  def setup
    super
    @sample_data = <<-CSV
line,1,abc
line,2,"def\nghi"

line,4,jkl
    CSV
  end

  def test_line_trash
    csv = CSV.new(@sample_data, :skip_lines => Class.new do
      $lines = []
      def self.match(line)
        $lines << line
      end
    end
                 )

    Timeout.timeout(0.1) do # this is needed to bandaid infinite loop bug in some csv versions
      csv.count # do something to iterate over the file

      assert_equal($lines, ["line,1,abc", "line,2,\"def", "ghi\"", "line,4,jkl"])
    end
  end
end

This spec is passing at v3.0.1 tag, but fails on current master with

[retro@retro  csv (master $%=)]❤ ruby -Itest test/csv/test_regress.rb 
Loaded suite test/csv/test_regress
Started
F
==============================================================================================================================================================================================
test/csv/test_regress.rb:25:in `test_line_trash'
     22:     end
     23:                  )
     24: 
  => 25:     timeout(0.1) do
     26:       csv.count # do something to iterate over the file
     27: 
     28:       assert_equal($lines, ["line,1,abc", "line,2,\"def", "ghi\"", "line,4,jkl"])
/home/retro/.rubies/ruby-2.5.7/lib/ruby/2.5.0/timeout.rb:122:in `timeout'
/home/retro/.rubies/ruby-2.5.7/lib/ruby/2.5.0/timeout.rb:108:in `timeout'
/home/retro/.rubies/ruby-2.5.7/lib/ruby/2.5.0/timeout.rb:33:in `catch'
/home/retro/.rubies/ruby-2.5.7/lib/ruby/2.5.0/timeout.rb:33:in `catch'
/home/retro/.rubies/ruby-2.5.7/lib/ruby/2.5.0/timeout.rb:33:in `block in catch'
/home/retro/.rubies/ruby-2.5.7/lib/ruby/2.5.0/timeout.rb:93:in `block in timeout'
test/csv/test_regress.rb:28:in `block in test_line_trash'
<["line,1,abc\n", "line,2,\"def\n", "ghi\"\n", "\n", "line,4,jkl\n"]> expected but was
<["line,1,abc", "line,2,\"def", "ghi\"", "line,4,jkl"]>

diff:
? ["line,1,abc\n", "line,2,\"def\n", "ghi\"\n", "\n", "line,4,jkl\n"]
Failure: test_line_trash(TestCSVRegress)
==============================================================================================================================================================================================

Finished in 0.005389607 seconds.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 tests, 1 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
0% passed
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
185.54 tests/s, 185.54 assertions/s

For now I have introduced custom class to strip the "payload" and do the check on top of that.

class Skipper
  EMPTY_LINE = /^,+$/.freeze

  def match(line)
    line.strip.match?(EMPTY_LINE)
  end
end

CSV.open(file, mode, skip_lines: Skipper.new)

Anyway this seems like side-effect of mentioned change and unwanted behaviour change. I suggest to at least add spec for the payload, but I'm not sure which behaviour is intended (with/without newline character included).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions