-
Notifications
You must be signed in to change notification settings - Fork 124
Description
Hello.
In our project we are using following strategy to skip some lines having only , content.
CSV.open(file, 'r', skip_lines: /^,+$/)During update to Ruby 2.7 some specs related to this failed since it wasn't skipping lines properly. I have found out the difference is that RegExp is matched against whole line now including newline character for example. I was able to bisect the problem to 39da059 using this reproducing script:
require 'test/unit'
require 'timeout'
require_relative '../../lib/csv'
class TestCSVRegress < Test::Unit::TestCase
def setup
super
@sample_data = <<-CSV
line,1,abc
line,2,"def\nghi"
line,4,jkl
CSV
end
def test_line_trash
csv = CSV.new(@sample_data, :skip_lines => Class.new do
$lines = []
def self.match(line)
$lines << line
end
end
)
Timeout.timeout(0.1) do # this is needed to bandaid infinite loop bug in some csv versions
csv.count # do something to iterate over the file
assert_equal($lines, ["line,1,abc", "line,2,\"def", "ghi\"", "line,4,jkl"])
end
end
endThis spec is passing at v3.0.1 tag, but fails on current master with
[retro@retro csv (master $%=)]❤ ruby -Itest test/csv/test_regress.rb
Loaded suite test/csv/test_regress
Started
F
==============================================================================================================================================================================================
test/csv/test_regress.rb:25:in `test_line_trash'
22: end
23: )
24:
=> 25: timeout(0.1) do
26: csv.count # do something to iterate over the file
27:
28: assert_equal($lines, ["line,1,abc", "line,2,\"def", "ghi\"", "line,4,jkl"])
/home/retro/.rubies/ruby-2.5.7/lib/ruby/2.5.0/timeout.rb:122:in `timeout'
/home/retro/.rubies/ruby-2.5.7/lib/ruby/2.5.0/timeout.rb:108:in `timeout'
/home/retro/.rubies/ruby-2.5.7/lib/ruby/2.5.0/timeout.rb:33:in `catch'
/home/retro/.rubies/ruby-2.5.7/lib/ruby/2.5.0/timeout.rb:33:in `catch'
/home/retro/.rubies/ruby-2.5.7/lib/ruby/2.5.0/timeout.rb:33:in `block in catch'
/home/retro/.rubies/ruby-2.5.7/lib/ruby/2.5.0/timeout.rb:93:in `block in timeout'
test/csv/test_regress.rb:28:in `block in test_line_trash'
<["line,1,abc\n", "line,2,\"def\n", "ghi\"\n", "\n", "line,4,jkl\n"]> expected but was
<["line,1,abc", "line,2,\"def", "ghi\"", "line,4,jkl"]>
diff:
? ["line,1,abc\n", "line,2,\"def\n", "ghi\"\n", "\n", "line,4,jkl\n"]
Failure: test_line_trash(TestCSVRegress)
==============================================================================================================================================================================================
Finished in 0.005389607 seconds.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 tests, 1 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
0% passed
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
185.54 tests/s, 185.54 assertions/s
For now I have introduced custom class to strip the "payload" and do the check on top of that.
class Skipper
EMPTY_LINE = /^,+$/.freeze
def match(line)
line.strip.match?(EMPTY_LINE)
end
end
CSV.open(file, mode, skip_lines: Skipper.new)Anyway this seems like side-effect of mentioned change and unwanted behaviour change. I suggest to at least add spec for the payload, but I'm not sure which behaviour is intended (with/without newline character included).