-
Notifications
You must be signed in to change notification settings - Fork 77
Description
Description
The LoadURL function in query.go causes a runtime panic when attempting to call the Close method on a gzip.Reader instance that is nil.
This issue occurs due to Go's behavior with interface values that have a nil underlying value.
Steps to Reproduce
- Call the
LoadURLfunction with a URL that triggers the gzipReader error case, for example this test:
func TestLoadURL_ErrorCases(t *testing.T) {
tests := []struct {
name string
contentEncoding string
responseBody []byte
expectedError string
}{
{
name: "ZlibReaderError",
contentEncoding: "deflate",
responseBody: []byte("invalid zlib data"),
expectedError: "zlib",
},
{
name: "EmptyContentEncodingNilBody",
contentEncoding: "",
responseBody: nil,
expectedError: "EOF",
},
{
name: "GzipReaderError",
contentEncoding: "gzip",
responseBody: []byte("invalid gzip data"),
expectedError: "gzip",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Create a local HTTP server
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Encoding", tt.contentEncoding)
w.WriteHeader(http.StatusOK)
w.Write(tt.responseBody)
}))
defer server.Close()
node, err := LoadURL(server.URL)
// Assert that the returned node is nil
require.Nil(t, node)
// Assert that an error is returned
require.NotNil(t, err)
// Optionally, check the error message
require.Contains(t, err.Error(), tt.expectedError)
})
}
}- Observe the runtime panic.
Actual Behavior
The function panics with the following error after running mentioned test:
go test -v ./......
=== RUN TestLoadURL_ErrorCases
=== RUN TestLoadURL_ErrorCases/ZlibReaderError
=== RUN TestLoadURL_ErrorCases/EmptyContentEncodingNilBody
=== RUN TestLoadURL_ErrorCases/GzipReaderError
--- FAIL: TestLoadURL_ErrorCases (0.00s)
--- PASS: TestLoadURL_ErrorCases/ZlibReaderError (0.00s)
--- PASS: TestLoadURL_ErrorCases/EmptyContentEncodingNilBody (0.00s)
--- FAIL: TestLoadURL_ErrorCases/GzipReaderError (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0x53bf0e]
goroutine 70 [running]:
testing.tRunner.func1.2({0x7996c0, 0xba8350})
/usr/local/go/src/testing/testing.go:1734 +0x21c
testing.tRunner.func1()
/usr/local/go/src/testing/testing.go:1737 +0x35e
panic({0x7996c0?, 0xba8350?})
/usr/local/go/src/runtime/panic.go:792 +0x132
compress/gzip.(*Reader).Close(0x0?)
/usr/local/go/src/compress/gzip/gunzip.go:290 +0xe
github.com/antchfx/htmlquery.LoadURL.func1()
/home/pzak/work/htmlquery/query.go:109 +0x24
github.com/antchfx/htmlquery.LoadURL({0xc00011a120?, 0xc00012cb10?})
/home/pzak/work/htmlquery/query.go:117 +0x5e2
github.com/antchfx/htmlquery.TestLoadURL_ErrorCases.func1(0xc00017a700)
/home/pzak/work/htmlquery/query_test.go:232 +0xb7
testing.tRunner(0xc00017a700, 0xc00012cad0)
/usr/local/go/src/testing/testing.go:1792 +0xf4
created by testing.(*T).Run in goroutine 53
/usr/local/go/src/testing/testing.go:1851 +0x413
FAIL github.com/antchfx/htmlquery 0.009s
FAIL
Expected Behavior
The function should handle the nil value gracefully and not panic.
Root Cause
The issue occurs when gzip.NewReader returns an error during the creation of the reader. In such cases, the reader variable remains nil, and the subsequent call to reader.Close() results in a nil pointer dereference. This behavior is related to Go's handling of interface values with a nil underlying value, as explained in the Go documentation: Interface values with nil underlying values.
Additionally, the zlib.NewReader function does not cause this issue because it returns an io.ReadCloser, which is an interface type.
On the other hand, gzip.NewReader returns a concrete type *Reader, which can lead to a nil pointer dereference if not properly checked.
This can be easily observed after putting fmt.Printf("(%v, %T)\n", reader, reader) at the beginning of defer function https://github.com/antchfx/htmlquery/blob/v1.3.4/query.go#L107-L111 to have:
defer func() {
fmt.Printf("(%v, %T)\n", reader, reader)
if reader != nil {
reader.Close()
}
}()
With this change, mentioned test produces following output:
...
=== RUN TestLoadURL_ErrorCases
=== RUN TestLoadURL_ErrorCases/ZlibReaderError
(<nil>, <nil>)
=== RUN TestLoadURL_ErrorCases/EmptyContentEncodingNilBody
({}, http.noBody)
=== RUN TestLoadURL_ErrorCases/GzipReaderError
(<nil>, *gzip.Reader)
--- FAIL: TestLoadURL_ErrorCases (0.00s)
...