Skip to content

Panic in LoadURL due to nil pointer dereference #75

@zak-pawel

Description

@zak-pawel

Description

The LoadURL function in query.go causes a runtime panic when attempting to call the Close method on a gzip.Reader instance that is nil.
This issue occurs due to Go's behavior with interface values that have a nil underlying value.

Steps to Reproduce

  1. Call the LoadURL function with a URL that triggers the gzipReader error case, for example this test:
func TestLoadURL_ErrorCases(t *testing.T) {
  tests := []struct {
  	name            string
  	contentEncoding string
  	responseBody    []byte
  	expectedError   string
  }{
  	{
  		name:            "ZlibReaderError",
  		contentEncoding: "deflate",
  		responseBody:    []byte("invalid zlib data"),
  		expectedError:   "zlib",
  	},
  	{
  		name:            "EmptyContentEncodingNilBody",
  		contentEncoding: "",
  		responseBody:    nil,
  		expectedError:   "EOF",
  	},
  	{
  		name:            "GzipReaderError",
  		contentEncoding: "gzip",
  		responseBody:    []byte("invalid gzip data"),
  		expectedError:   "gzip",
  	},
  }

  for _, tt := range tests {
  	t.Run(tt.name, func(t *testing.T) {
  		// Create a local HTTP server
  		server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
  			w.Header().Set("Content-Encoding", tt.contentEncoding)
  			w.WriteHeader(http.StatusOK)
  			w.Write(tt.responseBody)
  		}))
  		defer server.Close()

  		node, err := LoadURL(server.URL)

  		// Assert that the returned node is nil
  		require.Nil(t, node)

  		// Assert that an error is returned
  		require.NotNil(t, err)

  		// Optionally, check the error message
  		require.Contains(t, err.Error(), tt.expectedError)
  	})
  }
}
  1. Observe the runtime panic.

Actual Behavior

The function panics with the following error after running mentioned test:

go test -v ./...
...

=== RUN   TestLoadURL_ErrorCases
=== RUN   TestLoadURL_ErrorCases/ZlibReaderError
=== RUN   TestLoadURL_ErrorCases/EmptyContentEncodingNilBody
=== RUN   TestLoadURL_ErrorCases/GzipReaderError
--- FAIL: TestLoadURL_ErrorCases (0.00s)
    --- PASS: TestLoadURL_ErrorCases/ZlibReaderError (0.00s)
    --- PASS: TestLoadURL_ErrorCases/EmptyContentEncodingNilBody (0.00s)
    --- FAIL: TestLoadURL_ErrorCases/GzipReaderError (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x68 pc=0x53bf0e]

goroutine 70 [running]:
testing.tRunner.func1.2({0x7996c0, 0xba8350})
        /usr/local/go/src/testing/testing.go:1734 +0x21c
testing.tRunner.func1()
        /usr/local/go/src/testing/testing.go:1737 +0x35e
panic({0x7996c0?, 0xba8350?})
        /usr/local/go/src/runtime/panic.go:792 +0x132
compress/gzip.(*Reader).Close(0x0?)
        /usr/local/go/src/compress/gzip/gunzip.go:290 +0xe
github.com/antchfx/htmlquery.LoadURL.func1()
        /home/pzak/work/htmlquery/query.go:109 +0x24
github.com/antchfx/htmlquery.LoadURL({0xc00011a120?, 0xc00012cb10?})
        /home/pzak/work/htmlquery/query.go:117 +0x5e2
github.com/antchfx/htmlquery.TestLoadURL_ErrorCases.func1(0xc00017a700)
        /home/pzak/work/htmlquery/query_test.go:232 +0xb7
testing.tRunner(0xc00017a700, 0xc00012cad0)
        /usr/local/go/src/testing/testing.go:1792 +0xf4
created by testing.(*T).Run in goroutine 53
        /usr/local/go/src/testing/testing.go:1851 +0x413
FAIL    github.com/antchfx/htmlquery    0.009s
FAIL

Expected Behavior

The function should handle the nil value gracefully and not panic.

Root Cause

The issue occurs when gzip.NewReader returns an error during the creation of the reader. In such cases, the reader variable remains nil, and the subsequent call to reader.Close() results in a nil pointer dereference. This behavior is related to Go's handling of interface values with a nil underlying value, as explained in the Go documentation: Interface values with nil underlying values.

Additionally, the zlib.NewReader function does not cause this issue because it returns an io.ReadCloser, which is an interface type.
On the other hand, gzip.NewReader returns a concrete type *Reader, which can lead to a nil pointer dereference if not properly checked.

This can be easily observed after putting fmt.Printf("(%v, %T)\n", reader, reader) at the beginning of defer function https://github.com/antchfx/htmlquery/blob/v1.3.4/query.go#L107-L111 to have:

	defer func() {
		fmt.Printf("(%v, %T)\n", reader, reader)
		if reader != nil {
			reader.Close()
		}
	}()

With this change, mentioned test produces following output:

...

=== RUN   TestLoadURL_ErrorCases
=== RUN   TestLoadURL_ErrorCases/ZlibReaderError
(<nil>, <nil>)
=== RUN   TestLoadURL_ErrorCases/EmptyContentEncodingNilBody
({}, http.noBody)
=== RUN   TestLoadURL_ErrorCases/GzipReaderError
(<nil>, *gzip.Reader)
--- FAIL: TestLoadURL_ErrorCases (0.00s)

...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions