Skip to content

fix(deps): update dependency org.jsoup:jsoup to v1.22.2#2978

Merged
bjagg merged 1 commit into
masterfrom
renovate/jsoupversion
May 4, 2026
Merged

fix(deps): update dependency org.jsoup:jsoup to v1.22.2#2978
bjagg merged 1 commit into
masterfrom
renovate/jsoupversion

Conversation

@renovate

@renovate renovate Bot commented May 4, 2026

Copy link
Copy Markdown
Contributor

This PR contains the following updates:

Package Change Age Confidence
org.jsoup:jsoup (source) 1.20.11.22.2 age confidence

Warning

Some dependencies could not be looked up. Check the Dependency Dashboard for more information.


Release Notes

jhy/jsoup (org.jsoup:jsoup)

v1.22.2

Improvements
  • Expanded and clarified NodeTraversor support for in-place DOM rewrites during NodeVisitor.head(). Current-node edits such as remove, replace, and unwrap now recover more predictably, while traversal stays within the original root subtree. This makes single-pass tree cleanup and normalization visitors easier to write, for example when unwrapping presentational elements or replacing text nodes as you walk the DOM. #​2472
  • Documentation: clarified that a configured Cleaner may be reused across concurrent threads, and that shared Safelist instances should not be mutated while in use. #​2473
  • Updated the default HTML TagSet for current HTML elements: added dialog, search, picture, and slot; made ins, del, button, audio, video, and canvas inline by default (Tag#isInline(), aligned to phrasing content in the spec); and added readable Element.text() boundaries for controls and embedded objects via the new Tag.TextBoundary option. This improves pretty-printing and keeps normalized text from running adjacent words together. #​2493
Bug Fixes
  • Android (R8/ProGuard): added a rule to ignore the optional re2j dependency when not present. #​2459
  • Fixed a NodeTraversor regression in 1.21.2 where removing or replacing the current node during head() could revisit the replacement node and loop indefinitely. The traversal docs now also clarify which inserted nodes are visited in the current pass. #​2472
  • Parsing during charset sniffing no longer fails if an advisory available() call throws IOException, as seen on JDK 8 HttpURLConnection. #​2474
  • Cleaner no longer makes relative URL attributes in the input document absolute when cleaning or validating a Document. URL normalization now applies only to the cleaned output, and Safelist.isSafeAttribute() is side effect free. #​2475
  • Cleaner no longer duplicates enforced attributes when the input Document preserves attribute case. A case-variant source attribute is now replaced by the enforced attribute in the cleaned output. #​2476
  • If a per-request SOCKS proxy is configured, jsoup now avoids using the JDK HttpClient, because the JDK would silently ignore that proxy and attempt to connect directly. Those requests now fall back to the legacy HttpURLConnection transport instead, which does support SOCKS. #​2468
  • Connection.Response.streamParser() and DataUtil.streamParser(Path, ...) could fail on small inputs without a declared charset, if the initial 5 KB charset sniff fully consumed the input and closed it before the stream parse began. #​2483
  • In XML mode, doctypes with an internal subset, such as <!DOCTYPE root [<!ENTITY name "value">]>, now round-trip correctly. The subset is preserved as raw text only; entities are not expanded and external DTDs are not loaded. #​2486
Build Changes
  • Migrated the integration test server from Jetty to Netty, which actively maintains support for our minimum JDK target (8). #​2491

v1.22.1

Improvements
  • Added support for using the re2j regular expression engine for regex-based CSS selectors (e.g. [attr~=regex], :matches(regex)), which ensures linear-time performance for regex evaluation. This allows safer handling of arbitrary user-supplied query regexes. To enable, add the com.google.re2j dependency to your classpath, e.g.:
  <dependency>
    <groupId>com.google.re2j</groupId>
    <artifactId>re2j</artifactId>
    <version>1.8</version>
  </dependency>

(If you already have that dependency in your classpath, but you want to keep using the Java regex engine, you can disable re2j via System.setProperty("jsoup.useRe2j", "false").) You can confirm that the re2j engine has been enabled correctly by calling org.jsoup.helper.Regex.usingRe2j(). #​2407

  • Added an instance method Parser#unescape(String, boolean) that unescapes HTML entities using the parser's configuration (e.g. to support error tracking), complementing the existing static utility Parser.unescapeEntities(String, boolean). #​2396
  • Added a configurable maximum parser depth (to limit the number of open elements on stack) to both HTML and XML parsers. The HTML parser now defaults to a depth of 512 to match browser behavior, and protect against unbounded stack growth, while the XML parser keeps unlimited depth by default, but can opt into a limit via org.jsoup.parser.Parser#setMaxDepth. #​2421
  • Build: added CI coverage for JDK 25 #​2403
  • Build: added a CI fuzzer for contextual fragment parsing (in addition to existing full body HTML and XML fuzzers). oss-fuzz #​14041
Changes
  • Set a removal schedule of jsoup 1.24.1 for previously deprecated APIs.
Bug Fixes
  • Previously cached child Elements of an Element were not correctly invalidated in Node#replaceWith(Node), which could lead to incorrect results when subsequently calling Element#children(). #​2391
  • Attribute selector values are now compared literally without trimming. Previously, jsoup trimmed whitespace from selector values and from element attribute values, which could cause mismatches with browser behavior (e.g. [attr=" foo "]). Now matches align with the CSS specification and browser engines. #​2380
  • When using the JDK HttpClient, any system default proxy (ProxySelector.getDefault()) was ignored. Now, the system proxy is used if a per-request proxy is not set. #​2388, #​2390
  • A ValidationException could be thrown in the adoption agency algorithm with particularly broken input. Now logged as a parse error. #​2393
  • Null characters in the HTML body were not consistently removed; and in foreign content were not correctly replaced. #​2395
  • An IndexOutOfBoundsException could be thrown when parsing a body fragment with crafted input. Now logged as a parse error. #​2397, #​2406
  • When using StructuralEvaluators (e.g., a parent child selector) across many retained threads, their memoized results could also be retained, increasing memory use. These results are now cleared immediately after use, reducing overall memory consumption. #​2411
  • Cloning a Parser now preserves any custom TagSet applied to the parser. #​2422, #​2423
  • Custom tags marked as Tag.Void now parse and serialize like the built-in void elements: they no longer consume following content, and the XML serializer emits the expected self-closing form. #​2425
  • The <br> element is once again classified as an inline tag (Tag.isBlock() == false), matching common developer expectations and its role as phrasing content in HTML, while pretty-printing and text extraction continue to treat it as a line break in the rendered output. #​2387, #​2439
  • Fixed an intermittent truncation issue when fetching and parsing remote documents via Jsoup.connect(url).get(). On responses without a charset header, the initial charset sniff could sometimes (depending on buffering / available() behavior) be mistaken for end-of-stream and a partial parse reused, dropping trailing content. #​2448
  • TagSet copies no longer mutate their template during lazy lookups, preventing cross-thread ConcurrentModificationException when parsing with shared sessions. #​2453
  • Fixed parsing of <svg> foreignObject content nested within a <p>, which could incorrectly move the HTML subtree outside the SVG. #​2452
Internal Changes
  • Deprecated internal helper org.jsoup.internal.Functions (for removal in v1.23.1). This was previously used to support older Android API levels without full java.util.function coverage; jsoup now requires core library desugaring so this indirection is no longer necessary. #​2412

v1.21.2

Changes
  • Deprecated internal (yet visible) methods Normalizer#normalize(String, bool) and Attribute#shouldCollapseAttribute(Document.OutputSettings). These will be removed in a future version.
  • Deprecated Connection#sslSocketFactory(SSLSocketFactory) in favor of the new Connection#sslContext(SSLContext). Using sslSocketFactory will force the use of the legacy HttpUrlConnection implementation, which does not support HTTP/2. #​2370
Improvements
  • When pretty-printing, if there are consecutive text nodes (via DOM manipulation), the non-significant whitespace between them will be collapsed. #​2349.
  • Updated Connection.Response#statusMessage() to return a simple loggable string message (e.g. "OK") when using the HttpClient implementation, which doesn't otherwise return any server-set status message. #​2356
  • Attributes#size() and Attributes#isEmpty() now exclude any internal attributes (such as user data) from their count. This aligns with the attributes' serialized output and iterator. #​2369
  • Added Connection#sslContext(SSLContext) to provide a custom SSL (TLS) context to requests, supporting both the HttpClient and the legacy HttUrlConnection implementations. #​2370
  • Performance optimizations for DOM manipulation methods including when repeatedly removing an element's first child (element.child(0).remove(), and when using Parser#parseBodyFragement() to parse a large number of direct children. #​2373.
Bug Fixes
  • When parsing from an InputStream and a multibyte character happened to straddle a buffer boundary, the stream would not be completely read. #​2353.
  • In NodeTraversor, if a last child element was removed during the head() call, the parent would be visited twice. #​2355.
  • Cloning an Element that has an Attributes object would add an empty internal user-data attribute to that clone, which would cause unexpected results for Attributes#size() and Attributes#isEmpty(). #​2356
  • In a multithreaded application where multiple threads are calling Element#children() on the same element concurrently, a race condition could happen when the method was generating the internal child element cache (a filtered view of its child nodes). Since concurrent reads of DOM objects should be threadsafe without external synchronization, this method has been updated to execute atomically. #​2366
  • When parsing HTML with svg:script elements in SVG elements, don't enter the Text insertion mode, but continue to parse as foreign content. Otherwise, misnested HTML could then cause an IndexOutOfBoundsException. #​2374
  • Malformed HTML could throw an IndexOutOfBoundsException during the adoption agency. #​2377.

v1.21.1

Changes
  • Removed previously deprecated methods. #​2317
  • Deprecated the :matchText pseduo-selector due to its side effects on the DOM; use the new ::textnode selector and the Element#selectNodes(String css, Class type) method instead. #​2343
  • Deprecated Connection.Response#bufferUp() in lieu of Connection.Response#readFully() which can throw a checked IOException.
  • Deprecated internal methods Validate#ensureNotNull (replaced by typed Validate#expectNotNull); protected HTML appenders from Attribute and Node.
  • If you happen to be using any of the deprecated methods, please take the opportunity now to migrate away from them, as they will be removed in a future release.
Improvements
  • Enhanced the Selector to support direct matching against nodes such as comments and text nodes. For example, you can now find an element that follows a specific comment: ::comment:contains(prices) + p will select p elements immediately after a <!-- prices: --> comment. Supported types include ::node, ::leafnode, ::comment, ::text, ::data, and ::cdata. Node contextual selectors like ::node:contains(text), :matches(regex), and :blank are also supported. Introduced Element#selectNodes(String css) and Element#selectNodes(String css, Class nodeType) for direct node selection. #​2324
  • Added TagSet#onNewTag(Consumer<Tag> customizer): register a callback that’s invoked for each new or cloned Tag when it’s inserted into the set. Enables dynamic tweaks of tag options (for example, marking all custom tags as self-closing, or everything in a given namespace as preserving whitespace).
  • Made TokenQueue and CharacterReader autocloseable, to ensure that they will release their buffers back to the buffer pool, for later reuse.
  • Added Selector#evaluatorOf(String css), as a clearer way to obtain an Evaluator from a CSS query. An alias of QueryParser.parse(String css).
  • Custom tags (defined via the TagSet) in a foreign namespace (e.g. SVG) can be configured to parse as data tags.
  • Added NodeVisitor#traverse(Node) to simplify node traversal calls (vs. importing NodeTraversor).
  • Updated the default user-agent string to improve compatibility. #​2341
  • The HTML parser now allows the specific text-data type (Data, RcData) to be customized for known tags. (Previously, that was only supported on custom tags.) #​2326.
  • Added Connection#readFully() as a replacement for Connection#bufferUp() with an explicit IOException. Similarly, added Connection#readBody() over Connection#body(). Deprecated Connection#bufferUp(). #​2327
  • When serializing HTML, the < and > characters are now escaped in attributes. This helps prevent a class of mutation XSS attacks. #​2337
  • Changed Connection to prefer using the JDK's HttpClient over HttpUrlConnection, if available, to enable HTTP/2 support by default. Users can disable via -Djsoup.useHttpClient=false. #​2340
Bug Fixes
  • The contents of a script in a svg foreign context should be parsed as script data, not text. #​2320
  • Tag#isFormSubmittable() was updating the Tag's options. #​2323
  • The HTML pretty-printer would incorrectly trim whitespace when text followed an inline element in a block element. #​2325
  • Custom tags with hyphens or other non-letter characters in their names now work correctly as Data or RcData tags. Their closing tags are now tokenized properly. #​2332
  • When cloning an Element, the clone would retain the source's cached child Element list (if any), which could lead to incorrect results when modifying the clone's child elements. #​2334

Configuration

📅 Schedule: (UTC)

  • Branch creation
    • At any time (no schedule defined)
  • Automerge
    • At any time (no schedule defined)

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@bjagg bjagg merged commit d2cfef5 into master May 4, 2026
9 checks passed
@renovate renovate Bot deleted the renovate/jsoupversion branch May 4, 2026 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant