Headline
CVE-2021-37714: jsoup release 1.14.1 (2021-Jul-10)
jsoup is a Java library for working with HTML. Those using jsoup versions prior to 1.14.2 to parse untrusted HTML or XML may be vulnerable to DOS attacks. If the parser is run on user supplied input, an attacker may supply content that causes the parser to get stuck (loop indefinitely until cancelled), to complete more slowly than usual, or to throw an unexpected exception. This effect may support a denial of service attack. The issue is patched in version 1.14.2. There are a few available workarounds. Users may rate limit input parsing, limit the size of inputs based on system resources, and/or implement thread watchdogs to cap and timeout parse runtimes.
jsoup Java HTML Parser release 1.14.1
2021-Jul-10
jsoup 1.14.1 is out now, with simple request session management, increased parse robustness, and a ton of other improvements, speed-ups, and bug fixes.
jsoup is a Java library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors.
Please note the changes indicated below as in some circumstances you may need to modify your build or codebase to upgrade.
Download jsoup now.
Changes
- Change: updated the minimum supported Java version from Java 7 to Java 8.
- Change: updated the minimum Android API level from 8 to 10.
- Change: although Node.childNodes() returns an UnmodifiableList as a view into its children, it was still directly backed by the internal child list. That made some uses, such as looping and moving those children to another element, throw a ConcurrentModificationException. Now this method returns its own list so that they are separated and changes to the parent’s contents will not impact the children view. This aligns with similar methods such as Element.children(). If you have code that iterates this list and makes parenting changes to its contents, you may need to make a code update. #1431
- Change: the org.jsoup.Connection interface has been modified to introduce new methods for sessions and the cookie store. If you have a custom implementation of this interface, you will need to add implementations of these methods.
Improvements
Improvement: added HTTP request session management support with Jsoup.newSession(). This extends the Connection implementation to support (optional) sessions, which allow request defaults (timeout, proxy, etc) to be set once and then applied to all requests within that session.
Cookies are re-implemented to correctly support path and domain filtering when used within a session. A default in-memory cookie store is used for the session, or a custom implementation (perhaps disk-persistent, or pre-set) can be used instead.
Forms submitted using the FormElement.submit() use the same session that was used to fetch the document and so pass cookies and other defaults appropriately.
The session is multi-thread safe and can execute multiple requests concurrently. If the user accidentally tries to execute the same request object across multiple threads (vs calling Connection.newRequest()), that is detected cleanly and a clear exception is thrown (vs weird blowups in input stream reading, or forcing everything through a synchronized bottleneck. #1476
Improvement: renamed the Whitelist class to Safelist, with the goal of more inclusive language. A shim is provided for backwards compatibility (source and binary). This shim is marked as deprecated and will be removed in the jsoup 1.15.1 release. #1464
Improvement: added support for Internationalized Domain Names (IDNs) in Jsoup.Connect. #1300
Improvement: added support for loading and parsing gzipped HTML files in Jsoup.parse(File in, charset, baseUri).
Improvement: reduced thread contention in HttpConnection and Document. #1455
Improvement: better parsing performance when under high thread concurrency #1402
Improvement: added Element.id(String) ID attribute setter.
Improvement: in Document, #body() and #head() accessors will now automatically create those elements, if they were missing (e.g. if the Document was not parsed from HTML). Additionally, the #body() method returns the frameset element (instead of null) for frameset documents.
Improvement: when cleaning a document, the output settings of the original document are cloned into the cleaned document. #1417
Improvement: when parsing XML, disable pretty-printing by default. #1168
Improvement: much better performance in Node.clone() for large and deeply nested documents. Complexity was O(n^2) or worse, now O(n).
Improvement: during traversal using the NodeTraversor, nodes may now be replaced with Node.replaceWith(Node). #1289
Improvement: added Element.insertChildren and Elment.prependChildren, as convenience methods in addition to Element.insertChildren(index, children), for bulk moving nodes.
Improvement: clean up relative URLs with too many … segments better. #1482
Build Improvements
- Build Improvement: integrated jsoup into the OSS Fuzz project, which semi-randomly generates millions of different HTML and XML input files, searching for areas to improve in the parser for increased robustness and throughput. #1502
- Build Improvement: integrated with GitHub’s CodeQL static code analyzer. #1494
- Build Improvement: moved to GitHub Workflows for build verification.
- Build Improvement: updated Jetty (used for integration tests; not bundled) to 9.4.42.
- Build Improvement: added nullability annotations and initial settings. #1467
Bug Fixes
- Bugfix: corrected the adoption agency algorithm, to handle cases where e.g. an a tag incorrectly nests further a tags. #1517 #845
- Bugfix: when parsing HTML, could throw NPEs on some tags (isindex or table>input). #1404
- Bugfix: in HttpConnection.Request, headers beginning with "sec-" (e.g. Sec-Fetch-Mode) were silently discarded by the underlying Java HttpURLConnection. These are now settable correctly. #1461
- Bugfix: when adding child Nodes to a Node, could incorrectly reparent all nodes if the first parent had the same length of children as the incoming node list.
- Bugfix: when wrapping an orphaned element, would throw an NPE.
- Bugfix: when wrapping an element with HTML that included multiple sibling elements, those siblings were incorrectly added as children of the wrapper instead of siblings.
- Bugfix: when setting the content of a script or style tag via the Element#html(String) method, the content is now treated as a DataNode, not a TextNode. This means that characters like ‘<’ will no longer be incorrectly escaped. As a related ergonomic improvement, the same behavior applies for Element#text(String) (i.e. the content will be treated as a DataNode, despite calling the text() method. #1419
- Bugfix: when wrapping HTML around an existing element with Element#wrap(String), will now take the content as provided and ignore normal HTML tree-building rules. This allows for e.g. a div tag to be placed inside of p tags.
- Bugfix: the Elements#forms() method should return the selected immediate elements that are Forms, not children. #1403
- Bugfix: when creating a selector for an element with Element#cssSelector, if the element used a non-unique ID attribute, the returned selector may not match the desired element. #1085
- Bugfix: corrected the toString() methods of the Evaluator classes.
- Bugfix: when converting a jsoup document to a W3C document (in W3CDom.convert()), if a tag had XML illegal characters, a DOMException would be thown. Now instead, that tag is represented as a text node. #1093
- Bugfix: if a HTML file ended with an open noscript tag, an “EOF” string would appear in the HTML output.
- Bugfix: when parsing a document as XML, automatically set the output syntax to XML, and ensure that < characters in attributes are escaped as < (which is not required in HTML as the quoted attribute contents are safe, but is required in XML). #1420
- Bugfix: [Fuzz] when parsing an attribute key containing abs:abs, a validation error would be incorrectly thrown. #1541
- Bugfix: [Fuzz] could NPE while parsing in resetInsertionMode. #1538
- Bugfix: [Fuzz] when parsing XML, could Stack Overflow when parsing XML declarations. #1539
- Bugfix: [Fuzz] fixed a potential Stack Overflow when parsing mis-nested tfoot tags, and updated the tree parser for this situation to match the updated HTML5 spec. #1543
- Bugfix: [Fuzz] fixed a potentially slow HTML parse when tags are nested extremely deep (e.g. 88K depth), by limiting the formatting tag search depth to 256. In practice, it’s generally between 4 - 8. #1544
- Bugfix: [Fuzz] when parsing an unterminated RCDATA token (e.g. a title tag), could throw an IO Exception “No buffer left to unconsume” when trying to rewind the buffer. #1542
My sincere thanks to everyone who contributed patches, suggestions, and bug reports. If you have any suggestions for the next release, I would love to hear them; please get in touch with me directly.
You can also follow me (@jhy) on Twitter to receive occasional notes about jsoup releases.
Related news
Vulnerability in the Sun ZFS Storage Appliance product of Oracle Systems (component: Core). The supported version that is affected is 8.8.60. Difficult to exploit vulnerability allows unauthenticated attacker with network access via HTTP to compromise Sun ZFS Storage Appliance. Successful attacks of this vulnerability can result in unauthorized ability to cause a hang or frequently repeatable crash (complete DOS) of Sun ZFS Storage Appliance. CVSS 3.1 Base Score 5.9 (Availability impacts). CVSS Vector: (CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:U/C:N/I:N/A:H).
Dell Streaming Data Platform prior to 1.4 contains Open Redirect vulnerability. An attacker with privileges same as a legitimate user can phish the legitimate the user to redirect to malicious website leading to information disclosure and launch of phishing attacks.
Red Hat Security Advisory 2022-6407-01 - A minor version update is now available for Red Hat Camel K that includes CVE fixes in the base images, which are documented in the Release Notes document linked in the References section. Issues addressed include denial of service, information leakage, integer overflow, and resource exhaustion vulnerabilities.
A minor version update is now available for Red Hat Integration Camel K. The purpose of this text-only errata is to inform you about the security issues fixed in this release. Red Hat Product Security has rated this update as having a security impact of Moderate. A Common Vulnerability Scoring System (CVSS) base score, which gives a detailed severity rating, is available for each vulnerability from the CVE link(s) in the References section.This content is licensed under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). If you distribute this content, or a modified version of it, you must provide attribution to Red Hat Inc. and provide a link to the original. Related CVEs: * CVE-2020-9492: hadoop: WebHDFS client might send SPNEGO authorization header * CVE-2020-27223: jetty: request containing multiple Accept headers with a large number of "quality" parameters may lead to DoS * CVE-2020-36518: jackson-databind: denial of service ...
Red Hat Security Advisory 2022-5903-01 - Red Hat Process Automation Manager is an open source business process management suite that combines process management and decision service management and enables business and IT users to create, manage, validate, and deploy process applications and decision services. This asynchronous security patch is an update to Red Hat Process Automation Manager 7. Issues addressed include HTTP request smuggling, denial of service, and deserialization vulnerabilities.
An update is now available for Red Hat Process Automation Manager. Red Hat Product Security has rated this update as having a security impact of Low. A Common Vulnerability Scoring System (CVSS) base score, which gives a detailed severity rating, is available for each vulnerability from the CVE link(s) in the References section.This content is licensed under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). If you distribute this content, or a modified version of it, you must provide attribution to Red Hat Inc. and provide a link to the original. Related CVEs: * CVE-2021-2471: mysql-connector-java: unauthorized access to critical * CVE-2021-3642: wildfly-elytron: possible timing attack in ScramServer * CVE-2021-3644: wildfly-core: Invalid Sensitivity Classification of Vault Expression * CVE-2021-3717: wildfly: incorrect JBOSS_LOCAL_USER challenge location may lead to giving access to all the local users * CVE-2021-22569: protobu...
Vulnerability in the Oracle Banking Trade Finance product of Oracle Financial Services Applications (component: Infrastructure). The supported version that is affected is 14.5. Difficult to exploit vulnerability allows low privileged attacker with network access via HTTP to compromise Oracle Banking Trade Finance. Successful attacks require human interaction from a person other than the attacker. Successful attacks of this vulnerability can result in unauthorized creation, deletion or modification access to critical data or all Oracle Banking Trade Finance accessible data as well as unauthorized access to critical data or complete access to all Oracle Banking Trade Finance accessible data. CVSS 3.1 Base Score 6.4 (Confidentiality and Integrity impacts). CVSS Vector: (CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:H/I:H/A:N).
Red Hat Integration Camel Extensions for Quarkus 2.7 is now available. The purpose of this text-only errata is to inform you about the security issues fixed. Red Hat Product Security has rated this update as having an impact of Moderate. A Common Vulnerability Scoring System (CVSS) base score, which gives a detailed severity rating, is available for each vulnerability from the CVE link(s) in the References section.This content is licensed under the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/). If you distribute this content, or a modified version of it, you must provide attribution to Red Hat Inc. and provide a link to the original. Related CVEs: * CVE-2020-9492: hadoop: WebHDFS client might send SPNEGO authorization header * CVE-2021-3520: lz4: memory corruption due to an integer overflow bug caused by memmove argument * CVE-2021-22132: elasticsearch: executing async search improperly stores HTTP headers leading to information ...
IBM Cognos Analytics 11.1.7, 11.2.0, and 11.1.7 is vulnerable to cross-site scripting. This vulnerability allows users to embed arbitrary JavaScript code in the Web UI thus altering the intended functionality potentially leading to credentials disclosure within a trusted session. IBM X-Force ID: 211240.
Vulnerability in the Oracle Java SE, Oracle GraalVM Enterprise Edition product of Oracle Java SE (component: JNDI). Supported versions that are affected are Oracle Java SE: 7u331, 8u321, 11.0.14, 17.0.2, 18; Oracle GraalVM Enterprise Edition: 20.3.5, 21.3.1 and 22.0.0.2. Easily exploitable vulnerability allows unauthenticated attacker with network access via multiple protocols to compromise Oracle Java SE, Oracle GraalVM Enterprise Edition. Successful attacks of this vulnerability can result in unauthorized update, insert or delete access to some of Oracle Java SE, Oracle GraalVM Enterprise Edition accessible data. Note: This vulnerability applies to Java deployments, typically in clients running sandboxed Java Web Start applications or sandboxed Java applets, that load and run untrusted code (e.g., code that comes from the internet) and rely on the Java sandbox for security. This vulnerability can also be exploited by using APIs in the specified Component, e.g., through a web service ...
Vulnerability in the MySQL Connectors product of Oracle MySQL (component: Connector/J). Supported versions that are affected are 8.0.27 and prior. Difficult to exploit vulnerability allows high privileged attacker with network access via multiple protocols to compromise MySQL Connectors. Successful attacks of this vulnerability can result in takeover of MySQL Connectors. CVSS 3.1 Base Score 6.6 (Confidentiality, Integrity and Availability impacts). CVSS Vector: (CVSS:3.1/AV:N/AC:H/PR:H/UI:N/S:U/C:H/I:H/A:H).
jsoup is a Java library for working with HTML. Those using jsoup versions prior to 1.14.2 to parse untrusted HTML or XML may be vulnerable to DOS attacks. If the parser is run on user supplied input, an attacker may supply content that causes the parser to get stuck (loop indefinitely until cancelled), to complete more slowly than usual, or to throw an unexpected exception. This effect may support a denial of service attack. The issue is patched in version 1.14.2. There are a few available workarounds. Users may rate limit input parsing, limit the size of inputs based on system resources, and/or implement thread watchdogs to cap and timeout parse runtimes.