Headline
CVE-2022-39353: Missing error for XML documents with multiple root element nodes · Issue #150 · jindw/xmldom
xmldom is a pure JavaScript W3C standard-based (XML DOM Level 2 Core) DOMParser
and XMLSerializer
module. xmldom parses XML that is not well-formed because it contains multiple top level elements, and adds all root nodes to the childNodes
collection of the Document
, without reporting any error or throwing. This breaks the assumption that there is only a single root node in the tree, which led to issuance of CVE-2022-39299 as it is a potential issue for dependents. Update to @xmldom/xmldom@~0.7.7, @xmldom/xmldom@~0.8.4 (dist-tag latest) or @xmldom/xmldom@>=0.9.0-beta.4 (dist-tag next). As a workaround, please one of the following approaches depending on your use case: instead of searching for elements in the whole DOM, only search in the documentElement
or reject a document with a document that has more then 1 childNode
.
Bug Description:
xmldom allows incoming documents to have multiple root element nodes. This appears to me to be a violation of the W3C DOM Level 2 Core Specification:
[…] Each document contains zero or one doctype nodes, one root element node, and zero or more comments or processing instructions; the root element serves as the root of the element tree for the document. […]
— from “What the Document Object Model is” (emphasis mine)
The root node is the unique node that is not a child of any other node. All other nodes are children or other descendants of the root node.
— from the Glossary (emphasis mine)
However, the spec also says this:
[…] However, the DOM does not specify that documents must be implemented as a tree or a grove, nor does it specify how the relationships among objects be implemented. The DOM is a logical model that may be implemented in any convenient manner. […]
— from “What the Document Object Model is” (emphasis mine)
The children of a DocumentFragment node are zero or more nodes representing the tops of any sub-trees defining the structure of the document. DocumentFragment nodes do not need to be well-formed XML documents (although they do need to follow the rules imposed upon well-formed XML parsed entities, which can have multiple top nodes). For example, a DocumentFragment might have only one child and that child node could be a Text node. Such a structure model represents neither an HTML document nor a well-formed XML document.
— from § 1.2 / Fundamental Interfaces / Interface DocumentFragment (emphasis mine)
In light of this, for the record, I actually don’t dislike the fact that xmldom can parse such documents. This comes with a few reservations (outlined below) because it seems like the current behavior is contrary to what the specs (both W3C DOM Level 2 Core and XML 1.1) assert ought to be the case.
Bug Reproduction:
The following code:
var DOMParser = require(‘xmldom’).DOMParser; var xmlData = ‘<?xml version="1.0" encoding="UTF-8"?>\n’ + ‘<root>\n’ + ' <branch girth="large">\n’ + ' <leaf color="green" />\n’ + ' </branch>\n’ + ‘</root>\n’ + ‘<root>\n’ + ' <branch girth="twig">\n’ + ' <leaf color="gold" />\n’ + ' </branch>\n’ + '</root>\n’; var xmlDOM = new DOMParser().parseFromString(xmlData); console.log(xmlDOM.toString());
…produces the following output:
<?xml version="1.0" encoding="UTF-8"?><root>
<branch girth="large">
<leaf color="green"/>
</branch>
</root>
<root>
<branch girth="twig">
<leaf color="gold"/>
</branch>
</root>
In contrast to this, libxmljs — which relies on libxml2 — refuses to parse such documents, opting to throw an error instead:
var xml = require(‘libxmljs’); var xmlDoc = xml.parseXmlString(xmlData); Error: Extra content at the end of the document
Firefox behaves in a similar way, and refuses to parse the document:
console.log( new XMLSerializer().serializeToString( new DOMParser().parseFromString(xmlData, ‘text/xml’)));
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="chrome://global/locale/intl.css" type="text/css"?>
<parsererror xmlns="http://www.mozilla.org/newlayout/xml/parsererror.xml">XML Parsing Error: junk after document element
Location: about:blank
Line Number 7, Column 1:<sourcetext><root>
^</sourcetext></parsererror>
Chrome goes a little bit farther in that it is at least willing to parse/render the first root element node:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<parsererror xmlns="http://www.w3.org/1999/xhtml" style="display: block; white-space: pre; border: 2px solid #c77; padding: 0 1em 0 1em; margin: 1em; background-color: #fdd; color: black">
<h3>This page contains the following errors:</h3>
<div style="font-family:monospace;font-size:12px">error on line 7 at column 1: Extra content at the end of the document
</div>
<h3>Below is a rendering of the page up to the first error.</h3>
</parsererror>
<branch girth="large">
<leaf color="green"/>
</branch>
</root>
Expectations/Recommendations/Discussion:
- I expected the above document to fail to parse in xmldom’s DOMParser implementation, but it didn’t. In light of the specs, this seems like it might be a “false-positive” bug. However, I think there is some utility in being able to parse multiple documents in a single stream, if the API can be re-tooled just a bit. More on this below.
- In order to be more compliant with the specs, I think that if more than one root node is present in an XML stream — especially if they appear without corresponding <?xml?> declarations — then the DOMParser should, bare-minimum, simply opt to not support such streams, and should raise an error informing the consumer that the given XML stream is invalid.
- Alternatively, if it is desired to keep the ability to parse multiple documents in a single stream — which I think would be useful functionality to retain: the alternative of requiring users to manually split XML documents crammed into a single stream before they can pass them into xmldom seems counter-intuitive to me, as that should be the job of the parser — then perhaps the DOMParser API could be re-tooled to return an array or a list of multiple document objects, each one with a singular root element node (rather than a single document object with multiple root elements, as is the case today), but only in the event that multiple root elements are detected within the input stream. I think that this behavior would help xmldom to be more compliant with the specs, and it would help to prevent consumers from accidentally generating poorly formed XML output from their parsed document objects, since each parsed document object would have only one root node, as the specs indicate ought to be the case.
Conclusion:
On one hand, I think it is a useful thing to be able to parse streams containing multiple XML documents.
On the other hand, to do that silently without issuing even so much as a warning to consumers — especially when the specs say that root nodes should be unique — seems, well, odd. It seems like the current behavior is more of an artifact or a side-effect of xmldom’s current architecture than it is an intentional aspect of its design, hence my writing this up as a bug rather than a feature request. (If I’m mistaken, do let me know!)
Environment/Versions:
- xmldom v0.1.21
Related news
Ubuntu Security Notice 6102-1 - It was discovered that xmldom incorrectly handled certain inputs. If a user or an automated system were tricked into opening a specially crafted input file, a remote attacker could possibly use this issue to cause unexpected syntactic changes during XML processing. This issue only affected Ubuntu 20.04 LTS. It was discovered that xmldom incorrectly handled certain inputs. If a user or an automated system were tricked into opening a specially crafted input file, a remote attacker could possibly use this issue to cause a denial of service.
Rapid remedy follows reawakening of long-dormant bug threat
### Impact xmldom parses XML that is not well-formed because it contains multiple top level elements, and adds all root nodes to the `childNodes` collection of the `Document`, without reporting any error or throwing. This breaks the assumption that there is only a single root node in the tree, which led to https://nvd.nist.gov/vuln/detail/CVE-2022-39299 and is a potential issue for dependents. ### Patches Update to `@xmldom/xmldom@~0.7.7`, `@xmldom/xmldom@~0.8.4` (dist-tag `latest`) or `@xmldom/xmldom@>=0.9.0-beta.4` (dist-tag `next`). ### Workarounds One of the following approaches might help, depending on your use case: - Instead of searching for elements in the whole DOM, only search in the `documentElement`. - Reject a document with a document that has more then 1 `childNode`. ### References - https://nvd.nist.gov/vuln/detail/CVE-2022-39299 - https://github.com/jindw/xmldom/issues/150 ### For more information If you have any questions or comments about this advisory: * Email us...