Update dependency org.jsoup:jsoup to v1.22.1 #499
No reviewers
Labels
No labels
antville.org
bug
compatibility
dependency
duplicate
enhancement
help wanted
invalid
java
javascript
major
needs feedback
needs work
no-issue-activity
runtime
security
urgent
usability
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: antville/antville#499
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "renovate/org.jsoup-jsoup-1.x"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This PR contains the following updates:
1.19.1->1.22.1Release Notes
jhy/jsoup (org.jsoup:jsoup)
v1.22.1Improvements
re2jregular expression engine for regex-based CSS selectors (e.g.[attr~=regex],:matches(regex)), which ensures linear-time performance for regex evaluation. This allows safer handling of arbitrary user-supplied query regexes. To enable, add thecom.google.re2jdependency to your classpath, e.g.:(If you already have that dependency in your classpath, but you want to keep using the Java regex engine, you can disable re2j via
System.setProperty("jsoup.useRe2j", "false").) You can confirm that the re2j engine has been enabled correctly by callingorg.jsoup.helper.Regex.usingRe2j(). #2407Parser#unescape(String, boolean)that unescapes HTML entities using the parser's configuration (e.g. to support error tracking), complementing the existing static utilityParser.unescapeEntities(String, boolean). #2396org.jsoup.parser.Parser#setMaxDepth. #2421Changes
Bug Fixes
Elementsof anElementwere not correctly invalidated inNode#replaceWith(Node), which could lead to incorrect results when subsequently callingElement#children(). #2391[attr=" foo "]). Now matches align with the CSS specification and browser engines. #2380ProxySelector.getDefault()) was ignored. Now, the system proxy is used if a per-request proxy is not set. #2388, #2390ValidationExceptioncould be thrown in the adoption agency algorithm with particularly broken input. Now logged as a parse error. #2393IndexOutOfBoundsExceptioncould be thrown when parsing a body fragment with crafted input. Now logged as a parse error. #2397, #2406parent childselector) across many retained threads, their memoized results could also be retained, increasing memory use. These results are now cleared immediately after use, reducing overall memory consumption. #2411Parsernow preserves any customTagSetapplied to the parser. #2422, #2423Tag.Voidnow parse and serialize like the built-in void elements: they no longer consume following content, and the XML serializer emits the expected self-closing form. #2425<br>element is once again classified as an inline tag (Tag.isBlock() == false), matching common developer expectations and its role as phrasing content in HTML, while pretty-printing and text extraction continue to treat it as a line break in the rendered output. #2387, #2439Jsoup.connect(url).get(). On responses without a charset header, the initial charset sniff could sometimes (depending on buffering /available()behavior) be mistaken for end-of-stream and a partial parse reused, dropping trailing content. #2448TagSetcopies no longer mutate their template during lazy lookups, preventing cross-threadConcurrentModificationExceptionwhen parsing with shared sessions. #2453<svg>foreignObjectcontent nested within a<p>, which could incorrectly move the HTML subtree outside the SVG. #2452Internal Changes
org.jsoup.internal.Functions(for removal in v1.23.1). This was previously used to support older Android API levels without fulljava.util.functioncoverage; jsoup now requires core library desugaring so this indirection is no longer necessary. #2412v1.21.2Changes
Normalizer#normalize(String, bool)andAttribute#shouldCollapseAttribute(Document.OutputSettings). These will be removed in a future version.Connection#sslSocketFactory(SSLSocketFactory)in favor of the newConnection#sslContext(SSLContext). UsingsslSocketFactorywill force the use of the legacyHttpUrlConnectionimplementation, which does not support HTTP/2. #2370Improvements
Connection.Response#statusMessage()to return a simple loggable string message (e.g. "OK") when using theHttpClientimplementation, which doesn't otherwise return any server-set status message. #2356Attributes#size()andAttributes#isEmpty()now exclude any internal attributes (such as user data) from their count. This aligns with the attributes' serialized output and iterator. #2369Connection#sslContext(SSLContext)to provide a custom SSL (TLS) context to requests, supporting both theHttpClientand the legacyHttUrlConnectionimplementations. #2370element.child(0).remove(), and when usingParser#parseBodyFragement()to parse a large number of direct children. #2373.Bug Fixes
NodeTraversor, if a last child element was removed during thehead()call, the parent would be visited twice. #2355.Attributes#size()andAttributes#isEmpty(). #2356Element#children()on the same element concurrently, a race condition could happen when the method was generating the internal child element cache (a filtered view of its child nodes). Since concurrent reads of DOM objects should be threadsafe without external synchronization, this method has been updated to execute atomically. #2366v1.21.1Changes
:matchTextpseduo-selector due to its side effects on the DOM; use the new::textnodeselector and theElement#selectNodes(String css, Class type)method instead. #2343Connection.Response#bufferUp()in lieu ofConnection.Response#readFully()which can throw a checked IOException.Validate#ensureNotNull(replaced by typedValidate#expectNotNull); protected HTML appenders from Attribute and Node.Improvements
Selectorto support direct matching against nodes such as comments and text nodes. For example, you can now find an element that follows a specific comment:::comment:contains(prices) + pwill selectpelements immediately after a<!-- prices: -->comment. Supported types include::node,::leafnode,::comment,::text,::data, and::cdata. Node contextual selectors like::node:contains(text),:matches(regex), and:blankare also supported. IntroducedElement#selectNodes(String css)andElement#selectNodes(String css, Class nodeType)for direct node selection. #2324TagSet#onNewTag(Consumer<Tag> customizer): register a callback that’s invoked for each new or cloned Tag when it’s inserted into the set. Enables dynamic tweaks of tag options (for example, marking all custom tags as self-closing, or everything in a given namespace as preserving whitespace).TokenQueueandCharacterReaderautocloseable, to ensure that they will release their buffers back to the buffer pool, for later reuse.Selector#evaluatorOf(String css), as a clearer way to obtain an Evaluator from a CSS query. An alias ofQueryParser.parse(String css).TagSet) in a foreign namespace (e.g. SVG) can be configured to parse as data tags.NodeVisitor#traverse(Node)to simplify node traversal calls (vs. importingNodeTraversor).Connection#readFully()as a replacement forConnection#bufferUp()with an explicit IOException. Similarly, addedConnection#readBody()overConnection#body(). DeprecatedConnection#bufferUp(). #2327<and>characters are now escaped in attributes. This helps prevent a class of mutation XSS attacks. #2337Connectionto prefer using the JDK's HttpClient over HttpUrlConnection, if available, to enable HTTP/2 support by default. Users can disable via-Djsoup.useHttpClient=false. #2340Bug Fixes
scriptin asvgforeign context should be parsed as script data, not text. #2320Tag#isFormSubmittable()was updating the Tag's options. #2323v1.20.1Changes
<foo />)to close HTML elements by default. Foreign content (SVG, MathML), and content parsed with the XML parser, still
supports self-closing tags. If you need specific HTML tags to support self-closing, you can register a custom tag via
the
TagSetconfigured inParser.tagSet(), usingTag#set(Tag.SelfClose). Standard void tags (such as<img>,<br>, etc.) continue to behave as usual and are not affected by thischange. #2300.
ChangeNotifyingArrayList,Document.updateMetaCharsetElement(),Document.updateMetaCharsetElement(boolean),HtmlTreeBuilder.isContentForTagData(String),Parser.isContentForTagData(String),Parser.setTreeBuilder(TreeBuilder),Tag.formatAsBlock(),Tag.isFormListed(),TokenQueue.addFirst(String),TokenQueue.chompTo(String),TokenQueue.chompToIgnoreCase(String),TokenQueue.consumeToIgnoreCase(String),TokenQueue.consumeWord(),TokenQueue.matchesAny(String...)Functional Improvements
Tags, and provide a cleaner path for ongoing improvements. The specific HTML produced by the pretty-printer may be
different from previous versions. #2286.
TagSettag collection.Their properties can impact both the parse and how content is
serialized (output as HTML or XML). #2285.
Element.cssSelector()will prefer to return shorter selectors by using ancestor IDs when available and unique. E.g.#id > div > pinstead ofhtml > body > div > div > p#2283.Elements.deselect(int index),Elements.deselect(Object o), andElements.deselectAll()methods to removeelements from the
Elementslist without removing them from the underlying DOM. Also addedElements.asList()methodto get a modifiable list of elements without affecting the DOM. (Individual Elements remain linked to the
DOM.) #2100.
Connection.requestBodyStream(InputStream stream). #1122.Attributes. Also, added
Tag#prefix(),Tag#localName(),Attribute#prefix(),Attribute#localName(), andAttribute#namespace()to retrieve these. #2299.Element#cssSelector()will emitappropriately escaped selectors, and the QueryParser supports those. Added
Selector.escapeCssIdentifier()andSelector.unescapeCssIdentifier(). #2297, #2305Structure and Performance Improvements
QueryParserinto a clearer recursive descentparser. #2310.
div >> p) will throw an explicit parseexception. #2311.
#2307.
HTML. #2304.
Parserinstances threadsafe, so that inadvertent use of the same instance across threads will not lead toerrors. For actual concurrency, use
Parser#newInstance()perthread. #2314.
Bug Fixes
serializing. #1496.
encoded). #1743.
Documentto the W3C DOM inW3CDom, elements with an attribute in an undeclared namespace nowget a declaration of
xmlns:prefix="undefined". This allows subsequent serialization to XML viaW3CDom.asString()to succeed. #2087.
StreamParsercould emit the final elements of a document twice, due to howonNodeCompletedwas fired when closing out the stack. #2295.?in<?xml version="1.0"?>wouldincorrectly emit an error. #2298.
Element#cssSelector()on an element with combining characters in the class or ID now produces the correct output. #1984.Configuration
📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR has been generated by Renovate Bot.
Update dependency org.jsoup:jsoup to v1.20.1to Update dependency org.jsoup:jsoup to v1.21.16ee6b23548to77c99be343Update dependency org.jsoup:jsoup to v1.21.1to Update dependency org.jsoup:jsoup to v1.21.2Update dependency org.jsoup:jsoup to v1.21.2to Update dependency org.jsoup:jsoup to v1.22.1View command line instructions
Checkout
From your project repository, check out a new branch and test the changes.