JSoup:如何列出列表中的链接?

JSoup: how to list links from a list?

提问人:Thufir 提问时间:1/19/2019 最后编辑:Thufir 更新时间:1/19/2019 访问量:61

问:

如何列出链接,但只能从标签中列出?更具体地说,仅在那个特定的?以某种方式将选择限制在特定元素上?divlistdiv

法典:

package my.books;

import java.io.File;
import java.net.URI;
import java.util.Properties;
import java.util.logging.Logger;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class App {

    private static final Logger LOG = Logger.getLogger(App.class.getName());
    private Properties properties = new Properties();

    public static void main(String[] args) throws Exception {
        new App().basicJSoup();
    }

    private void basicJSoup() throws Exception {
        properties.loadFromXML(App.class.getResourceAsStream("/properties.xml"));
        LOG.fine(properties.toString());
        URI inputURI = new URI(properties.getProperty("html_input"));
        URI outputURI = new URI(properties.getProperty("output"));

        File input = new File(inputURI);
        Document doc = Jsoup.parse(input, "UTF-8");
        Element sideCategories = doc.select("div.side_categories").first();
        LOG.fine(sideCategories.outerHtml());

        Elements ul = doc.select("div.side_categories > ul");
        Elements li = ul.select("li");

        for (int i = 0; i < li.size(); i++) {
            LOG.info(li.get(i).text());
            LOG.info("i\t\t" + i);
        }
    }

}
html dom xhtml jsoup 元素

评论


答:

1赞 cody 1/19/2019 #1

如果我正确理解了你的问题,你只需要编写完整的、特定的 css 选择器,比如 .div.side_categories ul li a

例如:

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class JSoupTest {
    public static void main(String[] args) {
        String markup =
                "<div class=\"side_categories\">" +
                  "<ul>" +
                    "<li>" +
                      "<a href=\"#\">Link 1</a>" +
                    "</li>" +
                    "<li>" +
                      "<a href=\"#\">Link 2</a>" +
                    "</li>" +
                  "</ul>" +
                "</div>";

        Document doc = Jsoup.parse(markup);
        Elements links = doc.select("div.side_categories ul li a");

        for (Element link : links) {
            System.out.println(link);
        }
    }
}

结果:

<a href="#">Link 1</a>
<a href="#">Link 2</a>

评论

0赞 Thufir 1/20/2019
比 XPath 更面向 CSS 吗?我想我看到了“xtidy”之类的东西,但它已经好几年没有更新了。jsoup
2赞 cody 1/20/2019
@Thufir 是的,没有任何 xpath 查询机制,只有 css 选择器。您可能正在考虑 ,这是一个添加 xpath 支持的分支,但您是对的,因为它似乎没有得到积极维护。jsoupxsoup