如何使用 Nokogiri 获取没有任何文本内容的完整 HTML-解网

问：

我正在尝试使用 Nokogiri 来获取页面的完整 HTML，但删除了所有文本。

我试过了这个：

require 'nokogiri'
x = "<html>  <body>  <div class='example'><span>Hello</span></div></body></html>"
y = Nokogiri::HTML.parse(x).xpath("//*[not(text())]").each { |a| a.children.remove }
puts y.to_s

这将输出：

<div class="example"></div>

我也尝试在没有零件的情况下运行它：children.remove

y = Nokogiri::HTML.parse(x).xpath("//*[not(text())]")
puts y.to_s

但后来我得到：

<div class="example"><span>Hello</span></div>

但我真正想要的是：

<html><body><div class='example'><span></span></div></body></html>

Ruby 网页抓取 XPath HTML 解析 Nokogiri

require 'nokogiri'
html = "<html>  <body>  <div class='example'><span>Hello</span></div></body></html>"

# Parse HTML
doc = Nokogiri::HTML.parse(html)

puts doc.inner_html
# => "<html>  <body>  <div class=\"example\"><span>Hello</span></div>\n</body>\n</html>"

# Remove text nodes from parsed document
doc.xpath("//text()").each { |t| t.remove }

puts doc.inner_html
# => "<html><body><div class=\"example\"><span></span></div></body></html>"

如何使用 Nokogiri 获取没有任何文本内容的完整 HTML

How to use Nokogiri to get the full HTML without any text content

评论

评论