提问人:snoozy 提问时间:2/9/2023 最后编辑:snoozy 更新时间:2/9/2023 访问量:34
在 ruby nokogiri 中解析复杂表结构的联接文本
Joining text from parsing a complex table structure in ruby nokogiri
问:
我有一个HTML表格,我想从一些td中获取文本。现在,有时文本是单个 td,但有时它会传播成多个 td。如果文本在多个 td 中传播,我该如何加入文本。这是 HTML 代码
<table class="detailRecordTable">
<tbody>
<tr><td class="detailSeperator" colspan="6"> </td></tr>
<tr>
<td valign="top" style="width: 11% " class="detailData"><b>02/03/2016</b></td> <td style="width: 3%" class="detailLabels" valign="top"> </td>
<td style="width: 85%" class="detailData alignData" colspan="3"> <b>Disposed- Pet for Writ Denied</b> /td>
<td style="width: 1%" class="detailData"> </td>
</tr>
<tr>
<td colspan="2" style="width: 14% " class="detailLabels" valign="top"> </td>
<td style="width: 86% " class="detailData" colspan="2">ORDER ISSUED: PETITION FOR WRIT OF MANDAMUS DENIED. MANDATE AVAILABLE TO COUNSEL OF RECORD VIA SECURE CASE.NET.</td>
</tr>
<tr><td class="detailSeperator" colspan="6"> </td></tr>
<tr>
<td valign="top" style="width: 11% " class="detailData"><b>01/29/2016</b></td>
<td style="width: 3%" class="detailLabels" valign="top"> </td>
<td style="width: 85%" class="detailData alignData" colspan="3">
<b>Suggestions in Opposition</b></td>
<td style="width: 1%" class="detailData"> </td>
</tr>
<tr>
<td colspan="2" style="width: 14% " class="detailLabels" valign="top"> </td>
<td style="width: 86% " class="detailData" colspan="2">SUGGESTIONS IN OPPOSITION TO RELATORS PETITION FOR WRIT OF MANDAMUS; Electronic Filing Certificate of Service.</td>
</tr>
<tr>
<td colspan="2" style="width: 14%" class="detailLabels"> </td>
<td style="width: 86%" class="detailData" colspan="2"> <b>Filed By:</b>JOHN RICHARD SHANK JR
</td>
</tr><tr>
<td style="width: 14%" class="detailLabels" colspan="2"></td>
<td style="width: 86%" class="detailData" colspan="2"> <b>On Behalf Of:</b>ELIZABETH DAVIS
</td>
</tr>
<tr>
<td class="detailSeperator" colspan="6"> </td></tr>
<tr><td valign="top" style="width: 11% " class="detailData"><b>01/22/2016</b></td><td style="width: 3%" class="detailLabels" valign="top"> </td>
<td style="width: 85%" class="detailData alignData" colspan="3"><b>Court Order Issued</b></td>
<td style="width: 1%" class="detailData"> </td>
</tr>
<tr><td colspan="2" style="width: 14% " class="detailLabels" valign="top"> </td>
<td style="width: 86% " class="detailData" colspan="2">ORDER ISSUED: RESPONDENT REQUESTED TO FILE SUGGESTIONS IN OPPOSITION ON OR BEFORE 2:00 P.M. ON JANUARY 29, 2016.</td>
</tr>
</tbody></table>
我想要这样的输出,我在应该连接文本的地方放了星号
["ORDER ISSUED: PETITION FOR WRIT OF MANDAMUS DENIED. MANDATE AVAILABLE TO COUNSEL OF RECORD VIA SECURE CASE.NET." , "**SUGGESTIONS IN OPPOSITION TO RELATORS PETITION FOR WRIT OF MANDAMUS; Electronic Filing Certificate of Service. Filed By:JOHN RICHARD SHANK JR On Behalf Of:ELIZABETH DAVIS**" , "ORDER ISSUED: RESPONDENT REQUESTED TO FILE SUGGESTIONS IN OPPOSITION ON OR BEFORE 2:00 P.M. ON JANUARY 29, 2016"]
我试过这个,但它没有加入文本,我得到的文本就像一个单独的项目,尤其是被星号包围的文本
if !tr.css('td.detailData').empty?
ac_desc = tr.css('td.detailData')[0].text.strip.gsub("\n", '').gsub("\t", '')
end
if ac_desc != ""
acc_descs << ac_desc
end
答: 暂无答案
评论