在 ruby nokogiri 中解析复杂表结构的联接文本

Joining text from parsing a complex table structure in ruby nokogiri

提问人:snoozy 提问时间:2/9/2023 最后编辑:snoozy 更新时间:2/9/2023 访问量:34

问:

我有一个HTML表格,我想从一些td中获取文本。现在,有时文本是单个 td,但有时它会传播成多个 td。如果文本在多个 td 中传播,我该如何加入文本。这是 HTML 代码

    <table class="detailRecordTable">                                       
    <tbody>
  <tr><td class="detailSeperator" colspan="6">&nbsp;</td></tr>
  <tr>
    <td valign="top" style="width: 11% " class="detailData"><b>02/03/2016</b></td>      <td style="width: 3%" class="detailLabels" valign="top">&nbsp;</td> 
    <td style="width: 85%" class="detailData alignData" colspan="3">                                <b>Disposed- Pet for Writ Denied</b>    /td>
<td style="width: 1%" class="detailData">   &nbsp;</td>
    </tr>
 <tr>
<td colspan="2" style="width: 14% " class="detailLabels" valign="top">&nbsp;</td>
    <td style="width: 86%  " class="detailData" colspan="2">ORDER ISSUED:  PETITION FOR WRIT OF MANDAMUS DENIED. MANDATE AVAILABLE TO COUNSEL OF RECORD VIA SECURE CASE.NET.</td>
</tr>
                                    
<tr><td class="detailSeperator" colspan="6">&nbsp;</td></tr>
<tr>
    <td valign="top" style="width: 11% " class="detailData"><b>01/29/2016</b></td>
<td style="width: 3%" class="detailLabels" valign="top">&nbsp;</td> 
<td style="width: 85%" class="detailData alignData" colspan="3">
<b>Suggestions in Opposition</b></td>
<td style="width: 1%" class="detailData">   &nbsp;</td>
</tr>
<tr>
    <td colspan="2" style="width: 14% " class="detailLabels" valign="top">&nbsp;</td>
    <td style="width: 86%  " class="detailData" colspan="2">SUGGESTIONS IN OPPOSITION TO RELATORS PETITION FOR WRIT OF MANDAMUS; Electronic Filing Certificate of Service.</td>
    </tr>
<tr>
<td colspan="2" style="width: 14%" class="detailLabels">&nbsp;</td>
<td style="width: 86%" class="detailData" colspan="2">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b>Filed By:</b>JOHN RICHARD SHANK JR
    </td>
</tr><tr>
    <td style="width: 14%" class="detailLabels" colspan="2"></td>
    <td style="width: 86%" class="detailData" colspan="2">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b>On Behalf Of:</b>ELIZABETH DAVIS 
    </td>
 </tr>
<tr>
<td class="detailSeperator" colspan="6">&nbsp;</td></tr>
    <tr><td valign="top" style="width: 11% " class="detailData"><b>01/22/2016</b></td><td style="width: 3%" class="detailLabels" valign="top">&nbsp;</td>   
<td style="width: 85%" class="detailData alignData" colspan="3"><b>Court Order Issued</b></td>
    <td style="width: 1%" class="detailData">&nbsp;</td>
    </tr>
 <tr><td colspan="2" style="width: 14% " class="detailLabels" valign="top">&nbsp;</td>
<td style="width: 86%  " class="detailData" colspan="2">ORDER ISSUED: RESPONDENT REQUESTED TO FILE SUGGESTIONS IN OPPOSITION ON OR BEFORE 2:00 P.M. ON JANUARY 29, 2016.</td>
</tr>
</tbody></table>                                                                                    

我想要这样的输出,我在应该连接文本的地方放了星号

["ORDER ISSUED:  PETITION FOR WRIT OF MANDAMUS DENIED. MANDATE AVAILABLE TO COUNSEL OF RECORD VIA SECURE CASE.NET."   ,  "**SUGGESTIONS IN OPPOSITION TO RELATORS PETITION FOR WRIT OF MANDAMUS; Electronic Filing Certificate of Service. Filed By:JOHN RICHARD SHANK JR  On Behalf Of:ELIZABETH DAVIS**"   ,   "ORDER ISSUED: RESPONDENT REQUESTED TO FILE SUGGESTIONS IN OPPOSITION ON OR BEFORE 2:00 P.M. ON JANUARY 29, 2016"]                 

我试过这个,但它没有加入文本,我得到的文本就像一个单独的项目,尤其是被星号包围的文本

if !tr.css('td.detailData').empty?
      ac_desc = tr.css('td.detailData')[0].text.strip.gsub("\n", '').gsub("\t", '') 
    end      
    if ac_desc != ""
           acc_descs << ac_desc
    end  
                          
Ruby 解析 Nokogiri

评论


答: 暂无答案