提问人:CountryBoy_71 提问时间:11/16/2023 更新时间:11/17/2023 访问量:36
凌乱的 txt 文件数据提取 - Power Query
Messy txt file data extraction - Power Query
问:
日安。。。我收到了一个非常凌乱的数据集来清理......我的第一个想法是PQ。下面是实际文件中 10k+ 行的片段,它只是为每个用户重复 ('C:'),用 .首先,该列表是我目前需要提取的信息。首次导入 PQ 时,它是一列;制表符分隔。.txt
----------
- C:
- 订阅者名称
- 当前费用(所有字段)
- 其他费用和信用额度
- 其他费用
- 税
"Company-Name." Client No: "5780859"
" " Purchase Order No:
Invoice Date: 30-Sep-23 Unique Invoice No: "123456789"
"12345 Main Street"
"City AB"
"T4A 1B7"
"Account Number 1234567"
"-------------------------------------------------------------------------------"
"REPORT - INDIVIDUAL DETAILS"
--------------
"C:" "1234567890"
"Subscriber Name:" "NAME.NAME SPARE"
"Additional line user name:" ""
"Sublevel:" " "
"Sublevel:" ""
"Reference 1:" ""
"Reference 2:" ""
"Handset Transparency"
"Number/Device Information" ""
"Starting Balance" $0.00
"Last Month's Balance" $0.00
"Current Balance" $0.00
"Monthly Credit" $0.00
"Monthly Balance Adjust" $0.00
"CURRENT CHARGES"
"Monthly Service Plan" $40.00
"Additional Local Airtime" $0.00
"Long Distance Charges" $22.40
"Roaming Charges" $0.00
"Total Taxes:" $7.49
"Total Current Charges:" $69.89
"MONTHLY SERVICE PLAN" 01-Oct-23 to 31-Oct-23
"Service Plan Name" "Total"
"Business SharePro 5GB Q1 offer (01-Oct-23 to 31-Oct-23)" $40.00
"Total Monthly Service Plan Charges" $40.00
"ADDITIONAL LOCAL AIRTIME"
"Service" "Total Airtime" "Free Airtime" "Included Airtime" "Chargeable Airtime" "Total"
"Phone (minutes)" 28:00 0:00 28:00 0:00 $0.00
"Total Additional Local Airtime Charges" $0.00
"LONG DISTANCE CHARGES"
"Service" "Total LD Minutes" "Free LD Minutes" "Included LD Minutes" "Chargeable LD Minutes" "Total"
"Domestic Phone" 28:00 0:00 0:00 28:00 $22.40
"Total Long Distance Charges" $22.40
"ROAMING"
"Service" "Roaming Minutes" "Roaming Charges" "Roaming LD Minutes" "Roaming LD Charges" "Roaming Surcharge" "Total"
"Total Roaming Charges" $0.00
"DO MORE DATA SERVICES"
"Service" "Total Events" "Event Type" "Total"
"Total Do More Data Services Charges" $0.00
"DO MORE VOICE SERVICES"
"Service" "Total Events" "Event Type" "Total"
"Total Do More Voice Services Charges" $0.00
"PAGER SERVICES"
"Service" "Total Messages" "Included Messages" "Chargeable Messages" "Total"
"Total Pager Charges" $0.00
"VALUE-ADDED SERVICES" 01-Oct-23 to 31-Oct-23
"Service" "Total"
"Can - Can/US LD $0.80/min (01-Oct-23 to 31-Oct-23)" $0.00
"Easy Roam INTL - $16/day Business (01-Oct-23 to 31-Oct-23)" $0.00
"Easy Roam US - $14/day Business (01-Oct-23 to 31-Oct-23)" $0.00
"UL Can - Can LD min (01-Oct-23 to 31-Oct-23)" $0.00
"UL domestic SMS / MMS (01-Oct-23 to 31-Oct-23)" $0.00
"Visual Voicemail (01-Oct-23 to 31-Oct-23)" $0.00
"Total Value Added Service Charges" $0.00
"OTHER CHARGES AND CREDIT"
"Charge or Credit" "Total"
"Total Other Charges and Credits" $0.00
"OTHER FEES"
"Service" "Total"
"Other Fees" $0.00
"TAXES"
"" "Total"
"GST" $3.12
"PST - BC" $4.37
"Total Taxes" $7.49
经过大量的尝试,我使用了以下步骤;
- 筛选的行(删除了顶行)
- 添加了一个索引列,然后是一个条件列以返回“-----”的值,然后我“填充”了它。
- 再次过滤以仅保留实际需要的行
- 按分隔符(制表符)拆分单列,因为这就是“.txt”文件的显示方式
- 删除了(最终会是)标题列,因为它惹恼了我。
- 对条件列中的行进行分组并向下钻取以获取列表
- 最后一步是使用列表中的“Table.Combine”。
所以,现在我按应有的方式显示行,但这是下一个问题。并非每个用户 (C:) 都有相同的行数(费用),因此数据会溢出到许多不再位于应有的位置的列上。
有什么办法可以解决这个问题吗?这样的东西更适合 Python 吗?
答:
2赞
Sam Nseir
11/17/2023
#1
看看这是否有助于推动你前进......
let
Source = Csv.Document(File.Contents("C:\5780859.txt"),[Delimiter="#(tab)", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Customer", each if [Column1] = "C:" then [Column2] else null),
#"Filled Down" = Table.FillDown(#"Added Custom",{"Customer"}),
#"Filtered Rows" = Table.SelectRows(#"Filled Down", each [Customer] <> null and [Customer] <> ""),
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each [Column1] <> null and [Column1] <> ""),
#"Added Custom1" = Table.AddColumn(#"Filtered Rows1", "Section", each if [Column1] = Text.Upper([Column1]) then [Column1] else null),
#"Filled Down1" = Table.FillDown(#"Added Custom1",{"Section"}),
#"Filtered Rows2" = Table.SelectRows(#"Filled Down1", each ([Section] = "C:" or [Section] = "CURRENT CHARGES" or [Section] = "OTHER CHARGES AND CREDIT" or [Section] = "OTHER FEES" or [Section] = "TAXES")),
#"Filtered Rows3" = Table.SelectRows(#"Filtered Rows2", each ([Column2] <> null and [Column2] <> "") and ([Column1] = "C:" or [Column1] = "Subscriber Name:" or [Section] <> "C:"))
in
#"Filtered Rows3"
评论
0赞
CountryBoy_71
11/17/2023
非常感谢你,我相信这将使我足够接近我需要的地方。对于缺少示例文件,我深表歉意...我保证下次会做得更好。
0赞
horseyride
11/17/2023
点赞,因为我不知道你是如何从这个问题的大杂烩中得到一个可接受的解决方案的。干得好
0赞
Davide Bacci
11/17/2023
同意 horseyride - 干得好,点赞。
评论