提问人:Florin 提问时间:10/28/2023 最后编辑:Mayukh BhattacharyaFlorin 更新时间:10/29/2023 访问量:233
在 Excel 中使用 3 种类型的分隔符分隔数据 - 空格、连字符和 =>
Separate data in Excel with 3 types of delimiters - Space, hyphen and =>
问:
我最近收到了一个 Excel 文件,我正在努力分离佣金列,以使我将来的工作更轻松。 我有 3 种不同类型的百分比分隔符 - 空格、连字符和 =>。
这是我的示例数据:
供应商 | 日期 | 委员会 |
---|---|---|
阿德科斯 | 1/1/2023 | intre 4-6 密耳 $ - 2% 6-10 密耳 $ - 3% >10 密耳 $ - 5% |
吉格尔 | 5/7/2023 | intre 3-5 密耳 $ => 3% 5-8 密耳 $ => 5% >8 密耳 $ => 7% |
弗洛里斯 | 4/4/2023 | intre 0.5-2 mil $ 1% 2-3 mil $ 2% 3-5 mil $ 3.5% 5-6 mil $ 4.5% >6 mil $ 6% |
我尝试使用Power Query使用textafter函数,texsplit函数,但我无法获取所需输出中的数据,如您所见:
供应商 | 日期 | 委员会 | 百分之 |
---|---|---|---|
阿德科斯 | 1/1/2023 | 4-6 密耳 $ | 2% |
阿德科斯 | 1/1/2023 | 6-10 密耳 $ | 3% |
阿德科斯 | 1/1/2023 | >10 密耳 $ | 5% |
吉格尔 | 5/7/2023 | 3-5 密耳 $ | 3% |
吉格尔 | 5/7/2023 | 5-8 密耳 $ | 5% |
吉格尔 | 5/7/2023 | >8 密耳 $ | 7% |
弗洛里斯 | 4/4/2023 | 0-2 密耳 $ | 1% |
弗洛里斯 | 4/4/2023 | 2-3 密耳 $ | 2% |
弗洛里斯 | 4/4/2023 | 3-5 密耳 $ | 3.5% |
弗洛里斯 | 4/4/2023 | 5-6 密耳 $ | 4.5% |
弗洛里斯 | 4/4/2023 | >6 密耳 $ | 6% |
您能否使用 Office 365 或 Power Query 中的公式帮助解决任何解决方案?
先谢谢你。
答:
我相信有更干净的方法,但这里有一个解决方案。
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("XY/dDoIwDEZfpVncHWU/XcEbn2RyYXCSJSgJ6Ps7AkPhrv3WntN5L2730LbDNIlCGGWU1ZZSGV/vMYDDCp6xhxMgWAkVGr31JOH60ZrCX8ZSNIUXXexCnyCs6j2PkNfZy7I7UxjPh5Az+vhQL/xHP4xxPtgptxfoktGuS0aCRVqbdP1PTiXP2vw3V27CHFVJ1HwB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [supplier = _t, date = _t, commision = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"supplier", type text}, {"date", type date}, {"commision", type text}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type"," - ","|",Replacer.ReplaceText,{"commision"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value"," => ","|",Replacer.ReplaceText,{"commision"}),
#"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1","$ ","$|",Replacer.ReplaceText,{"commision"}),
#"Replaced Value3" = Table.ReplaceValue(#"Replaced Value2","%","%~",Replacer.ReplaceText,{"commision"}),
#"Replaced Value4" = Table.ReplaceValue(#"Replaced Value3","intre","",Replacer.ReplaceText,{"commision"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value4", {{"commision", Splitter.SplitTextByDelimiter("~", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "commision"),
#"Filtered Rows" = Table.SelectRows(#"Split Column by Delimiter", each [commision] <> ""),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Filtered Rows", "commision", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"commision", "percent"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"commision", type text}, {"percent", Percentage.Type}})
in
#"Changed Type1"
不优雅,但它有效:
=LET(res,
WRAPROWS(
TOCOL(
TEXTSPLIT(
TEXTAFTER(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(
SUBSTITUTE(TRIM(C2:C4),"% ","%|"&A2:A4&"|"&B2:B4&"|"),
"$ -","$"),
"$ =>","$"),
"$","$|"),
"intre","|"&A2:A4&"|"&B2:B4&"|"),
"|",SEQUENCE(,99)),
"|"),
2),
4),
IFERROR(--res,res))
或使用 lambda 作为重复的 SUBSTITUTE:
=LET(a,{"% ";"$ -";"$ =>";"$";"intre"},
b,{"%|#|";"$";"$";"$|";"|#|"},
res,
WRAPROWS(
TOCOL(
TEXTSPLIT(
TEXTAFTER(
SUBSTITUTE(
REDUCE(C2:C4,SEQUENCE(ROWS(a)),
LAMBDA(x,y,
SUBSTITUTE(x,INDEX(a,y),
INDEX(b,y)))),
"#",
A2:A4&"|"&B2:B4),
"|",SEQUENCE(,99)),
"|"),
2),
4),
IFERROR(--res,res))
评论
REDUCE(C2:C4,a,LAMBDA(x,y,SUBSTITUTE(x,y,XLOOKUP(y,a,b)))
也许有更优雅的方法可以做到这一点,也许这是您可以尝试的另一种方法。但是,就像我每次说的那样,会尝试即兴创作更多,
• 细胞中使用的配方E1
=LET(
α, A2:C4,
REDUCE(HSTACK("Supplier","Date","Commission","Percent"),SEQUENCE(ROWS(α)),
LAMBDA(φ,ε,VSTACK(φ,IFERROR(HSTACK(INDEX(α,ε,{1,2}),
LET(_α, WRAPROWS(TOCOL(TEXTSPLIT(TEXTAFTER(INDEX(α,ε,3),"intre "),{" - ","=>"," "},,1),2),4),
_φ, BYROW(TAKE(_α,,3),LAMBDA(p,TEXTJOIN(" ",,p))),
_ε, DROP(_α,,3)/1,
HSTACK(_φ,_ε))),
INDEX(α,ε,{1,2}))))))
在Power Query中,还可以在单个步骤中按多个分隔符进行拆分。然后,您可以组合事物以返回佣金和百分比
let
Source = Excel.CurrentWorkbook(){[Name="Table4"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"supplier", type text}, {"date", type date}, {"commission", type text}}),
//Split by multiple delimiters
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Splitter.SplitTextByAnyDelimiter({" "," - ","=>"})([commission])),
//Create lists of the commissions and percentages
#"Split Commission" = Table.AddColumn(#"Added Custom", "Comission", (r)=>
let
pos = List.PositionOf(r[Custom],"$",Occurrence.All),
coms = List.Accumulate(
pos,
{},
(s,c)=>s & {Text.Combine(List.Range(r[Custom],c-2,3)," ")}),
pct =List.Select(r[Custom], each Text.EndsWith(_,"%"))
in
List.Zip({coms,pct})),
//Remove unneeded columns
#"Removed Columns" = Table.RemoveColumns(#"Split Commission",{"commission", "Custom"}),
//Expand to new rows and then to the columns
#"Expanded Comission" = Table.ExpandListColumn(#"Removed Columns", "Comission"),
#"Extracted Values" = Table.TransformColumns(#"Expanded Comission",
{"Comission", each Text.Combine(List.Transform(_, Text.From), "#(tab)"), type text}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Extracted Values", "Comission",
Splitter.SplitTextByDelimiter("#(tab)", QuoteStyle.Csv), {"Comission", "Percent"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Comission", type text}, {"Percent", Percentage.Type}})
in
#"Changed Type1"
这里已经有很多答案了,但只是为了踢球,这里还有一个使用 lambda 递归的答案:LET()
=LET(
F, LAMBDA(X,arr,[val],[num],
LET(
n, IF(ISOMITTED(num), 1, num),
a, INDEX(arr, n, {1,2}),
b, WRAPROWS(TRIM(TEXTSPLIT(TEXTAFTER(INDEX(arr, n, 3),"intre"),,{"$ -","$ =>","$","%"},TRUE)),2),
c, IFNA(HSTACK(a, IFERROR(b/100, b&" $")), a),
v, IF(ISOMITTED(val), c, VSTACK(val, c)),
IF(n<ROWS(arr), X(X, arr, v, n+1), v))),
F(F, A2:C4, HSTACK(A1:C1, "percent")))
结果:
编辑:
正如 @DavidLeal 在下面的评论中指出的那样,由于 Excel 中递归 lambda 的局限性,此技术只能用于小型数据集。不过,幸运的是,上面公式中使用的逻辑与公式中的逻辑非常相似,可以很容易地重写如下:REDUCE()
=LET(
arr, A2:C4,
REDUCE(HSTACK(A1:C1, "percent"), SEQUENCE(ROWS(arr)), LAMBDA(v,n,
LET(
a, INDEX(arr, n, {1,2}),
b, WRAPROWS(TRIM(TEXTSPLIT(INDEX(arr, n, 3),,{"intre","$ -","$ =>","$","%"},TRUE)),2),
VSTACK(v, IFNA(HSTACK(a, IFERROR(b/100, b&" $")), a))))))
我还通过删除 [row_delimiter] 数组并将其添加到 [intre] 数组中来简化文本操作过程。TEXTAFTER()
评论
1,024
REDUCE/VSTACK
与 @P.b 的方法类似,它没有 的固定大小 ,而是标识最大可能的长度,并分别处理前两列:99
=LET(A,A2:A4, B,B2:B4, C,C2:C4,
old,{"$","%"," => "," - ","intre "}, new, {"$,","%;","","",","},sep, {",",";"},
clean, REDUCE(C, old, LAMBDA(ac,a,SUBSTITUTE(ac,a,XLOOKUP(a,old,new)))),
nRec, LEN(clean) - LEN(SUBSTITUTE(clean, ";","")),
split, TEXTSPLIT(TEXTAFTER(clean,sep, SEQUENCE(,2*MAX(nRec))),sep),
seq, 1*MID(CONCAT(REPT(SEQUENCE(ROWS(nRec)), nRec)), SEQUENCE(SUM(nRec)),1),
VSTACK({"supplier","date","commision","percentage"},
HSTACK(INDEX(A,seq), INDEX(B,seq), TRIM(WRAPROWS(TOCOL(split,2),2)))))
clean
准备 C 列来分隔每条记录 (),并在记录中分隔每个值 (),即佣金和百分比。 计算 C 列每行的记录数。变量 考虑两个分隔符:记录和值(如果您希望列中出现此类值,则可以使用其他方便的分隔符)。每个分隔符的实例编号是通过考虑具有最大记录数的行来确定的。;
,
nRec
sep
SEQUENCE(,2*MAX(nRec)))
我们使用的技巧不适用于 2D 数组,它仅按第一个元素进行拆分,在这种情况下,可以方便地删除输出的剩余部分。TEXTSPLIT
TEXTAFTER
为了根据记录数生成前两列 () 的重复项,我们使用以下命令:seq
1*MID(CONCAT(REPT(SEQUENCE(ROWS(nRec)), nRec)), SEQUENCE(SUM(nRec)),1)
您也可以改用以下命令:
idx, SEQUENCE(ROWS(nRec)),
seq, INDEX(idx,XMATCH(SEQUENCE(SUM(nRec)),MMULT(N(idx>=TOROW(idx)),nRec),1,2))
更长,但对于大型数据集来说可能更有效。在这两种情况下,它都会生成序列:该序列将用于 A 列和 B 列。根据在 C 列的每一行上找到的记录数生成重复值。1;1;1;2;2;2;;3;3;3;3;3
INDEX
现在我们准备好准备输出:将 C 列值排列到预期的输出。我们通过第二个输入参数删除数量较少或记录。TRIM(WRAPROWS(TOCOL(split,2),2))
N/A
TOCOL
最后,我们使用 和 来适应最终输出。VSTACK
HSTACK
评论