提问人: 提问时间:6/13/2023 最后编辑:Peter Seliger 更新时间:6/26/2023 访问量:128
如何使用自定义正则表达式拆分字符串?
How can I split string using custom regex?
问:
let myString = "Hello 'How are you' foo bar abc 'Strings are cool' d b s ;12gh gh76;"
const myRegEx = / \w+ "\w* +" | ;\w +; +/g // This what i have figured but its not working :(
const splitedString = myString.split(myRegEx)
console.log(splitedString)
预期输出:
["Hello", "How are you", "foo", "bar", abc, "Strings are cool", "d", "b", "s", "12gh-gh76"]
让我试着解释更多:
首先,所有基于空格 “ ” 拆分整个字符串,除了 里面的字符串 或 ,如:''
;;
"Hello 'Yo what's up'"
--> ["Hello", "Yo-what's-up"]
(注意:这里是额外的内容,所以也要处理它。'
'
然后,如果字符串在里面,那么concat(我相信这是正确的名称)它,就像:;;
-
Hello ;hi there;
--> ["Hello", "hi-there"]
最后返回一个包含所有格式化的数组......作为预期的输出。
答:
您可以使用而不是拆分来查找引号、分号对或使用正则表达式 ([';]) 分隔的单词的匹配内容。+?\1|\w+
.matchAll
然后取下包装并在需要的地方替换空格。
const myRegEx = new RegExp(/([';]).+?\1|\w+/gm)
const message = "Hello 'How are you' foo bar abc 'Strings are cool' d b s ;12gh gh76; ;a 'b c' d; 'a ;b c; d' d" // Try edit me
const matches = Array.from(message.matchAll(myRegEx))
const finalResult = matches.map(str => {
const value = str.shift()
if(value.match(/^;.*;$/))
return value.substring(1, value.length-1).replaceAll(' ', '-')
else if(value.match(/^'.*'$/))
return value.substring(1, value.length-1)
else
return value
})
// Log to console
console.log(finalResult)
请注意,此解决方案的工作原理是假设包装器(引号和分号)未嵌套。
如果你需要考虑嵌套包装器,正则表达式不是这项工作的最佳工具,因为你需要检查“括号”-平衡,虽然使用正则表达式是可能的,但更简单的方法可以做到这一点。
评论
a ;b c; d
--> a b-c d
;
-
b c
'abc ; def ghi; jkl'
;a 'b c' d;
您可以捕获要重新格式化的部件,然后在处理它们后检查捕获组编号:
'([^']+(?:'[^\s'][^']*)*)'|;([^;]+);|\S+
模式匹配:
'
火柴'
(
捕获组 1[^']+'
匹配 1+ 个字符,后跟'
'
(?:'[^\s'][^']*)*
(可选)重复单个非空格字符,后跟可选字符,而不是'
'
)
关闭组'
火柴'
|
或;([^;]+);
从组 2 中匹配并捕获内部内容;...;
|
或\S+
匹配 1+ whitspace 字符
const regex = /'([^']+(?:'[^\s'][^']*)*)'|;([^;]+);|\S+/g;
const s = `Hello 'Yo what's up'`;
[
`Hello 'Yo what's up'`,
`Hello 'How are you' foo bar abc 'Strings are cool' d b s ;12gh gh76;`,
`Hello ;hi there;`
].forEach(s =>
console.log(
Array.from(
s.matchAll(regex), m => {
if (m[1]) return m[1]
else if (m[2]) return m[2].replace(/\s+/g, "-");
else return m[0];
}
)
)
);
评论
''
-
人们至少需要两种折叠的方法
首先,必须通过用单个破折号替换其每个空格序列来替换
任何分号分隔的范围,这看起来像......
`Hello 'how\\'re you feeling' foo bar abc 'Strings are cool' d b s ;12gh gh76;`
.replace(/;([^;]*);/g, (match, capture) => capture.replace(/\s+/g, '-'))
...正则表达式在哪里....../;([^;]*);/g
...结果将是......
"Hello 'how\\'re you feeling' foo bar abc 'Strings are cool' d b s 12gh-gh76"
其次,需要想出一个可以同时处理两者的拆分正则表达式,在任何空格(序列)处拆分
,但前提是它不是单引号封闭子字符串的一部分。后者需要被捕获才能在分裂时保存。然后,上面的示例代码继续看起来像...
`Hello 'how\\'re you feeling' foo bar abc 'Strings are cool' d b s ;12gh gh76;`
.replace(/;([^;]*);/g, (match, capture) => capture.replace(/\s+/g, '-'))
.split(/'(.*?(?<!\\))'|\s+/)
...其中,拆分正则表达式是....../'(.*?(?<!\\))'|\s+/
...生成的数组确实包含大量空值,例如空字符串值和未定义值。因此,该任务需要伴随着一个基于reduce
的清理任务......split
`Hello 'how\\'re you feeling' foo bar abc 'Strings are cool' d b s ;12gh gh76;`
.replace(/;([^;]*);/g, (match, capture) => capture.replace(/\s+/g, '-'))
.split(/'(.*?(?<!\\))'|\s+/)
.reduce((result, item) => item && result.concat(item) || result, [])
下一个提供的示例代码只是证明了上述方法的解释......
const sampleString =
`Hello 'how\\'re you feeling' foo bar abc 'Strings are cool' d b s ;12gh gh76;`;
// see ... [https://regex101.com/r/ZShVPL/1]
const regXSplitAlternation = /'(.*?(?<!\\))'|\s+/;
// see ... [https://regex101.com/r/ZShVPL/2]
const regXSemicolonRange = /;([^;]*);/g
console.log(
sampleString
// first ...
// ... replace any semicolon delimited range by replacing
// each of its whitespace sequence(s) with a single dash.
.replace(regXSemicolonRange, (match, capture) => capture.replace(/\s+/g, '-'))
);
console.log(
sampleString
.replace(regXSemicolonRange, (match, capture) => capture.replace(/\s+/g, '-'))
// second ...
// ... split the intermediate replacement string at
// - either a single quoted character sequence (capturing it)
// - or a whitespace (sequence) (not capturing the latter).
.split(regXSplitAlternation)
// ... and third ... do omit any empty (undefined, empty string) item.
.reduce((result, item) => item && result.concat(item) || result, [])
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
评论
'a ;b c; d'
;a 'b c' d;
''
;;
-
"Hey 'hwy hi bye' ;lol 123; "
"Wow ;what a great catch; 'yay that was funny' 'ooh that's realy bad' 'oh my god'"
"Realy 'you wanna do that?' ;no i don't; ;yes you do; 'I said, no!'"
([';])。+?\1|\w+
,然后遍历结果,删除换行引号,并更改为 interdependently。matchAll
;a b;
a-b
;abc def; -> abc-def