提问人:Aku-Ville Lehtimäki 提问时间:10/18/2023 最后编辑:Aku-Ville Lehtimäki 更新时间:10/18/2023 访问量:35
从 R 中的文本文件中读取顶级块
Reading top level blocks from a text file in R
问:
我正在使用包含块的 Rwith 文件,例如
block name { block contents can be anything: strings, numbers or even curly braces {} or whatever}
blockn4m3 containing numbers {
Can be something junk like:
ans{a{a[sf'asödfä'asdösdö'äasdö'äasdö}}}
}}
然后我想将它们提取到一个向量中,以便:
"block name { block contents can be anything strings, numbers or even brackets {} or whatever}","blockn4m3 containing numbers {
Can be something junk like:
ans{a{a[sf'asödfä'asdösdö'äasdö'äasdö}}}
}}"
我假设正则表达式不起作用,因为块中可以有大括号(和嵌套块)?
所以我想也许我只是逐个字符读取每个文件,然后我写了一个以下函数:
separateBlocksFromFile <- \(file) {
input <- file %>% readLines %>% {paste(., collapse = "\n")}
blocks <- c()
blockNumber = 1 #We start from the first block
netBracketValue = 0 #0, when reading a block name
for(i in 1:nchar(input)) {
currentCharacter = substr(input,i,i)
#Did we enter a block?
netBracketValue = netBracketValue + (currentCharacter == "{")
#Write the character into its correct place.
#Previous characters in the current block...
previousCharacters <- ifelse(is.na(blocks[blockNumber]),"",blocks[blockNumber])
#...are put before current character
blocks[blockNumber] <- paste0(previousCharacters,currentCharacter)
#Did we exit a block? If so, the netBracketValue becomes 0 here.
netBracketValue = netBracketValue - (currentCharacter == "}")
#Block number is updated, if needed.
#Updated when we pass "}" character and the character ends a block i.e.
#netBracketValue == 0
blockNumber <- blockNumber + (netBracketValue == 0)*(currentCharacter == "}")
}
return(blocks)
}
虽然这可行,但在处理较大的文件时,解决方案往往有点慢。我想知道是否有更快的方法来实现这一目标?
编辑:块内容在打开 { 之前不能有关闭 }。如果是这样的话,那么就无法确定我们是否退出了区块。
答: 暂无答案
评论
substring()