提问人:Simon Harmel 提问时间:11/11/2023 最后编辑:Simon Harmel 更新时间:11/17/2023 访问量:79
将文本作为数据帧读取到 R 中
reading text as a dataframe into R
问:
在下面,要求列在其元素之间没有任何空格。例如,如果一个元素是 ,则会抛出错误,直到用户删除空格,如 。DATA
read.table()
study
"Hayati & Jalilifar"
read.table()
"Hayati&Jalilifar"
但是有没有办法在不需要删除任何数据元素之间的空白的情况下阅读以下内容?read.table()
DATA
DATA = read.table(header=TRUE, text =
"study year g v_g assign_type n_class Nt Nc
Hayati & Jalilifar 2009 0.213 0.101 student NA 20 20
Hayati & Jalilifar 2009 0.785 0.108 student NA 20 20
Hale & Courtney 1994 -0.894 0.0154 class 4 286 286
Hale & Courtney 1994 0.946 0.0156 class 4 286 286
Hale & Courtney 1994 -0.237 0.0146 class 4 277 277
Hale & Courtney 1994 -0.179 0.0146 class 4 277 277")
答:
2赞
stefan
11/11/2023
#1
您的数据似乎是固定宽度的格式。根据您的示例数据,这里是一种几乎完美的方法,除了它将第一列拆分为两列之外,它几乎完美。此外,它需要第二步来获取列名:readr::read_fwf
library(readr)
library(dplyr, warn = FALSE)
tmp <- tempfile()
writeLines(
text = "study year g v_g assign_type n_class Nt Nc
Hayati & Jalilifar 2009 0.213 0.101 student NA 20 20
Hayati & Jalilifar 2009 0.785 0.108 student NA 20 20
Hale & Courtney 1994 -0.894 0.0154 class 4 286 286
Hale & Courtney 1994 0.946 0.0156 class 4 286 286
Hale & Courtney 1994 -0.237 0.0146 class 4 277 277
Hale & Courtney 1994 -0.179 0.0146 class 4 277 277",
tmp
)
dat <- readr::read_fwf(
file = tmp, skip = 1
) |>
mutate(X1 = paste(X1, X2), .keep = "unused")
names(dat) <- readr::read_table(tmp, n_max = 0) |> names()
dat
#> # A tibble: 6 × 8
#> study year g v_g assign_type n_class Nt Nc
#> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 Hayati & Jalilifar 2009 0.213 0.101 student NA 20 20
#> 2 Hayati & Jalilifar 2009 0.785 0.108 student NA 20 20
#> 3 Hale & Courtney 1994 -0.894 0.0154 class 4 286 286
#> 4 Hale & Courtney 1994 0.946 0.0156 class 4 286 286
#> 5 Hale & Courtney 1994 -0.237 0.0146 class 4 277 277
#> 6 Hale & Courtney 1994 -0.179 0.0146 class 4 277 277
1赞
Friede
11/11/2023
#2
我有时使用一种解决方法:base-R
txt <- "study year g v_g assign_type n_class Nt Nc
Hayati & Jalilifar 2009 0.213 0.101 student NA 20 20
Hayati & Jalilifar 2009 0.785 0.108 student NA 20 20
Hale & Courtney 1994 -0.894 0.0154 class 4 286 286
Hale & Courtney 1994 0.946 0.0156 class 4 286 286
Hale & Courtney 1994 -0.237 0.0146 class 4 277 277
Hale & Courtney 1994 -0.179 0.0146 class 4 277 277"
data <- read.table(text = txt, header = FALSE, skip = 1L)
data$V1 <- with(data, paste(V1, V2, V3))
data[, c("V2", "V3")] <- list(NULL)
colnames(data) <- read.table(text = txt, nrows = 1L)
给
> head(data)
study year g v_g assign_type n_class Nt Nc
1 Hayati & Jalilifar 2009 0.213 0.1010 student NA 20 20
2 Hayati & Jalilifar 2009 0.785 0.1080 student NA 20 20
3 Hale & Courtney 1994 -0.894 0.0154 class 4 286 286
4 Hale & Courtney 1994 0.946 0.0156 class 4 286 286
5 Hale & Courtney 1994 -0.237 0.0146 class 4 277 277
6 Hale & Courtney 1994 -0.179 0.0146 class 4 277 277
3赞
Onyambu
11/11/2023
#3
将文本保存在变量中,并使用以下命令:
read.table(text=gsub("(\\S+\\s+[&]\\s+\\S+)", "'\\1'", txt), header = TRUE)
study year g v_g assign_type n_class Nt Nc
1 Hayati & Jalilifar 2009 0.213 0.1010 student NA 20 20
2 Hayati & Jalilifar 2009 0.785 0.1080 student NA 20 20
3 Hale & Courtney 1994 -0.894 0.0154 class 4 286 286
4 Hale & Courtney 1994 0.946 0.0156 class 4 286 286
5 Hale & Courtney 1994 -0.237 0.0146 class 4 277 277
6 Hale & Courtney 1994 -0.179 0.0146 class 4 277 277
txt <- "study year g v_g assign_type n_class Nt Nc
Hayati & Jalilifar 2009 0.213 0.101 student NA 20 20
Hayati & Jalilifar 2009 0.785 0.108 student NA 20 20
Hale & Courtney 1994 -0.894 0.0154 class 4 286 286
Hale & Courtney 1994 0.946 0.0156 class 4 286 286
Hale & Courtney 1994 -0.237 0.0146 class 4 277 277
Hale & Courtney 1994 -0.179 0.0146 class 4 277 277"
评论
1赞
stefan
11/11/2023
真的很好,很优雅。
1赞
SAL
11/17/2023
#4
该函数有两个有用的参数,用于读取此类数据,其中有很多空格和...:它们是和。在本例中,第一个应设置为 。为了防止从基本 R 进行任何自动转换,您还可以添加参数 。这就是您需要做的:read.table()
strip.white=
sep="\t"
TRUE
check.names=F
DATA = read.table(header=TRUE, check.names = F, strip.white = T, sep = "\t", text =
"study year g v_g assign_type n_class Nt Nc
Hayati & Jalilifar 2009 0.213 0.101 student NA 20 20
Hayati & Jalilifar 2009 0.785 0.108 student NA 20 20
Hale & Courtney 1994 -0.894 0.0154 class 4 286 286
Hale & Courtney 1994 0.946 0.0156 class 4 286 286
Hale & Courtney 1994 -0.237 0.0146 class 4 277 277
Hale & Courtney 1994 -0.179 0.0146 class 4 277 277")
因此,您必须没有错误,这是您可能正在寻找的数据输出:
DATA
study year g v_g assign_type n_class Nt Nc
1 Hayati & Jalilifar 2009 0.213 0.101 student NA 20 20
2 Hayati & Jalilifar 2009 0.785 0.108 student NA 20 20
3 Hale & Courtney 1994 -0.894 0.0154 class 4 286 286
4 Hale & Courtney 1994 0.946 0.0156 class 4 286 286
5 Hale & Courtney 1994 -0.237 0.0146 class 4 277 277
6 Hale & Courtney 1994 -0.179 0.0146 class 4 277 277
评论