将文本作为数据帧读入 R

reading text as a dataframe into R

提问人:Simon Harmel 提问时间:11/11/2023 最后编辑:Simon Harmel 更新时间:11/17/2023 访问量:81

问:

在我的下面,要求列在其元素之间没有任何空格。例如,如果元素是 ,则抛出错误,直到用户删除空格,如 。DATAread.table()study"Hayati & Jalilifar"read.table()"Hayati&Jalilifar"

但是有没有办法阅读以下内容,而无需按原样删除任何数据元素之间的空格?read.table()DATA

DATA = read.table(header=TRUE, text = 
 "study               year  g     v_g    assign_type  n_class Nt    Nc
  Hayati & Jalilifar  2009  0.213 0.101  student      NA      20    20
  Hayati & Jalilifar  2009  0.785 0.108  student      NA      20    20
  Hale & Courtney     1994 -0.894 0.0154 class        4       286   286
  Hale & Courtney     1994  0.946 0.0156 class        4       286   286
  Hale & Courtney     1994 -0.237 0.0146 class        4       277   277
  Hale & Courtney     1994 -0.179 0.0146 class        4       277   277")
R 数据帧 函数 csv tidyverse

评论

0赞 Richard Summers 11/11/2023
这是 R 中包的输出吗?如果是这样,您可以将其作为数据帧分配给变量,并以这种方式访问表。

答:

2赞 stefan 11/11/2023 #1

您的数据似乎是固定宽度的格式。根据您的示例数据,这里是一种几乎完美的方法,除了它将第一列拆分为两列之外,它几乎完美。此外,它需要第二步来获取列名:readr::read_fwf

library(readr)
library(dplyr, warn = FALSE)

tmp <- tempfile()

writeLines(
  text = "study               year  g     v_g    assign_type  n_class Nt    Nc
  Hayati & Jalilifar  2009  0.213 0.101  student      NA      20    20
  Hayati & Jalilifar  2009  0.785 0.108  student      NA      20    20
  Hale & Courtney     1994 -0.894 0.0154 class        4       286   286
  Hale & Courtney     1994  0.946 0.0156 class        4       286   286
  Hale & Courtney     1994 -0.237 0.0146 class        4       277   277
  Hale & Courtney     1994 -0.179 0.0146 class        4       277   277",
  tmp
)

dat <- readr::read_fwf(
  file = tmp, skip = 1
) |>
  mutate(X1 = paste(X1, X2), .keep = "unused")


names(dat) <- readr::read_table(tmp, n_max = 0) |> names()

dat
#> # A tibble: 6 × 8
#>   study               year      g    v_g assign_type n_class    Nt    Nc
#>   <chr>              <dbl>  <dbl>  <dbl> <chr>         <dbl> <dbl> <dbl>
#> 1 Hayati & Jalilifar  2009  0.213 0.101  student          NA    20    20
#> 2 Hayati & Jalilifar  2009  0.785 0.108  student          NA    20    20
#> 3 Hale & Courtney     1994 -0.894 0.0154 class             4   286   286
#> 4 Hale & Courtney     1994  0.946 0.0156 class             4   286   286
#> 5 Hale & Courtney     1994 -0.237 0.0146 class             4   277   277
#> 6 Hale & Courtney     1994 -0.179 0.0146 class             4   277   277
1赞 Friede 11/11/2023 #2

我有时使用一种解决方法:base-R

txt <- "study               year  g     v_g    assign_type  n_class Nt    Nc
  Hayati & Jalilifar  2009  0.213 0.101  student      NA      20    20
  Hayati & Jalilifar  2009  0.785 0.108  student      NA      20    20
  Hale & Courtney     1994 -0.894 0.0154 class        4       286   286
  Hale & Courtney     1994  0.946 0.0156 class        4       286   286
  Hale & Courtney     1994 -0.237 0.0146 class        4       277   277
  Hale & Courtney     1994 -0.179 0.0146 class        4       277   277"

data <- read.table(text = txt, header = FALSE, skip = 1L)
data$V1 <- with(data, paste(V1, V2, V3))
data[, c("V2", "V3")] <- list(NULL)
colnames(data) <- read.table(text = txt, nrows = 1L)

> head(data)
               study year      g    v_g assign_type n_class  Nt  Nc
1 Hayati & Jalilifar 2009  0.213 0.1010     student      NA  20  20
2 Hayati & Jalilifar 2009  0.785 0.1080     student      NA  20  20
3    Hale & Courtney 1994 -0.894 0.0154       class       4 286 286
4    Hale & Courtney 1994  0.946 0.0156       class       4 286 286
5    Hale & Courtney 1994 -0.237 0.0146       class       4 277 277
6    Hale & Courtney 1994 -0.179 0.0146       class       4 277 277
3赞 Onyambu 11/11/2023 #3

将文本保存在变量中,并使用以下命令:

 read.table(text=gsub("(\\S+\\s+[&]\\s+\\S+)", "'\\1'", txt), header = TRUE)
               study year      g    v_g assign_type n_class  Nt  Nc
1 Hayati & Jalilifar 2009  0.213 0.1010     student      NA  20  20
2 Hayati & Jalilifar 2009  0.785 0.1080     student      NA  20  20
3    Hale & Courtney 1994 -0.894 0.0154       class       4 286 286
4    Hale & Courtney 1994  0.946 0.0156       class       4 286 286
5    Hale & Courtney 1994 -0.237 0.0146       class       4 277 277
6    Hale & Courtney 1994 -0.179 0.0146       class       4 277 277

txt <- "study               year  g     v_g    assign_type  n_class Nt    Nc
  Hayati & Jalilifar  2009  0.213 0.101  student      NA      20    20
  Hayati & Jalilifar  2009  0.785 0.108  student      NA      20    20
  Hale & Courtney     1994 -0.894 0.0154 class        4       286   286
  Hale & Courtney     1994  0.946 0.0156 class        4       286   286
  Hale & Courtney     1994 -0.237 0.0146 class        4       277   277
  Hale & Courtney     1994 -0.179 0.0146 class        4       277   277"

评论

1赞 stefan 11/11/2023
真的很好,很优雅。
1赞 SAL 11/17/2023 #4

该函数有两个有用的参数,用于读取此类数据,其中有很多空格和...:它们是和。在本例中,第一个应设置为 。为了防止从基本 R 进行任何自动转换,您还可以添加参数 。这就是您需要做的:read.table()strip.white=sep="\t"TRUEcheck.names=F

DATA = read.table(header=TRUE, check.names = F, strip.white = T, sep = "\t", text = 
                      "study               year  g     v_g    assign_type  n_class Nt    Nc
  Hayati & Jalilifar  2009  0.213 0.101  student      NA      20    20
  Hayati & Jalilifar  2009  0.785 0.108  student      NA      20    20
  Hale & Courtney     1994 -0.894 0.0154 class        4       286   286
  Hale & Courtney     1994  0.946 0.0156 class        4       286   286
  Hale & Courtney     1994 -0.237 0.0146 class        4       277   277
  Hale & Courtney     1994 -0.179 0.0146 class        4       277   277")

因此,您必须没有错误,这是您可能正在寻找的数据输出:

DATA
   study               year  g     v_g    assign_type  n_class Nt    Nc
1  Hayati & Jalilifar  2009  0.213 0.101  student      NA      20    20
2  Hayati & Jalilifar  2009  0.785 0.108  student      NA      20    20
3 Hale & Courtney     1994 -0.894 0.0154 class        4       286   286
4 Hale & Courtney     1994  0.946 0.0156 class        4       286   286
5 Hale & Courtney     1994 -0.237 0.0146 class        4       277   277
6 Hale & Courtney     1994 -0.179 0.0146 class        4       277   277