修改多个txt文件的数据帧代码

Modify the code of dataframes for multiple txt files

提问人:Alexia k Boston 提问时间:7/18/2023 最后编辑:zx8754Alexia k Boston 更新时间:7/18/2023 访问量:49

问:

以下代码适用于数据帧 df1、df2。代码正在读取数据列 (var) 并查看每个数据帧,如果 var 列不存在,它会将其添加并将 NA 放在该特定列中。

dfs1 <- c('df1','df2')

var <- c('City_Name',  'Temp',  'Pres' , 'Wind_Hor' , 'Wind_Ver' , 'Rainf' , 'S_Moist')

lapply(dfs1, \(x) {
  dfn <- get(x, envir = .GlobalEnv)
  dfn[[var[which(is.na(match(var,names(dfn))))]]] <- NA
  dfn <- dfn %>% select(all_of(var))
  return(assign(x,dfn,envir = .GlobalEnv))
})

如果我有一个文件列表,我该如何修改上面的代码?

我试着跟着走

dfs1 <- list.files(path = 'D:/Test3', pattern = "*txt", recursive = TRUE)
var <- c('D/T', 'City_Name', 'Temp', 'Pres', 'Wind_Hor', 'Wind_Ver', 'Rainf', 'S_Moist')
lapply(dfs1, \(x) {
  dfn <- get(x, envir = .GlobalEnv)
  dfn[[var[which(is.na(match(var,names(dfn))))]]] <- NA
  dfn <- dfn %>% select(all_of(var))
  return(assign(x,dfn,envir = .GlobalEnv))
})

但它返回一个错误:

Error in get(x, envir = .GlobalEnv) :
object 'File/File1.txt' not found

任何人都可以回答如何修改文件的代码。

r lapply rbind

评论

0赞 Mark 7/18/2023
嗨,亚历克西娅。文件夹“File”中的文件 File1.txt 是否存在于您的工作目录中?
0赞 Mark 7/18/2023
此外,如果你能使你的代码可重现,那就太好了!

答:

2赞 Mburu 7/18/2023 #1
library(dplyr)
library(data.table)
# I find it easier to use data.table in this case when it comes to assigning NA's using dt[, (character_vector) := NA]

var <- c('City_Name',  'Temp',  'Pres' , 'Wind_Hor' , 'Wind_Ver' , 'Rainf' , 'S_Moist')

## comment 2 columns 
df1 = data.frame(City_Name = "NYC", 
                 Temp = 20,
                 Pres = 10,
                 #Wind_Hor = 5,
                 Wind_Ver = 5,
                 # Rainf = 10,
                 S_Moist = 5)

## Comment 3
df2 = data.frame(#City_Name = "NYC", 
  Temp = 15,
  #Pres = 15,
  Wind_Hor = 5,
  Wind_Ver = 5,
  Rainf = 15)
#S_Moist = 5)
## put the dfs as a list


dfs1 <- list(df1, df2)


## loop through 



processed_dfs <- lapply(seq_along(dfs1), function(x) {
  
  
  dfn = dfs1[[x]]
  dfn_nms = names(dfn)  
  #get missing column names
  var_missing = var[!var %in% dfn_nms]
  
  setDT(dfn) # convert to data.table 
  
  dfn[, (var_missing) := NA] # asign NA to missing
  
  dfn[, ..var] ## data.table select statement 
})
## combine final output
## dplyr method
final_df <- bind_rows(processed_dfs)

## if you want final as data.table
final_df <- rbindlist(processed_dfs)


# I made the above code to make it reproducible if I wanted to combine it by from reading text files from my disk then

## this will give you the files 
## you need to read in the files
## to read in files create file_paths

dfs1 <- list.files(path = 'D:/Test3',
                   pattern = "*txt", 
                   recursive = TRUE)

## this will create file paths ie appending the name of the file to folder name
dfs1_file_paths = file.path( 'D:/Test3', dfs1)

var <- c('D/T', 'City_Name', 'Temp', 
         'Pres', 'Wind_Hor', 
         'Wind_Ver', 'Rainf', 
         'S_Moist')

processed_dfs <- lapply(seq_along(dfs1), function(x) {
  
  
  file_x = dfs1_file_paths[[x]] ## file path i

  ## read the file
  dfn <- fread(file_x) ## use can also use read.table but you need one more step to convert to data.table
  
  dfn_nms = names(dfn)  
  
  #get missing column names
  var_missing = var[!var %in% dfn_nms]
  

  dfn[, (var_missing) := NA] # asign NA to missing
  
  dfn[, ..var] ## data.table select statement 
})
## combine final output
## dplyr method
final_df <- bind_rows(processed_dfs)

## if you want final as data.table
final_df <- rbindlist(processed_dfs)

# Hope this helps