提问人:mropa 提问时间:2/2/2010 最后编辑:NelsonGonmropa 更新时间:11/1/2022 访问量:298951
将 data.frame 从宽格式调整为长格式
Reshaping data.frame from wide to long format
问:
我在将我的宽表转换为长表时遇到了一些麻烦。
目前它看起来像这样:data.frame
Code Country 1950 1951 1952 1953 1954
AFG Afghanistan 20,249 21,352 22,532 23,557 24,555
ALB Albania 8,097 8,986 10,058 11,123 12,246
现在我想把它变成一个长.
像这样的东西:data.frame
data.frame
Code Country Year Value
AFG Afghanistan 1950 20,249
AFG Afghanistan 1951 21,352
AFG Afghanistan 1952 22,532
AFG Afghanistan 1953 23,557
AFG Afghanistan 1954 24,555
ALB Albania 1950 8,097
ALB Albania 1951 8,986
ALB Albania 1952 10,058
ALB Albania 1953 11,123
ALB Albania 1954 12,246
我已经查看并已经尝试使用和函数
正如一些人在类似问题中建议的那样。
但是,到目前为止,我只得到混乱的结果。melt()
reshape()
如果可能的话,我想用这个函数来做,因为
它看起来处理起来更好一些。reshape()
答:
使用重塑包:
#data
x <- read.table(textConnection(
"Code Country 1950 1951 1952 1953 1954
AFG Afghanistan 20,249 21,352 22,532 23,557 24,555
ALB Albania 8,097 8,986 10,058 11,123 12,246"), header=TRUE)
library(reshape)
x2 <- melt(x, id = c("Code", "Country"), variable_name = "Year")
x2[,"Year"] <- as.numeric(gsub("X", "" , x2[,"Year"]))
reshape()
需要一段时间才能习惯,就像 / 一样。这是一个具有 reshape 的解决方案,假设您的数据框被调用:melt
cast
d
reshape(d,
direction = "long",
varying = list(names(d)[3:7]),
v.names = "Value",
idvar = c("Code", "Country"),
timevar = "Year",
times = 1950:1954)
两种替代解决方案:
1) 使用 data.table:
您可以使用熔融
功能:
library(data.table)
long <- melt(setDT(wide), id.vars = c("Code","Country"), variable.name = "year")
这给了:
> long Code Country year value 1: AFG Afghanistan 1950 20,249 2: ALB Albania 1950 8,097 3: AFG Afghanistan 1951 21,352 4: ALB Albania 1951 8,986 5: AFG Afghanistan 1952 22,532 6: ALB Albania 1952 10,058 7: AFG Afghanistan 1953 23,557 8: ALB Albania 1953 11,123 9: AFG Afghanistan 1954 24,555 10: ALB Albania 1954 12,246
一些替代符号:
melt(setDT(wide), id.vars = 1:2, variable.name = "year")
melt(setDT(wide), measure.vars = 3:7, variable.name = "year")
melt(setDT(wide), measure.vars = as.character(1950:1954), variable.name = "year")
2)整洁:
使用 pivot_longer()
:
library(tidyr)
long <- wide %>%
pivot_longer(
cols = `1950`:`1954`,
names_to = "year",
values_to = "value"
)
注意:
names_to
并分别默认为 和 ,因此您可以将其写得更简洁。values_to
"name"
"value"
wide %>% pivot_longer(`1950`:`1954`)
- 该参数使用高度灵活的 tidyselect DSL,因此您可以使用负选择 ()、选择帮助程序 (; )、数字索引 () 等来选择相同的列。
cols
!c(Code, Country)
starts_with("19")
matches("^\\d{4}$")
3:7
tidyr::pivot_longer()
是 和 的继任者,它们不再处于开发中。tidyr::gather()
reshape2::melt()
转换价值
数据的另一个问题是,R 将把这些值作为字符值读取(作为数字的结果)。您可以在重塑之前使用 和 进行修复:,
gsub
as.numeric
long$value <- as.numeric(gsub(",", "", long$value))
或在整形过程中,用 或 :data.table
tidyr
# data.table
long <- melt(setDT(wide),
id.vars = c("Code","Country"),
variable.name = "year")[, value := as.numeric(gsub(",", "", value))]
# tidyr
long <- wide %>%
pivot_longer(
cols = `1950`:`1954`,
names_to = "year",
values_to = "value",
values_transform = ~ as.numeric(gsub(",", "", .x))
)
数据:
wide <- read.table(text="Code Country 1950 1951 1952 1953 1954
AFG Afghanistan 20,249 21,352 22,532 23,557 24,555
ALB Albania 8,097 8,986 10,058 11,123 12,246", header=TRUE, check.names=FALSE)
评论
id
time
melt
id.vars
measure.vars
-c(var1, var2)
gather
pivot_longer
pivot_longer()
pivot_wider()
spread()
gather()
下面是另一个示例,显示了 from 的用法。您可以通过单独删除列(就像我在这里所做的那样)或显式包含您想要的年份来选择列。gather
tidyr
gather
请注意,为了处理逗号(如果未设置,则添加 X),我还使用 'mutate with from 将文本值转换回数字。这些都是 的一部分,因此可以与check.names = FALSE
dplyr
parse_number
readr
tidyverse
library(tidyverse)
wide %>%
gather(Year, Value, -Code, -Country) %>%
mutate(Year = parse_number(Year)
, Value = parse_number(Value))
返回:
Code Country Year Value
1 AFG Afghanistan 1950 20249
2 ALB Albania 1950 8097
3 AFG Afghanistan 1951 21352
4 ALB Albania 1951 8986
5 AFG Afghanistan 1952 22532
6 ALB Albania 1952 10058
7 AFG Afghanistan 1953 23557
8 ALB Albania 1953 11123
9 AFG Afghanistan 1954 24555
10 ALB Albania 1954 12246
由于这个答案被标记为 r-faq,我觉得分享 base R 的另一个替代方案会很有用:.stack
但是请注意,这不适用于 s——它仅在 is 时有效,并且从 的文档中我们发现:stack
factor
is.vector
TRUE
is.vector
is.vector
如果 x 是指定模式的向量,则返回该向量,该向量除名称外没有其他属性。否则返回。TRUE
FALSE
我使用的是 @Jaap 答案中的示例数据,其中年份列中的值为 s。factor
方法如下:stack
cbind(wide[1:2], stack(lapply(wide[-c(1, 2)], as.character)))
## Code Country values ind
## 1 AFG Afghanistan 20,249 1950
## 2 ALB Albania 8,097 1950
## 3 AFG Afghanistan 21,352 1951
## 4 ALB Albania 8,986 1951
## 5 AFG Afghanistan 22,532 1952
## 6 ALB Albania 10,058 1952
## 7 AFG Afghanistan 23,557 1953
## 8 ALB Albania 11,123 1953
## 9 AFG Afghanistan 24,555 1954
## 10 ALB Albania 12,246 1954
评论
下面是一个 sqldf 解决方案:Here's a sqldf solution:
sqldf("Select Code, Country, '1950' As Year, `1950` As Value From wide
Union All
Select Code, Country, '1951' As Year, `1951` As Value From wide
Union All
Select Code, Country, '1952' As Year, `1952` As Value From wide
Union All
Select Code, Country, '1953' As Year, `1953` As Value From wide
Union All
Select Code, Country, '1954' As Year, `1954` As Value From wide;")
若要在不键入所有内容的情况下进行查询,可以使用以下命令:
感谢 G. Grothendieck 的实施。
ValCol <- tail(names(wide), -2)
s <- sprintf("Select Code, Country, '%s' As Year, `%s` As Value from wide", ValCol, ValCol)
mquery <- paste(s, collapse = "\n Union All\n")
cat(mquery) #just to show the query
#> Select Code, Country, '1950' As Year, `1950` As Value from wide
#> Union All
#> Select Code, Country, '1951' As Year, `1951` As Value from wide
#> Union All
#> Select Code, Country, '1952' As Year, `1952` As Value from wide
#> Union All
#> Select Code, Country, '1953' As Year, `1953` As Value from wide
#> Union All
#> Select Code, Country, '1954' As Year, `1954` As Value from wide
sqldf(mquery)
#> Code Country Year Value
#> 1 AFG Afghanistan 1950 20,249
#> 2 ALB Albania 1950 8,097
#> 3 AFG Afghanistan 1951 21,352
#> 4 ALB Albania 1951 8,986
#> 5 AFG Afghanistan 1952 22,532
#> 6 ALB Albania 1952 10,058
#> 7 AFG Afghanistan 1953 23,557
#> 8 ALB Albania 1953 11,123
#> 9 AFG Afghanistan 1954 24,555
#> 10 ALB Albania 1954 12,246
不幸的是,我不这么认为,并且会为 .如果你想以更复杂的方式写下你的查询,你也可以看看这些帖子:PIVOT
UNPIVOT
R
SQLite
对于 ,另一种选择是tidyr_1.0.0
pivot_longer
library(tidyr)
pivot_longer(df1, -c(Code, Country), values_to = "Value", names_to = "Year")
# A tibble: 10 x 4
# Code Country Year Value
# <fct> <fct> <chr> <fct>
# 1 AFG Afghanistan 1950 20,249
# 2 AFG Afghanistan 1951 21,352
# 3 AFG Afghanistan 1952 22,532
# 4 AFG Afghanistan 1953 23,557
# 5 AFG Afghanistan 1954 24,555
# 6 ALB Albania 1950 8,097
# 7 ALB Albania 1951 8,986
# 8 ALB Albania 1952 10,058
# 9 ALB Albania 1953 11,123
#10 ALB Albania 1954 12,246
数据
df1 <- structure(list(Code = structure(1:2, .Label = c("AFG", "ALB"), class = "factor"),
Country = structure(1:2, .Label = c("Afghanistan", "Albania"
), class = "factor"), `1950` = structure(1:2, .Label = c("20,249",
"8,097"), class = "factor"), `1951` = structure(1:2, .Label = c("21,352",
"8,986"), class = "factor"), `1952` = structure(2:1, .Label = c("10,058",
"22,532"), class = "factor"), `1953` = structure(2:1, .Label = c("11,123",
"23,557"), class = "factor"), `1954` = structure(2:1, .Label = c("12,246",
"24,555"), class = "factor")), class = "data.frame", row.names = c(NA,
-2L))
评论
-c(...)
您还可以使用包,它使用(转换)控制表的概念:cdata
# data
wide <- read.table(text="Code Country 1950 1951 1952 1953 1954
AFG Afghanistan 20,249 21,352 22,532 23,557 24,555
ALB Albania 8,097 8,986 10,058 11,123 12,246", header=TRUE, check.names=FALSE)
library(cdata)
# build control table
drec <- data.frame(
Year=as.character(1950:1954),
Value=as.character(1950:1954),
stringsAsFactors=FALSE
)
drec <- cdata::rowrecs_to_blocks_spec(drec, recordKeys=c("Code", "Country"))
# apply control table
cdata::layout_by(drec, wide)
我目前正在探索该软件包,发现它非常容易访问。它专为更复杂的转换而设计,包括反向转换。有一个可用的教程。
评论
tidyr
gather
spread
pivot_*