提问人:Carlos Garibotto 提问时间:6/28/2022 更新时间:6/28/2022 访问量:109
我需要创建一个地图(或循环)函数,以便能够从一个日期列表中保存数据
I need to create a map (or loop) function to be able to save data from one list of dates
问:
我根据此链接提供的信息创建了以下脚本: 使用 javascript 提取数据 URL(php 中的表)
code
library(httr) library(rvest) library(janitor) library(dplyr) library(purrr)
headers <- c("Content-Type" = "application/x-www-form-urlencoded; charset=UTF-8")
data <- "vid_tipo=1&vprod=&vvari=&vfecha=22/06/2022"
for (i in seq_along(fechas)) {
r <- httr::POST(
url = "http://old.emmsa.com.pe/emmsa_spv/app/reportes/ajax/rpt07_gettable.php",
httr::add_headers(.headers = headers),
body = data
)
t <- content(r) %>%
html_element(".timecard") %>%
html_table() %>%
row_to_names(1) %>%
clean_names() %>%
dplyr::filter(producto != "") %>%
mutate_at(vars(matches("precio")), as.numeric) %>%
as_tibble() -> precios
timestamp <- 1:seq_along(i)
filename <- paste0("c:/Users/.../Desktop/data/precios_",timestamp,".rds")
saveRDS(precios, file = filename)
}```
My problem is that this sequence that I have created by looking at other links in this page has not allowed me to obtain the following results:
1.-Scrape the page according to the sequence of dates;
2.- include the date in the file name such as "data_22-06-2022";
3.- I don't know how to link the date of the variable
`data <- "vid_tipo=1&vprod=&vvari=&vfecha=22/06/2022"`
with the sequence of one file for each date;
4.- Any improvements to the file download and save function are welcome.
Thank you
答:
1赞
Chamkrai
6/28/2022
#1
最大的问题可能是日期格式。在这里,我曾经保存了多个文件。map
.RData
library(tidyverse)
library(lubridate)
library(janitor)
library(httr)
将 和 更改为您喜欢的任何内容from_date
to_date
"22/06/2022" %>%
as.Date(format = "%d/%m/%Y") -> from_date
"26/06/2022" %>%
as.Date(format = "%d/%m/%Y") -> to_date
dates_formatted <- seq(from_date, to_date, by = "day") %>%
format("%d/%m/%Y")
[1] "22/06/2022" "23/06/2022" "24/06/2022" "25/06/2022" "26/06/2022"
创建一个函数来获取一个日期的数据框
get_df <- function(the_date) {
headers <-
c("Content-Type" = "application/x-www-form-urlencoded; charset=UTF-8")
data <- paste0("vid_tipo=1&vprod=&vvari=&vfecha=", the_date)
r <-
httr::POST(url = "http://old.emmsa.com.pe/emmsa_spv/app/reportes/ajax/rpt07_gettable.php",
httr::add_headers(.headers = headers),
body = data)
df <- content(r) %>%
html_element(".timecard") %>%
html_table() %>%
row_to_names(1) %>%
clean_names() %>%
dplyr::filter(producto != "") %>%
mutate_at(vars(matches("precio")), as.numeric) %>%
as_tibble()
save(df, file = paste0("precios_", the_date %>%
str_replace_all(pattern = "/",
replacement = "_") %>%
paste0("data_", .), ".Rdata"))
}
# A tibble: 144 x 5
producto variedad precio_min precio_max precio_prom
<chr> <chr> <dbl> <dbl> <dbl>
1 ACELGA ACELGA 3.5 4 3.75
2 AJI AJI AMARILLO SECO 13 14 13.4
3 AJI AJI ESCABECHE FRESCO/ZANAHOR/LISO 1.5 2.7 2.18
4 AJI AJI MONTANA/CHAN(COSTA/SELVA) 5 8 6.5
5 AJI AJI SECO PANCA 18 20 18.8
6 AJI AJI ROCOTO (COSTA/SIERRA/SELVA) 9.44 11.1 10.3
7 AJI PAPRIKA 13 14 13.5
8 AJO AJO PELADO 5.5 7.5 6.63
9 AJO AJO CRIOLLO O NAPURI 6 8 6.88
10 AJO AJO MORADO/BARRAN/LEGIT/OTROS 6.5 8 7.25
# ... with 134 more rows
绘制日期地图
map(dates_formatted, get_df)
library(tidyverse)
library(lubridate)
library(janitor)
library(httr)
"22/06/2022" %>%
as.Date(format = "%d/%m/%Y") -> from_date
"26/06/2022" %>%
as.Date(format = "%d/%m/%Y") -> to_date
dates_formatted <- seq(from_date, to_date, by = "day") %>%
format("%d/%m/%Y")
get_df <- function(the_date) {
headers <-
c("Content-Type" = "application/x-www-form-urlencoded; charset=UTF-8")
data <- paste0("vid_tipo=1&vprod=&vvari=&vfecha=", the_date)
r <-
httr::POST(url = "http://old.emmsa.com.pe/emmsa_spv/app/reportes/ajax/rpt07_gettable.php",
httr::add_headers(.headers = headers),
body = data)
df <- content(r) %>%
html_element(".timecard") %>%
html_table() %>%
row_to_names(1) %>%
clean_names() %>%
dplyr::filter(producto != "") %>%
mutate_at(vars(matches("precio")), as.numeric) %>%
as_tibble()
save(df, file = paste0("precios_", the_date %>%
str_replace_all(pattern = "/",
replacement = "_") %>%
paste0("data_", .), ".Rdata"))
}
map(dates_formatted, get_df)
评论
0赞
Carlos Garibotto
6/28/2022
乔尔,我认为您回答我的脚本中尚未定义日期变量,这导致当我按块运行函数时,我替换了 ''' 标头 <- c(“Content-Type” = “application/x-www-form-urlencoded;charset=UTF-8“) data <- paste0(”vid_type=1&vprod=&vvari=&vdate=“, the_date)''' by ,如果它运行并输出 df 作为环境输出表。但是,如果我在脚本末尾继续执行相同的替换,则结果表将不再另存为我的工作目录中的文件。谢谢你的帮助dates_formatted
1赞
Chamkrai
6/28/2022
@CarlosGaribotto我粘贴了可以一次性运行的整个代码。你能试试吗?代码对我来说工作正常,所有数据都正确保存。
0赞
Carlos Garibotto
6/28/2022
我把放在get_df功能之前。在脚本的最后一个块中,替换的结果是 gzfile(file, “wb”) 中的错误:无效的“description”参数 此外: 警告消息:In if (!nzchar(file)) stop(“'file' must be non-empty string”) :条件长度> 1,并且仅使用第一个元素 此外,以前我运行了所有代码,但文件没有保存在我的工作目录中 使用 w10 RStudio Rthe_date <- dates_formatted
the_date by dates_formatted
1赞
Chamkrai
6/28/2022
请不要那样做。 在函数中使用,不应馈入其本身。date_formatted
map
get_df()
0赞
Carlos Garibotto
6/28/2022
@tom_hoel .谢谢!!你是对的,代码必须一次性运行。请原谅导致我犯错的不信任。感谢您的帮助
评论