我需要创建一个地图(或循环)函数,以便能够从一个日期列表中保存数据

I need to create a map (or loop) function to be able to save data from one list of dates

提问人:Carlos Garibotto 提问时间:6/28/2022 更新时间:6/28/2022 访问量:109

问:

我根据此链接提供的信息创建了以下脚本: 使用 javascript 提取数据 URL(php 中的表)

code library(httr) library(rvest) library(janitor) library(dplyr) library(purrr)


headers <- c("Content-Type" = "application/x-www-form-urlencoded; charset=UTF-8")

data <- "vid_tipo=1&vprod=&vvari=&vfecha=22/06/2022"

for (i in seq_along(fechas)) {
  
    r <- httr::POST(
    url = "http://old.emmsa.com.pe/emmsa_spv/app/reportes/ajax/rpt07_gettable.php",
    httr::add_headers(.headers = headers),
    body = data
  )
  
  t <- content(r) %>%
    html_element(".timecard") %>%
    html_table() %>%
    row_to_names(1) %>%
    clean_names() %>%
    dplyr::filter(producto != "") %>%
    mutate_at(vars(matches("precio")), as.numeric) %>% 
    as_tibble() -> precios
  
    timestamp <- 1:seq_along(i) 
    filename  <- paste0("c:/Users/.../Desktop/data/precios_",timestamp,".rds")
    saveRDS(precios, file = filename)
    }```

My problem is that this sequence that I have created by looking at other links in this page has not allowed me to obtain the following results:

1.-Scrape the page according to the sequence of dates;
2.- include the date in the file name such as "data_22-06-2022";
3.- I don't know how to link the date of the variable
 `data <- "vid_tipo=1&vprod=&vvari=&vfecha=22/06/2022"` 
with the sequence of one file for each date;
4.- Any improvements to the file download and save function are welcome.

Thank you

R 循环日期 下载 序列

评论

0赞 Carlos Garibotto 6/28/2022
dates (“fechas”) 变量位于标头行代码 ''' fechas <- list(seq(as.日期(“2022/6/22”),作为。Date(“2022/6/26”), “天”)''' @QHarr

答:

1赞 Chamkrai 6/28/2022 #1

最大的问题可能是日期格式。在这里,我曾经保存了多个文件。map.RData

library(tidyverse)
library(lubridate)
library(janitor)
library(httr)

将 和 更改为您喜欢的任何内容from_dateto_date

"22/06/2022" %>%
  as.Date(format = "%d/%m/%Y") -> from_date

"26/06/2022" %>%
  as.Date(format = "%d/%m/%Y") -> to_date

dates_formatted <- seq(from_date, to_date, by = "day") %>%
  format("%d/%m/%Y")

[1] "22/06/2022" "23/06/2022" "24/06/2022" "25/06/2022" "26/06/2022"

创建一个函数来获取一个日期的数据框

get_df <- function(the_date) {
  headers <-
    c("Content-Type" = "application/x-www-form-urlencoded; charset=UTF-8")
  data <- paste0("vid_tipo=1&vprod=&vvari=&vfecha=", the_date)
  
  r <-
    httr::POST(url = "http://old.emmsa.com.pe/emmsa_spv/app/reportes/ajax/rpt07_gettable.php",
               httr::add_headers(.headers = headers),
               body = data)
  
  df <- content(r) %>%
    html_element(".timecard") %>%
    html_table() %>%
    row_to_names(1) %>%
    clean_names() %>%
    dplyr::filter(producto != "") %>%
    mutate_at(vars(matches("precio")), as.numeric) %>%
    as_tibble()
  
  save(df, file = paste0("precios_", the_date %>% 
                              str_replace_all(pattern = "/", 
                                          replacement = "_") %>% 
                              paste0("data_", .), ".Rdata"))
}

# A tibble: 144 x 5
   producto variedad                          precio_min precio_max precio_prom
   <chr>    <chr>                                  <dbl>      <dbl>       <dbl>
 1 ACELGA   ACELGA                                  3.5         4          3.75
 2 AJI      AJI AMARILLO SECO                      13          14         13.4 
 3 AJI      AJI ESCABECHE FRESCO/ZANAHOR/LISO       1.5         2.7        2.18
 4 AJI      AJI MONTANA/CHAN(COSTA/SELVA)           5           8          6.5 
 5 AJI      AJI SECO PANCA                         18          20         18.8 
 6 AJI      AJI ROCOTO (COSTA/SIERRA/SELVA)         9.44       11.1       10.3 
 7 AJI      PAPRIKA                                13          14         13.5 
 8 AJO      AJO PELADO                              5.5         7.5        6.63
 9 AJO      AJO CRIOLLO O NAPURI                    6           8          6.88
10 AJO      AJO MORADO/BARRAN/LEGIT/OTROS           6.5         8          7.25
# ... with 134 more rows

绘制日期地图

map(dates_formatted, get_df)

enter image description here

library(tidyverse)
library(lubridate)
library(janitor)
library(httr)


"22/06/2022" %>%
  as.Date(format = "%d/%m/%Y") -> from_date

"26/06/2022" %>%
  as.Date(format = "%d/%m/%Y") -> to_date

dates_formatted <- seq(from_date, to_date, by = "day") %>%
  format("%d/%m/%Y")


get_df <- function(the_date) {
  headers <-
    c("Content-Type" = "application/x-www-form-urlencoded; charset=UTF-8")
  data <- paste0("vid_tipo=1&vprod=&vvari=&vfecha=", the_date)
  
  r <-
    httr::POST(url = "http://old.emmsa.com.pe/emmsa_spv/app/reportes/ajax/rpt07_gettable.php",
               httr::add_headers(.headers = headers),
               body = data)
  
  df <- content(r) %>%
    html_element(".timecard") %>%
    html_table() %>%
    row_to_names(1) %>%
    clean_names() %>%
    dplyr::filter(producto != "") %>%
    mutate_at(vars(matches("precio")), as.numeric) %>%
    as_tibble()
  
  save(df, file = paste0("precios_", the_date %>% 
                              str_replace_all(pattern = "/", 
                                          replacement = "_") %>% 
                              paste0("data_", .), ".Rdata"))
}

map(dates_formatted, get_df)

评论

0赞 Carlos Garibotto 6/28/2022
乔尔,我认为您回答我的脚本中尚未定义日期变量,这导致当我按块运行函数时,我替换了 ''' 标头 <- c(“Content-Type” = “application/x-www-form-urlencoded;charset=UTF-8“) data <- paste0(”vid_type=1&vprod=&vvari=&vdate=“, the_date)''' by ,如果它运行并输出 df 作为环境输出表。但是,如果我在脚本末尾继续执行相同的替换,则结果表将不再另存为我的工作目录中的文件。谢谢你的帮助dates_formatted
1赞 Chamkrai 6/28/2022
@CarlosGaribotto我粘贴了可以一次性运行的整个代码。你能试试吗?代码对我来说工作正常,所有数据都正确保存。
0赞 Carlos Garibotto 6/28/2022
我把放在get_df功能之前。在脚本的最后一个块中,替换的结果是 gzfile(file, “wb”) 中的错误:无效的“description”参数 此外: 警告消息:In if (!nzchar(file)) stop(“'file' must be non-empty string”) :条件长度> 1,并且仅使用第一个元素 此外,以前我运行了所有代码,但文件没有保存在我的工作目录中 使用 w10 RStudio Rthe_date <- dates_formattedthe_date by dates_formatted
1赞 Chamkrai 6/28/2022
请不要那样做。 在函数中使用,不应馈入其本身。date_formattedmapget_df()
0赞 Carlos Garibotto 6/28/2022
@tom_hoel .谢谢!!你是对的,代码必须一次性运行。请原谅导致我犯错的不信任。感谢您的帮助