提问人:Jorge Martínez 提问时间:7/3/2020 最后编辑:Jorge Martínez 更新时间:7/5/2020 访问量:45
尝试解析网页,下标越界
Trying to parse webpage, subscript out of bounds
问:
我正在尝试从网页 coches.net(购买汽车的页面)中提取信息,但我在浏览时发现的一些代码有问题。只是为了澄清这一点,我没有编码经验,所以我迷路了。我尝试了几件事,但无法让它工作。
R 给我的错误消息是这样的:。翻译过来的意思是“下标越界。Error in str_split(string = titulo, pattern = " ")[[1]] : subíndice fuera de los límites
寻找我在这里找到的解决方案:https://stackoverrun.com/es/q/4074347 问题与我的表为我正在下载的信息创建的行/列数有关。但是,我想不出解决方案。
完整的代码是这样的:(编辑 V1,去掉“Marca”后)
start <- Sys.time()
list.of.packages <- c("tidyverse", "rvest", "httr")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)>0) {install.packages(new.packages)}
library(tidyverse)
library(rvest)
library(httr)
desktop_agents <- c('Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/602.2.14 (KHTML, like Gecko) Version/10.0.1 Safari/602.2.14',
'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.98 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36',
'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36',
'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0')
line <- data.frame("Titulo", "Precio", "Provincia", "Motor", "Año", "Kilometros", "Fecha subida","Link")
write.table(line, file = ruta, sep = ",", append = TRUE, quote = TRUE, col.names = FALSE, row.names = FALSE, na = "")
for (counter in (1:paginas)) {
url <- paste0("https://www.coches.net/segunda-mano/?pg=", as.character(counter))
print(url)
x <- GET(url, add_headers('user-agent' = desktop_agents[sample(1:10, 1)]))
bloque <- x %>% read_html() %>% html_nodes(".mt-Card-body")
for (p in (1:length(bloque))) {
titulo <- bloque[p] %>% html_nodes(".mt-CardAd-title .mt-CardAd-titleHiglight") %>% html_text()
precio <- bloque[p] %>% html_nodes(".mt-CardAd-price .mt-CardAd-titleHiglight") %>% html_text()
precio <- str_replace(string = precio, pattern = " €", replacement = "")
precio <- str_replace(string = precio, pattern = "\\.", replacement = "")
precio <- as.numeric(precio)
info <- bloque[p] %>% html_nodes(".mt-CardAd-attribute") %>% html_text()
prov <- info[1]
motor <- info[2]
año <- info[3]
km <- info[4]
km <- str_replace(string = km, pattern = "\\.", replacement = "")
km <- as.numeric((str_replace(string = km, pattern = " km", replacement = "")))
fechasubida <- bloque[p] %>% html_nodes(".mt-CardAdDate-time") %>% html_text()
link <- bloque[p] %>% html_nodes(".mt-CardAd-link") %>% html_attr(name = "href")
link <- paste0("https://www.coches.net", link[1])
print(paste(titulo, precio, prov, motor, año, km, fechasubida, link))
line <- data.frame(titulo, precio, prov, motor, año, km, fechasubida, link)
write.table(line, file = ruta, sep = ",", append = TRUE, quote = TRUE, col.names = FALSE, row.names = FALSE, na = "")
}
}
end <- Sys.time()
diff <- end - start
print(paste("Cochisto ha descargado el 100% de los anuncios en", diff))
}
将不胜感激。
答: 暂无答案
评论
Error in data.frame(titulo, precio, prov, motor, año, km, fechasubida, : arguments imply differing number of rows: 0, 1
Error durante el wrapup: regular expression is invalid UTF-8