最近才出现的问题。
即使是使用之前已经运行成功的代码,用read_html读取网页的时候还是会出现乱码。
借用另一个帖子的当当网址。尝试过使用utf8字符集,虽然明显不是的。
library(rvest)
library(dplyr)
read_html(url,encoding = "gbk")%>%html_nodes('.name')%>%html_text
sessionInfo如下,试过调整为Sys.setlocale("LC_ALL","english")但是没有变化。
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
谢谢指点下。