我的想法是用随机森林对titanic的人员生还情况做分类判别
数据来源
Kaggle-Titanic: Machine Learning from Disaster
head(titanic)
PassengerId Survived Pclass Sex Age SibSp Parch Fare Embarked
1 1 0 3 male 22 1 0 7.2500 S
2 2 1 1 female 38 1 0 71.2833 C
3 3 1 3 female 26 0 0 7.9250 S
4 4 1 1 female 35 1 0 53.1000 S
5 5 0 3 male 35 0 0 8.0500 S
6 6 0 3 male NA 0 0 8.4583 Q
然后我调用randomForest包的randomFoerst()函数建模,R直接报错
Error in na.fail.default(list(Survived = c(1L, 2L, 2L, 2L, 1L, 1L, 1L, :
missing values in object
#数据准备,训练集70%,测试集30%
titanic<-read.csv("train.csv",sep=",")
titanic$Survived <-factor(titanic$Survived)
set.seed(1234)
index <- sample(2,nrow(train),replace=TRUE,prob = c(0.7,0.3))
titanic_train<- titanic[index==1,]
titanic_test <- titanic[index==2,]
#调用randomforest包
library(randomForest)
titanic_rf <- randomForest(Survived~Pclass+Sex+Age+SibSp+Parch,
data=titanic_train)
Error in na.fail.default(list(Survived = c(1L, 2L, 2L, 2L, 1L, 1L, 1L, :
missing values in object
求助:请问有同学也遇到过这种情况吗?
=======================================================================
更新于2017年6月4日14:21
之前忽略了数据集中缺失值的处理,出现了这种低级错误 :cry: :cry:
后面采用对应列的中位数来替换缺失值
titanic_rf <- randomForest(Survived~Pclass+Sex+Age+SibSp+Parch,
data=titanic_train,
na.action = na.roughfix)