View Forum Topic

Author	Message
ljuster2 Posts: 19	Posted 19:37 Jan 28, 2015 \| I am trying to find out the indices that have null values for Age in my test data and I am using pandas to read in my data In the test.csv file, the ages are not whole integers like they are in the train.csv file Ex.: test.csv age '23.5' or '67.0' while train.csv has '23' or '67' When I run the following line: test_data.isnull(test_data.Age).astype(int) I get the error: TypeError: isnull() takes exactly 1 argument (2 given) I do NOT get this error when I run the same line with my training data. I couldn't figure out why, any thoughts? Thanks
ljuster2 Posts: 19	Posted 08:31 Jan 29, 2015 \| Found my own silly error. I imported pandas as 'pd' test_data['AgeIsNull'] = pd.isnull(td.Age).astype(int) NOT test_data['AgeIsNull'] = test_data.isnull(td.Age).astype(int)
ljuster2 Posts: 19	Posted 12:56 Jan 30, 2015 \| Question on the extra credit: How are we supposed to use the test set on the model if the test set does not include information on whether the test examples survived or not?
msargent Posts: 519	Posted 13:20 Jan 30, 2015 \| Good question. Try this: divide the training set up into 2 sections: train on one, test on the other.