reset password
Author Message
ljuster2
Posts: 19
Posted 19:37 Jan 28, 2015 |

I am trying to find out the indices that have null values for Age in my test data and I am using pandas to read in my data

In the test.csv file, the ages are not whole integers like they are in the train.csv file

Ex.: test.csv age '23.5'   or '67.0' while train.csv has '23' or '67'

When I run the following line:

test_data.isnull(test_data.Age).astype(int)

I get the error: TypeError: isnull() takes exactly 1 argument (2 given)

I do NOT get this error when I run the same line with my training data.

I couldn't figure out why, any thoughts?

 

Thanks

 

 

ljuster2
Posts: 19
Posted 08:31 Jan 29, 2015 |

Found my own silly error.

I imported pandas as 'pd'

 

test_data['AgeIsNull'] = pd.isnull(td.Age).astype(int)

 

NOT

 

test_data['AgeIsNull'] = test_data.isnull(td.Age).astype(int)

ljuster2
Posts: 19
Posted 12:56 Jan 30, 2015 |

Question on the extra credit: How are we supposed to use the test set on the model if the test set does not include information on whether the test examples survived or not?

msargent
Posts: 519
Posted 13:20 Jan 30, 2015 |

Good question. Try this: divide the training set up into 2 sections: train on one, test on the other.