reset password
Author Message
lmann2
Posts: 156
Posted 11:18 Jan 30, 2016 |

Sorry I was sick last week, but it's unclear to me which data set from 'adult' we should be using.  Are we using adult.data or adult.test? 

 

vsluong4
Posts: 87
Posted 11:22 Jan 30, 2016 |

"If you choose the first data set, crx.data has the data in csv format. Save it as a csv file. If you choose the second, adult.data has the data, save it as a csv file. The adult.names has information about the set. "

lmann2
Posts: 156
Posted 12:00 Jan 30, 2016 |

Cool, missed that line of data.

 

Can you disambiguate these two steps:

2 pts Copy all columns with empty values and replace all empty numeric fields with the average of its column in this new column.

2 pts Convert non-numeric columns to columns with an integer representations. 

rkmx52
Posts: 23
Posted 12:25 Jan 30, 2016 |

I feel that this step:

2 pts Copy all columns with empty values and replace all empty numeric fields with the average of its column in this new column.

Doesn't apply to the second dataset or adult.data. All the empty values in that dataset belong to non-numeric columns.

In that scenario, I am not sure how we are suppose to proceed at this step. I simply skipped it for the time being and eventually I converted the non-numeric empty values to an integer representation and replaced it by the average (integer) of its column.

lmann2
Posts: 156
Posted 13:20 Jan 30, 2016 |
rkmx52 wrote:

I feel that this step:

2 pts Copy all columns with empty values and replace all empty numeric fields with the average of its column in this new column.

Doesn't apply to the second dataset or adult.data. All the empty values in that dataset belong to non-numeric columns.

I applied the isnull function to each column using a lamada function to show that none of the cells are null.  

In that scenario, I am not sure how we are suppose to proceed at this step. I simply skipped it for the time being and eventually I converted the non-numeric empty values to an integer representation and replaced it by the average (integer) of its column.

I guess this still doesn't make sense to me.  In adult.data there will be cells with '?' values that I assume the instructor is referring.  One non-numeric column this occurs in is native-country.  The average of what are we replacing for '?' ?  The average of a colmun of numbers makes sense (1+2+3+4)/4 right but the average of a column of countries should at the very least be by its frequency, right?  I guess I still really don't understand these two steps at all.  I was hoping he clarified this in class, but maybe not. 

 

Last edited by lmann2 at 13:22 Jan 30, 2016.
lmann2
Posts: 156
Posted 16:27 Jan 30, 2016 |

Without a response it makes this assignment impossible to complete just for the record. 

lmann2
Posts: 156
Posted 17:01 Jan 30, 2016 |

Want to stress again that this step unless the pandas library has a function I haven't found doesn't make sense:

Convert non-numeric columns to columns with an integer representations. 

Does that mean you want us to write a dictionary and convert each unique string a numerical value (this is a lot of work)?  Does this mean you simple want to change stings to integers?  What does this mean??????????????????????///