reset password
Author Message
ykim85
Posts: 26
Posted 14:24 Apr 18, 2018 |

in our homework, after normalization, i'm getting negative values... could someone describe.  if this is what other people are getting?

anguyen8613
Posts: 4
Posted 14:48 Apr 18, 2018 |

I do not think you should get negative values.  I used the following code snippet to normalize and didn't get any negative values.

from sklearn import preprocessing
x = df_data.values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
df_data = pd.DataFrame(x_scaled)

 

Hope this helps!

ykim85
Posts: 26
Posted 15:03 Apr 18, 2018 |

thanks, its helpful to know this.  yes ++

anguyen8613
Posts: 4
Posted 15:15 Apr 18, 2018 |

No problem!  Can you please let me know your accuracy when you get it? I have doubts about my result. 

Thanks!

ykim85
Posts: 26
Posted 15:29 Apr 18, 2018 |

sure.  I've been getting weird ones, maybe because of the normalization with negative values

lakerfan94
Posts: 143
Posted 16:43 Apr 18, 2018 |

@ANGUYEN8613 I used the same thing for normalization. When I fed that normalized dataset into SVC, did you get an accuracy of 0.07?

anguyen8613
Posts: 4
Posted 16:53 Apr 18, 2018 |

@LAKERFAN94 yes, I got an accuracy of 0.08.  I am not sure what is wrong.  My first thought was to normalize the label as well, but that doesn't make much sense.  

lakerfan94
Posts: 143
Posted 16:57 Apr 18, 2018 |

Yeah, I don't know what's up with the low accuracy, unless it's expected. I have no idea. I followed exactly what was said in the homework description. Also, I used the training data that you get in part D from doing the split. For some reason, some people were telling me that they used the training dataset that was generated by PCA. I don't think you're supposed to do that because the homework description clearly states to use the training and testing sets from part D on SVC.

Last edited by lakerfan94 at 17:04 Apr 18, 2018.
anguyen8613
Posts: 4
Posted 17:08 Apr 18, 2018 |

I think he made a typo.  You should be using the training and testing set from part e, otherwise there is no reason to use PCA.  

Let me know your results after running with part e.

Thanks!

dpadilla24
Posts: 7
Posted 20:40 Apr 18, 2018 |
lakerfan94 wrote:

Yeah, I don't know what's up with the low accuracy, unless it's expected. I have no idea. I followed exactly what was said in the homework description. Also, I used the training data that you get in part D from doing the split. For some reason, some people were telling me that they used the training dataset that was generated by PCA. I don't think you're supposed to do that because the homework description clearly states to use the training and testing sets from part D on SVC.

Had that problem with MinMaxScaler. Then read that StandardScaler is popular with SVM (likes -1<x<1 scale instead of 0<x<1) . Worked for me and got 0.86.

 

The PCA data split is used for finding the best C

lakerfan94
Posts: 143
Posted 21:28 Apr 18, 2018 |
dpadilla24 wrote:
lakerfan94 wrote:

Yeah, I don't know what's up with the low accuracy, unless it's expected. I have no idea. I followed exactly what was said in the homework description. Also, I used the training data that you get in part D from doing the split. For some reason, some people were telling me that they used the training dataset that was generated by PCA. I don't think you're supposed to do that because the homework description clearly states to use the training and testing sets from part D on SVC.

Had that problem with MinMaxScaler. Then read that StandardScaler is popular with SVM (likes -1<x<1 scale instead of 0<x<1) . Worked for me and got 0.86.

 

The PCA data split is used for finding the best C

Okay. I switched to the StandardScaler and I got the same accuracy for SVC. So for the GridSearch, I'm guessing that we fit the grid on the training dataset generated by PCA?

Last edited by lakerfan94 at 21:28 Apr 18, 2018.
dpadilla24
Posts: 7
Posted 22:13 Apr 18, 2018 |
lakerfan94 wrote:
dpadilla24 wrote:
lakerfan94 wrote:

Yeah, I don't know what's up with the low accuracy, unless it's expected. I have no idea. I followed exactly what was said in the homework description. Also, I used the training data that you get in part D from doing the split. For some reason, some people were telling me that they used the training dataset that was generated by PCA. I don't think you're supposed to do that because the homework description clearly states to use the training and testing sets from part D on SVC.

Had that problem with MinMaxScaler. Then read that StandardScaler is popular with SVM (likes -1<x<1 scale instead of 0<x<1) . Worked for me and got 0.86.

 

The PCA data split is used for finding the best C

Okay. I switched to the StandardScaler and I got the same accuracy for SVC. So for the GridSearch, I'm guessing that we fit the grid on the training dataset generated by PCA?

Yup that's it.

Another hint: capital C

lakerfan94
Posts: 143
Posted 22:17 Apr 18, 2018 |
dpadilla24 wrote:
lakerfan94 wrote:
dpadilla24 wrote:
lakerfan94 wrote:

Yeah, I don't know what's up with the low accuracy, unless it's expected. I have no idea. I followed exactly what was said in the homework description. Also, I used the training data that you get in part D from doing the split. For some reason, some people were telling me that they used the training dataset that was generated by PCA. I don't think you're supposed to do that because the homework description clearly states to use the training and testing sets from part D on SVC.

Had that problem with MinMaxScaler. Then read that StandardScaler is popular with SVM (likes -1<x<1 scale instead of 0<x<1) . Worked for me and got 0.86.

 

The PCA data split is used for finding the best C

Okay. I switched to the StandardScaler and I got the same accuracy for SVC. So for the GridSearch, I'm guessing that we fit the grid on the training dataset generated by PCA?

Yup that's it.

Another hint: capital C

Alright cool. For the fitting, I'm guessing we use the training label vector created in part D?

dpadilla24
Posts: 7
Posted 15:16 Apr 19, 2018 |

FYI there's an email clarifying most of this.

- "train/test SVM after dimensionality reduction"

- recommend using preprocessing.scale for normalization

ykim85
Posts: 26
Posted 15:38 Apr 20, 2018 |
anguyen8613 wrote:

No problem!  Can you please let me know your accuracy when you get it? I have doubts about my result. 

Thanks!

hey i've been getting 91 or 96.  grid search accuracy