Author | Message |
---|---|
ykim85
Posts: 26
|
Posted 14:24 Apr 18, 2018 |
in our homework, after normalization, i'm getting negative values... could someone describe. if this is what other people are getting? |
anguyen8613
Posts: 4
|
Posted 14:48 Apr 18, 2018 |
I do not think you should get negative values. I used the following code snippet to normalize and didn't get any negative values. from sklearn import preprocessing
Hope this helps! |
ykim85
Posts: 26
|
Posted 15:03 Apr 18, 2018 |
thanks, its helpful to know this. ++ |
anguyen8613
Posts: 4
|
Posted 15:15 Apr 18, 2018 |
No problem! Can you please let me know your accuracy when you get it? I have doubts about my result. Thanks! |
ykim85
Posts: 26
|
Posted 15:29 Apr 18, 2018 |
sure. I've been getting weird ones, maybe because of the normalization with negative values |
lakerfan94
Posts: 143
|
Posted 16:43 Apr 18, 2018 |
@ANGUYEN8613 I used the same thing for normalization. When I fed that normalized dataset into SVC, did you get an accuracy of 0.07? |
anguyen8613
Posts: 4
|
Posted 16:53 Apr 18, 2018 |
@LAKERFAN94 yes, I got an accuracy of 0.08. I am not sure what is wrong. My first thought was to normalize the label as well, but that doesn't make much sense. |
lakerfan94
Posts: 143
|
Posted 16:57 Apr 18, 2018 |
Yeah, I don't know what's up with the low accuracy, unless it's expected. I have no idea. I followed exactly what was said in the homework description. Also, I used the training data that you get in part D from doing the split. For some reason, some people were telling me that they used the training dataset that was generated by PCA. I don't think you're supposed to do that because the homework description clearly states to use the training and testing sets from part D on SVC. Last edited by lakerfan94 at
17:04 Apr 18, 2018.
|
anguyen8613
Posts: 4
|
Posted 17:08 Apr 18, 2018 |
I think he made a typo. You should be using the training and testing set from part e, otherwise there is no reason to use PCA. Let me know your results after running with part e. Thanks! |
dpadilla24
Posts: 7
|
Posted 20:40 Apr 18, 2018 |
Had that problem with MinMaxScaler. Then read that StandardScaler is popular with SVM (likes -1<x<1 scale instead of 0<x<1) . Worked for me and got 0.86.
The PCA data split is used for finding the best C |
lakerfan94
Posts: 143
|
Posted 21:28 Apr 18, 2018 |
Okay. I switched to the StandardScaler and I got the same accuracy for SVC. So for the GridSearch, I'm guessing that we fit the grid on the training dataset generated by PCA? Last edited by lakerfan94 at
21:28 Apr 18, 2018.
|
dpadilla24
Posts: 7
|
Posted 22:13 Apr 18, 2018 |
Yup that's it. Another hint: capital C |
lakerfan94
Posts: 143
|
Posted 22:17 Apr 18, 2018 |
Alright cool. For the fitting, I'm guessing we use the training label vector created in part D? |
dpadilla24
Posts: 7
|
Posted 15:16 Apr 19, 2018 |
FYI there's an email clarifying most of this. - "train/test SVM after dimensionality reduction" - recommend using preprocessing.scale for normalization |
ykim85
Posts: 26
|
Posted 15:38 Apr 20, 2018 |
hey i've been getting 91 or 96. grid search accuracy |