reset password
Author Message
venny
Posts: 61
Posted 17:26 Oct 19, 2016 |

Anyone figure out how to use OneHotEncoder?  The only part I can understand from the documentation of it is that .fit of it takes in a list of lists with the outer list being the rows of the data set and the inner list holds data for each feature/column.  

I'm testing the OneHotEncoder with the iris set right now using the label column.

print(y)
0      0
1      0
2      0
3      0
4      0
5      0
6      0
7      0
8      0
9      0
10     0
11     0
12     0
13     0
14     0
15     0
16     0
17     0
18     0
19     0
20     0
21     0
22     0
23     0
24     0
25     0
26     0
27     0
28     0
29     0
      ..
120    2
121    2
122    2
123    2
124    2
125    2
126    2
127    2
128    2
129    2
130    2
131    2
132    2
133    2
134    2
135    2
136    2
137    2
138    2
139    2
140    2
141    2
142    2
143    2
144    2
145    2
146    2
147    2
148    2
149    2
Name: label, dtype: int64

So I do this code 

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder();

enc.fit(y) 

enc.n_values_  #should be [3], cause that's how many different values there are for that list

But I get this result instead

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])

I'm so confused right now on how to get this working
dle35
Posts: 22
Posted 17:32 Oct 19, 2016 |

As i understand you need to make them in separate list like if The Gender Role will transform 

Female    Male

0             1

1             0

1             0

something in that format

Last edited by dle35 at 17:33 Oct 19, 2016.
dbravoru
Posts: 60
Posted 17:55 Oct 19, 2016 |

Think of the Encoder as a binary (or trinary if three possible values, or quad-ry if four)

Only ONE of the respective columns can be 1 at one time.

 

kaancalstatela
Posts: 52
Posted 17:55 Oct 19, 2016 |

This goes a little bit into Python function syntax and how to effectively use the pandas dataframe methods. You can write your onehotencoder in a very simple, numerical categorization manner(hence the "binary encoding") and define the function in a way that you check for categories and assign '1' or '0' to them.

Now here's the tricky part. When you apply the onehotencoder to your features to binary encode them, the 'apply' method you use is a pandas dataframe method. It can pass an argument to a function inside it's argument call. The pandas documentation have some cool examples using lambda functions. Also check out how to call arguments in python with an argument list.

dbravoru
Posts: 60
Posted 19:25 Oct 19, 2016 |

You can also just do it manually instead of trying to get the encoder to work. I found it easier to just do it myself