Author | Message |
---|---|
venny
Posts: 61
|
Posted 17:26 Oct 19, 2016 |
Anyone figure out how to use OneHotEncoder? The only part I can understand from the documentation of it is that .fit of it takes in a list of lists with the outer list being the rows of the data set and the inner list holds data for each feature/column. I'm testing the OneHotEncoder with the iris set right now using the label column. print(y)
0 0
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
24 0
25 0
26 0
27 0
28 0
29 0
..
120 2
121 2
122 2
123 2
124 2
125 2
126 2
127 2
128 2
129 2
130 2
131 2
132 2
133 2
134 2
135 2
136 2
137 2
138 2
139 2
140 2
141 2
142 2
143 2
144 2
145 2
146 2
147 2
148 2
149 2
Name: label, dtype: int64
So I do this code
from sklearn.preprocessing import OneHotEncoder enc.fit(y) enc.n_values_ #should be [3], cause that's how many different values there are for that list
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
I'm so confused right now on how to get this working
|
dle35
Posts: 22
|
Posted 17:32 Oct 19, 2016 |
As i understand you need to make them in separate list like if The Gender Role will transform Female Male 0 1 1 0 1 0 something in that format Last edited by dle35 at
17:33 Oct 19, 2016.
|
dbravoru
Posts: 60
|
Posted 17:55 Oct 19, 2016 |
Think of the Encoder as a binary (or trinary if three possible values, or quad-ry if four) Only ONE of the respective columns can be 1 at one time.
|
kaancalstatela
Posts: 52
|
Posted 17:55 Oct 19, 2016 |
This goes a little bit into Python function syntax and how to effectively use the pandas dataframe methods. You can write your onehotencoder in a very simple, numerical categorization manner(hence the "binary encoding") and define the function in a way that you check for categories and assign '1' or '0' to them. Now here's the tricky part. When you apply the onehotencoder to your features to binary encode them, the 'apply' method you use is a pandas dataframe method. It can pass an argument to a function inside it's argument call. The pandas documentation have some cool examples using lambda functions. Also check out how to call arguments in python with an argument list. |
dbravoru
Posts: 60
|
Posted 19:25 Oct 19, 2016 |
You can also just do it manually instead of trying to get the encoder to work. I found it easier to just do it myself |