reset password
Author Message
ljuster2
Posts: 19
Posted 05:01 Mar 03, 2015 |

My cost and sigmoid gradient return the numbers specified, and I have the following dimensions for my objects

X shape...
(5000L, 401L)
T1 shape...
(25L, 401L)
a2 shape....
(5000L, 26L)
T2 shape....
(10L, 26L)
H shape....
(5000L, 10L)
Y out shape...
(5000L, 10L)
backprop 1...
Sigmoid gradient....
(5000L, 25L)
Delta 2....
(26L, 25L)

HOWEVER, when I run gradient check, the dimensions don't match up

X shape...
(10L, 5L)
T1 shape...
(4L, 5L)
a2 shape....
(10L, 5L)
T2 shape....
(3L, 5L)
H shape....
(10L, 3L)
Y out shape...
(10L, 3L)

 

I think the problem starts with the dimensions of T2. I'm not sure if this is my code or how the numbers were initialized for grad_check and grad_approx.

 

any suggestions would be appreciated, thanks!

msargent
Posts: 519
Posted 08:25 Mar 03, 2015 |

It's hard to tell from here, but the grad_check function doesn't use the same number of input units, hidden layer units, or output units as the main problem,  to save time. The data going into the function is X[:10, :3], which is only the first 10 rows and 3 columns, etc. It's supposed to run the cost function on a smaller neural net using a subset of the data. The cost function will have to work on neural nets of any size with only one hidden layer. 

I'm not sure if this helps, but I will be on campus this Thursday for extra office hours. 

Last edited by msargent at 08:26 Mar 03, 2015.
ljuster2
Posts: 19
Posted 08:45 Mar 03, 2015 |
msargent wrote:

It's hard to tell from here, but the grad_check function doesn't use the same number of input units, hidden layer units, or output units as the main problem,  to save time. The data going into the function is X[:10, :3], which is only the first 10 rows and 3 columns, etc. It's supposed to run the cost function on a smaller neural net using a subset of the data. The cost function will have to work on neural nets of any size with only one hidden layer. 

I'm not sure if this helps, but I will be on campus this Thursday for extra office hours. 

Yeah, I printed Y[:3, :10] though, and it is 

[[10]
 [10]
 [10]]

Which means storing a "1" at index 9 of the labels, but there aren't that many labels?

msargent
Posts: 519
Posted 08:48 Mar 03, 2015 |

How about Y[:10, :3]?

ljuster2
Posts: 19
Posted 08:53 Mar 03, 2015 |

yep.

 

woops.

 

 

 

thanks.

msargent
Posts: 519
Posted 08:55 Mar 03, 2015 |

; @)

ljuster2
Posts: 19
Posted 09:20 Mar 03, 2015 |

No. I'm still not convinced.

 

print Y[:10, :3]

 

[[10]
 [10]
 [10]
 [10]
 [10]
 [10]
 [10]
 [10]
 [10]
 [10]]

But if there aren't 10 labels in the subset of the data, how are we supposed to store a 1 in the 9th index of the Y_out array?

msargent
Posts: 519
Posted 09:39 Mar 03, 2015 |

 

First, Y should end up being be a 10x3 matrix (10x 4 after a column of 1s are added). 

Second, The idea with this code:


    grad_check = costFunction(params_check[:35], 4, 4, 3, X[:10, :4], Y[:10, :3], lambd)[1]

    grad_approx =  gradApprox(params_check[:35], 4, 4, 3, X[:10, :4], Y[:10, :3], lambd)
    checkGradient = np.column_stack((grad_check, grad_approx))

is just to check if the gradients are close. The Y vector will sometimes not have a value (i.e., be all zeroes), as we are only looking at the first 3 indices of the Y vectors, and the first 3 indices of the prediction vectors. But that should be okay. This isn't to get a real result, it's just to see if the gradients are close if we plug in the same data into both functions. The whole point is just to check if the algorithm was implemented correctly. 

You can change the parameters to include all of the values in the Y vector and see if that works, but you shouldn't need to. 

 

Last edited by msargent at 12:11 Mar 03, 2015.