Day 95: RNN Diagrams Compared to Code

Today was a solid day of work on learning Recurrent Neural Networks (RNNs). I drew some diagrams, took some notes, and asked a lot of questions about how the current example is structured. The current example I am working on is a character prediction RNN where, at each time step ‘t’ it should predict the next character in the word/sentence.

So here are some of my questions:

The diagrams I have seen, represent the hidden layer of the RNN at each time step with what appears to be only one neuron. Is that actually the case? I mean that doesn’t make sense so I drew out what I imagine it to be when it is expanded upon.

Potential representation of an RNN at time step t. I am not sure so please do not quote me on this yet.

The weights applied between different layers in the RNN at each time step seem to be one value. Is that really the case? Wouldn’t they realistically be a vector (an object with multiple values i.e. [ 0, 0.5, 2.6, -1, 0.3 ])
How does the output from the previous hidden layer get applied at time step ‘t + 1’ with the associated weight ‘W_hh’ ?

I made a reasonable amount of progress and I am excited to get working through the actual code for this example. Thankfully Andrej had posted it up on his GitHub and also wrote a blog about it.

Btw, I took a quick peak; 100 hidden layer size is what I saw with the initialization! My diagram is way too small. I can’t wait to break this code down thoroughly.

Post Views: 242