Fresh off of coding a fully-connected neural net, it's time to turn the chapter to two more specialized neural nets (on our road to the transformer.)
RNNs
This gentle introductory lecture, from a MIT class offered to the public, can give you the birds' eye view of how neural networks process sequences using RNNs, and how the ideas present in the RNN led to the innovations of transformers. Pay special attention to the "many-to-many" architectures.
Don't worry that this is from a course on convolutional nets. The RNN lecture stands alone. Also, I personally didn't spend too many brain-calories trying to understand the intricacies of how LSTMs work. A hand-wavey summary ("it passes hidden state values around like a weird little computer") is OK if, like me, your main priority is learning the transformers architectures built to address common RNN limitations. PS- This lecturer, Justin Johnson, is particularly excellent and you should always try watching his videos.
I've found Serrano to excel at ultra-simple explanations of ML topics. His examples are usually derived from super clear ways of understanding the underlying architectures. In this video, you see an undeniably direct example of an RNN's underlying math, and how its hidden state manipulates its output, and you probably would have a hard time finding it presented in this way anywhere else.
CNNs
A fellow teacher in the Stanford course above, Serena Yeung walks us through convolutional nets and really does a great job. I also found her walkthrough of ResNets to be illuminating. The concepts of residuals will reappear in transformers.
Lecture 5 and then Lecture 9, on architectures.