He also goes over some other things about DeepSeek and their philosophy too. He explains why R1 works so well, it is a bit technical and he talks fast but the information is all there and very well presented in my opinion. You might have to pause/rewatch the middle section (at least I did). He also goes over benchmark data for the big model and the distills.