The entire training process of DeepSeek R1 is nothing but using different way of reinforcement learning on top of their base model (i.e. deepseek V3)
Starting with a tiny base model that runs locally, we’ll build everything from scratch using DeepSeek R1 tech report while covering theory alongside each step. — Read More