Distillation is basically “compressing” a model. Perhaps someone else’s model. And that’s a lot cheaper than making a model in the first place.
Here’s my post from a couple days ago about why DeepSeek was cheap to make – because it used select datasets and OTHER MODELS.
OpenAI agrees. It was trained on OpenAI. The way OpenAI was trained on the New York Times…?