Qwen3-4B-Thinking: “This is art—pelicans don’t ride bikes!”

12 Aug, 2025

These are relatively tiny models that punch way above their weight.

I used the Instruct model to summarize this Hacker News conversation about GPT-5.

The good news is Qwen spat out a genuinely useful summary of the conversation! You can read that here—it’s the best I’ve seen yet from a model running on my laptop, though honestly I’ve not tried many other recent models in this way.

The bad news... it took almost five minutes to process and return the result!

They’re fun, they have personality and I’m confident there are classes of useful problems they will prove capable at despite their small size. Their ability at summarization should make them a good fit for local RAG, and I’ve not started exploring their tool calling abilities yet.

I tried the model locally (ollama) and on openrouter. A 4B model that runs on a phone with some (unreliable) intelligence is insane. If it can do something well - like if it could just summarize reliably - that would be huge. In practice when you want to make use of intermediate LLMs between user request and response, the latency become super crucial - and although providers like groq and cerebras can push 1000+tok/s, the time-to-first-token becomes the bottleneck. A model small enough to easily host on a server you can control could solve this to an extent.

#im-4 #qwen #simonw #slm