I fed it some 200 lines of non-trivial Delphi code and asked it to implement a stub method in the middle of the code, and it did a decent job.
The implementation was solid Delphi code and used the existing code correctly, but it didn't solve the core problem 100% correct.
However, I'm just using the Q4 variant as I'm constrained by my GPU's 16GB of VRAM, and my question was just a simple one-liner with very few details, and the code wasn't commented. So overall I was pleasantly surprised.
Though definitely too heavy for my system for such tasks. Due to the thinking it used 14k tokens total, leading to just partially offloading layers on the GPU, which in turn meant just 10 tok/s response speed.
Edit: I just gave it a very simple prompt: "Give me a Python script to anonymize strings inside a file. The strings are 4-character long and inside single quotes, like this: 'abcd'"
And it is hopelessly trying to decide the regex, missing it time and time again, for the past 5 minutes.
Claude, o1-mini, o3-mini and even gpt-4o did it in seconds.
Not explicitly written in the summary but this seems to be mostly for Python right?
Guessing from the reference to TACO, LiveCodeBench and the mention of a runtime sandbox for Python.
I fed it some 200 lines of non-trivial Delphi code and asked it to implement a stub method in the middle of the code, and it did a decent job.
The implementation was solid Delphi code and used the existing code correctly, but it didn't solve the core problem 100% correct.
However, I'm just using the Q4 variant as I'm constrained by my GPU's 16GB of VRAM, and my question was just a simple one-liner with very few details, and the code wasn't commented. So overall I was pleasantly surprised.
Though definitely too heavy for my system for such tasks. Due to the thinking it used 14k tokens total, leading to just partially offloading layers on the GPU, which in turn meant just 10 tok/s response speed.
The model was fine-tuned from DeepSeek-R1-Distill-Qwen-14B. Maybe the knowledge was inherited from the base model?
https://huggingface.co/agentica-org/DeepCoder-14B-Preview
Is this on ollama yet?
Edit: I just gave it a very simple prompt: "Give me a Python script to anonymize strings inside a file. The strings are 4-character long and inside single quotes, like this: 'abcd'"
And it is hopelessly trying to decide the regex, missing it time and time again, for the past 5 minutes.
Claude, o1-mini, o3-mini and even gpt-4o did it in seconds.
So I'd say this is very far from being useful...