My Local LLM Setup: One Model, Many Personalities

Sun, 03 May 2026 00:00:00 +0000

Hardware

Component	Spec
CPU	11th Gen Intel Core i7-11700K (16 threads) @ 5.00 GHz
GPU 1	NVIDIA GeForce RTX 4060 Ti 16GB (Discrete)
GPU 2	NVIDIA GeForce RTX 4060 Ti 16GB (Discrete)
Memory	128 GiB

Running a large language model locally is one thing. Serving it intelligently to a variety of workloads is another. This post walks through how I serve a single Qwen 3.6 model via llama.cpp and expose it as multiple purpose-tuned model aliases through LiteLLM — giving different clients the right inference parameters without ever loading a second model.

Qwen on d3v0ps.cloud

My Local LLM Setup: One Model, Many Personalities

Hardware