Local Llm reviews from Reddit

Summary

We analyzed 159 Reddit reviews across 14 subreddits and 68 posts to rank the best Local Llm brands recommended by redditors, including communities like r/LocalLLaMA, r/LocalLLM, r/ollama, r/AI_Agents, r/frigate_nvr. Top-rated brands include Mistral (4.2/5), Qwen (4.2/5), Ollama (4.2/5).

Stats

Reviews159

Subreddits14

Posts68

Brands60

Products60

Top communities

r/LocalLLaMA100 r/LocalLLM30 r/ollama7 r/AI_Agents4 r/frigate_nvr3

159 reviews from

and

By Brand

By Product

Mistral

4.2

(15)

"I have tried it with some examples and really impressed with response."

"Mistral Nemo Instruct @ Q8 is my go to Local LLM for everything on my 4090, fast enough, long context and does what ever."

"Been using mistral to talk for my reference manager."

"Just wanted to provide that Mistral works well for this task."

"You can try Mistral24B and GML-4 too."

"Mistral small 24b should fit nicely in your vram."

"Mistral small 24b is 14.3 gb at q4 but for code you honestly should use atleast Q6 and best q8."

"Among small models, out of box Mistral Nemo 12b and Ministral 8b."

"Here's my recommendations: General: Mistral Small 3 24B/Qwen 2.5 32B"

"Most LLM's will do fine like any mistral based one orso."

Qwen

4.2

(14)

"Qwen2.5 Coder 32B, its an amazing coding model but the con is it requires larger GPU."

"Qwen 32 B are the best for basic coding."

"But in general for a local model I'd say Qwen 2.5 coder takes the cake."

"Lately Qwen seems the best for a small LLM."

"Qwen2.5-32B-instruct"

"+1 Qwen Coder 7b , Also give a try to mistral-nemo."

"They are exceptionally smart in their respective flavours - non-thinking and thinking."

"I would use a q4km quant of fuse-o1-qwq for a reasoning coding model, and qwen 2.5 32b coder for a non reasoning coding model."

"Qwen 2.5 coder 32b or deepseek r1 distilled qwen 2.5"

"Qwen 2.5 and 24gb vram."

Ollama

4.2

(12)

"If you want to run an LLM locally check out ollama"

"If you are on a mac, check out ollama.ai. It makes it damn simple to get started."

"You need a 'cline model', take a look -> https://ollama.com/maryasov/qwen2.5-coder-cline/"

"Just try them.. Ollama.com/models."

"All recommendations I've heard are ollama to start out and get the workflow or framework set up."

"Ollama and Llama 3.2 3B that is the smallest model, can be your entry point."

"Ollama - Code [https://ollama.com/search?q=code]"

"You can use Ollama + llama3.2 vision + set format = json in your model call."

"You can use Ollama with OpenWebUI to run a locally hosted model."

"Ollama seems to be one of the more popular and easier ways to stand one up."

Llama

4.0

(11)

"I would suggest giving Llama-3.3-Nemotron-Super-49B-v1 a try. I found it to be pretty smart, and it should fit with a 5090."

"There are a few fine-tuned LLaMA 3.1 models focused on tool use that are worth considering: ToolACE-2-Llama-3.1-8B, watt-tool-8B."

"I think llama model are your best bet, with framework like ollama."

"I would suggest llama 3.2 7b."

"Llama 3.2 3B + LM studio has worked well for my initial development."

"You could test gemma-2-2b and Llama-3.2-3B-Instruct."

"Check out open source LLM like LLAMA."

"LLama 3 8B is your best bet."

"The new llama 3.2 11B released today!"

"I use local LLMs for work all the time that I host in my own site."

QwQ

3.9

(11)

"QwQ-32B and Gemma-3-27B are a must have."

"QwQ for sure can run on your machine easily, and its performance is comparable with big models."

"Qwq-32B with the context length bumped up has been my workhorse as of late, its latent knowledge is a bit more limited due to its size, but it works hard to get a good answer."

"As others have mentioned, QwQ 32b and Gemma 27b are good options."

"QwQ-32b for a slow reasoning model, Deepseek-R1-distill-Qwen-32b for a faster reasoning model"

"QwQ 32B for a thinking model For a non thinking model… maybe gemma 3 27B"

"QwQ-32B-GGUF might be better than qwen."

"QwQ wins over older qwen coders if you want all around universal LLM, kinda like Mistral Small."

"Last 0.5 years the best coding models have been Qwens."

"Only qwq consistently executes tool calls accurately, while the others perform poorly. However, the downside with qwq is its slow response time."

Gemma

4.1

(9)

"Gemma 3 is imo the best local model for conversational interaction right now."

"Gemma 3 27b / quantised"

"Gemma QATs are good, so is Granite."

"1. Gemma 3 27b 2. QWQ 3. DeepSeek R1 - 14b"

"Gemma3 does a good job. I have also had good luck with Qwen2-VL and Qwen2.5-VL."

"I've had amazing results with gemma3-27b."

"With 16GB of RAM my go to was Gemma 2 9B on my 16GB MacBook Air. Now that Gemma 3 is out the 12B should run just fine"

"For me, it offers the best balance between RAM usage (around 32GB) and performance."

"Gemma 2 9b."

Lmstudio

4.5

(6)

"LMStudio is extremely easy to use with AMD. It will automatically download the Vulkan llama.cpp backend for you and then it's just a matter of downloading a model and you're ready to go."

"Install Lmstudio and plug in a few of the best ones to test them out."

"First project that’s an immediate turnkey solution in my opinion is LMStudio."

"The vulkan backend of llama.cpp performs quite well and is really easy to use with something like LMstudio."

"I'm using lmstudio. It works just nice. MLX support is also great."

"Ejecutar LLMs locales me ha dado mejores resultados que usar chatgpt 3.5."

Qwen 2.5 Coder

5.0

(4)

"Qwen2.5Coder 7B is the way. Q4 should do the trick"

"For coding, i'd say go with Qwen2.5 Coder 7B and Flux for image generation."

"Qwq and qwen2,5 coder are a very strong combo right now."

"Qwen 2.5 coder takes the cake."

Gpt4all

4.0

(4)

"It doesn't get easier than this: https://gpt4all.io/index.html"

"Gpt4all and sanctum ai are local LLMs that handle documents."

"Try Gpt4all with sbert plugin."

"The document handling I have got it with GPT4ALL but I can't find a local LLM with all these together."

#10

MLX

5.0

(3)

"MLX is faster and more accurate than GGUF and works far better on Macs then GGUF"

"MLX and LM Studio is fastest and best way to run LLMs on Apple Silicon"

#11

Local LLM

4.7

(3)

"I've been working on a PA for a few years and use a local LLM for the chat aspect."

"I like being able to use an LLM without all my information going through a corporation."

"I use local LLM for various purposes such as summarizing content, generating art prompts, aiding in debates, examining notes, serving as a conversational assistant..."

#12

Qwen2.5-Coder

4.3

(3)

"Qwen2.5-coder is amazing. I would try and get a 32 or 14b model in at least Q4 feeling good for you with at least 8k context."

"Qwen2.5-coder so far has been good. Still LLMs are not very good at detailed work."

"Qwen2.5-coder. Prob best bang for the buck I have found."

#13

privateGPT

4.0

(3)

"I use privateGPT for exactly your use case."

"New program called privateGPT will work well."

"Maybe turn your data into a pdf or text file, then feed it to https://github.com/imartinez/privateGPT."

#14

ChatterUI

3.7

(3)

"Layla & Layla Lite ChatterUI Maid MLC LLM Sherpa Private LLM MLC Chat"

"Chatterui Layla Lite Maid"

"Chatterui is pretty good BUT the last version has a problem and is very slow."

#15

Aya Expanse

5.0

(2)

"Try aya-expanse-32b or aya-expanse-8b."

#16

Llama 3.1 8b

4.5

(2)

"Llama 3.1 8b is the best at this."

"You can give a Llama 3.1 8b and Gemma 2 9b a shot."

#17

Mistral Small 24B

4.5

(2)

"Mistral-Small-24B really does feel GPT-4 quality despite only needing around 12GB of RAM to run"

"I would go with mistral small 24B which Is very good in european languages."

#18

LM Studio

4.5

(2)

"LM studio and Mistral-Small-3.1-24B-Instruct-2503 heard is best model able to run at 16 GB RAM on Apple silicon"

"The team uses LM Studio but ollama is decent as well."

#19

Claude

4.5

(2)

"Claude 3.7 is the best, the rest are shit"

"Claude and GPT nearly always write good code for me."

#20

Mixtral

4.5

(2)

"You will get around 25T/s."

"What LLM would you suggest that I deploy, considering that I have 48GB vram to run the finished model on my PC? Is Mixtral 8*7B that top choice for that amount of vram."

#21

OpenRouter

4.0

(2)

"Openrouter is probably better. They have several back end providers and they have stated policies on data retention."

"Just sign up for OpenRouter, make sure the logs are turned off, and use that."

#22

KoboldCPP

4.0

(2)

"I would recommend taking the highest B you can still endure to use regarding slowness."

"I'm using koboldcpp, which is basically just downloading an exe and a gmgl/gguf model file."

#23

Rombo

4.0

(2)

"I tried the Rombo with Q4_1 quantification. After a few iterations and suggestions, I got the bouncing ball inside a rotating rectangle! Yes I guess the real big models could one-shot it, but for a local tool, this is probably the best for now."

"Given your single GPU rig, I can recommend trying Rombo 32B the QwQ merge - it is really fast on local hardware, and I find it less prone to repetition than the original QwQ."

#24

DeepSeek

4.0

(2)

"DeepSeek V3"

"I would recommend distilled deepseek as llama and qwen."

#25

Nvidia

4.0

(2)

"It works on my 3050 but it's using most of the vram and most of the compute."

"I pulled down the Nvidia beta LLM when it dropped."

#26

Moondream

3.0

(2)

"The year-old Moondream2 version was in gguf and still works very quickly with ollama."

"Moondream 2 very fast, but newest versions is not usable with GPU, they supports only CPU inference."

#27

unsloth

5.0

(1)

"Using unsloth’s versions of qwen2.5-coder in 14b and 32b in q5_K_M and q6_K with great success"

#28

Allen AI Institute

5.0

(1)

"It works great for getting summaries of research papers around a prompt/topic."

#29

Dobby Unhinged

5.0

(1)

"Dobby Unhinged is VERY fun to talk to."

#30

Private GPT

5.0

(1)

"Private GPT is the best."

#31

Goliath

5.0

(1)

"Goliath is excellent for creative writing."

#32

Phi4

5.0

(1)

"I'm using Phi4 and prompting with 'Give me a Stable Diffusion prompt for x' Works a treat"

#33

Apollo AI

5.0

(1)

"I have tried several LLM iOS apps including paid one and I find this the best."

#34

ModernBert

5.0

(1)

"Use ModernBert"

#35

PocketPal

5.0

(1)

"On my Samsung S23 PocketPal performs the best - 12 tps for 3B models."

#36

D. AI

5.0

(1)

"This D. Ai is the best For me this works veey well."

#37

GGUF

4.0

(1)

"Using unsloth’s GGUF models in q5_K_M and q6_K is faster than the original models"

#38

deepcogito

4.0

(1)

"The new deepcogito models claim to be really competitive at tool calling."

#39

Qwen3

4.0

(1)

"Qwen3:30b-a3b runs much faster with similar intelligence, on my RTX 3080 10GB I get 15T/s"

#40

InternVL

4.0

(1)

"Try the new InternVL3 which just dropped today. They have many different parameter sizes."

#41

IBM

4.0

(1)

"My current favorite phone sized model is IBM Granite 3.2 2B (Q5)"

#42

Mistral Nemo, Gemma

4.0

(1)

"Start with Mistral Nemo or Gemma 3 12b."

#43

Gemma, Qwen, DeepSeek

4.0

(1)

"Gemma, Qwen and distilled models of DeepSeek are good at coding."

#44

Mistral-Small-3.1-24B-Instruct-2503, Gemma, QwQ, Qwen2.5, Phi-4

4.0

(1)

"Mistral-Small-3.1-24B-Instruct-2503 Q4, Gemma 2 27B and QwQ-32B are a bit too large, and/or slow. If you don't like Mistral or want more room for context, try Gemma 3 12B, Qwen2.5 14B or Phi-4 (14B)"

#45

Gemini 2.5

4.0

(1)

"Gemini 2.5 is not quite Claude but damn it’s saving me money."

#46

Nvidia Llama

4.0

(1)

"This is my favorite for coding and other left brained activities."

#47

27B Q8_0

4.0

(1)

"I recommend that one. Used to run bigger models but this one is really good."

#48

Obsidian

4.0

(1)

"Obsidian + Copilot Plugin + Ollama/LMStudio with a local llm that is good enough for your hardware."

#49

Letta

4.0

(1)

"I recommend Letta."

#50

Hugging Face

4.0

(1)

"Check this sites, and select the most download models: Hugging Face - Code"

#51

Whisper

4.0

(1)

"I am looking into buying a Mac Mini M4 with the lowest setup to run Whisper locally."

#52

Deepseek

4.0

(1)

"I had success with the Deepseek models to generate charts from csv."

#53

Orca

4.0

(1)

"I was able to get what I think are good results with Orca-2 13B and Solar-10.7B"

#54

StabilityAI

4.0

(1)

"StabilityAI is supposed to have a good quality models"

#55

OpenWebUI

4.0

(1)

"I'd highly recommend deploying a docker instance with OpenWebUI and ollama."

#56

Llamaparse

4.0

(1)

"Take a look at Llamaparse."

#57

LLM for Unity

4.0

(1)

"Yes, you can use LLM for Unity if you work with the Unity engine."

#58

vLLM

4.0

(1)

"I use vLLM or oobabooga (text-generation-webui) openai compatible APIs."

#59

Microsoft

3.0

(1)

"Try Microsoft Omniparser"

#60

Open-source bot

3.0

(1)

"What open-source bot is best for uncensored practical advice?"

Discover your audience

GummySearch is an audience research toolkit for 130,000 unique communities on Reddit.

If you are looking for startup problems to solve, want to validate your idea or find your customers online, GummySearch is for you.

Tell me more

Get started