Llm reviews from Reddit

Summary

We analyzed 190 Reddit reviews across 22 subreddits and 71 posts to rank the best Llm brands recommended by redditors, including communities like r/LocalLLaMA, r/LocalLLM, r/LLMDevs, r/ChatGPTCoding, r/salesforce. Top-rated brands include Qwen (4.1/5), Qwen3 (4.3/5), Gemma (4.1/5).

Stats

Reviews190

Subreddits22

Posts71

Brands67

Products48

Top communities

r/LocalLLaMA98 r/LocalLLM32 r/LLMDevs11 r/ChatGPTCoding9 r/salesforce8

190 reviews from

and

By Brand

By Product

Qwen

4.1

(22)

"This one just keeps getting it right."

"Qwen 2.5 has been doing wonders for me."

"Qwen2.5 coder 32b"

"Qwen 2.5 coder 33 is doable on 3090/4090 and really good."

"If you’re on a Mac, use the MLX versions and quants. It’s fast."

"Qwen good at coding."

"Everyone’s talking about Qwen which makes sense due to its recent release."

"Qwen 2.5 coder 32b or deepseek r1 distilled qwen 2.5"

"Qwen 32b coder instruct."

"For coding, either Qwen 2.5 Coder 32b Instruct."

Qwen3

4.3

(16)

"Yes PocketPal is Perfect. I would suggest trying out Qwen3 1.7B."

"I'm running QWEN3 A3B 30 MLX and I reach 60.12 tok/sec 528 tokens 4.19s to first token."

"If hallucinations are critical, look into qwen 3 30b A3B. It has the lowest hallucination rate of all opensource models tested."

"So far qwen3 is really the only game in town for consistent tool calling for me at small sizes."

"Try out different quantizations and see which token speed - intelligence tradeoff you’re comfortable with. If it’s too slow, try qwen3 0.6B."

"Probably Qwen3 1.7b is the biggest you can run via Pocketpal."

"Try the new Qwen3 0.6b or 1.7b. You may even pull of 4b."

"Using M1 Max (64G), we tested Qwen3-30B-A3B for constrained writing."

"Qwen 3 is pretty good."

"OP could use qwq/qwen 3 32b for analysis then if needed for detailed analysis."

Gemma

4.1

(12)

"For me, it offers the best balance between RAM usage (around 32GB) and performance."

"Imo gemma is the best."

"Gemma-2-Ataraxy-9B, see which one works better for you."

"Gemma 2 9b is very good for translations."

"Your best bet is to get at least Gemma 2 9b."

"You could test gemma-2-2b and Llama-3.2-3B-Instruct."

"Gemma2-2b-it probably, also worth trying the small llama 3.2 3b."

"Gemma is good for general multilingual use."

"Gemma2 for European languages imo."

"Gemma-2 9B might be a good option."

Llama

3.8

(12)

"Llama and Gemma families are the best options for general purpose models."

"Llama 3.3 70B may be a place to start."

"There are several finetunes that might be useful for your use case."

"I would suggest llama 3.2 7b."

"Ended up using Qwen 2.5 3B coder + llama 3.2 3B + OLMoE for offline inferencing."

"Llama 3.2."

"Llama3.2 3b prob works without the ai compute chip."

"I've had some good results with Llama and Mistral so far."

"The 3B version, which will run almost anything with a GPU."

"Perhaps llama3:70B would deliver good results."

Mistral

4.1

(11)

"Current winner is Mistral Small."

"Mistral is exceptional at design."

"Im loving mistral.rs right now, its like vllm with less headaches"

"The latest mistral small that came out looks really interesting."

"Most LLM's will do fine like any mistral based one orso."

"Mistral Nemo is probably a smallest one producing interesting results."

"Behemoth 123b on Mistral."

"Mistral is quite good too."

"Have found Mistral-Small-22B-ArliAI-RPMax-v1.1-Q4_K_S to be decent for NSFW."

"Mistral Small 22B and it's finetunes, like Cydonia."

Claude

4.1

(9)

"I can chat with my vault, and Claude can do things too, like take meeting notes."

"Only Claude Sonnet 3.5 seems to write good code for me."

"Claude, use Tailwind for styling - LLMs are really good at it."

"I like the Claude x Cursor combination."

"I've been using Claude and it's pretty damn good."

"Cursor with Claude works really well."

"Claude is good for sci-fi."

"Even Claude would struggle."

"Claude works well but always make sure to verify your code."

Ollama

4.0

(6)

"Among the local models ollama/qwen2.5-coder:32b seems to be the best."

"Look for “Ollama”. It runs local (on Mac)."

"I use these from Ollama: Coding: qwen2.5-coder:32b"

"I ended up using Ollama. It works fine with ubuntu."

"Use ollama and run some of these from your home computer."

"I migrated from ollama to LMStudio."

Deepseek

4.6

(5)

"Runs surprisingly quick on my no-gpu laptop."

"Deepseek is such a great model."

"Deepseek-coder."

"Recently I've used deepseek which was good too considering it's free."

"Deepseek Coder 2 (3.0 you'll unlikely be able to run at decent performance)."

Qwen 2.5

4.4

(5)

"I'd say Qwen 2.5 coder takes the cake."

"I find Qwen2.5 to perform exceptionally well for Asian languages."

"Qwen 2.5 coder works well for code completion."

"Good for general purpose translations."

"[https://huggingface.co/Qwen/Qwen2.5-1.5B]."

#10

Github Copilot

4.4

(5)

"Yo, for coding copilot is straight 🔥."

"I use copilot in word to write a script for a 45 min presentation."

"Github CoPilot and OpenAI o1 are my daily drivers."

"GitHub Copilot works well for me."

"Github Copilot is still great lol."

#11

NVIDIA A6000

5.0

(4)

"I would go for A6000 or 6000 Ada route over consumer cards."

"For training I'd sooner get 8x A6000s over 8x 5090s."

"You have RTX A6000 cards with 48GB VRAM each."

"Just get a6000 cards and rent h200 when you like the output of your first epoc."

#12

LM Studio

3.4

(5)

"Qwen 2.5 coder is a good model to be using locally."

"LM Studio has MLX support right now, with easy HF access, and you can usually squeeze out another 1 t/s with speculative decoding."

"I've already seen LMstudio eating up nearly 1gb of memory, on a mac that means less GPU memory available."

"If you want to use an easy to use UI and want to stick to ggufs with llama.cpp, use LM Studio."

"If you want speed (and OP seems to be mainly interested in speed), don't use LM Studio."

#13

vLLM

4.3

(4)

"Purely for performance, I think vLLM is the one to beat"

"VLLM has been my choice when I want my LLM to run fast (Mistral 3.1 small 24b)."

"Prefix caching is amazing for best-of-n style generative tasks."

"Saw several people recommending vLLM for speed with CUDA."

#14

Mistral Large

4.7

(3)

"Mistral large 2407 is 👌 for code reviews."

"I've noticed that mistral-large is the best for translating into French and German."

"It's definitely still worth to give Codestral or Mistral Large a shot."

#15

Microsoft

4.7

(3)

"I’d recommend Microsoft’s Phi-2."

"Don't go beyond 7b, 8b. MS Phi 4 is most certainly your best bet."

"This should do almost all of your job without much hassle."

#16

Cursor

4.3

(3)

"It can see my whole vault, create, move and organize notes and folders, run templates."

"Cursor and Windsurf are solid options as well."

"Cursor (IDE) or Github Copilot. You can try both for free."

#17

llama.cpp

4.0

(3)

"For mixed CPU / GPU inference, I think LlamaCPP's hard to beat."

"Llamacpp is the engine under the covers of many of the other products mentioned."

"Llama.cpp is the way to go if you don't want to mess with lots of Python dependencies, especially on Windows."

#18

Midnight Miqu

5.0

(2)

"Midnight Miqu is a gift and holding steady as my default model."

#19

AI Studio

5.0

(2)

"You want aistudio.google.com."

"Aistudio.google.com has the latest model at the top of the livebench benchmark, for free."

#20

Aya Expanse

4.5

(2)

"Quality is better than Qwen 2.5 in m."

"It seemed to be also a fine model on my first tries."

#21

Sonnet

4.5

(2)

"Sonnet works best for almost all coding tasks."

"Sonnet all the way!"

#22

O3

4.0

(2)

"O3 is tested best for deep context understanding on long contexts."

"O3 and Gemini 2.5 pro are the best for long context llms."

#23

Sonnet 3.5

4.0

(2)

"For my relatively simple needs I find sonnet 3.5 the best."

"As a writer I have found Sonnet 3.5 good."

#24

Codestral

4.0

(2)

"For web development I prefer Codestral."

"It's definitely still worth to give Codestral or Mistral Large a shot."

#25

exllama

4.0

(2)

"Exllama is great, it's fast, but I've found myself using llama.cpp more and more"

"Single user? exllama (tabby is popular, I've used it before, it's a bit slower than base exllama"

#26

Phi4

4.0

(2)

"I'm using Phi4 and prompting with 'Give me a Stable Diffusion prompt for x' Works a treat"

"Phi4 also has been showing some promising initial results."

#27

ChatGPT

4.0

(2)

"ChatGPT and Claude. Make sure to double check though."

"Chatgpt knows an awful lot about home assistant."

#28

MiniSearch

4.0

(2)

"Check out minisearch."

#29

Magnum

4.0

(2)

"Try magnum v4 72b."

"Magnum 4 is definitely worth a look."

#30

CohereForAI

5.0

(1)

"Try aya-expanse-32b or aya-expanse-8b."

#31

Granite

5.0

(1)

"Granite 3.1 is currently very high on the GPU Poor LLM Arena, give it a try."

#32

IBM

5.0

(1)

"I use granite-code models with Ollama. It's a family of open foundation models by IBM."

#33

Qwen 2.5 32B

5.0

(1)

"I've got the best results with Qwen2.5:32B."

#34

Aya

4.0

(1)

"Try Aya Expanse 8B, it supports 23 languages."

#35

Distilled Version

4.0

(1)

"The 14b distilled version runs really well."

#36

Facebook

4.0

(1)

"Try a translation model like this https://huggingface.co/facebook/seamless-m4t-v2-large"

#37

Helsinki NLP

4.0

(1)

"You could try Opus: https://huggingface.co/Helsinki-NLP/opus-mt-tc-big-en-cat_oci_spa"

#38

Backyard AI

4.0

(1)

"Backyard AI is pretty cool for phone if you have a powerful desktop PC."

#39

Claude Sonnet

4.0

(1)

"I use Claude Sonnet on the apps.abacus.ai/chatllm service."

#40

Meta.ai

4.0

(1)

"Meta.ai Free and pretty good."

#41

ModernBert

4.0

(1)

"Use ModernBert."

#42

Mistral Chat

4.0

(1)

"My experience with Mistral Chat was quite good."

#43

ChatGPT, Claude

4.0

(1)

"ChatGPT and Claude are good at Haskell, Julia and python."

#44

OpenWebUI

4.0

(1)

"I'd highly recommend deploying a docker instance with OpenWebUI and ollama."

#45

DeepSeek-V3

4.0

(1)

"DeepSeek-V3 or Qwen2.5-Coder-32B."

#46

Deepseek, Qwen

4.0

(1)

"Chinese models like Deepseek and Qwen."

#47

CodeGeex

4.0

(1)

"CodeGeex."

#48

Supernova Medius

4.0

(1)

"Supernova Medius seems to work as well as most."

#49

Nvidia Nano

4.0

(1)

"The new (cheap) Nvidia Nano supposedly excels at this exact kind of thing."

#50

Google NotebookLM

4.0

(1)

"Have you tried Google NotebookLM?"

#51

potpie.ai

4.0

(1)

"I've tried to solve the codebase context problem with potpie.ai."

#52

Vellum.ai

4.0

(1)

"Check out vellum.ai."

#53

Genkit

4.0

(1)

"Genkit - made by google, still in Alpha/Beta."

#54

Haystack

4.0

(1)

"Haystack (not used this myself) but often recommended."

#55

TinyLlama

4.0

(1)

"Tinyllama."

#56

C3TR Adapter

4.0

(1)

"We recommend trying this with llama.cpp using the ngl option."

#57

Whisper

4.0

(1)

"Absolutely run whisper first, in my experience."

#58

Llamaparse

4.0

(1)

"Try llamaparse."

#59

OpenAI

4.0

(1)

"Still having a chatgpt plus since i often use it to test stuff."

#60

Corcel

4.0

(1)

"Corcel has Llama 3.0 and 3.1 It's pretty good and free."

#61

Gemma 2

4.0

(1)

"I've been pretty happy with gemma2 9B for English<->Hungarian."

#62

Llamaindex

3.0

(1)

"Perhaps llamaindex might do the job?"

#63

Groq

3.0

(1)

"Recommend to Use free groq api for llama 3.30 70B."

#64

ChatterUI

3.0

(1)

"ChatterUI. You don't need special hardware."

#65

Layla, ChatterUI, MLC, Maid, PocketPal

3.0

(1)

"There's Layla, ChatterUI, MLC, Maid and PocketPal on Android."

#66

Qwen, Gemma, Llama

3.0

(1)

"Use the more state of the art open-weights models like Qwen, Gemma, Llama."

#67

RWKV

3.0

(1)

"Another option is RWKV 3B."

Discover your audience

GummySearch is an audience research toolkit for 130,000 unique communities on Reddit.

If you are looking for startup problems to solve, want to validate your idea or find your customers online, GummySearch is for you.

Tell me more

Get started