it:ai_lmstudio_tests
AI - performance testing LLMs using LM Studio
Windows 11 Intel i9 laptop with 64Gb RAM and nVidia 8Gb VRAM GPU
- LM Studio v. 0.2.12 Jan 2024
- temp 0.5, n_predict 1024, top_k 50, repeat_penalty 1.1, min_p 0.05, cpu threads 4, n_batch 512, context length 32768 tokens (dropping to 4096 seems to speed up tokens/sec by 10% but otherwise no obvious benefit in speed), experts to use 2 (for mixtral)
- system prompt requiring tree of knowledge analysis etc
| model | model size | GPU layers | Load All into RAM | time to 1st token (secs) | gen t (secs) | tokens/sec | RAM used | response quality |
|---|---|---|---|---|---|---|---|---|
| Mistral 7 q6 K | 5.94Gb | all 32 | No | 35, 11 | 24, 37 | 12.7, 11.8 | 11Gb | OK - only just |
| Mixtral 8×7 q2 | 15.64Gb | 14 of 32 | No | 60, 36 | 76, 103 | 4.74, 4.5 | 20Gb | OK - only just |
| Mixtral 8×7 q3 K_M | 20.36Gb | 9 of 32 | No | 632, 78, 121 | 105, 96, 91 | 3.8 | 26Gb | excellent |
| Mixtral 8×7 q3 K_M+ RAM | 20.36Gb | 9 of 32 | YES | 826, 146, 91 | 105, 77, 77 | 3.8, 3.86, 3.89 | 25Gb | excellent |
| Mixtral 8×7 q4 K_M | 26.44Gb | 9 of 32 | No | 278, 236 | 37 | 4 | 33Gb | excellent |
| Mixtral 8×7 q4 K_M + RAM | 26.44Gb | 9 of 32 | YES | 860, 151 | 54, 92 | 4, 3.8 | 32Gb | excellent |
it/ai_lmstudio_tests.txt · Last modified: 2024/01/31 04:32 by wh