it:ai_lmstudio_tests

AI - performance testing LLMs using LM Studio

Windows 11 Intel i9 laptop with 64Gb RAM and nVidia 8Gb VRAM GPU

  • LM Studio v. 0.2.12 Jan 2024
    • temp 0.5, n_predict 1024, top_k 50, repeat_penalty 1.1, min_p 0.05, cpu threads 4, n_batch 512, context length 32768 tokens (dropping to 4096 seems to speed up tokens/sec by 10% but otherwise no obvious benefit in speed), experts to use 2 (for mixtral)
  • system prompt requiring tree of knowledge analysis etc
model model size GPU layers Load All into RAM time to 1st token (secs) gen t (secs) tokens/sec RAM used response quality
Mistral 7 q6 K 5.94Gb all 32 No 35, 11 24, 37 12.7, 11.8 11Gb OK - only just
Mixtral 8×7 q2 15.64Gb 14 of 32 No 60, 36 76, 103 4.74, 4.5 20Gb OK - only just
Mixtral 8×7 q3 K_M 20.36Gb 9 of 32 No 632, 78, 121 105, 96, 91 3.8 26Gb excellent
Mixtral 8×7 q3 K_M+ RAM 20.36Gb 9 of 32 YES 826, 146, 91 105, 77, 77 3.8, 3.86, 3.89 25Gb excellent
Mixtral 8×7 q4 K_M 26.44Gb 9 of 32 No 278, 236 37 4 33Gb excellent
Mixtral 8×7 q4 K_M + RAM 26.44Gb 9 of 32 YES 860, 151 54, 92 4, 3.8 32Gb excellent
it/ai_lmstudio_tests.txt · Last modified: 2024/01/31 04:32 by wh

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki