AI - performance testing LLMs using LM Studio

see also:

Windows 11 Intel i9 laptop with 64Gb RAM and nVidia 8Gb VRAM GPU

model model size GPU layers Load All into RAM time to 1st token (secs) gen t (secs) tokens/sec RAM used response quality
Mistral 7 q6 K 5.94Gb all 32 No 35, 11 24, 37 12.7, 11.8 11Gb OK - only just
Mixtral 8×7 q2 15.64Gb 14 of 32 No 60, 36 76, 103 4.74, 4.5 20Gb OK - only just
Mixtral 8×7 q3 K_M 20.36Gb 9 of 32 No 632, 78, 121 105, 96, 91 3.8 26Gb excellent
Mixtral 8×7 q3 K_M+ RAM 20.36Gb 9 of 32 YES 826, 146, 91 105, 77, 77 3.8, 3.86, 3.89 25Gb excellent
Mixtral 8×7 q4 K_M 26.44Gb 9 of 32 No 278, 236 37 4 33Gb excellent
Mixtral 8×7 q4 K_M + RAM 26.44Gb 9 of 32 YES 860, 151 54, 92 4, 3.8 32Gb excellent