| model | model size | GPU layers | Load All into RAM | time to 1st token (secs) | gen t (secs) | tokens/sec | RAM used | response quality |
|---|---|---|---|---|---|---|---|---|
| Mistral 7 q6 K | 5.94Gb | all 32 | No | 35, 11 | 24, 37 | 12.7, 11.8 | 11Gb | OK - only just |
| Mixtral 8×7 q2 | 15.64Gb | 14 of 32 | No | 60, 36 | 76, 103 | 4.74, 4.5 | 20Gb | OK - only just |
| Mixtral 8×7 q3 K_M | 20.36Gb | 9 of 32 | No | 632, 78, 121 | 105, 96, 91 | 3.8 | 26Gb | excellent |
| Mixtral 8×7 q3 K_M+ RAM | 20.36Gb | 9 of 32 | YES | 826, 146, 91 | 105, 77, 77 | 3.8, 3.86, 3.89 | 25Gb | excellent |
| Mixtral 8×7 q4 K_M | 26.44Gb | 9 of 32 | No | 278, 236 | 37 | 4 | 33Gb | excellent |
| Mixtral 8×7 q4 K_M + RAM | 26.44Gb | 9 of 32 | YES | 860, 151 | 54, 92 | 4, 3.8 | 32Gb | excellent |