Changing the GPU used for Large Language Models can lead to differences in behavior and output due to factors such as parallel computation handling, hardware architecture, and quantization effects.

12m read timeFrom medium.com
Post cover image
Table of contents
Changing the GPU is changing the behaviour of your LLM.1. Why this article ?2. Setup the experimentation3. The experiment results: T4 vs A10G4. T4 Colab vs T4 SageMaker.5. Why are the answers generated by the same inputs and the same LLM so different across two GPUs?6. Exploring Probabilities.7. Why do the calculation differ depending on the GPU ?8. Should I be concerned about scaling an LLM horizontally using multiple GPUs?Conclusion:
1 Comment

Sort: