Johannes Gäßler
|
96b8419b27
|
CUDA: fix FA out-of-bounds reads (llama/7479)
|
2024-06-16 18:19:48 +03:00 |
|
Johannes Gäßler
|
3c63f4cf35
|
CUDA: fix FA out-of-bounds writes (llama/7465)
|
2024-06-16 18:19:48 +03:00 |
|
Georgi Gerganov
|
5848dfd9c8
|
cuda : fix compile warning (llama/7454)
|
2024-06-16 18:19:48 +03:00 |
|
Johannes Gäßler
|
29ab5d0326
|
CUDA: remove incorrect precision check (llama/7454)
|
2024-06-16 18:19:48 +03:00 |
|
Johannes Gäßler
|
45b5b95e29
|
CUDA: deduplicate FlashAttention code (llama/7352)
|
2024-06-16 18:19:48 +03:00 |
|
Johannes Gäßler
|
ec52f900e4
|
CUDA: faster large batch FA without tensor cores (llama/7314)
|
2024-06-16 18:19:48 +03:00 |
|