whisper.cpp

History

Jeff Bolz 45f1f9144f vulkan: Optimize soft_max (llama/10301) * vulkan: Optimize soft_max Large soft_max could already saturate memory, but small/medium sizes were pretty slow. The bulk of the gains for them comes from using a smaller workgroup size, and making the workgroup size match the subgroup size also makes the barriers much cheaper. Cache some values in locals to avoid refetching/recomputing. And stamp out a few "template instantiations" so smaller cases will fully unroll. Add a missing early return for OOB rows. This happens when there are more than 512 rows and the dispatch is 512 x H. * vulkan: Further soft_max optimizations Restore the workgroup size of 512 case, use it for >1024. Use unrollable loops for more iteration counts.		2024-11-20 21:00:08 +02:00
..
include	ggml: new optimization interface (ggml/988)	2024-11-20 21:00:08 +02:00
src	vulkan: Optimize soft_max (llama/10301)	2024-11-20 21:00:08 +02:00
.gitignore	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00
CMakeLists.txt	CUDA: remove DMMV, consolidate F16 mult mat vec (llama/10318)	2024-11-20 21:00:08 +02:00
ggml_vk_generate_shaders.py	whisper : reorganize source code + improve CMake (#2256 )	2024-06-26 19:34:09 +03:00