whisper.cpp/ggml
Jeff Bolz a753a82462 vulkan: get the first command buffer submitted sooner (llama/10499)
This is an incremental improvement over #9118 to get work to the GPU a bit
sooner. The first part is to start with a smaller number of nodes before
the first submit, and ramp it up to the current 100 nodes/submit. The
second part is to reduce the dryrun overhead for all the nodes that just
need to request descriptor space.

With these changes I get around 1-2% speedup on RTX 4070 combined with my
old Haswell-era CPU.
2024-12-08 20:14:35 +02:00
..
include ggml-cpu: support IQ4_NL_4_4 by runtime repack (llama/10541) 2024-12-08 20:14:35 +02:00
src vulkan: get the first command buffer submitted sooner (llama/10499) 2024-12-08 20:14:35 +02:00
.gitignore whisper : reorganize source code + improve CMake (#2256) 2024-06-26 19:34:09 +03:00
CMakeLists.txt ggml : add support for dynamic loading of backends (llama/10469) 2024-12-08 20:14:35 +02:00