Make sure n_barrier and n_barrier_passed do not share the cache line to avoid cache line bouncing. This optimization shows performance improvements even for n_threads <= 8 cases. Resurect TSAN (Thread Sanitizer) check so that we can avoid doing expensive read-modify-write in the normal case and just use thread-fence as originally intended. |
||
---|---|---|
.. | ||
cmake | ||
include | ||
src | ||
.gitignore | ||
CMakeLists.txt | ||
ggml_vk_generate_shaders.py |