Chen Xi
68d609a12c
fix the mul_mat_id ut issues (llama/8427)
...
* fix part of mul_mat_id
* skip the bfloat 16 sycl ut
Signed-off-by: Chen Xi <xi2chen@intel.com>
---------
Signed-off-by: Chen Xi <xi2chen@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
Co-authored-by: Chen Xi <xi2chen@intel.com>
2024-08-08 22:48:46 +03:00
Alberto Cabrera Pérez
2af4a52c39
sycl : Reenabled mmvq path for the SYCL Nvidia Backend (llama/8372)
...
* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend
* Reduced verbosity of comment
2024-08-08 22:48:46 +03:00
Ouadie EL FAROUKI
c5b05321e9
Enabled more data types for oneMKL gemm_batch (llama/8236)
2024-07-08 14:53:55 +03:00
luoyu-intel
29a2739d27
Fix WARP_SIZE=16 bug of Intel GPU (llama/8266)
...
* fix group_norm ut
* split softmax
* fix softmax
* add concat support condition
* revert debug code
* move QK_WARP_SIZE to presets.hpp
2024-07-08 14:53:55 +03:00
Neo Zhang Jianyu
ee6d17f6b4
rm get_work_group_size() by local cache for performance (llama/8286)
...
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-07-08 14:53:55 +03:00
luoyu-intel
f096cc6807
Fix the sub group size of Intel (llama/8106)
...
* use warp_size macro for all sycl kernels
* fix mask of permute_sub_group_by_xor
* fix rms_norm with correct warp number
* fix rms_norm_f32/group_norm_f32
* move norm to norm.cpp file
* fix quantize bug
* fix mmvq's batch size
2024-07-08 14:53:55 +03:00
zhentaoyu
db7e0dbe6e
Update SYCL-Rope op and Refactor (llama/8157)
...
* align with rope.cu and move sycl-op to a single file
2024-07-08 14:53:55 +03:00
Georgi Gerganov
e30c679928
whisper : reorganize source code + improve CMake ( #2256 )
...
* scripts : update sync [no ci]
* files : reorganize [no ci]
* sync : llama.cpp
* cmake : link math library
* cmake : build normal ggml library
* files : move headers to include
* objc : fix path to ggml-metal.h
* ci : fix WHISPER_CUDA -> GGML_CUDA
* scripts : sync LICENSE [no ci]
2024-06-26 19:34:09 +03:00