Commit Graph

22 Commits

Author SHA1 Message Date
Diego Devesa
1acfadb721 ggml-backend : add device and backend reg interfaces (llama/9707)
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-10-05 15:23:51 +03:00
Akarshan Biswas
c36ddc43c6 Revert "[SYCL] fallback mmvq (ggml/9088)" (llama/9579)
This reverts commit 50addec9a532a6518146ab837a85504850627316.
2024-09-24 19:45:08 +03:00
Georgi Gerganov
34291099fb ggml : refactoring (llama/#0)
- d6a04f87
- 23e0d70b
2024-09-24 19:45:08 +03:00
Georgi Gerganov
d245d7aec7 ggml : fix builds (llama/0)
ggml-ci
2024-09-24 19:45:08 +03:00
Alberto Cabrera Pérez
32f659861a sycl : update support conditions (llama/9394)
* sycl : update support condition to im2col

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>

* Added TODO to remind supporting FP32 im2col

---------

Signed-off-by: Alberto Cabrera <alberto.cabrera@codeplay.com>
2024-09-24 19:45:08 +03:00
Neo Zhang Jianyu
3468983315 add check malloc result on device (llama/9346)
* add check malloc result on device

* update for review comments, check all malloc_device() result

---------

Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-09-24 19:45:08 +03:00
Johannes Gäßler
c7515b0995 ggml/examples: add backend support for numerical optimization (ggml/949)
* CUDA eval works

* stochastic gradient descent op

* Adam except decay

* CUDA CROSS_ENTROPY_LOSS_BACK

* CUDA mnist-fc training works

* backend CLI arg

* refactor gguf load

* remove sched from opt_step_adam

* implement l1 regularization (weight decay)

* extra call to add optimizer

* initialize gradients with ggml_graph_reset

* gradient accumulation

* increment iter per eval instead of epoch

* adjust backend interfaces

* fix ggml_graph_reset without backend

* fix ggml graph export/import

* fixup

* rename

* revert ggml_opt changes

* more general CUDA repeat_back

* update documentation, fix CNN

* validation split

* add clarifying comment

* optimize PyTorch training

* adjust buffer size, thread count

* fix 0.0f validation split

* Update examples/mnist/mnist-common.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* fix gradient accumulation

* tensor flag for accumulators -> tensor hash set

* Update include/ggml.h

Co-authored-by: slaren <slarengh@gmail.com>

* Update tests/test-backend-ops.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* Update tests/test-backend-ops.cpp

Co-authored-by: slaren <slarengh@gmail.com>

* fix test prints

* Update src/ggml-backend.c

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* better CUDA support for noncontiguous out_prod

* add comment

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2024-09-24 19:45:08 +03:00
luoyu-intel
32f88af17b Add oneDNN primitive support (llama/9091)
* add onednn

* add sycl_f16

* add dnnl stream

* add engine map

* use dnnl for intel only

* use fp16fp16fp16

* update doc
2024-08-28 13:22:20 +03:00
Meng, Hengyu
17e49d3ab2 fallback mmvq (llama/9088)
* fallback mmvq to mul_mat

* mmvq in cuda path

* Update ggml/src/ggml-sycl.cpp

Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@codeplay.com>

---------

Co-authored-by: Alberto Cabrera Pérez <alberto.cabrera@codeplay.com>
2024-08-28 13:22:20 +03:00
zhentaoyu
58b725282a Fix SYCL im2col and convert Overflow with Large Dims (llama/9052)
* sycl: fix im2col overflow and sync with cuda

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* sycl: fix convert overflow

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* sycl: fix convert and dequantize

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* sycl: fix ib in dmmv

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* sycl:refine convert

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* sycl: move downsample global_range into common

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* test: add im2col and convert test cases

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* test: make new cases only in sycl

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

* test: comment new test_cases for only local testing

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>

---------

Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
2024-08-28 13:22:20 +03:00
zhentaoyu
3f190addda Add TIMESTEP_EMBEDDING OP (llama/8707)
Signed-off-by: zhentaoyu <zhentao.yu@intel.com>
2024-08-08 22:48:46 +03:00
Meng, Hengyu
8ef98ae7e3 add conv support (llama/8688) 2024-08-08 22:48:46 +03:00
slaren
dd916a2852 ggml : reduce hash table reset cost (llama/8698)
* ggml : reduce hash table reset cost

* fix unreachable code warnings after GGML_ASSERT(false)

* GGML_ASSERT(false) -> GGML_ABORT("fatal error")

* GGML_ABORT use format string
2024-08-08 22:48:46 +03:00
Meng, Hengyu
3e94c7a81d add concat through dim 1/2 (llama/8483)
* add concat through dim 1/2
2024-08-08 22:48:46 +03:00
Chen Xi
68d609a12c fix the mul_mat_id ut issues (llama/8427)
* fix part of mul_mat_id

* skip the bfloat 16 sycl ut

Signed-off-by: Chen Xi <xi2chen@intel.com>

---------

Signed-off-by: Chen Xi <xi2chen@intel.com>
Co-authored-by: Meng, Hengyu <hengyu.meng@intel.com>
Co-authored-by: Chen Xi <xi2chen@intel.com>
2024-08-08 22:48:46 +03:00
Alberto Cabrera Pérez
2af4a52c39 sycl : Reenabled mmvq path for the SYCL Nvidia Backend (llama/8372)
* SYCL : Reenabled mmvq path for the SYCL Nvidia Backend

* Reduced verbosity of comment
2024-08-08 22:48:46 +03:00
Ouadie EL FAROUKI
c5b05321e9 Enabled more data types for oneMKL gemm_batch (llama/8236) 2024-07-08 14:53:55 +03:00
luoyu-intel
29a2739d27 Fix WARP_SIZE=16 bug of Intel GPU (llama/8266)
* fix group_norm ut

* split softmax

* fix softmax

* add concat support condition

* revert debug code

* move QK_WARP_SIZE to presets.hpp
2024-07-08 14:53:55 +03:00
Neo Zhang Jianyu
ee6d17f6b4 rm get_work_group_size() by local cache for performance (llama/8286)
Co-authored-by: arthw <14088817+arthw@users.noreply.github.com>
2024-07-08 14:53:55 +03:00
luoyu-intel
f096cc6807 Fix the sub group size of Intel (llama/8106)
* use warp_size macro for all sycl kernels

* fix mask of permute_sub_group_by_xor

* fix rms_norm with correct warp number

* fix rms_norm_f32/group_norm_f32

* move norm to norm.cpp file

* fix quantize bug

* fix mmvq's batch size
2024-07-08 14:53:55 +03:00
zhentaoyu
db7e0dbe6e Update SYCL-Rope op and Refactor (llama/8157)
* align with rope.cu and move sycl-op to a single file
2024-07-08 14:53:55 +03:00
Georgi Gerganov
e30c679928
whisper : reorganize source code + improve CMake (#2256)
* scripts : update sync [no ci]

* files : reorganize [no ci]

* sync : llama.cpp

* cmake : link math library

* cmake : build normal ggml library

* files : move headers to include

* objc : fix path to ggml-metal.h

* ci : fix WHISPER_CUDA -> GGML_CUDA

* scripts : sync LICENSE [no ci]
2024-06-26 19:34:09 +03:00