Georgi Gerganov
ab36d02560
metal : support permuted matrix multiplicaions (llama/10033)
...
* metal : support permuted matrix multiplicaions
ggml-ci
* cont : use nb01 directly for row steps
ggml-ci
* cont : add comments [no ci]
* metal : minor refactor
* metal : minor
2024-11-01 10:19:05 +02:00
Jun Hee Yoo
a3231b2f2e
metal : add POOL2D and fix IM2COL (llama/9943)
...
* add pool_2d
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
* fix im2col and add unittest for N>=1024
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
* add tests for N % 1024 != 0
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
* remove trailing whitespaces
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
* apply suggestions
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
* apply more optimization
- original IM2COL kernel + _ext with MIN()
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
* apply review: change kernel name of pool_2d
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
* apply review
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
* fix more formatting and enhance readability
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
---------
Signed-off-by: Junhee Yoo <junhee.yoo@navercorp.com>
2024-11-01 10:19:05 +02:00
Georgi Gerganov
896c41ef30
metal : use F32 prec for K*Q in vec FA (llama/9595)
...
ggml-ci
2024-09-24 19:45:08 +03:00
Georgi Gerganov
d96a17848f
metal : separate scale and mask from QKT in FA kernel (llama/9189)
...
* metal : separate scale and mask from QKT in FA kernel
* metal : ne01 check no longer necessary
* metal : keep data in local memory
2024-08-28 13:22:20 +03:00
Georgi Gerganov
0e7798677a
ggml : add SSM Metal kernels (llama/8546)
...
* ggml : add ggml_ssm_conv metal impl
* ggml : add ssm_scan metal impl
ggml-ci
2024-08-28 13:22:20 +03:00
slaren
58a36d2e3b
metal : gemma2 flash attention support (llama/9159)
2024-08-28 13:22:20 +03:00
Radoslav Gerganov
b6c05ce82f
yolo : add backend support (ggml/924)
...
* yolo : add backend support
* metal : add sub and sqrt kernels
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-08-21 11:07:13 +03:00
Ronsor
3643120690
feat: add new sin
and cos
operators (ggml/919)
...
* ggml : add sin/cos operators
* ggml-cuda : add sin/cos operators
* ggml : add corresponding tests for sin/cos
* ggml : add backward computation for sin/cos operators
* ggml-vulkan : add sin/cos operators
* ggml-vulkan : add sin/cos shader source
* metal : add sin, cos
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-08-21 11:07:13 +03:00
slaren
be9a16fd3f
ggml : fix quant dot product with odd number of blocks (llama/8549)
...
* ggml : fix iq4_nl dot product with odd number of blocks
* ggml : fix odd blocks for ARM_NEON (llama/8556)
* ggml : fix iq4_nl dot product with odd number of blocks
* ggml : fix q4_1
* ggml : fix q5_0
* ggml : fix q5_1
* ggml : fix iq4_nl metal
ggml-ci
* ggml : fix q4_0
* ggml : fix q8_0
ggml-ci
* ggml : remove special Q4_0 code for first 2 blocks
* ggml : fix sumf redefinition
---------
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-08-08 22:48:46 +03:00
Georgi Gerganov
b852a4c5ca
metal : template-ify some of the kernels (llama/8447)
...
ggml-ci
2024-08-08 22:48:46 +03:00
Clint Herron
c2c60dc9ba
Removes multiple newlines at the end of files that is breaking the editorconfig step of CI. (llama/8258)
2024-07-08 14:53:55 +03:00
Georgi Gerganov
e30c679928
whisper : reorganize source code + improve CMake ( #2256 )
...
* scripts : update sync [no ci]
* files : reorganize [no ci]
* sync : llama.cpp
* cmake : link math library
* cmake : build normal ggml library
* files : move headers to include
* objc : fix path to ggml-metal.h
* ci : fix WHISPER_CUDA -> GGML_CUDA
* scripts : sync LICENSE [no ci]
2024-06-26 19:34:09 +03:00