Paul Tsochantaris
80753d4da8
metal : single allocation of encode_async block (llama/9747)
...
* Single allocation of encode_async block with non-ARC capture in ggml-metal.m
* Moving Block_release to the deallocation code
* Release encode block when re-setting encoding buffer count if needed
* Update ggml/src/ggml-metal.m
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-11-01 10:19:05 +02:00
Georgi Gerganov
aa037a60f3
ggml : alloc ggml_contexts on the heap ( #2525 )
...
* whisper : reduce ggml_context usage
* ggml : allocate contexts on the heap (v2)
* ggml : aligned malloc -> malloc
2024-10-31 22:00:09 +02:00
Georgi Gerganov
1ba185f4af
metal : zero-init buffer contexts ( #0 )
2024-10-05 15:23:51 +03:00
Diego Devesa
cf977670e6
ggml-backend : add device and backend reg interfaces (llama/9707)
...
Also:
- metal : fix compute pass descriptor autorelease crash
- ggml-backend : add device description to CPU backend
- ggml: unify backend logging mechanism
2024-10-05 15:23:51 +03:00
Diego Devesa
1acfadb721
ggml-backend : add device and backend reg interfaces (llama/9707)
...
Co-authored-by: Johannes Gäßler <johannesg@5d6.de>
2024-10-05 15:23:51 +03:00
Georgi Gerganov
162a455402
metal : reduce command encoding overhead (llama/9698)
2024-10-03 12:22:17 +03:00
Georgi Gerganov
34291099fb
ggml : refactoring (llama/#0)
...
- d6a04f87
- 23e0d70b
2024-09-24 19:45:08 +03:00
Georgi Gerganov
a2cb5b4183
metal : handle zero-sized allocs (llama/9466)
2024-09-24 19:45:08 +03:00
Georgi Gerganov
288ae5176e
common : reimplement logging (llama/9418)
...
https://github.com/ggerganov/llama.cpp/pull/9418
2024-09-24 19:45:08 +03:00
Georgi Gerganov
a785232bf9
metal : fix compile warning with GGML_METAL_NDEBUG (llama/0)
2024-09-24 19:45:08 +03:00
Johannes Gäßler
c7515b0995
ggml/examples: add backend support for numerical optimization (ggml/949)
...
* CUDA eval works
* stochastic gradient descent op
* Adam except decay
* CUDA CROSS_ENTROPY_LOSS_BACK
* CUDA mnist-fc training works
* backend CLI arg
* refactor gguf load
* remove sched from opt_step_adam
* implement l1 regularization (weight decay)
* extra call to add optimizer
* initialize gradients with ggml_graph_reset
* gradient accumulation
* increment iter per eval instead of epoch
* adjust backend interfaces
* fix ggml_graph_reset without backend
* fix ggml graph export/import
* fixup
* rename
* revert ggml_opt changes
* more general CUDA repeat_back
* update documentation, fix CNN
* validation split
* add clarifying comment
* optimize PyTorch training
* adjust buffer size, thread count
* fix 0.0f validation split
* Update examples/mnist/mnist-common.cpp
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* fix gradient accumulation
* tensor flag for accumulators -> tensor hash set
* Update include/ggml.h
Co-authored-by: slaren <slarengh@gmail.com>
* Update tests/test-backend-ops.cpp
Co-authored-by: slaren <slarengh@gmail.com>
* Update tests/test-backend-ops.cpp
Co-authored-by: slaren <slarengh@gmail.com>
* fix test prints
* Update src/ggml-backend.c
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
* better CUDA support for noncontiguous out_prod
* add comment
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Co-authored-by: slaren <slarengh@gmail.com>
2024-09-24 19:45:08 +03:00
Georgi Gerganov
03a6fae484
metal : update support condition for im2col + fix warning (llama/0)
2024-09-24 19:45:08 +03:00
Georgi Gerganov
0e7798677a
ggml : add SSM Metal kernels (llama/8546)
...
* ggml : add ggml_ssm_conv metal impl
* ggml : add ssm_scan metal impl
ggml-ci
2024-08-28 13:22:20 +03:00
slaren
58a36d2e3b
metal : gemma2 flash attention support (llama/9159)
2024-08-28 13:22:20 +03:00
Johannes Gäßler
24d8534bd8
CPU/CUDA: Gemma 2 FlashAttention support (llama/8542)
...
* CPU/CUDA: Gemma 2 FlashAttention support
* apply logit_softcap to scale in kernel
* disable logit softcapping tests on Metal
* remove metal check
2024-08-28 13:22:20 +03:00
Daniel Bevenius
60098d6204
ggml : move rope type enum to ggml.h (llama/8949)
...
* ggml : move rope type enum to ggml.h
This commit moves the `llama_rope_type` enum from `llama.h` to
`ggml.h` and changes its name to `ggml_rope_type`.
The motivation for this change is to address the TODO in `llama.h` and
use the enum in ggml.
Note: This commit does not change the `mode` parameter to be of type
`enum ggml_rope_type`. The name `mode` and its usage suggest that it
might be more generic and possibly used as a bit field for multiple
flags. Further investigation/discussion may be needed to determine
if `mode` should be restricted to RoPE types.
* squash! ggml : move rope type enum to ggml.h
This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from
ggml.h, and back the llama_rope_type enum.
I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is
safe to remove it yet.
* squash! ggml : move rope type enum to ggml.h
This commit removes the enum ggml_rope_type from ggml.h and replaces it
with a define (GGML_ROPE_TYPE_NEOX). This define is used in the code to
check if the mode is set to GPT-NeoX. Also the enum llama_rope_type has
been updated to reflect this change.
* squash! ggml : move rope type enum to ggml.h
This commit contains a suggestion enable the GGML_ROPE_TYPE_NEOX
macro/define to be passed to the shader compiler.
* squash! ggml : move rope type enum to ggml.h
This commit fixes the editorconfig-checker warnings.
* squash! ggml : move rope type enum to ggml.h
Update comment for ggml_rope function.
* Revert "squash! ggml : move rope type enum to ggml.h"
This reverts commit 6261222bd0dc0efd51f0fb0435ad3f16a5b52fd6.
* squash! ggml : move rope type enum to ggml.h
Add GGML_ROPE_TYPE_NEOX to rope_common.comp.
* remove extra line
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-08-28 13:22:20 +03:00
Radoslav Gerganov
b6c05ce82f
yolo : add backend support (ggml/924)
...
* yolo : add backend support
* metal : add sub and sqrt kernels
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-08-21 11:07:13 +03:00
Ronsor
3643120690
feat: add new sin
and cos
operators (ggml/919)
...
* ggml : add sin/cos operators
* ggml-cuda : add sin/cos operators
* ggml : add corresponding tests for sin/cos
* ggml : add backward computation for sin/cos operators
* ggml-vulkan : add sin/cos operators
* ggml-vulkan : add sin/cos shader source
* metal : add sin, cos
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-08-21 11:07:13 +03:00
slaren
9b1788483c
metal : fix uninitialized abort_callback (llama/8968)
2024-08-12 11:58:49 +03:00
Molly Sophia
4160b930f1
ggml : add epsilon as a parameter for group_norm (llama/8818)
...
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2024-08-08 22:48:46 +03:00
Georgi Gerganov
b3264eb266
metal : fix struct name (ggml/912)
...
ggml-ci
2024-08-08 22:48:46 +03:00
Conrad Kramer
eb2eb87a58
metal : add abort callback (ggml/905)
2024-08-08 22:48:46 +03:00
slaren
dd916a2852
ggml : reduce hash table reset cost (llama/8698)
...
* ggml : reduce hash table reset cost
* fix unreachable code warnings after GGML_ASSERT(false)
* GGML_ASSERT(false) -> GGML_ABORT("fatal error")
* GGML_ABORT use format string
2024-08-08 22:48:46 +03:00
slaren
be9a16fd3f
ggml : fix quant dot product with odd number of blocks (llama/8549)
...
* ggml : fix iq4_nl dot product with odd number of blocks
* ggml : fix odd blocks for ARM_NEON (llama/8556)
* ggml : fix iq4_nl dot product with odd number of blocks
* ggml : fix q4_1
* ggml : fix q5_0
* ggml : fix q5_1
* ggml : fix iq4_nl metal
ggml-ci
* ggml : fix q4_0
* ggml : fix q8_0
ggml-ci
* ggml : remove special Q4_0 code for first 2 blocks
* ggml : fix sumf redefinition
---------
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-08-08 22:48:46 +03:00
Georgi Gerganov
b852a4c5ca
metal : template-ify some of the kernels (llama/8447)
...
ggml-ci
2024-08-08 22:48:46 +03:00
Georgi Gerganov
e30c679928
whisper : reorganize source code + improve CMake ( #2256 )
...
* scripts : update sync [no ci]
* files : reorganize [no ci]
* sync : llama.cpp
* cmake : link math library
* cmake : build normal ggml library
* files : move headers to include
* objc : fix path to ggml-metal.h
* ci : fix WHISPER_CUDA -> GGML_CUDA
* scripts : sync LICENSE [no ci]
2024-06-26 19:34:09 +03:00