whisper.cpp

Author	SHA1	Message	Date
Paul Tsochantaris	80753d4da8	metal : single allocation of encode_async block (llama/9747) * Single allocation of encode_async block with non-ARC capture in ggml-metal.m * Moving Block_release to the deallocation code * Release encode block when re-setting encoding buffer count if needed * Update ggml/src/ggml-metal.m --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-11-01 10:19:05 +02:00
Georgi Gerganov	aa037a60f3	ggml : alloc ggml_contexts on the heap (#2525 ) * whisper : reduce ggml_context usage * ggml : allocate contexts on the heap (v2) * ggml : aligned malloc -> malloc	2024-10-31 22:00:09 +02:00
Georgi Gerganov	1ba185f4af	metal : zero-init buffer contexts (#0 )	2024-10-05 15:23:51 +03:00
Diego Devesa	cf977670e6	ggml-backend : add device and backend reg interfaces (llama/9707) Also: - metal : fix compute pass descriptor autorelease crash - ggml-backend : add device description to CPU backend - ggml: unify backend logging mechanism	2024-10-05 15:23:51 +03:00
Diego Devesa	1acfadb721	ggml-backend : add device and backend reg interfaces (llama/9707) Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-10-05 15:23:51 +03:00
Georgi Gerganov	162a455402	metal : reduce command encoding overhead (llama/9698)	2024-10-03 12:22:17 +03:00
Georgi Gerganov	34291099fb	ggml : refactoring (llama/#0) - d6a04f87 - 23e0d70b	2024-09-24 19:45:08 +03:00
Georgi Gerganov	a2cb5b4183	metal : handle zero-sized allocs (llama/9466)	2024-09-24 19:45:08 +03:00
Georgi Gerganov	288ae5176e	common : reimplement logging (llama/9418) https://github.com/ggerganov/llama.cpp/pull/9418	2024-09-24 19:45:08 +03:00
Georgi Gerganov	a785232bf9	metal : fix compile warning with GGML_METAL_NDEBUG (llama/0)	2024-09-24 19:45:08 +03:00
Johannes Gäßler	c7515b0995	ggml/examples: add backend support for numerical optimization (ggml/949) * CUDA eval works * stochastic gradient descent op * Adam except decay * CUDA CROSS_ENTROPY_LOSS_BACK * CUDA mnist-fc training works * backend CLI arg * refactor gguf load * remove sched from opt_step_adam * implement l1 regularization (weight decay) * extra call to add optimizer * initialize gradients with ggml_graph_reset * gradient accumulation * increment iter per eval instead of epoch * adjust backend interfaces * fix ggml_graph_reset without backend * fix ggml graph export/import * fixup * rename * revert ggml_opt changes * more general CUDA repeat_back * update documentation, fix CNN * validation split * add clarifying comment * optimize PyTorch training * adjust buffer size, thread count * fix 0.0f validation split * Update examples/mnist/mnist-common.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * fix gradient accumulation * tensor flag for accumulators -> tensor hash set * Update include/ggml.h Co-authored-by: slaren <slarengh@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: slaren <slarengh@gmail.com> * Update tests/test-backend-ops.cpp Co-authored-by: slaren <slarengh@gmail.com> * fix test prints * Update src/ggml-backend.c Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> * better CUDA support for noncontiguous out_prod * add comment --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> Co-authored-by: slaren <slarengh@gmail.com>	2024-09-24 19:45:08 +03:00
Georgi Gerganov	03a6fae484	metal : update support condition for im2col + fix warning (llama/0)	2024-09-24 19:45:08 +03:00
Georgi Gerganov	0e7798677a	ggml : add SSM Metal kernels (llama/8546) * ggml : add ggml_ssm_conv metal impl * ggml : add ssm_scan metal impl ggml-ci	2024-08-28 13:22:20 +03:00
slaren	58a36d2e3b	metal : gemma2 flash attention support (llama/9159)	2024-08-28 13:22:20 +03:00
Johannes Gäßler	24d8534bd8	CPU/CUDA: Gemma 2 FlashAttention support (llama/8542) * CPU/CUDA: Gemma 2 FlashAttention support * apply logit_softcap to scale in kernel * disable logit softcapping tests on Metal * remove metal check	2024-08-28 13:22:20 +03:00
Daniel Bevenius	60098d6204	ggml : move rope type enum to ggml.h (llama/8949) * ggml : move rope type enum to ggml.h This commit moves the `llama_rope_type` enum from `llama.h` to `ggml.h` and changes its name to `ggml_rope_type`. The motivation for this change is to address the TODO in `llama.h` and use the enum in ggml. Note: This commit does not change the `mode` parameter to be of type `enum ggml_rope_type`. The name `mode` and its usage suggest that it might be more generic and possibly used as a bit field for multiple flags. Further investigation/discussion may be needed to determine if `mode` should be restricted to RoPE types. * squash! ggml : move rope type enum to ggml.h This commit removes GGML_ROPE_TYPE_NONE and GGML_ROPE_TYPE_GLM from ggml.h, and back the llama_rope_type enum. I've kept the assert for GGML_ROPE_TYPE_GLM as I'm not sure if it is safe to remove it yet. * squash! ggml : move rope type enum to ggml.h This commit removes the enum ggml_rope_type from ggml.h and replaces it with a define (GGML_ROPE_TYPE_NEOX). This define is used in the code to check if the mode is set to GPT-NeoX. Also the enum llama_rope_type has been updated to reflect this change. * squash! ggml : move rope type enum to ggml.h This commit contains a suggestion enable the GGML_ROPE_TYPE_NEOX macro/define to be passed to the shader compiler. * squash! ggml : move rope type enum to ggml.h This commit fixes the editorconfig-checker warnings. * squash! ggml : move rope type enum to ggml.h Update comment for ggml_rope function. * Revert "squash! ggml : move rope type enum to ggml.h" This reverts commit 6261222bd0dc0efd51f0fb0435ad3f16a5b52fd6. * squash! ggml : move rope type enum to ggml.h Add GGML_ROPE_TYPE_NEOX to rope_common.comp. * remove extra line --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-08-28 13:22:20 +03:00
Radoslav Gerganov	b6c05ce82f	yolo : add backend support (ggml/924) * yolo : add backend support * metal : add sub and sqrt kernels --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-08-21 11:07:13 +03:00
Ronsor	3643120690	feat: add new `sin` and `cos` operators (ggml/919) * ggml : add sin/cos operators * ggml-cuda : add sin/cos operators * ggml : add corresponding tests for sin/cos * ggml : add backward computation for sin/cos operators * ggml-vulkan : add sin/cos operators * ggml-vulkan : add sin/cos shader source * metal : add sin, cos --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-08-21 11:07:13 +03:00
slaren	9b1788483c	metal : fix uninitialized abort_callback (llama/8968)	2024-08-12 11:58:49 +03:00
Molly Sophia	4160b930f1	ggml : add epsilon as a parameter for group_norm (llama/8818) Signed-off-by: Molly Sophia <mollysophia379@gmail.com>	2024-08-08 22:48:46 +03:00
Georgi Gerganov	b3264eb266	metal : fix struct name (ggml/912) ggml-ci	2024-08-08 22:48:46 +03:00
Conrad Kramer	eb2eb87a58	metal : add abort callback (ggml/905)	2024-08-08 22:48:46 +03:00
slaren	dd916a2852	ggml : reduce hash table reset cost (llama/8698) * ggml : reduce hash table reset cost * fix unreachable code warnings after GGML_ASSERT(false) * GGML_ASSERT(false) -> GGML_ABORT("fatal error") * GGML_ABORT use format string	2024-08-08 22:48:46 +03:00
slaren	be9a16fd3f	ggml : fix quant dot product with odd number of blocks (llama/8549) * ggml : fix iq4_nl dot product with odd number of blocks * ggml : fix odd blocks for ARM_NEON (llama/8556) * ggml : fix iq4_nl dot product with odd number of blocks * ggml : fix q4_1 * ggml : fix q5_0 * ggml : fix q5_1 * ggml : fix iq4_nl metal ggml-ci * ggml : fix q4_0 * ggml : fix q8_0 ggml-ci * ggml : remove special Q4_0 code for first 2 blocks * ggml : fix sumf redefinition --------- Co-authored-by: slaren <slarengh@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>	2024-08-08 22:48:46 +03:00
Georgi Gerganov	b852a4c5ca	metal : template-ify some of the kernels (llama/8447) ggml-ci	2024-08-08 22:48:46 +03:00
Georgi Gerganov	e30c679928	whisper : reorganize source code + improve CMake (#2256 ) * scripts : update sync [no ci] * files : reorganize [no ci] * sync : llama.cpp * cmake : link math library * cmake : build normal ggml library * files : move headers to include * objc : fix path to ggml-metal.h * ci : fix WHISPER_CUDA -> GGML_CUDA * scripts : sync LICENSE [no ci]	2024-06-26 19:34:09 +03:00

26 Commits