whisper.cpp

Author	SHA1	Message	Date
Diego Devesa	9c817edb48	ggml : move CPU backend to a separate file (llama/10144)	2024-11-15 15:21:04 +02:00
Diego Devesa	3e231ab9cc	llama : fix buffer checks for mamba and rwk (llama/10111) * llama : fix buffer checks for mamba and rwk * llama : fix missing worst case flag during reserve * cuda : fix supports_op for norm * disable sched SET_CAUSE	2024-11-15 15:21:04 +02:00
Sergio López	1e122d66f9	kompute: add backend registry / device interfaces (llama/10045) Get in line with the other backends by supporting the newer backend/device registry interfaces. Signed-off-by: Sergio Lopez <slp@redhat.com>	2024-11-15 15:21:04 +02:00
Diego Devesa	1d48457aa6	llama : refactor model loader with backend registry (llama/10026)	2024-11-15 15:21:04 +02:00
leo-pony	13db492f83	Adapt to dynamically loadable backends mechanism (llama/9970) * [CANN] Adapt to dynamically loadable backends mechanism * Fix the Bug: inference running result is garbled in debug running model for LM models who's type is Q4_0 class * Handle the review comments of this pull request	2024-11-01 10:19:05 +02:00
Ouadie EL FAROUKI	a4a22daa8f	Add SYCL Backend registry, device and Event Interfaces (llama/9705) * implemented missing SYCL event APIs * sycl : Added device and backend reg interfaces * Restructured ggml-sycl.cpp	2024-11-01 10:19:05 +02:00
Ma Mingfei	e1936eb2a5	add amx kernel for gemm (llama/8998) add intel amx isa detection add vnni kernel for gemv cases add vnni and amx kernel support for block_q8_0 code cleanup fix packing B issue enable openmp fine tune amx kernel switch to aten parallel pattern add error message for nested parallelism code cleanup add f16 support in ggml-amx add amx kernels for QK_K quant formats: Q4_K, Q5_K, Q6_K and IQ4_XS update CMakeList update README fix some compilation warning fix compiler warning when amx is not enabled minor change ggml-ci move ggml_amx_init from ggml.c to ggml-amx/mmq.cpp ggml-ci update CMakeLists with -mamx-tile, -mamx-int8 and -mamx-bf16 ggml-ci add amx as an ggml-backend update header file, the old path for immintrin.h has changed to ggml-cpu-impl.h minor change update CMakeLists.txt minor change apply weight prepacking in set_tensor method in ggml-backend fix compile error ggml-ci minor change ggml-ci update CMakeLists.txt ggml-ci add march dependency minor change ggml-ci change ggml_backend_buffer_is_host to return false for amx backend ggml-ci fix supports_op use device reg for AMX backend ggml-ci minor change ggml-ci minor change fix rebase set .buffer_from_host_ptr to be false for AMX backend	2024-11-01 10:19:05 +02:00
Diego Devesa	28b044dad9	vulkan : add backend registry / device interfaces (llama/9721) * vulkan : add backend registry / device interfaces * llama : print devices used on model load	2024-11-01 10:19:05 +02:00
Gilad S	b8f11a0a17	fix: allocating CPU buffer with size `0` (llama/9917)	2024-11-01 10:19:05 +02:00
Gilad S	ff5a838099	fix: use `vm_allocate` to allocate CPU backend buffer on macOS (llama/9875) * fix: use `vm_allocate` to allocate CPU backend buffer on macOS * fix: switch to `posix_memalign` to keep existing `free()` usages work * feat: move `GGML_ALIGNED_MALLOC` to `ggml-backend-impl.h`, add support for `vm_allocate` on macOS * style: formatting * fix: move const outside of `#ifndef` * style: formatting * fix: unused var * fix: transform `GGML_ALIGNED_MALLOC` and `GGML_ALIGNED_FREE` into functions and add them to `ggml-impl.h` * fix: unused var * fix: page align to `GGUF_DEFAULT_ALIGNMENT` * fix: page align to `TENSOR_ALIGNMENT` * fix: convert `TENSOR_ALIGNMENT` to a macro * fix: increase page size to `32` on iOS * fix: iOS page size * fix: `hbw_posix_memalign` alignment	2024-11-01 10:19:05 +02:00
Diego Devesa	81110c0174	ggml : move more prints to the ggml log system (llama/9839) * ggml : move more prints to the ggml log system * show BLAS OpenMP warnings in all builds using debug print	2024-11-01 10:19:05 +02:00
Diego Devesa	c313723860	rpc : add backend registry / device interfaces (llama/9812) * rpc : add backend registry / device interfaces * llama : add llama_supports_rpc API * ggml_backend_rpc_start_rpc_server -> ggml_backend_rpc_start_server	2024-11-01 10:19:05 +02:00
Diego Devesa	1531259b2c	ggml : fix BLAS with unsupported types (llama/9775) * ggml : do not use BLAS with types without to_float * ggml : return pointer from ggml_internal_get_type_traits to avoid unnecessary copies * ggml : rename ggml_internal_get_type_traits -> ggml_get_type_traits it's not really internal if everybody uses it	2024-11-01 10:19:05 +02:00
Diego Devesa	44bc2767fd	ggml : add backend registry / device interfaces to BLAS backend (llama/9752) * ggml : add backend registry / device interfaces to BLAS backend * fix mmap usage when using host buffers	2024-11-01 10:19:05 +02:00
Georgi Gerganov	315364d7de	ggml : add metal backend registry / device (llama/9713) * ggml : add metal backend registry / device ggml-ci * metal : fix names [no ci] * metal : global registry and device instances ggml-ci * cont : alternative initialization of global objects ggml-ci * llama : adapt to backend changes ggml-ci * fixes * metal : fix indent * metal : fix build when MTLGPUFamilyApple3 is not available ggml-ci * fix merge * metal : avoid unnecessary singleton accesses ggml-ci * metal : minor fix [no ci] * metal : g_state -> g_ggml_ctx_dev_main [no ci] * metal : avoid reference of device context in the backend context ggml-ci * metal : minor [no ci] * metal : fix maxTransferRate check * metal : remove transfer rate stuff --------- Co-authored-by: slaren <slarengh@gmail.com>	2024-11-01 10:19:05 +02:00
Diego Devesa	cf977670e6	ggml-backend : add device and backend reg interfaces (llama/9707) Also: - metal : fix compute pass descriptor autorelease crash - ggml-backend : add device description to CPU backend - ggml: unify backend logging mechanism	2024-10-05 15:23:51 +03:00
Diego Devesa	1acfadb721	ggml-backend : add device and backend reg interfaces (llama/9707) Co-authored-by: Johannes Gäßler <johannesg@5d6.de>	2024-10-05 15:23:51 +03:00

17 Commits