WebLLM on WebGPU

Blazing fast inference with WebGPU and WebLLM running locally in your browser.

Load Model

Prompt:

-

Completion:

-

Prefill:

-

Decoding:

-