Environment
onnxruntime-web version: 1.26.0
- Browser / runtime: Electron 33 (Obsidian 1.7.x), renderer process (no COOP+COEP,
SharedArrayBuffer unavailable)
- Bundle:
onnxruntime-web/all (ort.all.min.js), single-threaded (numThreads: 1)
- Backend: WASM / CPU (
ort-wasm-simd.wasm, non-JSEP)
- OS: macOS 26.5 (Apple M4)
Describe the issue
Running feature-extraction inference on a 768-dimensional ONNX IR version 10 model (Supabase/gte-small / EmbeddingGemma 300M) with max_length: 2048 (the model's declared context window) throws a SafeInt integer overflow during OrtRun on the WASM CPU execution path.
Error:
RuntimeError: integer overflow
at wasm-function (ort-wasm-simd.wasm)
(during OrtRun / attention kernel for 768-dim matmul at sequence length 2048)
The overflow is reproducible at max_length >= 768 on this model class. Capping max_length: 512 suppresses the crash but defeats the model's advertised 2K context window.
The same model and input run without error on the WebGPU EP (ort-wasm-simd.jsep.wasm), confirming the overflow is specific to WASM integer arithmetic — WebGPU EP bypasses the WASM int-math path entirely.
Minimal repro
import { pipeline, env } from "@huggingface/transformers";
env.backends.onnx.wasm.wasmPaths = "https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0/dist/";
env.backends.onnx.wasm.numThreads = 1;
const pipe = await pipeline("feature-extraction", "Supabase/gte-small", {
device: "cpu",
});
// Crashes with integer overflow at max_length 2048; works at 512
const result = await pipe("a ".repeat(1000), {
pooling: "mean",
normalize: true,
truncation: true,
max_length: 2048,
});
Expected behavior
OrtRun completes successfully at the model's full context window (2048 tokens) on the WASM CPU EP.
Actual behavior
RuntimeError: integer overflow thrown during attention kernel execution for 768×2048 tensor shapes.
Analysis
The 768-dim attention kernel computes intermediate tensor sizes that exceed Number.MAX_SAFE_INTEGER at 2048 sequence length when using JavaScript's 53-bit integer arithmetic via SafeInt. The non-JSEP WASM build appears to use JS-side SafeInt for certain shape calculations; the JSEP WASM build (WebGPU path) does not exhibit this because GPU dispatch avoids the JS integer path.
Workaround
- Cap
max_length: 512 (lossy — discards the 2K context)
- Use WebGPU EP (
device: "webgpu") if available
Environment
onnxruntime-webversion:1.26.0SharedArrayBufferunavailable)onnxruntime-web/all(ort.all.min.js), single-threaded (numThreads: 1)ort-wasm-simd.wasm, non-JSEP)Describe the issue
Running
feature-extractioninference on a 768-dimensional ONNX IR version 10 model (Supabase/gte-small/ EmbeddingGemma 300M) withmax_length: 2048(the model's declared context window) throws a SafeInt integer overflow duringOrtRunon the WASM CPU execution path.Error:
The overflow is reproducible at
max_length >= 768on this model class. Cappingmax_length: 512suppresses the crash but defeats the model's advertised 2K context window.The same model and input run without error on the WebGPU EP (
ort-wasm-simd.jsep.wasm), confirming the overflow is specific to WASM integer arithmetic — WebGPU EP bypasses the WASM int-math path entirely.Minimal repro
Expected behavior
OrtRuncompletes successfully at the model's full context window (2048 tokens) on the WASM CPU EP.Actual behavior
RuntimeError: integer overflowthrown during attention kernel execution for 768×2048 tensor shapes.Analysis
The 768-dim attention kernel computes intermediate tensor sizes that exceed
Number.MAX_SAFE_INTEGERat 2048 sequence length when using JavaScript's 53-bit integer arithmetic viaSafeInt. The non-JSEP WASM build appears to use JS-sideSafeIntfor certain shape calculations; the JSEP WASM build (WebGPU path) does not exhibit this because GPU dispatch avoids the JS integer path.Workaround
max_length: 512(lossy — discards the 2K context)device: "webgpu") if available