Skip to content

[Web] WASM execution path throws integer overflow for 768-dim ONNX IR10 models at sequence length > 512 #28726

@folotp

Description

@folotp

Environment

  • onnxruntime-web version: 1.26.0
  • Browser / runtime: Electron 33 (Obsidian 1.7.x), renderer process (no COOP+COEP, SharedArrayBuffer unavailable)
  • Bundle: onnxruntime-web/all (ort.all.min.js), single-threaded (numThreads: 1)
  • Backend: WASM / CPU (ort-wasm-simd.wasm, non-JSEP)
  • OS: macOS 26.5 (Apple M4)

Describe the issue

Running feature-extraction inference on a 768-dimensional ONNX IR version 10 model (Supabase/gte-small / EmbeddingGemma 300M) with max_length: 2048 (the model's declared context window) throws a SafeInt integer overflow during OrtRun on the WASM CPU execution path.

Error:

RuntimeError: integer overflow
  at wasm-function (ort-wasm-simd.wasm)
  (during OrtRun / attention kernel for 768-dim matmul at sequence length 2048)

The overflow is reproducible at max_length >= 768 on this model class. Capping max_length: 512 suppresses the crash but defeats the model's advertised 2K context window.

The same model and input run without error on the WebGPU EP (ort-wasm-simd.jsep.wasm), confirming the overflow is specific to WASM integer arithmetic — WebGPU EP bypasses the WASM int-math path entirely.

Minimal repro

import { pipeline, env } from "@huggingface/transformers";

env.backends.onnx.wasm.wasmPaths = "https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0/dist/";
env.backends.onnx.wasm.numThreads = 1;

const pipe = await pipeline("feature-extraction", "Supabase/gte-small", {
  device: "cpu",
});

// Crashes with integer overflow at max_length 2048; works at 512
const result = await pipe("a ".repeat(1000), {
  pooling: "mean",
  normalize: true,
  truncation: true,
  max_length: 2048,
});

Expected behavior

OrtRun completes successfully at the model's full context window (2048 tokens) on the WASM CPU EP.

Actual behavior

RuntimeError: integer overflow thrown during attention kernel execution for 768×2048 tensor shapes.

Analysis

The 768-dim attention kernel computes intermediate tensor sizes that exceed Number.MAX_SAFE_INTEGER at 2048 sequence length when using JavaScript's 53-bit integer arithmetic via SafeInt. The non-JSEP WASM build appears to use JS-side SafeInt for certain shape calculations; the JSEP WASM build (WebGPU path) does not exhibit this because GPU dispatch avoids the JS integer path.

Workaround

  • Cap max_length: 512 (lossy — discards the 2K context)
  • Use WebGPU EP (device: "webgpu") if available

Metadata

Metadata

Assignees

No one assigned

    Labels

    .NETPull requests that update .net codeapi:Javascriptissues related to the Javascript APIep:WebGPUort-web webgpu providermodel:transformerissues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.platform:webissues related to ONNX Runtime web; typically submitted using template

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions