[Web] WASM execution path throws integer overflow for 768-dim ONNX IR10 models at sequence length > 512

### Environment

- `onnxruntime-web` version: `1.26.0`
- Browser / runtime: Electron 33 (Obsidian 1.7.x), renderer process (no COOP+COEP, `SharedArrayBuffer` unavailable)
- Bundle: `onnxruntime-web/all` (`ort.all.min.js`), single-threaded (`numThreads: 1`)
- Backend: WASM / CPU (`ort-wasm-simd.wasm`, non-JSEP)
- OS: macOS 26.5 (Apple M4)

### Describe the issue

Running `feature-extraction` inference on a 768-dimensional ONNX IR version 10 model (`Supabase/gte-small` / EmbeddingGemma 300M) with `max_length: 2048` (the model's declared context window) throws a SafeInt integer overflow during `OrtRun` on the WASM CPU execution path.

Error:

```
RuntimeError: integer overflow
  at wasm-function (ort-wasm-simd.wasm)
  (during OrtRun / attention kernel for 768-dim matmul at sequence length 2048)
```

The overflow is reproducible at `max_length >= 768` on this model class. Capping `max_length: 512` suppresses the crash but defeats the model's advertised 2K context window.

**The same model and input run without error on the WebGPU EP** (`ort-wasm-simd.jsep.wasm`), confirming the overflow is specific to WASM integer arithmetic — WebGPU EP bypasses the WASM int-math path entirely.

### Minimal repro

```js
import { pipeline, env } from "@huggingface/transformers";

env.backends.onnx.wasm.wasmPaths = "https://cdn.jsdelivr.net/npm/onnxruntime-web@1.26.0/dist/";
env.backends.onnx.wasm.numThreads = 1;

const pipe = await pipeline("feature-extraction", "Supabase/gte-small", {
  device: "cpu",
});

// Crashes with integer overflow at max_length 2048; works at 512
const result = await pipe("a ".repeat(1000), {
  pooling: "mean",
  normalize: true,
  truncation: true,
  max_length: 2048,
});
```

### Expected behavior

`OrtRun` completes successfully at the model's full context window (2048 tokens) on the WASM CPU EP.

### Actual behavior

`RuntimeError: integer overflow` thrown during attention kernel execution for 768×2048 tensor shapes.

### Analysis

The 768-dim attention kernel computes intermediate tensor sizes that exceed `Number.MAX_SAFE_INTEGER` at 2048 sequence length when using JavaScript's 53-bit integer arithmetic via `SafeInt`. The non-JSEP WASM build appears to use JS-side `SafeInt` for certain shape calculations; the JSEP WASM build (WebGPU path) does not exhibit this because GPU dispatch avoids the JS integer path.

### Workaround

- Cap `max_length: 512` (lossy — discards the 2K context)
- Use WebGPU EP (`device: "webgpu"`) if available

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Web] WASM execution path throws integer overflow for 768-dim ONNX IR10 models at sequence length > 512 #28726

Environment

Describe the issue

Minimal repro

Expected behavior

Actual behavior

Analysis

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Web] WASM execution path throws integer overflow for 768-dim ONNX IR10 models at sequence length > 512 #28726

Description

Environment

Describe the issue

Minimal repro

Expected behavior

Actual behavior

Analysis

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions