SIMD

Two 16-byte vector classes — Simd4f (four f32 lanes) and Simd4i (four i32 lanes). Modern x86_64 and aarch64 lower these to single instructions; browsers route them through the wasm simd128 proposal when supported, scalar fallback otherwise.

Construct

Method	Description
`Simd4f.new(a, b, c, d)`	Four explicit f32 lanes.
`Simd4f.splat(x)`	All four lanes set to `x`.
`Simd4i.new(a, b, c, d)`	Four explicit i32 lanes.
`Simd4i.splat(x)`	All four lanes set to `x`.

var v = Simd4f.new(1.0, 2.0, 3.0, 4.0)
var ones = Simd4f.splat(1.0)
var w = v + ones                    // (2, 3, 4, 5)
System.print(w)                     // Simd4f(2, 3, 4, 5)

Arithmetic & comparison

Real Wren operators land on both classes: +, -, *, / for arithmetic; <, <=, >, >=, ==, != for comparison. min and max are method-style.

var a = Simd4f.new(1, 2, 3, 4)
var b = Simd4f.new(2, 2, 2, 2)
System.print(a + b)                 // Simd4f(3, 4, 5, 6)
System.print(a * b)                 // Simd4f(2, 4, 6, 8)
System.print(a.min(b))              // Simd4f(1, 2, 2, 2)
System.print(a.max(b))              // Simd4f(2, 2, 3, 4)

Per-lane access

Method	Description
`v[i]`	Read lane `i` (`0..3`) as a Num.
`v.replaceLane(i, x)`	New vector with lane `i` set to `x`.

Lane indices are 0 (lowest) through 3 (highest). The class is immutable; mutations return a new vector.

Mask, bitmask, select

Comparisons return a Simd4i mask (-1 on true, 0 on false per lane).

Method	Description
`mask.bitmask`	Pack the MSB of each lane into bit `i` (lane 0 → bit 0).
`mask.allTrue`	`true` when every lane is non-zero.
`mask.anyTrue`	`true` when at least one lane is non-zero.
`X.select(mask, onTrue, onFalse)`	Per-lane: `mask[i] ? onTrue[i] : onFalse[i]`. Available on both Simd4f and Simd4i.

var ints = Simd4i.new(1, 5, 3, 7)
var mask = ints > Simd4i.splat(4)
System.print(mask.bitmask)          // 10   (binary 1010 — lanes 1 + 3)
System.print(mask.anyTrue)          // true

Load / store via typed arrays

Bridge between SIMD and packed memory through a typed array:

Method	Description
`Simd4f.load(arr, offset)`	Read 4 floats starting at `offset` from a `Float32Array`.
`v.store(arr, offset)`	Write 4 floats into a `Float32Array` starting at `offset`.
`Simd4i.load` · `.store`	Same, against `Int32Array`.

var buf = Float32Array.fromList([1, 2, 3, 4, 5, 6, 7, 8])
var lo  = Simd4f.load(buf, 0)        // (1, 2, 3, 4)
var hi  = Simd4f.load(buf, 4)        // (5, 6, 7, 8)
(lo + hi).store(buf, 0)              // buf is now (6, 8, 10, 12, 5, 6, 7, 8)

On browsers without the wasm simd128 proposal these operations fall back to scalar code — you get the same results, just without the speedup. simd128Supported() on the harness side tells you which path the runtime loaded.