Wrenlift

SIMD

Two 16-byte vector classes — Simd4f (four f32 lanes) and Simd4i (four i32 lanes). Modern x86_64 and aarch64 lower these to single instructions; browsers route them through the wasm simd128 proposal when supported, scalar fallback otherwise.

Construct

MethodDescription
Simd4f.new(a, b, c, d)Four explicit f32 lanes.
Simd4f.splat(x)All four lanes set to x.
Simd4i.new(a, b, c, d)Four explicit i32 lanes.
Simd4i.splat(x)All four lanes set to x.
var v = Simd4f.new(1.0, 2.0, 3.0, 4.0)
var ones = Simd4f.splat(1.0)
var w = v + ones                    // (2, 3, 4, 5)
System.print(w)                     // Simd4f(2, 3, 4, 5)

Arithmetic & comparison

Real Wren operators land on both classes: +, -, *, / for arithmetic; <, <=, >, >=, ==, != for comparison. min and max are method-style.

var a = Simd4f.new(1, 2, 3, 4)
var b = Simd4f.new(2, 2, 2, 2)
System.print(a + b)                 // Simd4f(3, 4, 5, 6)
System.print(a * b)                 // Simd4f(2, 4, 6, 8)
System.print(a.min(b))              // Simd4f(1, 2, 2, 2)
System.print(a.max(b))              // Simd4f(2, 2, 3, 4)

Per-lane access

MethodDescription
v[i]Read lane i (0..3) as a Num.
v.replaceLane(i, x)New vector with lane i set to x.

Lane indices are 0 (lowest) through 3 (highest). The class is immutable; mutations return a new vector.

Mask, bitmask, select

Comparisons return a Simd4i mask (-1 on true, 0 on false per lane).

MethodDescription
mask.bitmaskPack the MSB of each lane into bit i (lane 0 → bit 0).
mask.allTruetrue when every lane is non-zero.
mask.anyTruetrue when at least one lane is non-zero.
X.select(mask, onTrue, onFalse)Per-lane: mask[i] ? onTrue[i] : onFalse[i]. Available on both Simd4f and Simd4i.
var ints = Simd4i.new(1, 5, 3, 7)
var mask = ints > Simd4i.splat(4)
System.print(mask.bitmask)          // 10   (binary 1010 — lanes 1 + 3)
System.print(mask.anyTrue)          // true

Load / store via typed arrays

Bridge between SIMD and packed memory through a typed array:

MethodDescription
Simd4f.load(arr, offset)Read 4 floats starting at offset from a Float32Array.
v.store(arr, offset)Write 4 floats into a Float32Array starting at offset.
Simd4i.load · .storeSame, against Int32Array.
var buf = Float32Array.fromList([1, 2, 3, 4, 5, 6, 7, 8])
var lo  = Simd4f.load(buf, 0)        // (1, 2, 3, 4)
var hi  = Simd4f.load(buf, 4)        // (5, 6, 7, 8)
(lo + hi).store(buf, 0)              // buf is now (6, 8, 10, 12, 5, 6, 7, 8)