Legacy Video was never made for AI;
It was made for human eyes.

Name: COSIMO Video
Brand: COSIMO

Introducing The World's First Deterministic Video Primitive.

Purpose-Built Video for AI, COSIMO encodes the naked geometry of the world.

COSIMO encodes colder and clearer, trains smarter and faster, and performs with greater precision on lean hardware.

COSIMO Boosts AI Efficiency

Your AI can do more thinking training building creating driving seeing

with less compute memory energy storage latency bandwidth .

Median Accuracy · n=5 seeds

COSIMO Video vs. Legacy Video

83.2%

Median Accuracy

COSIMO 83.2%

Legacy 70.8%

▲+12.4 pp over Legacy Video

5 seeds · 40 epochs · tabula rasa

COSIMO Video achieves a higher Median Accuracy than Legacy Video while also delivering:

01 3× tighter variance across all training runs. Five independent retrains land on nearly the same answer.σ collapsed 0.052 → 0.017
02 78.5% fewer model parameters. A smaller network learns more than a much larger one.33.15M → 7.14M
03 2.4× less GPU memory while training. Half the VRAM footprint of the Legacy Video baseline.5.23 GB → 2.18 GB
04 28× collapse at inference. Fits on automotive edge silicon, not just datacenter GPUs.2.18 GB → 77.6 MiB
05 3.12× compression on disk. Two-thirds less storage and bandwidth at hyperscaler scale.166.6 MB → 53.4 MB
06 15.86 ms latency · batch-invariant. Real-time decisions without batching middleware.63.07 clips/s sustained

Deterministic Video

COSIMO Video vs Legacy Video

The New Standard

COSIMO Video

A new kind of video stream that captures the naked geometry of the world, strips out the noise, and encodes objects and their motion in a deterministic, mathematically pure form.

Technical Trials: The Sparse Geometric Matrix (SGM) output of the COSIMO Deterministic Structural Transform (DST) kernel. A coordinate list (x, y, t, Δ) describing physically active voxels after Zero-Motion Gating at q=0.98. Fixed-point integer math, stateless, no learned weights.

The Old Standard

Legacy Video

Standard video as it has been encoded for the past three decades — designed for human eyes. Dense pixel grids, frame-by-frame, with every visual texture preserved, including the noise.

Technical Trials: Normalized H.264-class uint8 voxel grids at 32 × 112 × 112 resolution, grayscale, ingested directly as the input tensor to a canonical ResNet3D-18 dense Conv3D baseline. The reference architecture for video understanding for the past decade.

The Validation Setup

Both formats were trained from a blank slate — no shortcuts, no pretrained weights — on the same data, with the same compute budget, then graded on identical tests. Five separate training runs each, to ensure the result wasn't luck. Tabula-rasa cold start, n=5 random seeds × 40 epochs, SHA-256 init hashes logged and reproducible. Run on NVIDIA L4 (cloud) and Apple M1 (desktop, passively cooled).

Dataset

UCF-1015-class kinematic subset

Replication

5 × 40seeds × epochs

Cloud Silicon

NVIDIA L4SpConv 2.3.6

Edge Silicon

Apple M1passively cooled

Why COSIMO is the answer

Six Maximizing Value Propositions.

COSIMO does More with Less to Improve the Performance of your AI.

01 · Capacity

More intelligence. 78% fewer parameters.

A smaller COSIMO Video model learns more than a much larger Legacy Video model — because COSIMO Video shows the network only what physically matters.

+10.3 pp mean accuracy
7.14M params vs. 33.15M · −78.5%

02 · Sparsity

Higher accuracy. 98% fewer pixels.

COSIMO Video keeps only the 2% of pixels that are physically meaningful and throws away the static background. The network learns better from less.

+12.4 pp median accuracy (83.2% vs 70.8%)
98.0% spatial sparsity, deterministic

03 · Stability

Tighter variance.

Tighter clustering of performance across all runs — five training runs land on nearly the same answer.

σ collapsed 0.052 → 0.017
3.0× tighter cluster

04 · Edge

Real-time AI in under 77 MiB.

COSIMO Video is small enough to run inside a surveillance camera, a robot, or a vehicle's existing chip — no datacenter required.

Inference VRAM: 77.6 MiB
28× collapse from training peak

05 · Cloud

Cut storage by 67%.

COSIMO Video takes about a third of the disk space of Legacy Video — without discarding any of the meaningful signal.

3.12× compression on disk
166.6 MB → 53.4 MB

06 · Latency

63 real-time decisions per second. Zero batching delays.

COSIMO Video runs at full speed even on a single video stream — no need to group requests together to get good throughput. Your autonomous vehicle, drone, or surveillance camera doesn't pay a latency penalty to be fast.

15.86 ms p50 latency at batch=1 · 63.07 clips/s sustained throughput · Δ between b=1 and b=32: 0.02%

Go Deeper

The full technical proof.

Methodology, ablations, and reproducibility kit — every claim above is traceable to a canonical run.

Whitepaper →

The AI Video Pipeline

COSIMO outperforms legacy video at every stage.

Three Stages
Encode · Train · Perform

Stage 01 / 03

Encode.

Encode or transcode ice-cold on any silicon.

Apple M1 · Single-core · Passively cooled
The DST kernel converts dense Legacy Video into the Sparse Geometric Matrix. Deterministic, integer fixed-point math, no GPU host required. These are the numbers from the most demanding context: edge silicon at idle.

P50 Latency

1.14ms

Encodes one video frame in just over a millisecond.

vs. baseline 13–22× faster than RAFT-class optical-flow extraction

Sparsity Ratio

99.84%

99.84% of pixels are mathematically deleted as noise.

active signal ~2% of voxels carry the entire physical motion of the scene

Estimated Power

0.27W

Less than a third of a watt. Passive cooling. No fan.

vs. baseline ~1,100× lower than typical GPU encode pipelines

Storage Compression

3.12×

COSIMO Video takes one-third the disk space of Legacy Video.

canonical clip set 166.58 MB (Legacy) → 53.42 MB (COSIMO) · −67.9% storage and egress

Decision Headroom

30×

After encoding, ~30× the per-frame budget remains for decisions and actions.

real-time budget 33 ms per-frame budget at 30 fps · 1.14 ms encode leaves 31.86 ms for everything else

P95 Latency

1.17ms

Worst-case latency stays within 0.03 ms of typical.

tail stability P95/P50 ratio: 1.026 · No latency spikes, no queueing required

CPU Load

3.3%

Uses 3.3% of one CPU core. Everything else stays free.

single-core utilization 96.7% of the M1 core remains available for application logic

Cycles per Pixel

1.7

Less than two CPU cycles to process each pixel.

execution profile Achieved by eliminating floating-point trigonometry in the encode path

Stage 02 / 03

Train.

Train smarter on fewer parameters with tighter variance.

NVIDIA L4 · GCP · SpConv 2.3.6 · Tabula rasa
Five independent random seeds × forty full epochs per track. Both COSIMO Video and Legacy Video trained from random weights — no pretrained checkpoints, no shortcuts. The architectures were held identical; only the data representation changed.

Median Accuracy

83.2%

COSIMO Video scored higher than Legacy Video on a typical training run.

delta vs. legacy +12.4 pp over Legacy Video (median 70.8%)
Range: 78.76% – 83.19% across n=5 seeds

Mean Accuracy

81.95%

Across all five training runs combined, COSIMO Video averaged higher.

delta vs. legacy +10.3 pp over Legacy Video mean (71.68%)
Worst COSIMO seed (78.76%) beat 4 of 5 Legacy seeds

Cross-Seed Variance

0.017

Five training runs cluster tightly around the same answer.

σ ratio COSIMO σ = 0.017 vs. Legacy σ = 0.052
3.0× tighter cluster

Model Parameters

7.14M

A much smaller network learns more than a much larger one.

vs. legacy 7.14M vs. 33.15M · −78.5%
4.64× fewer parameters

Peak Training VRAM

2.18GB

Half the GPU memory of the Legacy Video baseline.

memory profile 5.23 GB → 2.18 GB · 2.40× less peak GPU memory

Stage 03 / 03

Perform. inference

Real-time decisions. Lean hardware footprint.

NVIDIA L4 · Inference · Forward pass only
The trained network is deployed and run on live data — what hyperscaler customers, AV OEMs, and edge devices actually pay for. Numbers below are sustained at batch=1 for a single live video stream.

Inference VRAM

77.6MiB

Smaller than a single 4K image. Fits inside an automotive edge chip.

deployment economics 28× collapse from 2.18 GB training peak
Edge BoM transition: NVIDIA L4 → Jetson Orin Nano · $2,500 → $249

P50 Latency

15.86ms

A real-time decision in under 16 milliseconds — twice as fast as a single video frame.

batch invariance P50 at b=1: 15.86 ms · P50 at b=32: 15.87 ms
Throughput Δ: 0.02% · No batching middleware required

Sustained Throughput

63.07clips/s

63 real-time AI decisions every second on standard cloud hardware.

fleet density Achieved at batch=1 with no queueing
Effective fleet density: ~63× more concurrent streams per GPU

Portability

3paths

A model trained on cloud GPU runs on embedded silicon, on Apple, and on CPU — same code, same weights, no edge conversion.

deployment targets CUDA · MPS · CPU
Fits Jetson Orin Nano (8 GB) memory budget natively · no quantization or distillation required

How we tested

Every figure on this page is traceable to a canonical test run. Edge encoding was captured on a 2020 MacBook Pro · Apple M1 · 16 GB at single-core, passively-cooled load. Cloud inference and training were captured on a Google Cloud L4 instance running NGC PyTorch 24.09 with SpConv 2.3.6, across 5 seeds × 40 epochs per track, with SHA-256 init hashes logged for every model. Both tracks started from random weights — no pretrained checkpoints — and are independently reproducible.

Edge Encoding Benchmark · MacBook M1 · single-core · passively cooled Cloud Inference Benchmark · NVIDIA L4 · batch-1 · real-time latency COSIMO Training Run · 5 seeds × 40 epochs · tabula rasa Legacy Baseline Run · 5 seeds × 40 epochs · ResNet3D-18 Performance Targets · derived from canonical run artifacts Whitepaper · full methodology and results

Legacy Video was never made for AI; It was made for human eyes.