Ranked #2 of 5Reviewed by Mike Hunter·April 25, 2026

NVIDIA DGX Spark

Name: NVIDIA DGX Spark
Price: 4679 USD
Availability: InStock
Rating: 4.6 (4 reviews)

4.6

Averaged from 4 derived from review text

The verdict

The NVIDIA DGX Spark is the productized version of Project DIGITS — a 150 mm cube housing the GB10 Grace Blackwell Superchip, 128 GB unified LPDDR5x, and the full CUDA AI stack out of the box. Tom's Hardware called it 'a well-rounded toolkit for local AI'; ServeTheHome called it 'must-have for AI developers'; LMSYS published the most thorough independent benchmarks. The 128 GB unified-memory ceiling is the headline feature: it loads models that would otherwise need a $30K+ multi-GPU rig. The catch is bandwidth-limited decode — LMSYS measured Llama-3.1 70B FP8 at 2.7 tokens/sec single-batch, while GPT-OSS 120B (MoE, ~17B active) hits ~14.5 tokens/sec per ServeTheHome. Best understood as a CUDA-native development box for buyers who need to iterate on big-model code without renting cloud GPUs.

Full review

What It Is and Who It's For

The DGX Spark started life as Project DIGITS, NVIDIA's January 2025 announcement of a $3,000 personal AI supercomputer. Shipping under the DGX Spark name from October 15, 2025, it landed at $3,999 before a memory-supply price increase pushed the MSRP to $4,699 in February 2026. The hardware is straightforward: a 150 x 150 x 50.5 mm cube weighing 1.2 kg, built around the GB10 Grace Blackwell Superchip — a 20-core Arm CPU (10x Cortex-X925 + 10x Cortex-A725) paired with a Blackwell GPU featuring 5th-gen Tensor Cores and up to 1 PFLOP of FP4 (sparse) compute. Memory is 128 GB of unified LPDDR5x at 273 GB/s, and storage is a 4 TB user-replaceable self-encrypting NVMe M.2. The whole thing draws 240 W max from a wall-wart-style external PSU. NVIDIA's positioning is unambiguous: this is a development box for engineers who need to iterate on large-model code without renting cloud GPUs, not a general-purpose workstation.

Local LLM Performance

The independent benchmark of record for the DGX Spark is LMSYS's October 2025 deep-dive, which ran SGLang-on-FP8 numbers across multiple model sizes. Llama 3.1 8B FP8 hits 7,991 tokens/sec prefill and 20.5 tokens/sec decode at batch 1, scaling to 368 tokens/sec decode at batch 32. Llama 3.1 70B FP8 measures 803 tokens/sec prefill and 2.7 tokens/sec decode at batch 1 — bandwidth-limited at the 273 GB/s memory ceiling. ServeTheHome reported approximately 14.5 tokens/sec on GPT-OSS 120B, which sounds counterintuitive until you remember GPT-OSS is mixture-of-experts with only ~17B active parameters per token, so memory bandwidth pressure is much lower than a dense 70B. The practical takeaway: the DGX Spark is fastest on small-to-mid models (≤30B) and on MoE models, slow on dense 70B+. For sustained high-throughput dense-70B inference, an RTX 5090 or a 4x A6000 box is several times faster. For development — fitting models in memory, validating training scripts, doing single-batch experiments — the DGX Spark is the cheapest and most polished tool available.

Software and Toolchain

Where the DGX Spark genuinely earns its premium over a Strix Halo mini PC is the software stack. NVIDIA ships it with the full DGX OS image preinstalled — CUDA, cuDNN, TensorRT-LLM, NeMo, NIM containers, the latest PyTorch and JAX builds, and the NVIDIA AI Enterprise license. For a developer whose code is written against CUDA, this is a same-day install-and-go experience. The Strix Halo and Mac Studio alternatives can run local LLMs well via Ollama, llama.cpp, or MLX, but porting CUDA-native research code to those stacks is a real engineering tax. ServeTheHome's review framed this as the practical reason the DGX Spark exists: every AI engineer's existing PyTorch+CUDA workflow runs unmodified, including training small models, fine-tuning with LoRA/QLoRA, and exporting TensorRT-optimized inference engines for production deployment on bigger NVIDIA hardware.

Clustering and Scale-Up

The DGX Spark ships with two QSFP56 ports backed by a ConnectX-7 NIC, supporting up to 200 GbE RDMA. NVIDIA's documentation and LMSYS's testing show that two Sparks linked over ConnectX-7 can collectively address 256 GB of unified memory and run 405B-parameter models at low-bit quantization — a ceiling that is otherwise the exclusive domain of multi-GPU rigs costing $30K+. At a street price of roughly $9,400 for a two-unit cluster, this is the cheapest legal path to running Llama-3-405B-class quants on a desk. The trade-off is the same bandwidth wall: dense 405B inference at low batch sizes is single-digit tokens/sec. Buyers running fine-tunes or small-batch evaluations of frontier models will find this acceptable; buyers needing production-grade serving will not.

Where It Falls Short

The DGX Spark is unapologetically specialist hardware. It runs only NVIDIA's DGX OS — no Windows, no gaming, no creative-suite workflows. It has a single HDMI output and no front-panel LEDs or display indicators; the chassis is functional rather than decorative. The 273 GB/s memory bandwidth is the hard ceiling on dense large-model decode performance, and there's no path to upgrade it. The price increase from $3,999 to $4,699 in February 2026 (driven by global LPDDR5x memory supply pressure) ate into the original value proposition versus a Strix Halo mini PC, which now costs roughly one-third as much for the same memory ceiling. Buyers who don't specifically need CUDA-native software, a polished out-of-box experience, or two-unit clustering will get more raw inference performance per dollar from a Mac Studio M3 Ultra or a single-GPU PC build.

Strengths

+128 GB unified LPDDR5x memory — fits 70B FP8 / 120B Q4 / 405B with two clustered units
+Full CUDA + NVIDIA AI stack preinstalled; the most polished local-AI dev box on the market
+Compact 150 mm cube, 240 W max — fits any desk, runs cool and quiet
+Dual ConnectX-7 200 GbE QSFP56 ports for low-latency two-unit clustering

Watch-outs

−273 GB/s LPDDR5x bandwidth caps decode tok/s on dense large models — 70B FP8 measures ~2.7 tok/s on a single unit
−Linux-only, no Windows or gaming use; specialist hardware for AI developers
−Price raised from $3,999 to $4,699 in February 2026 due to memory supply

How it compares

The DGX Spark is the cheapest path to 128 GB of CUDA-addressable unified memory anywhere on the market. Versus the GMKtec EVO-X2 ($1,699) or Beelink GTR9 Pro ($2,000), it's roughly 2.5x the price but offers the full NVIDIA software stack the Strix Halo boxes can only approximate via ROCm or Vulkan. Versus the Puget Genesis II ($10K+), it's a single-purpose dev box — no multi-display creative workflow, no gaming, no general workstation duty. Pair two Sparks via the ConnectX-7 networking and you get 405B-class model coverage at roughly $9,400, the cheapest legal path to that ceiling.

Who this is for

At a glance: Best for for local-llm developers — CUDA-native 128 GB dev box.

Why you’d buy the NVIDIA DGX Spark

128 GB unified LPDDR5x memory — fits 70B FP8 / 120B Q4 / 405B with two clustered units.
Full CUDA + NVIDIA AI stack preinstalled; the most polished local-AI dev box on the market.
Compact 150 mm cube, 240 W max — fits any desk, runs cool and quiet.

Why you’d skip it

273 GB/s LPDDR5x bandwidth caps decode tok/s on dense large models — 70B FP8 measures ~2.7 tok/s on a single unit.
Linux-only, no Windows or gaming use; specialist hardware for AI developers.
Price raised from $3,999 to $4,699 in February 2026 due to memory supply.

Rating sources

tomshardware

4.5/5*

“A well-rounded toolkit for local AI — pricey if you don't use its features to the fullest.”

servethehome

4.7/5*

“So freaking cool. A must-have for AI developers.”

lmsys

4.5/5*

“A new standard for local AI inference.”

techradar

4.5/5*

Our 4.6 score is the average of these published ratings. Ratings marked * were derived from the reviewer’s written analysis or video transcript — the publisher didn’t print an explicit numeric score, so we inferred one from their own words. Click through to verify. More about methodology.

Frequently asked questions

Is the NVIDIA DGX Spark worth buying?

What is the NVIDIA DGX Spark's biggest strength?

128 GB unified LPDDR5x memory — fits 70B FP8 / 120B Q4 / 405B with two clustered units

What is the main drawback of the NVIDIA DGX Spark?

273 GB/s LPDDR5x bandwidth caps decode tok/s on dense large models — 70B FP8 measures ~2.7 tok/s on a single unit

What sources back the 4.6/5 rating?

Our 4.6/5 rating is the average of scores from 4 independent ai workstations reviews — tomshardware, servethehome, lmsys, and techradar. Click any source on the product page to read the original review.

How it compares

See all 5 →

Product	Rating	Price	Best for	Head-to-head
Puget Systems Genesis II	4.7	$10,569	Best for enterprise — multi-GPU training and inference	vs. NVIDIA DGX Spark →
NVIDIA DGX Spark (this product)	4.6	$4,679	Best for local-LLM developers — CUDA-native 128 GB dev box	—
HP Z6 G5 A	4.5	$1,327	Best Mid-Tier — Threadripper Pro multi-GPU under Z8 pricing	vs. NVIDIA DGX Spark →
HP Z8 Fury G5	4.4	$7,995	Best for 4-GPU training and inference — enterprise tier with HP support	vs. NVIDIA DGX Spark →
Apple Mac Studio M3 Ultra	4.3	$2,499	Best for Mac — highest memory bandwidth in a desktop chassis	vs. NVIDIA DGX Spark →

#1 · Top Score

Puget Systems Genesis II

4.7

The Puget Systems Genesis II is the enterprise pick. Versus the HP Z8 Fury G5, it offers comparable scale-up capability but in a quieter chassis with a more thoughtful configurator. Versus the HP Z6 G5 A, it's two tiers up in price and ceiling. Versus the NVIDIA DGX Spark, it's a different class of machine entirely — the DGX Spark is a 128 GB unified-memory dev box, the Genesis II is a multi-GPU training/inference workstation. For buyers whose only goal is running large local LLMs, the DGX Spark is the more cost-effective answer; the Genesis II earns its premium when training, fine-tuning, or multi-application workstation duty are part of the picture.

HP Z6 G5 A

4.5

The HP Z6 G5 A is the mid-tier sweet spot in this lineup. Versus the HP Z8 Fury G5 (its flagship sibling), it's a smaller chassis with the same Threadripper Pro CPU family at a noticeably lower entry price — trading the Z8's 4-GPU ceiling for a 3-GPU ceiling and a more desk-friendly footprint. Versus the Puget Genesis II, it offers similar build pedigree without Puget's bespoke configurator and handpicked components, at a meaningfully lower starting price. Versus the DGX Spark, it's a different class of machine — the HP Z6 G5 A is a multi-GPU general workstation, the Spark is a single-purpose 128 GB unified-memory dev box. Pick the HP Z6 G5 A when you need both AI horsepower and traditional workstation workloads (rendering, simulation, multi-app productivity) on the same machine.

HP Z8 Fury G5

4.4

Similar to the Dell Precision 7960 Tower, the HP Z8 Fury G5 supports four-GPU configurations for extreme parallel processing, but it differentiates itself with a built-in handle and a design prioritizing easy serviceability. Versus its smaller sibling the HP Z6 G5 A, the Z8 Fury G5 is the right pick when you genuinely need 4 GPUs (versus 3) or the Xeon W9 platform's enterprise ECC and reliability features. Versus the Puget Genesis II, the Z8 Fury G5 brings HP's enterprise service network and parts availability, while Puget brings hand-tuned assembly and a more thoughtful configurator. Versus the Apple Mac Studio M3 Ultra, the Z8 Fury G5 is twice the size and triple the price for a 1-GPU build, but unlocks training-class workloads the Mac Studio cannot touch.

Apple Mac Studio M3 Ultra

4.3

The Apple Mac Studio M3 Ultra is the best Mac-ecosystem AI workstation and competitive on raw local-LLM throughput per dollar. Versus the DGX Spark ($4,699 / 128 GB), the base Mac Studio M3 Ultra ($3,999 / 96 GB) loses on memory ceiling but wins on memory bandwidth (819 vs 273 GB/s) — meaning faster decode tok/s on dense models that fit. Step up to a 256 GB or 512 GB Mac Studio config and you exceed the Spark's memory ceiling at higher bandwidth, at the cost of premium Apple memory pricing. Versus the multi-GPU PC workstations (Puget, HP Z6/Z8), the Mac Studio cannot match peak training throughput but is silent, half the size, and roughly half the price of an equivalent dual-GPU PC build.

NVIDIA DGX Spark

4.6/5· $4,679

Check Price on Amazon