Verdict
The Best 5Reviewed by Mike Hun·April 25, 2026

Best AI Workstations

Top 5 AI workstations reviewed and ranked — from the $4,699 NVIDIA DGX Spark for local-LLM developers to the $10K+ Puget Genesis II for enterprise training. Picks segmented by use case, model-size headroom, and budget.

Quick answer

Puget Systems Genesis II is our top pick for ai workstations — an averaged 4.7/5 across 3 published reviews at about $10,569. Runner-up: NVIDIA DGX Spark (~$4,699).

At a glance

Tap any product for the full review
(3 sources)
$10,569Best for: Best for enterprise — multi-GPU training and inference
$10,569 · Buy at pugetsystems.com
(4 sources)
$4,699Best for: Best for local-LLM developers — CUDA-native 128 GB dev box
$4,699 · Buy at nvidia.com
(5 sources)
$5,499Best for: Best Mid-Tier — Threadripper Pro multi-GPU under Z8 pricing
$5,499 · Buy at hp.com
(4 sources)
$7,995Best for: Best for 4-GPU training and inference — enterprise tier with HP support
$7,995 · Check Price on Amazon
(4 sources)
$3,999Best for: Best for Mac — highest memory bandwidth in a desktop chassis
$3,999 · Check Price on Amazon
Verdict is reader-supported. As an Amazon Associate we earn from qualifying purchases. Some links on this page are affiliate links — if you click through and buy, we may earn a small commission at no extra cost to you. Our ratings are sourced from independent publications, not sponsors.
Reviews aggregated from
TechRadarPCMagThroughPugetsystemsgenesisworkstationsTom's HardwareServeTheHomeLmsysAnandTech

The full ranking

How we rank →
Puget Systems Genesis II
#1 · Top Score
Best for: Best for enterprise — multi-GPU training and inference
Puget Systems Genesis II
from 3 sources$10,569as of Apr 25

The Puget Systems Genesis II is a highly customizable, professionally built workstation aimed at enterprise buyers who need bespoke configurations the major OEMs can't match. With AMD Threadripper Pro, up to 4x RTX 4090 (or RTX Ada workstation cards), 256 GB ECC DDR4, and Puget's hand-tuned assembly process, it delivers exceptional performance and build quality with options for a quiet edition. For local LLM work specifically, a single RTX 4090 (24 GB VRAM at 1008 GB/s) handles 70B Q4 with system-RAM offload at roughly 10–20 tokens/sec; a 4x RTX 4090 configuration pools 96 GB of VRAM and easily holds 70B Q4 in VRAM at 30–40 tokens/sec via tensor parallelism. The steep entry price makes this a tool for buyers who can justify a $10K+ workstation — if you only need local LLMs, the DGX Spark below delivers comparable model-size headroom at less than half the price.

Strengths
  • Highly customizable with a wide assortment of mainstream and pro-channel components like Nvidia Ada workstation GPUs
  • Serious professional build quality with careful component selection and assembly
Watch-outs
  • Extremely expensive, with review units costing over $10,000 and configurations reaching nearly $61,000
  • Configurations with 4x RTX 4090 lose the NVLink that would have helped tensor-parallel LLM inference
NVIDIA DGX Spark
#2
Best for: Best for local-LLM developers — CUDA-native 128 GB dev box
NVIDIA DGX Spark
from 4 sources$4,699as of Apr 25

The NVIDIA DGX Spark is the productized version of Project DIGITS — a 150 mm cube housing the GB10 Grace Blackwell Superchip, 128 GB unified LPDDR5x, and the full CUDA AI stack out of the box. Tom's Hardware called it 'a well-rounded toolkit for local AI'; ServeTheHome called it 'must-have for AI developers'; LMSYS published the most thorough independent benchmarks. The 128 GB unified-memory ceiling is the headline feature: it loads models that would otherwise need a $30K+ multi-GPU rig. The catch is bandwidth-limited decode — LMSYS measured Llama-3.1 70B FP8 at 2.7 tokens/sec single-batch, while GPT-OSS 120B (MoE, ~17B active) hits ~14.5 tokens/sec per ServeTheHome. Best understood as a CUDA-native development box for buyers who need to iterate on big-model code without renting cloud GPUs.

Strengths
  • 128 GB unified LPDDR5x memory — fits 70B FP8 / 120B Q4 / 405B with two clustered units
  • Full CUDA + NVIDIA AI stack preinstalled; the most polished local-AI dev box on the market
Watch-outs
  • 273 GB/s LPDDR5x bandwidth caps decode tok/s on dense large models — 70B FP8 measures ~2.7 tok/s on a single unit
  • Linux-only, no Windows or gaming use; specialist hardware for AI developers
HP Z6 G5 A
#3
Best for: Best Mid-Tier — Threadripper Pro multi-GPU under Z8 pricing
HP Z6 G5 A
from 5 sources$5,499as of Apr 25

The HP Z6 G5 A is the smallest Threadripper Pro OEM workstation on the market and the rational mid-tier pick under HP's flagship Z8 Fury G5. Reviewers across PCMag, AnandTech, StorageReview, Phoronix, and DEVELOP3D consistently praised its build quality, toolless serviceability, and 96-core CPU ceiling — StorageReview gave it their 'highest recommendation for a high-end tower workstation.' For local-LLM use, configurations with 1–3 RTX 6000 Ada GPUs (48 GB VRAM each at ~960 GB/s) deliver in the 25–40 tokens/sec range on Llama-3-70B Q4 single-GPU and substantially more with multi-GPU tensor parallelism. Note that none of the published professional reviews ran formal Llama-3 70B Q4 benchmarks, so LLM-specific performance numbers here are from single-GPU norms rather than published HP Z6 measurements specifically.

Strengths
  • Smallest Threadripper Pro OEM tower on the market — compact 4U chassis with built-in handle
  • AMD Ryzen Threadripper Pro 7000 WX-Series scales from 12 to 96 cores at the same chassis price floor
Watch-outs
  • 95°C all-core CPU thermals reported under sustained load (StorageReview)
  • Pricing scales steeply — 96-core configs push $18,000+
HP Z8 Fury G5
#4
Best for: Best for 4-GPU training and inference — enterprise tier with HP support
HP Z8 Fury G5
from 4 sources$7,995as of Apr 25

The HP Z8 Fury G5 is HP's flagship workstation — a formidable, highly scalable tower designed specifically for demanding professional media, VFX, and AI creators who need a four-GPU ceiling. Built around Intel's Xeon W9-3495X (56 cores), 128 GB DDR5 ECC, and up to four NVIDIA RTX A6000 cards, it is a credible local-LLM training and inference rig at the upper end. The configured price varies enormously: a 1-GPU base build lands around $7,995, a 2-GPU build around $14,000, and a fully loaded 4x RTX A6000 configuration pushes well past $25,000. The price field below reflects a typical 1-GPU configured build; readers planning multi-GPU AI work should expect to roughly triple that figure.

Strengths
  • Supports up to a four-GPU configuration for extreme parallel AI inference and tensor-parallel training
  • Features an easily accessible design with a built-in handle for serviceability
Watch-outs
  • Scaling up configurations becomes prohibitively expensive — 4x A6000 builds push $25,000+
  • Enormous tower chassis requires significant floor or desk space
Apple Mac Studio M3 Ultra
#5
Best for: Best for Mac — highest memory bandwidth in a desktop chassis
Apple Mac Studio M3 Ultra
from 4 sources$3,999as of Apr 25

The Apple Mac Studio with the M3 Ultra chip is the highest-memory-bandwidth single-machine pick in this guide. The base 96 GB / 64-GPU-core configuration starts at $3,999 and scales up to 512 GB unified memory — enough to hold a 405B-parameter Q4 model on a single desktop. Memory bandwidth of 819 GB/s is roughly three times that of a Mac mini M4 Pro and gives the Mac Studio the fastest single-user 70B Q4 inference of any machine in this guide that doesn't have a discrete pro GPU. Reviewers across PCMag, TechRadar, Tom's Guide, and Macworld praised its compactness, silent operation, and raw performance in creative workflows; the trade-off is the closed Apple ecosystem (MLX/Metal only, no CUDA) and zero hardware upgradability after purchase. For local-LLM developers who can live within the Mac toolchain and need a 256+ GB unified memory ceiling, this is the most cost-effective path under $10,000.

Strengths
  • Up to 512 GB unified memory at 819 GB/s — the highest memory bandwidth in this entire guide
  • Compact and stylish desktop chassis (3.7 x 7.7 x 7.7 inches) with silent operation
Watch-outs
  • Internal components like GPU and storage are not upgradable
  • High price for the 256/512 GB unified-memory configs that unlock 405B-class models

Spec comparison

5 products
SpecPuget Systems Genesis IINVIDIA DGX SparkHP Z6 G5 AHP Z8 Fury G5Apple Mac Studio M3 Ultra
CPUAMD Threadripper Pro 5975WX (32-core)20-core Arm (10x Cortex-X925 + 10x Cortex-A725)AMD Ryzen Threadripper Pro 7000 WX-Series (12–96 cores)Intel Xeon W9-3495X (56-core)Apple M3 Ultra (up to 32-core)
GPUUp to 4x Nvidia RTX 4090 (96 GB pooled VRAM)NVIDIA GB10 Grace Blackwell SuperchipUp to 3x dual-height pro GPUs (RTX A6000, RTX 6000 Ada)Up to 4x Nvidia RTX A6000 (192 GB pooled VRAM)Integrated up to 80-core Apple GPU
RAM256 GB DDR4-3200 ECC128 GB unified LPDDR5xUp to 1 TB DDR5-5600 ECC (8 channels)128 GB DDR5 ECC (configurable to 2 TB)96–512 GB unified memory
Storage4 TB Sabrent Rocket 4 Plus NVMe4 TB NVMe M.2 (user-replaceable, self-encrypting)HP Z Turbo NVMe (multiple M.2 + bays)NVMe SSD (configurable, multiple bays)Up to 16 TB SSD
Memory Bandwidth~76 GB/s system; ~1008 GB/s per RTX 4090 VRAM273 GB/s~358 GB/s system; ~960 GB/s per RTX 6000 Ada VRAM~307 GB/s system; ~768 GB/s per RTX A6000 VRAM819 GB/s
Form FactorFull tower (Fractal Define 7)Compact desktop (150 mm cube, 1.2 kg)Compact 4U tower (169 x 465 x 445 mm, built-in handle)Full towerCompact desktop

Frequently asked questions

What is the best ai workstation?
Puget Systems Genesis II is our top pick for ai workstations, with an averaged rating of 4.7/5 from 3 published reviews. The Puget Systems Genesis II is a highly customizable, professionally built workstation aimed at enterprise buyers who need bespoke configurations the major OEMs can't match. With AMD Threadripper Pro, up to 4x RTX 4090 (or RTX Ada workstation cards), 256 GB ECC DDR4, and Puget's hand-tuned assembly process, it delivers exceptional performance and build quality with options for a quiet edition. For local LLM work specifically, a single RTX 4090 (24 GB VRAM at 1008 GB/s) handles 70B Q4 with system-RAM offload at roughly 10–20 tokens/sec; a 4x RTX 4090 configuration pools 96 GB of VRAM and easily holds 70B Q4 in VRAM at 30–40 tokens/sec via tensor parallelism. The steep entry price makes this a tool for buyers who can justify a $10K+ workstation — if you only need local LLMs, the DGX Spark below delivers comparable model-size headroom at less than half the price.
Is there a cheaper alternative worth considering?
Apple Mac Studio M3 Ultra (around $3,999) rates 4.3/5 in our analysis. The Apple Mac Studio with the M3 Ultra chip is the highest-memory-bandwidth single-machine pick in this guide. The base 96 GB / 64-GPU-core configuration starts at $3,999 and scales up to 512 GB unified memory — enough to hold a 405B-parameter Q4 model on a single desktop. Memory bandwidth of 819 GB/s is roughly three times that of a Mac mini M4 Pro and gives the Mac Studio the fastest single-user 70B Q4 inference of any machine in this guide that doesn't have a discrete pro GPU. Reviewers across PCMag, TechRadar, Tom's Guide, and Macworld praised its compactness, silent operation, and raw performance in creative workflows; the trade-off is the closed Apple ecosystem (MLX/Metal only, no CUDA) and zero hardware upgradability after purchase. For local-LLM developers who can live within the Mac toolchain and need a 256+ GB unified memory ceiling, this is the most cost-effective path under $10,000.
How does Verdict rank these products?
Every rating on Verdict is the numerical average of scores published by independent review sites, YouTube reviewers, and Reddit buyer reports. No editor adjusts the order — the ranking is whatever the source data produces. See our methodology page for the full process.
When was this guide last updated?
This guide was last re-checked in April 2026. We re-run our research pipeline for each category on a rolling basis so prices and rankings reflect current market reality.

Related guides

Browse all →