One sensible card at each price point, from “pack the most cards per watt” to “96 GB flagship.” Slot width is colour-coded: single-slot vs two-slot. Full reasoning in the tables below.
“Blower” does not mean “single-slot.” The single number that decides how many cards fit per motherboard is slot width (cooler thickness). Here are the three shapes you'll meet, drawn isometrically at the same scale.
One rear slot. The density target: RTX A4000, RTX 4000 Ada, RTX 2000 Ada, the GALAX 无双 blowers. Fits in every adjacent slot — pack 4–8 on a board.
Two rear slots. Almost everything fast lives here — A6000, RTX 6000 Ada, RTX PRO 6000, L40S/H100, and 2-slot blower mods. A stock 3090/4090 is actually 3-slot; you need a Turbo/blower variant to hit two.
Short card for small chassis (e.g. NVIDIA L4, RTX 4000 SFF Ada). Watch out: many “SFF” pro cards are still two slots wide — short ≠ thin.
Only genuinely one-slot-wide cards here — nothing two-slot is mixed in. The field is small: outside AMD's weaker W7500/W7600, every single-slot card that meets or beats the A4000 is an NVIDIA pro card or a GALAX 无双 blower.
① You can buy the exact single-slot RTX 4060 Ti — but only as a China grey-market import, at a scalped premium. GALAX's “无双 / Unparalleled MAX” was a Dec-2023 China-domestic launch that never reached Western retail; today it surfaces on Taobao 现货 listings around ¥4480–6800 (~$620–940), not as a normal Amazon/Newegg/eBay SKU, and it's been superseded. (eBay “4060 Ti Metal Master MAX” cards are the multi-fan models — not this one.)
② The single-slot blower you can actually order is its successor: the GALAX RTX 5060 Ti “无双 MAX”. 20 mm blower, 16 GB GDDR7, 180 W, launched at ¥4999 (~$545–690), Oct 2025. JD SKU 100270366512 may now be off-shelf (已下柜) → route via Taobao or an eBay importer (~$900).
③ For a clean, in-warranty single-slot upgrade: the NVIDIA RTX 4000 Ada. True single-slot, 20 GB ECC, 130 W, ~2× the A4000's BF16 plus FP8, sold through normal pro channels (~$1250–1600). Catch: 360 GB/s bandwidth is lower than the A4000's 448.
The BF16/FP8 lens: your A4000 has no FP8 at all (Ampere) and only ~77 TF dense BF16 (its “153” is with sparsity). Every Ada/Blackwell card here adds FP8; the 5060 Ti adds the only FP4. Dense-BF16 order: 4000 Ada ≫ 5060 Ti > 4060 Ti ≈ 2000 Ada > A4000. There is no factory single-slot RTX 4070.
| Card | VRAM | BF16 tensor | FP8 | Bandwidth | TDP | Length | Price | Buyable |
|---|
Every link below appeared in real eBay/search results; none are invented. eBay item pages sit behind bot-protection, so prices marked “not pinned” should be confirmed on-page. ⚠️ Reject any card labelled Metal Master / 金属大师, EX / 星曜, 1-Click OC 2x, X2W/X3W or thicker than 20 mm — those are the multi-fan dual-slot cards sharing the name.
Bottom line: the 5060 Ti single-slot is the one you can order today; the 4060 Ti single-slot is grey/used China-only with no clean Western listing. Full list in LISTINGS.md.
Stacking note: all single-slot blowers exhaust out the back, but packed shoulder-to-shoulder they still starve each other. The 70–130 W cards (2000 Ada, 4000 Ada) tolerate dense packing far better than the 165–180 W GALAX cards. Budget real front-to-back airflow.
Accept two slots and the cheap-VRAM world opens up. This table spans used consumer 3090/4090s (cheapest $/GB), clean workstation cards, grey-market mods, and passive datacenter accelerators. Filter by class; sort by $/GB for value or by any precision column. Consumer cards are usually 3-slot stock — note the blower-mod caveat.
Cheapest real VRAM: a used RTX 3090 (24 GB, ~$750) or modded RTX 3080 20 GB blower. Ampere, so no FP8 — same gap as your A4000 — but ~$31/GB and 24 GB beats every single-slot 16 GB card. Get a 2-slot Turbo/blower variant to stack.
Best $/compute consumer: RTX 4090 24 GB (~$1.7k) → modded 4090 48 GB (~$4.3k). Ada FP8, ~1 TB/s. The 48 GB blower mod is the standout $/VRAM/compute play if you accept grey-market risk.
Best clean dense-build card: RTX PRO 6000 Blackwell Max-Q. 96 GB ECC GDDR7, FP4/FP8, active dual-slot at 300 W (vs the 600 W Workstation Edition) — far saner for multi-card thermals.
VRAM king (server-only): H200 NVL — 141 GB HBM3e, 4.8 TB/s, but passive and ~$26–37k. Best 48 GB clean pick: RTX 6000 Ada or L40S. Cheapest 48 GB: used A6000 (~$3–4.6k, NVLink, no FP8).
| Card | Class | VRAM | $ / GB | FP32 | BF16 | FP8 | Bandwidth | TDP | Price | Buyable |
|---|
nvidia-smi, memory tests, sustained-load thermals, return terms, and photos.Out of category: L4 is low-profile single-slot, not full-height two-slot. B200 is SXM6/HGX, not a buyable PCIe card. Full list with trust notes in LISTINGS_TWO_SLOT_BLOWERS.md.