System Monitoring (asitop)

Monitor GPU, CPU, ANE, and memory on Apple Silicon

Replaces DGX Spark: DGX Dashboard

toolsmonitoring

Basic idea

Apple Silicon combines CPU, GPU, Neural Engine (ANE), and memory on a single die in a unified memory architecture. Unlike a desktop with an NVIDIA GPU that has its own dedicated VRAM, your Mac's GPU and CPU share the same RAM pool. This means GPU memory pressure directly competes with system RAM — running a large language model inference can push your Mac to swap even if you have 32GB of RAM.

asitop uses Apple's private IOKit framework to read hardware performance counters in real-time — the same data source that Activity Monitor uses, but in a terminal-friendly format. On an NVIDIA machine, you'd run nvidia-smi. On Apple Silicon, you run asitop.

What you'll accomplish

asitop running in your terminal showing live: GPU utilization %, memory bandwidth (GB/s), ANE power draw, CPU cluster utilization (efficiency + performance cores), and unified memory pressure — everything you need to understand ML workload performance and diagnose bottlenecks.

What to know before starting

Unified memory architecture: The CPU, GPU, and ANE all access the same physical RAM. When you run a 7B LLM, the model weights live in RAM and the GPU reads them every inference pass. The memory bandwidth (GB/s) is the key bottleneck — not VRAM capacity.

Memory bandwidth vs capacity: A 70B model at 4-bit quantization = ~40GB. You need that much RAM. But the performance bottleneck during inference is bandwidth — how fast the GPU can read the weights from RAM. M3 Max provides 400 GB/s; M3 Pro provides 150 GB/s.

Memory pressure: macOS compresses and eventually swaps RAM to the SSD when physical RAM is full. The pressure gauge (green/yellow/red) reflects this. Yellow = compression active; red = swapping to disk = severe performance degradation.

Apple Neural Engine (ANE): A dedicated ML accelerator on the chip optimized for CoreML models. PyTorch (MPS) and MLX do NOT use the ANE — they use the GPU. The ANE is used by on-device Siri, autocorrect, and Core ML apps. Seeing high ANE% during LLM inference is unexpected.

CPU cluster topology: M-series chips have two CPU clusters: efficiency (E) cores for background tasks and performance (P) cores for compute. ML frameworks should be using the P cores.

Prerequisites

• macOS 12.0+, Apple Silicon (M1 or later)

• Python 3.9+

• `sudo` access (required for IOKit hardware monitoring APIs)

Time & risk

Duration:: 5 minutes

Risk level:: None — read-only hardware monitoring, no system changes