Appliance · on-premises

Intelligence
behind your firewall.

A pre-configured cluster of Mac Studios and Mac minis shipped to your facility. We ship it, we enroll it, we keep it current. You get unlimited inference for every internal tool — with zero data ever leaving your network.

Chompute appliance rack with active inference nodes
2.56 TB
pooled unified memory
16-node cluster
2.6 kW
peak draw
8 Studio + 8 mini nodes
< 4 wk
capex breakeven
vs metered cloud APIs
0
egress to public cloud
all inference local
Who it's for

Teams who can't send their data anywhere — but still want the best models.

Enterprise IT and engineering leadership, running regulated workloads or guarding IP that never leaves the network.

Enterprise IT / CIO

Procurement-grade deployment.

  • Flat capex — no per-token metering, ever
  • HIPAA, SOC 2 Type II, and FedRAMP-ready posture
  • Apple Business Manager zero-touch enrollment
  • GitOps-managed runtimes, models, patches
VP Engineering

Architecture that actually holds up.

  • Pool 64GB to 256GB per node over Thunderbolt 5 RDMA
  • Run 7B → 1T-parameter models on one cluster
  • OpenAI-compatible gateway (drop-in for SDKs)
  • Observability: KV-cache hit rate, memory, queue depth
Hardware

Apple silicon. Memory where GPUs have none.

Unified Memory Architecture puts up to 256GB of high-bandwidth memory next to the neural engine — so one appliance runs what normally takes four enterprise GPUs.

Mac Studio class node

Mac Studio class node

Runs deep reasoning, planning, and trillion-parameter MoE workloads. Cluster 4+ to serve frontier-size models over RDMA.

256 GB memory76 GPU cores215 W peakRole: Heavy
Mac mini class node

Mac mini class node

Handles routine extraction, structured output, and tool-use calls at high throughput. Stacks densely — no datacenter HVAC required.

64 GB memory70-100 tok/s110 W peakRole: Fast
How we ship

From purchase order to usable inference.

Our fleet management heritage means the appliance shows up ready. Your IT team never touches a terminal.

01

We spec your fleet

Tell us your workload mix (planning vs. extraction vs. vision). We size the cluster — usually 8–32 nodes.

02

Devices ship to your site

Pre-racked and labeled. Your team plugs in power and network. No image to flash, no firmware to chase.

03

Zero-touch enrollment

On first boot, Apple Business Manager authenticates hardware identifiers and joins the Chompute control plane.

04

Runtimes and models pull

Containerized MLX, vLLM, and LM Studio runtimes download locally. Models stream in over your WAN.

05

Gateway comes online

Your developers point their tools at your internal Chompute endpoint. OpenAI-compatible from minute one.

06

We keep it fresh

GitOps-driven continuous sync pushes model updates, security patches, and routing policies. Failover is automatic.

Reference spec

The Chompute Rack

Compute8× Mac Studio class nodes + 8× Mac mini class nodes
Pooled memory2.56 TB unified memory
InterconnectThunderbolt fabric
Max model size7B to 1T open-weight route
Peak draw2.6 kW
FootprintRack or shelf deployment
NoiseNetwork-closet friendly profile
Illustrative TCO
$76K+/yr

metered API baseline + usage overages

Fixed capex

owned capacity · no token overages

Cluster-16

Apple silicon fleet

online
Thunderbolt fabric2.56 TB pooled memory120 tok/squeue 03
M4-01fast
M4-02fast
M4-03fast
M4-04fast
M4-05fast
M4-06fast
M4-07fast
M4-08fast
STU-01heavy
STU-02heavy
STU-03heavy
STU-04heavy
STU-05heavy
STU-06heavy
STU-07heavy
STU-08heavy
Privacy

Hardware-level sovereignty.

Put inference capacity where the most sensitive work already lives. Chompute keeps the operational surface compatible while giving enterprises a real local-first path.

On-prem only

Keep sensitive prompts, source code, and records inside the customer-controlled environment.

HIPAA and SOC 2 ready

Built for enterprise diligence instead of “demo first, policy later” rollouts.

Remote attestation

Know which devices are enrolled, ready, and serving inside the fleet.

PII redaction

Add gateway-level controls before prompts move through agent workflows.

Verticals

Built for environments where control matters.

Engineering and IT

Code assistants, incident automation, CI review, internal knowledge agents.

Healthcare

Local document intelligence and care operations where PHI boundaries matter.

Industrial

Inference near facilities, telemetry, and operations teams that cannot depend on fragile cloud paths.

Marketing

Always-on creative and merchandising agents without runaway usage anxiety.

Tell us your workload

Bring us the agent loop that keeps growing.

We will help map it to appliance capacity, endpoint capacity, or a practical path that starts hosted and moves on-prem when the business case is clear.