Models testedQwen2.5-7BLoRA Fine-TuningHermes Reasoning TracesToken EfficiencyEnergy / Tokennext: H100next: A100

Private beta · multi-tenant GPU billing audit

Know what every tenant's GPU usage really costs you.
Down to the job.

NemulAI is a billing audit for multi-tenant GPU providers. A lightweight, read-only agent attributes real cost — compute + energy — to every job and rolls it up by tenant, so you can see which accounts you're billing below cost and exactly what to charge instead.

Run the live audit Start a pilot

billing.csvusage.log→[ NemulAI ]→who's underpriced

Per-tenant billing audit

Measured cost vs. invoiced · last 30 days · ROCm/NVML telemetry

2 below cost

Tenant	GPU-hrs	Cost	Billed	Margin
acme-research	4,200	$12,600	$11,300	-12%
lumina-systems	8,300	$24,900	$30,600	+19%
nova-ai	6,100	$18,300	$19,200	+5%
deepforge	1,480	$4,440	$3,900	-14%
pixel-labs	2,950	$8,850	$12,400	+29%

2 of 5 accounts billed below measured cost.
That gap is margin you're giving away every month.

$0/mo at risk

≈ $22,080/yr in leaked margin

Cost = real measured energy + compute at your rate. Repricing to break-even closes the gap.

A sample multi-tenant fleet: every GPU-second priced at measured cost and rolled up by tenant. Two accounts are billed below cost — the audit surfaces them first. See the full sample report →

Read-only telemetry

Open source agent

2-minute install

No workload changes

Built for neoclouds, GPU resellers & internal platform teams that bill tenants for shared GPU — anyone who needs to defend margin and prove their true cost of goods.

The Problem

You can't prove your GPU COGS

Shared nodes, fractional GPUs (MIG), and idle time make per-tenant cost a guess. You bill from a rate card and hope it clears cost — with no way to prove what any account actually consumed.

The Old Way

Invoices and spreadsheets

Aggregate cloud bills and scheduler logs that never map GPU-seconds to dollars per tenant. You discover the underpriced accounts in the quarterly margin review — if at all.

NemulAI

A per-tenant billing audit

A read-only agent ties every GPU-second and its real energy cost to a job → tenant, flags the accounts billed below cost, and tells you the break-even rate to charge.

What NemulAI surfaces

Hover any card to see how deep it reads your data. Green means a healthy margin; red means money is leaking or a customer is underpriced.

GPU Billing Audit

-12%

Find who is underpriced.

3 files in → customer margin vs. GPU-seconds consumed, with the accounts billed below cost flagged first.

acct margin −12%

Cost Attribution

+4%

Tie every dollar to a job.

Per run → which model, on which GPU, for how long, at what energy. Nothing lands in the bill unattributed.

100% attributed

Repricing Suggestions

+18%

Charge the right amount.

Per customer → measured break-even $/GPU-hr and a suggested list price to hit your target margin.

to +18% margin

Waste Detection

-23%

Catch the idle burn.

Per machine → idle %, dollars leaking, and the run that should have been scheduled there instead.

$ leaking −23%

Budget alerts via email and webhook. Carbon & EU AI Act reporting available for teams that need it.

20–35%

of GPU-hours typically sit idle or underused on the fleets we've measured — capacity you paid for but never billed a tenant. That's the margin NemulAI recovers first.

How much margin are your idle GPUs eating?

Set your blended cost and fleet size. NemulAI replaces this estimate with real, measured cost per tenant — and shows exactly which accounts and machines the leak comes from.

Blended GPU rate ($/GPU-hr)GPUs in your fleetIdle / underused hours: 30%

Money leaking on idle GPUs

$8,760

per month

of $29,200/mo total GPU spend. NemulAI shows you exactly which tenants and machines this is.

Estimate only — NemulAI prices your real, measured energy draw at the rate you set above.

What a 2-week pilot looks like

Lightweight, read-only, and reversible. You only need telemetry — not permission to touch production workloads.

Day 0

Install

pip install the read-only agent on your boxes — systemd, Docker, or a K8s DaemonSet. No workload changes.

Day 0

Tag

Set two env vars so cost rolls up by tenant and model. Scheduler metadata (SLURM, K8s, MIG slices) is picked up automatically.

Days 1–14

Watch

The agent attributes every job's real cost to a tenant and flags accounts running below cost as it happens.

Day 14

Audit

You get per-tenant cost vs. what you billed, the accounts you're losing money on, idle waste in dollars, and a repricing shortlist.

Up and running in 2 minutes

Install the read-only agent

Set your API key

Tag your jobs (optional)

See per-job cost + waste

When you're ready

Audit first. Reprice when you're ready.

Start with a read-only audit. Once you trust the numbers, NemulAI hands you the break-even rate and a suggested list price per tenant — and flags drift as new accounts slip below cost. You decide what to bill; you stay in control the whole way.

Why the cost number is real

Most tools estimate cost from cloud invoices. NemulAI measures real energy draw at the hardware — so every tenant's bill traces back to physics, not a rate card. Two real MI300X runs of the same fine-tune, measured to the joule:

Qwen2.5-7B-Instruct · LoRA · 2 configs

AMD MI300X · 4-bit NF4 · same 245,760 tokens · ROCm telemetry

2 runs · MI300X

Baseline

Most efficient

bs=2 · ga=4

0.00 J

Energy / token

Total energy: 0.0243 kWh
Avg power: 630.5 W
Duration: 138.5 s
Energy cost: $0.0024
Train loss: 1.05

Small batch

bs=1 · ga=8

0.46 J

Energy / token

Total energy: 0.0316 kWh
Avg power: 637.2 W
Duration: 178.9 s
Energy cost: $0.0032
Train loss: 2.11

Same model, same 245,760 tokens — but bs=2 used 23% less energy (0.36 vs 0.46 J/token) and finished 22% faster. Which config ran is the difference between the two bills.Direct summary fields · energy @ $0.10/kWh · measured 2026-05-07.

Same model, same 245,760 tokens — but bs=2 used 23% less energy (0.36 vs 0.46 J/token) and finished 22% faster. Which config ran is the difference between the two bills. See the full sample report →

Architecture

Agent (Open Source)

NVML probe, WAL buffer, batched upload. Runs as systemd, Docker, or K8s DaemonSet. Read-only by default.

API

Ingest, per-job + per-tenant cost attribution, below-cost detection, repricing, chargeback.

Dashboard

Per-tenant cost & margin, chargeback exports, below-cost alerts, weekly owner reports.

How it stays safe

You're handing us telemetry, not control. Everything below is true on day one.

Is the agent really read-only?

Yes. By default it only reads NVML counters and uploads metrics. It never changes your workloads unless you explicitly opt into power-cap autopilot — which has an observation window and automatic rollback.

Does it touch my code or training data?

No. It collects GPU telemetry (utilization, power, memory) and job metadata you tag. Your model weights, datasets, and source never leave your machines.

Can I self-host?

Yes. The agent is open source — audit it, and point it at your own endpoint. The collector will never be paywalled.

How long until I see value?

Install is ~2 minutes. You'll see live per-job cost immediately and a full per-tenant billing audit within a 1–2 week pilot.

GPUs testedAMD MI300XROCm TelemetryLive WattageJoules / TokenCost Attributionnext: H100next: A100next: RTX 3090

Upload your billing + usage files. See who's underpriced.

Drop in a billing export and a usage log. NemulAI attributes every GPU-second, flags the accounts you're losing money on, and suggests where to reprice.