L Singh

Sydney

Arts and Entertainment, Media

As seen in: Medium, Wiley Online Library, Hindustan Times, Flipboard, Royal Society of Chemistry, The BRAG, Tone Deaf, Economic and Political Weekly, ACI Materials Journal, tmrw and

Is this you? As a journalist, you can create a free Muck Rack account to customize your profile, list your contact preferences, and upload a portfolio of your best work. Claim your profile

Get in touch with L

Contact L, search articles and posts on X, monitor coverage, and track replies from one place.

Learn more about Muck Rack

Actions

Share this page

Is this you?

As a journalist, you can create a free Muck Rack account to customize your profile, list your contact preferences, and upload a portfolio of your best work.

Claim your profile

Articles

Allocated but Idle: The Network and Storage Stack Behind Every AI Workload on Kubernetes

Jun 04, 2026 |

By L Singh

| DevOps Blog

Press enter or click to view image in full size Your pod is Running. Your GPU is allocated. DCGM still says it is idle most of the time. If you followed the first post and got the NVIDIA stack installed, your training pod will now schedule. The driver works. The container toolkit works. The device plugin advertises nvidia.com/gpu. The scheduler finds a node. The pod requests its GPU and gets it. Everything in the NVIDIA Kubernetes stack lights up green.

Open in Who Shared

Green Dashboards: Production Monitoring and Logging for GPU Workloads on Kubernetes

May 29, 2026 |

By L Singh

| DevOps Blog

Press enter or click to view image in full size Kubernetes GPU workload Monitoring Dashboards Something just broke. Your dashboards are green. The customers know first. If you followed the first three posts, your cluster can schedule GPUs (part one), feed them properly with network and storage (part two), and grow and shrink to match demand (part three). The infrastructure is in place. Workloads run. The cluster scales itself. And one day inference p99 latency goes from 200ms to 8 seconds.

Open in Who Shared

From Zero to 256 GPUs: Deploying and Autoscaling AI Workloads on Kubernetes

May 29, 2026 |

By L Singh

| DevOps Blog

Press enter or click to view image in full size Deploying AI workloads to Kubernetes Your code is in Git. Your cluster has zero GPU nodes. Your training job is waiting. If you followed the first two posts (post1 and post2), you have a Kubernetes cluster that can schedule GPUs and feed them properly. The whole stack is sitting there, ready. But nobody has actually run anything on it yet, and you have not decided how they will. That is part three.

Open in Who Shared

See all 570 licensed articles (573 Total)

For PR Teams

Overview

Use Cases

Capabilities

Industries

Introducing Curation Engine

Resources

Library

Commmunity

Customers

Company

L Singh

Get in touch with L

Actions

Is this you?

Articles

Allocated but Idle: The Network and Storage Stack Behind Every AI Workload on Kubernetes

Green Dashboards: Production Monitoring and Logging for GPU Workloads on Kubernetes

From Zero to 256 GPUs: Deploying and Autoscaling AI Workloads on Kubernetes

Actions

Is this you?

Get in touch with L