Start now →

The Observability Stack Every LLM-Powered Go Service Needs

By Mundhraumang · Published March 6, 2026 · 9 min read · Source: Level Up Coding
RegulationAI & Crypto
The Observability Stack Every LLM-Powered Go Service Needs

Everyone is building AI-powered services. Almost nobody is thinking about what happens when they break.

Three weeks after shipping. Everything looks fine on the surface — the service is running, requests are being handled, users are getting responses.

Then it isn’t fine anymore.

Latency has quietly crept from 400ms to 6 seconds. Token costs doubled. Some users are getting responses that are subtly wrong — no error, no stack trace, just wrong. And you’re staring at logs that tell you absolutely nothing useful.

Welcome to the LLM observability gap. It’s the part of AI engineering that nobody talks about at conferences, but every team hits eventually.

LLM Services Break Differently. Your Observability Needs to Match

Here’s the thing about traditional microservices: they fail loudly. Database down, 500 thrown, alert fires, engineer wakes up, problem fixed. The feedback loop is tight.

LLM services don’t work that way. They fail in ways that look like success:

A service with zero errors in the logs can be completely broken from a user’s perspective. That’s not a hypothetical — it’s what happens when you treat an LLM service like a CRUD API.

The fix isn’t complicated. But it does require four specific things.

1. Distributed Tracing — Stop Guessing Where Time Goes

A 6-second response time is a useless data point without context. Is it your preprocessing? The LLM provider? Your database fetch for context? The network round-trip?

Distributed tracing breaks that number apart. And for LLM services specifically, the split between “time in your code” vs “time waiting for the provider” is the most important thing to know. One you can fix. The other you can’t — but you can at least stop blaming your own code for it.

GoFr instruments every handler and database call with OpenTelemetry automatically. You add spans around the parts that matter most — the LLM call:

func generateResponse(ctx *gofr.Context, prompt string) (string, error) {
span := ctx.Trace("llm-provider-call")
defer span.End()

span.SetAttributes(
attribute.Int("prompt.token_estimate", estimateTokens(prompt)),
attribute.String("llm.model", "gpt-4o"),
attribute.String("llm.endpoint", "chat-completions"),
)

response, err := callLLMProvider(ctx, prompt)
if err != nil {
span.RecordError(err)
return "", err
}

span.SetAttributes(
attribute.Int("response.tokens", response.TokensUsed),
attribute.Float64("response.latency_ms", response.LatencyMS),
)

return response.Text, nil
}

Now your trace dashboard shows the full picture: GoFr’s automatic spans for your HTTP handler and DB calls, plus your custom span for the LLM provider. You know immediately whether the problem is yours or theirs. That distinction alone is worth the 10 lines of code.

2. Structured Logging — Logs You Can Actually Use at 2 AM

Unstructured logs are a development convenience. In production, they’re a liability.

For LLM services, structured logging does something beyond just debugging: it gives you a queryable history of what your service said and why. That matters when a user reports “the AI gave me wrong information” and you need to reproduce the exact prompt and response from 3 days ago.

GoFr’s logger writes structured JSON by default — request ID, timestamp, service context are already there. You add the fields that matter for your LLM service:

func handleChatRequest(ctx *gofr.Context) (interface{}, error) {
var req ChatRequest
ctx.Bind(&req)

ctx.Logger.Info("llm request received",
"user_id", req.UserID,
"model", req.Model,
"prompt_len", len(req.Prompt),
)

response, err := generateResponse(ctx, req.Prompt)
if err != nil {
ctx.Logger.Errorw("llm request failed",
"error", err.Error(),
"model", req.Model,
)
return nil, err
}

ctx.Logger.Infow("llm request completed",
"tokens_used", response.TokensUsed,
"latency_ms", response.LatencyMS,
"model", req.Model,
)

return response, nil
}

Now “show me all requests where tokens_used > 2000” is a single filter. “Show me every failed request for user X this week” is instant. The audit trail for wrong outputs exists. Without structure, that investigation is 45 minutes of grep and guesswork.

3. Metrics — The Numbers That Actually Tell You If You’re Healthy

Traces answer “what happened in this request.” Metrics answer “what’s been happening across all requests for the past week.” Both matter.

For a regular API, CPU, memory, and request rate are usually enough. For an LLM service, those metrics tell you almost nothing. Here’s what actually matters:

func main() {                                                                                                                                                                    
a := gofr.New()

a.Metrics().NewCounter("llm_tokens_total", "Total tokens consumed")

a.Metrics().NewHistogram("llm_duration_seconds", "LLM provider call latency",
0.1, 0.5, 1, 2, 5, 10, 30)

a.Metrics().NewCounter("llm_errors_total", "LLM errors by type")

// ... register handlers and run

a.Run()
}

GoFr exposes /metrics in Prometheus format automatically. These counters plug straight in. Wire them to Grafana and you have a live view of cost, latency distribution, and error breakdown — the three numbers that actually tell you whether your LLM service is healthy.

4. Health Checks — Degraded Is Not the Same as Dead

A standard health check answers: is the process running? For an LLM service, that’s the least interesting question.

The questions that actually matter are: Is the provider responding within acceptable latency? Are we approaching rate limits? Are downstream dependencies healthy?

GoFr auto-exposes /.well-known/health and checks all registered dependencies automatically. For your LLM provider, you register it as an external HTTP service — GoFr then health-checks it on a schedule and includes it in the health response:

func main() {
app := gofr.New()

// Register the LLM provider as an external HTTP service.
// GoFr automatically calls the HealthEndpoint periodically
// and includes its status in /.well-known/health.
app.AddHTTPService("llm-provider", "https://api.openai.com",
&service.HealthConfig{
HealthEndpoint: "v1/models", // endpoint GoFr pings to check provider health
},
&service.CircuitBreakerConfig{
Threshold: 4, // open circuit after 4 consecutive failures
Interval: 1 * time.Second,
},
)

app.POST("/v1/chat", handleChatRequest)
app.Run()
}

func handleChatRequest(ctx *gofr.Context) (interface{}, error) {
// Retrieve the registered LLM service client from context
llmSvc := ctx.GetHTTPService("llm-provider")

// GoFr automatically logs, traces, and applies circuit breaking to this call
resp, err := llmSvc.Post(ctx, "v1/chat/completions", nil, requestBody)
if err != nil {
// If circuit is open, this returns immediately — no 30-second hang
return nil, err
}

defer resp.Body.Close()
// ... parse and return response
}

This is the right way to think about it — the LLM provider is a downstream HTTP service, not an internal component. GoFr treats it exactly that way. The health check, circuit breaker, tracing, and retry logic all come from AddHTTPService. You get the /.well-known/health distinction between provider-down and service-down for free.

The CircuitBreakerConfig is also the right home for your circuit breaker. After 4 failures within the interval, GoFr opens the circuit — subsequent calls fail fast instead of waiting for a timeout. GoFr even publishes a app_http_circuit_breaker_state metric automatically (0 = closed, 1 = open), which you can alert on directly in Grafana.

5. Circuit Breakers — When to Stop Trying

This is the one resilience pattern most LLM service skip. And it’s the one that saves you when a provider has an outage.

Without a circuit breaker, here’s what happens when your LLM provider goes down: every request waits for the full timeout (often 30 seconds), then fails. Thread pools fill up. Memory spikes. Other endpoints that have nothing to do with the LLM start failing too. One provider outage takes down your entire service.

In GoFr, the circuit breaker lives exactly where it should — on the external HTTP service registration. There’s no separate implementation needed:

app.AddHTTPService("llm-provider", "https://api.openai.com",
&service.CircuitBreakerConfig{
Threshold: 4, // open after 4 consecutive failures
Interval: 1 * time.Second, // within this window
},
)

After 4 failures within the interval, GoFr opens the circuit. Requests to the LLM provider fail immediately — no 30-second hang, no thread pool exhaustion. GoFr automatically publishes app_http_circuit_breaker_state (0 = closed, 1 = open) to your metrics endpoint. Set a Grafana alert on that metric and you'll know the moment a provider starts struggling — before your users tell you.

The circuit periodically retries the provider to detect recovery. When it closes, traffic resumes automatically. You don’t write any of this logic — it’s configuration.

The Pitfalls Table

Most LLM service problems fall into a predictable set of categories. Here’s what to watch for and what each one tells you:

Table of common LLM service production failures: symptoms, root causes, and what to check — including latency creep, silent wrong outputs, unexpected cost spikes, and circuit breaker gaps.

If you’re debugging an LLM service problem right now, start with this table. The column that matters most is “What to Check” — because you can only check it if you built the observability stack first.

What GoFr Gives You vs. What You Build

To be concrete about the division of work:

Table showing the division of observability responsibilities between GoFr (automatic) and the developer (custom additions) for an LLM service — covering tracing, logging, health checks, metrics, and circuit breakers.

The “You add” column is maybe 100 lines of code across your entire service. The “GoFr (automatic)” column is what would take you days to wire up from scratch — and would probably get skipped the first time because there’s always a feature to ship instead.

Closing Thoughts

The engineers who’ve been burned by LLM services in production all say the same thing: they wish they’d built the observability before they built the feature.

It’s not that the problems are unforeseeable — token cost creep, tail latency, provider outages, silent wrong outputs. These are all known failure modes. The issue is that without visibility, you’re discovering them through user complaints instead of dashboards.

GoFr doesn’t prevent LLM failures. Nothing does — the nondeterminism is the point. But it puts the traces, logs, metrics, and health checks in place from the moment you call gofr.New(). Your job is to add the 100 lines of domain context that make them meaningful for an AI service.

Build the stack first. Ship the feature second. The 2 AM alert you avoid will be worth it.

GoFr is an open-source Go microservices framework built for production. Explore it at gofr.dev and give it a ⭐ on GitHub.


The Observability Stack Every LLM-Powered Go Service Needs was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

This article was originally published on Level Up Coding and is republished here under RSS syndication for informational purposes. All rights and intellectual property remain with the original author. If you are the author and wish to have this article removed, please contact us at [email protected].

NexaPay — Accept Card Payments, Receive Crypto

No KYC · Instant Settlement · Visa, Mastercard, Apple Pay, Google Pay

Get Started →