Tech

AI Observability: Take Back Control and Manage Your Infrastructure Costs

Have you just deployed your own AI agent? Can you explain how it works... and how much it costs?

Alexis Cangelosi

May 12, 2026

Like many companies, you’ve decided to integrate AI into your processes; you pay cloud bills, as well as various subscriptions, and perhaps even the vendor of your own AI solution. But if someone asked you tomorrow what you’re actually consuming (in terms of computing power, energy, data traffic, etc.), would you know how to answer?

For most organizations, the honest answer is no.

A quick adoption… but not very well-managed

With the rapid rise of artificial intelligence, organizations have rushed to adopt it because of its great potential, due to competitive pressure, and sometimes simply to “not miss the boat” often at the expense of understanding how it works.

The next challenge for companies will therefore be the ability to see what is actually running in their AI infrastructures and the ability to choose how it runs.

Together, these can radically change the relationship an organization has with its AI strategy.

You can only manage what you measure

Faced with the spectacular figures for the AI “gigafactories” announced by hyperscalers, the real question isn’t the data center’s theoretical computing capacity, but “how many FLOPS are actually used to produce a useful result?”

And here, the picture is often less flattering, with significant waste: Models running continuously to meet intermittent needs, GPU resources that remain allocated hours after a job has finished, pipelines consuming memory for features that no one uses… It is therefore crucial to be able to accurately audit your infrastructure, lest it become oversized… And watch costs skyrocket!

Observability: Real-World Implementation Takes Over Theory

For a long time, deploying infrastructure was a much more theoretical process: we would list an application’s dependencies, build a Docker image accordingly, and assume that the result was more or less optimized.

Then came tools that allowed us to monitor runtime performance… The verdict: between 60 and 70% of the included content was useless. Never called, never used, but present with every deployment, weighing down the network chain each time.

AI observability is based on the same principle and often leads to the same surprises! It is now essential to actually look at what is happening in production (frequency of endpoint requests, user latency, impact of the model’s load on compute nodes) instead of simply assuming that the model is behaving as expected because the benchmarks were good.

Supplier dashboards: Are they really useful?

Today, many cloud providers offer their own monitoring tools. However, you and your cloud provider may not necessarily prioritize the same aspects. Why not develop your own monitoring layer? Many tools now make this approach accessible by treating the state of an operating system as a queryable database.

You ask a specific question, you get a specific answer, without going through a proprietary interface. Applied to AI, this allows you to build precise dashboards tailored to your needs: consumption by model, memory usage by pipeline, correlation between workload and energy footprint. This way, you can focus on the information you’ve chosen and that’s truly useful to you!

Go beyond observation, take action

You’ve set up your own dashboard and realized that some models are too expensive, or that sensitive data is being sent to infrastructures over which you have little control. So what now?

Sovereignty (with its technical ability to decide where models run, what data they’re trained on, and what infrastructure they use) will be your best ally in implementing a real action plan and making observability truly useful!

Decentralize to regain control of data

Today, the dominant model for cloud AI is centralized: you send your data to a third-party infrastructure, it does the work, and you get the results. It’s a simple process, but one that can prove problematic when examined in detail in terms of regulations, privacy, and the energy consumption associated with the massive transfer of data.

The solution? Stop moving the data, and move the training instead.

Each site (a regional data center, a subsidiary, an industrial partner) trains the model locally, and shares only updates to the model’s parameters with the global network, never the raw data. The aggregated result is a model that has learned from all sources without any of them having to expose their content.

No gigafactory. No overpriced proprietary infrastructure. The data stays where it is. Regulatory compliance is automatically strengthened. And overall energy consumption decreases, since massive data transfer between sites is eliminated.

Local inference: A luxury reserved for large teams?

Just two years ago, running a serious language model on your own servers required dedicated GPU clusters, specialized teams, and a substantial infrastructure budget.
That barrier has been broken. Frameworks like GGML have made it possible to run models with billions of parameters on standard hardware (including ordinary CPUs and consumer-grade GPUs).

The technique behind this is called quantization: by reducing the precision of the model’s weights from 32 bits to 4 or 8 bits, the required memory is reduced by a factor of 4 to 8. There is some loss of quality, but it is often negligible for common business use cases. What used to cost several euros per hour on a cloud instance can now run on an existing on-premises server. The result: lower cloud costs, improved latency, and no more data in transit!

The Pitfall of Oversizing

This is a simple concept to understand, but often more complicated to put into practice. Large general-purpose models are trained to answer anything. This universality comes at a direct cost in terms of size, inference time, and energy consumption. If your actual need is to analyze contracts in your industry, extract data from invoices, or answer your teams’ questions about your internal documentation, you don’t need that level of universality.

You need a model tailored to your domain, a smaller model, fine-tuned on your own business data, which will (in most cases) be more accurate, faster, and consume only the energy necessary.

There’s no need to “use a bazooka to kill flies.”

Conclusion

Observing what’s going on and deciding where things are heading are two skills that may seem defensive (as if the goal were merely to avoid the worst). The opposite is true. An organization that has its own AI telemetry, which can freely choose its infrastructure and calibrate its models to its actual use, is in a position to manage and optimize it, but above all to avoid vendor lock-in (being able to switch providers without disruption), while keeping costs under control

‍

Want to deploy your own vendor-agnostic, cost-effective AI solution, and most importantly, without vendor lock-in? Contact our experts.

Subscribe to our Newsletter

Stay up to date with the latest news, articles, and updates. Subscribe now!

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Our latest articles

Craft AI news

December 23, 2025

Glossaire IA – Partie 3 : Usages, enjeux et éthique de l’intelligence artificielle

L’IA c’est de la technique, des cas d’usage qui révolutionnent certains corps de métiers, mais aussi et surtout des enjeux éthiques !

Pascal CONDAMINE

Craft AI news

December 9, 2025

Glossaire IA - partie 1: Les bases de l’intelligence artificielle

Découvrez ce premier épisode de la mini série de notre glossaire qui décrypte 50 mots derrière l'IA

Pascal CONDAMINE

Applications

December 5, 2025

How AI Transforms the daily life of HR and Employees

With only one HR specialist per 100 employees on average, human resources professionals quickly find themselves overwhelmed... What if AI could help them?

François Vimond

See all articles

Let's talk about your project. We're ready to help you succeed!

Chat with an Expert

Book a Demo

Craft AI is a French pioneer in industrial and responsible AI.
Our mission is to make AI accessible to all businesses. We simplify putting AI into production and guarantee a trustworthy solution that is explainable, secure, and frugal. Our platform empowers clients to scale from a proof-of-concept in just a few weeks while retaining complete control of their data and infrastructure.

SOLUTIONS

SECTORS

CRAFT AI

Politique de Confidentialité

Mentions Légales

CGU