Glossaire IA – Partie 3 : Usages, enjeux et éthique de l’intelligence artificielle
L’IA c’est de la technique, des cas d’usage qui révolutionnent certains corps de métiers, mais aussi et surtout des enjeux éthiques !
Have you just deployed your own AI agent? Can you explain how it works... and how much it costs?
%20(2).png)

Like many companies, you’ve decided to integrate AI into your processes; you pay cloud bills, as well as various subscriptions, and perhaps even the vendor of your own AI solution. But if someone asked you tomorrow what you’re actually consuming (in terms of computing power, energy, data traffic, etc.), would you know how to answer?
For most organizations, the honest answer is no.
With the rapid rise of artificial intelligence, organizations have rushed to adopt it because of its great potential, due to competitive pressure, and sometimes simply to “not miss the boat” often at the expense of understanding how it works.
The next challenge for companies will therefore be the ability to see what is actually running in their AI infrastructures and the ability to choose how it runs.
Together, these can radically change the relationship an organization has with its AI strategy.
Faced with the spectacular figures for the AI “gigafactories” announced by hyperscalers, the real question isn’t the data center’s theoretical computing capacity, but “how many FLOPS are actually used to produce a useful result?”
And here, the picture is often less flattering, with significant waste: Models running continuously to meet intermittent needs, GPU resources that remain allocated hours after a job has finished, pipelines consuming memory for features that no one uses… It is therefore crucial to be able to accurately audit your infrastructure, lest it become oversized… And watch costs skyrocket!
For a long time, deploying infrastructure was a much more theoretical process: we would list an application’s dependencies, build a Docker image accordingly, and assume that the result was more or less optimized.
Then came tools that allowed us to monitor runtime performance… The verdict: between 60 and 70% of the included content was useless. Never called, never used, but present with every deployment, weighing down the network chain each time.
AI observability is based on the same principle and often leads to the same surprises! It is now essential to actually look at what is happening in production (frequency of endpoint requests, user latency, impact of the model’s load on compute nodes) instead of simply assuming that the model is behaving as expected because the benchmarks were good.
Today, many cloud providers offer their own monitoring tools. However, you and your cloud provider may not necessarily prioritize the same aspects. Why not develop your own monitoring layer? Many tools now make this approach accessible by treating the state of an operating system as a queryable database.
You ask a specific question, you get a specific answer, without going through a proprietary interface. Applied to AI, this allows you to build precise dashboards tailored to your needs: consumption by model, memory usage by pipeline, correlation between workload and energy footprint. This way, you can focus on the information you’ve chosen and that’s truly useful to you!
You’ve set up your own dashboard and realized that some models are too expensive, or that sensitive data is being sent to infrastructures over which you have little control. So what now?
Sovereignty (with its technical ability to decide where models run, what data they’re trained on, and what infrastructure they use) will be your best ally in implementing a real action plan and making observability truly useful!
Today, the dominant model for cloud AI is centralized: you send your data to a third-party infrastructure, it does the work, and you get the results. It’s a simple process, but one that can prove problematic when examined in detail in terms of regulations, privacy, and the energy consumption associated with the massive transfer of data.
The solution? Stop moving the data, and move the training instead.
Each site (a regional data center, a subsidiary, an industrial partner) trains the model locally, and shares only updates to the model’s parameters with the global network, never the raw data. The aggregated result is a model that has learned from all sources without any of them having to expose their content.
No gigafactory. No overpriced proprietary infrastructure. The data stays where it is. Regulatory compliance is automatically strengthened. And overall energy consumption decreases, since massive data transfer between sites is eliminated.
Just two years ago, running a serious language model on your own servers required dedicated GPU clusters, specialized teams, and a substantial infrastructure budget.
That barrier has been broken. Frameworks like GGML have made it possible to run models with billions of parameters on standard hardware (including ordinary CPUs and consumer-grade GPUs).
The technique behind this is called quantization: by reducing the precision of the model’s weights from 32 bits to 4 or 8 bits, the required memory is reduced by a factor of 4 to 8. There is some loss of quality, but it is often negligible for common business use cases. What used to cost several euros per hour on a cloud instance can now run on an existing on-premises server. The result: lower cloud costs, improved latency, and no more data in transit!
This is a simple concept to understand, but often more complicated to put into practice. Large general-purpose models are trained to answer anything. This universality comes at a direct cost in terms of size, inference time, and energy consumption. If your actual need is to analyze contracts in your industry, extract data from invoices, or answer your teams’ questions about your internal documentation, you don’t need that level of universality.
You need a model tailored to your domain, a smaller model, fine-tuned on your own business data, which will (in most cases) be more accurate, faster, and consume only the energy necessary.
There’s no need to “use a bazooka to kill flies.”
Observing what’s going on and deciding where things are heading are two skills that may seem defensive (as if the goal were merely to avoid the worst). The opposite is true. An organization that has its own AI telemetry, which can freely choose its infrastructure and calibrate its models to its actual use, is in a position to manage and optimize it, but above all to avoid vendor lock-in (being able to switch providers without disruption), while keeping costs under control
Want to deploy your own vendor-agnostic, cost-effective AI solution, and most importantly, without vendor lock-in? Contact our experts.