Build a LLM powered semantic recommendation engine

Learn how to easily build a recommender system based on semantic similarity with LLMs and Vector Databases!


Matteo Lhommeau
Data Scientist

Tous les articles


À télécharger


Recommendation engines have quickly taken over the digital world, you have already obviously encountered lots of them: Amazon, Google Search, Netflix,  Tinder…

Whether you have been recommended with products by marketplaces, videos by streaming services or personalized contents by your social networks, in the backstage different types of recommender systems are running. A recommender system is a tool to allow a user to easily navigate and items of interest for her/him in a large space of contents. The key challenge is how to provide the most relevant items specifically for the user between thousands or millions of possibilities and are large number of characteristics?

The goal is to map these contents to relevant users in the best way possible, or finding the most relevant association. There are many ways of building a recommendation engine, the big two categories being the content-based ones and the relational-based ones. Each of them have different perks and are often combined, but here we will focus on content-based recommender systems. The main idea is to map a content to a user based on the proximity between them. Proximity can be viewed as distance between words. This approach often returns poor results because different words often have similar meanings (e.g. aircraft and plane) that are not considered with this simple approach. Meaning or semantics should be considered, that where LLM models join the party. It improves the measure of proximity and therefore the relevance of the recommended contents. Different methods exist to try to catch as much underlying meaning as possible. For example, topic modeling is a method that retrieve the most present subjects in a textual corpus. LDA (Latent Dirichlet Allocation) is a really famous application of topic modeling. However it usually fails at capturing semantic relationships or deal with multi languages.

How can we leverage more recent techniques such as LLMs to better extract value from semantic and syntactic structures? In other words, how can we build a performant recommender system using LLM techniques? Here we will use text embeddings (at the paragraph level, beyond the word level), known in Python as Sentence Transformer models from Hugging Face. Let’s see how we implement that!

The Recommander Workflow

The main idea is to use word embeddings, a famous technique used by LLMs especially, to turn a text in a vector based on rich semantic analysis. Once we have done that on both contents (potentially thousands or millions of documents) and user’s data, we can use a similarity measure to compare these vectors, such as cosine similarity. The content that has the highest cosine similarity score is the most likely to be a good recommendation for the user.

Setting up the engine involves different steps:

🗃️ Collect data

To build a recommendation engine, we need both contents data and user’s data. We could think that contents’ metadata would be enough, but since we are building a recommender system based on semantic similarity we have to make sure to have at least a textual description of it. Therefore, a good start would be to collect contents’ metadata (title, language, source …) and their descriptions. On the user’s data side, we need to collect enough information to characterize its interests. We could think of the user’s center of interests, contents she/he already consumed, classes he has already followed (if we recommend to a student for example). After collecting user’s data, we turn it into text.

🔍 Infer embeddings

As we said previously,  embeddings are a way to represent complex large-dimension semantics objects or items as vectors in a lower-dimensional space, enabling Machine Learning models to work with and learn from the data more effectively while preserving the inherent relationships between items. We need to do this for both sides: Contents’ and user’s data. But choosing the appropriate model is not an easy task due to multiple contraints, such as handling different languages, supporting sentences, relevance with the semantic domain.

Hugging Face offers pre-trained models of sentence embeddings that are relatively easy to implement and take little time to load and infer. One recommended model coud be a Siamese multilingual sentence transformer SBERT available on Hugging Face. SBERT (2019) allows to reduce the computational cost of a classic BERT or RoBERTa, and allows to make fast inferences while keeping good accuracy. We can easily implement it following this step:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
embeddings = model.encode(text)

Once we have the embeddings we can compute cosine similarity. However, wouldn’t it be smart to not run the model on every single content every time a new user asks for a recommendation? It would save lots of time, especially with a large contents database.

Let’s explore an efficient way of storing contents’ data and their associated embeddings.

Vector database made easy with pgvector

In order to save embeddings and have high performance functionalities, we will use a Vector Database. This kind of database keeps all properties of more traditional ones, but are specifically designed to be efficient regarding vector data and related functions. They optimize storage and unlock querying capabilities for vectors such as embeddings. For instance, instead of retrieving vector data and then compute the cosine similarity, we can directly compute it during the query thanks to Pgvector. This makes possible real-time analysis on complex data, without working with on-memery embeddings (impossible for thousands, or more, of embeddings)?

Let’s use a PostgreSQL database to implement it. We only have to add the vector extension (pgvector) to a classic PostgresSQL database when you connect to it:

# Make sure you have pgvector installed first

The main advantage of using pgvector is that you don’t need to compute and compare distances between the embedding of a user and contents’ ones. You directly do it by query the data base, as said above.  In fact you precise the distance measure you want to use directly in the query. For example, L2 distance corresponds to the symbol <->, negative inner product to <#>, and cosine distance to <=>. Then, we can do as follow to retrieve the top ten contents with the highest cosine similarity score give the user embedding:

FROM database
ORDER BY (embeddings <=> :user_embedding) DESC LIMIT :10)

Now you get the best matches for your user, ready to integrate into your front applications.


To conclude, based on advanced semantic similarity approach, one can implement a hight relevance recommendation system. As long you have enough textual data, performant pre-trained models like the SBERT exist and are easily available to infer embeddings. The whole point of the workflow is to efficiently store them in order to reuse them as much as possible and avoid computing time. That’s why we introduced the vector database with pgvector that provides an easy and efficient storage method with adding querying capabilities (such as the distance-based retrieval).

If we want to build a more robust recommendation engine, we can define different metrics (for example the proportion of recommended content that the user has actually interacted with) that will be important indicators of the appropriate operations of our model, and monitor them.

Learn more about drift monitoring and the actual implementation of a drift monitoring pipeline here.

Une plateforme compatible avec tout l’écosystème

Google Cloud
OVH Cloud
Tensor Flow
mongo DB