The Impact of LoRa on the Emergence of Clusters in Transformers

This article is an introduction to the paper written by Hugo Koubbi, Matthieu Boussard and Louis Hernandez, from Craft AI R&D Team.

07/03/2024

Hugo Koubbi

R&D Researcher

R&D

Tous les articles

Sommaire

À télécharger

The LoRA algorithm seems to be the most widespread fine-tuning method for LLMs. LoRA reduces parameter count by employing low-rank matrix factorization on attention mechanisms. In “The Impact of LoRA on the Emergence of Clusters in Transformers” with Matthieu Boussard and Louis Hernandez, we study the impact of LoRA on tokens.

Based on the framework proposed by Sander et al. we have adopted a simplified Transformers architecture. In line with the concept of neural ODEs, we view the layers as time, and the tokens as an interacting particle system.

In “The emergence of clusters in self-attention dynamics” and “A mathematical perspective on Transformers” Geskovski et al. demonstrated that the dynamic asymptotically leads to the formation of token clusters. In this study, we investigate the impact of LoRA fine-tuning on cluster formation in Transformers.

Figures from *The emergence of clusters in self-attention dynamics* Github

We establish short-term stability results for attention matrix parameters. Our findings reveal that even if long-term dynamics of tokens with the same initial conditions diverge, on a short-term scale, the two trajectories are close.

If only the attention matrix V is changed, the tokens of the modified trajectory initially follow a similar pattern to the original dynamics before diverging towards a new clustering (Original dynamic on the left and LoRA dynamic on the right).

We also establish a theoretical upper bound on the time of formation of the second cluster. Additionally, we numerically investigate this bound for optimality.

Motivated by empirical and theoretical results on the bias of $ QK^{T} $ attention matrices towards low-rank matrices, we studied clustering structures in this case. We have highlighted new clustering structures.

We have also shown in a simplified framework how LoRA can be used to create new clustering structures (LoRA dynamic on the left and the original on the right).

We also trained the architecture considered in the article on modular addition, and then fine-tuned it for modular subtraction. As an illustration of our results, the deeper the model, the faster the accuracy decreases during fine-tuning.

Even if our hypotheses are rather simplistic, it seems that the clustering phenomenon also appears in LLMs such as Llama2 7B.

Find the full paper "The Impact of LorA on the Emergence of Clusters in Transformers" by clicking here.

Vous pourriez également apprécier

Applications

28/01/2025

Agents IA : La nouvelle ère de l’intelligence artificielle

Dans cet article, découvrez les perspectives liées à l'émergence de cette nouvelle technologie.

Lire l'article

IA de confiance

02/12/2024

Les risques de fuite de données liées à l'utilisation de ChatGPT

Dans cet article, découvrez les risques de sécurité liés à l'utilisation de ChatGPT et l'alternative souveraine et 100% sécurisée Craft GPT.

Lire l'article

IA de confiance

15/03/2024

Pourquoi avons-nous besoin d’une IA générative souveraine et sécurisée ?

Dans cet article découvrez comment créer une IA générative souveraine entièrement sécurisée et pourquoi c’est essentiel pour toute entreprise et institution française !

Lire l'article

The Impact of LoRa on the Emergence of Clusters in Transformers

Hugo Koubbi

Une plateforme compatible avec tout l’écosystème

Vous pourriez également apprécier

Produits

SECTEURS

CAS d'USAGE

RESSOURCES

L’entreprise