Information gain ratio correction: Improving prediction with more balanced decision tree splits

26/02/2018

R&D

Tous les articles

Sommaire

This paper presents an improvement of the information gain function used in a lot of decision tree Machine Learning algorithms. It was published on arXiv.org.

Abstract

Information gain ratio

Decision trees algorithms use a gain function to select the best split during the tree’s induction. This function is crucial to obtain trees with high predictive accuracy. Some gain functions can suffer from a bias when it compares splits of different arities. Quinlan proposed a gain ratio in C4.5’s information gain function to fix this bias. In this paper, we present an updated version of the gain ratio that performs better as it tries to fix the gain ratio’s bias for unbalanced trees and some splits with low predictive interest.

You can also download our papier on Github HERE.

Vous pourriez également apprécier

IA de confiance
14/09/2022

Garder l'humain dans la boucle avec l'XAI

Quelle place l’explicabilité (XAI) occupe-t-elle aujourd’hui en Machine Learning et en Data Science ? Le challenge de ces dix dernières années en data science, a plutôt été de trouver la bonne “recette algorithmique” pour créer des modèles de ML toujours plus puissants, plus complexes et donc de moins en moins compréhensibles.

Lire l'article

R&D
12/07/2022

L'industrialisation de l'IA & le concept de MLOps

Le MLOps apparaît comme une nécessité pour pallier les difficultés lors du passage à l’échelle de l’IA au sein des entreprises : la reproductibilité, le versionning, l'intégration continue... C’était l’objet de l’une des conférence sur l'industrialisation de l'intelligence artificielle dans le cadre de l'Enjeu Day Industrie & Services 2022. Vous n’aviez pas pu y assister ? Retrouvez le replay.

Lire l'article

IA de confiance
10/05/2022

A guide of the most promising XAI libraries

Using Machine Learning to solve a problem is good, understanding how it does is better. Indeed, many AI-based systems are characterized by their obscure nature. When seeking an explanation, looking to understand how a given result was produced, exposing the system’s architecture or its parameters alone is rarely enough. Explaining that a CNN recognized a dog by detailing the Neural Network weights is, to say the least, obscure. Even for models deemed as glass-box such as decision trees, a proper explanation is never obvious.

Lire l'article