Information gain ratio correction: Improving prediction with more balanced decision tree splits

26/02/2018

R&D

Tous les articles

Sommaire

This paper presents an improvement of the information gain function used in a lot of decision tree Machine Learning algorithms. It was published on arXiv.org.

Abstract

Information gain ratio

Decision trees algorithms use a gain function to select the best split during the tree’s induction. This function is crucial to obtain trees with high predictive accuracy. Some gain functions can suffer from a bias when it compares splits of different arities. Quinlan proposed a gain ratio in C4.5’s information gain function to fix this bias. In this paper, we present an updated version of the gain ratio that performs better as it tries to fix the gain ratio’s bias for unbalanced trees and some splits with low predictive interest.

You can also download our papier on Github HERE.

Une plateforme compatible avec tout l’écosystème

aws
Azure
Google Cloud
OVH Cloud
scikit-lean
PyTorch
Tensor Flow
XGBoost
jupyter
PC
Python
R
Rust
mongo DB

Vous pourriez également apprécier

MLOps
22/03/2023

How MLOps will streamline your AI projects?

When speaking of Artificial Intelligence, the efficiency and profitability of projects depend on the ability of companies to deploy reliable applications quickly and at low cost. To succeed, you need to organize and improve the processes for creating, implementing, and maintaining AI models with a diverse and sizable team.

Lire l'article

MLOps
15/03/2023

Don’t just build models, deploy them too!

You don’t know what “model deployment” means? Even when you try to understand what it means, you end up searching for the meaning of too many baffling tech words like “CI/CD”, “REST HTTPS API”, “Kubernetes clusters”, “WSGI servers”… and you feel overwhelmed or discouraged by this pile of new concepts?

Lire l'article

IA de confiance
08/03/2023

Un-risk Model Deployment with Differential Privacy

As a general rule, all data ought to be treated as confidential by default. Machine learning models, if not properly designed, can inadvertently expose elements of the training set, which can have significant privacy implications. Differential privacy, a mathematical framework, enables data scientists to measure the privacy leakage of an algorithm. However, it is important to note that differential privacy necessitates a tradeoff between a model's privacy and its utility. In the context of deep learning there are available algorithms which achieve differential privacy. Various libraries exist, making it possible to attain differential privacy with minimal modifications to a model.

Lire l'article