How to be rich as taxi driver with craft ai?


Yrieix Leprince
Oct 05, 2018 TutorialMobility

How to be rich as taxi driver with craft ai?

As craft ai aims at bringing cognitive automation to everyone, today we are focusing on how to increase chances to find clients for a New York City taxi driver.

Based on data analysis with craft ai tools, we will be able to predict within a confidence level the number of clients in a given zone at a given time. Our driver will therefore know what zone he should head for to find more clients!

In a real business use case, real time predictions would be easy. We would simply need to train our models regularly on the newest data to keep up its performance up to date. For this demonstration purposes, we only worked on historical data.

1. Find and explore historical data

On the Taxi and Limousine Commission (TLC) webpage, the yellow cabs company provides trip records from 2009 to 2017. For this work we only framed our analysis on the last year's records.

Ready?! It's time to explore these data by yourself! The chart below allows you to visualize 2017's data zone by zone. (You can zoom in and out on the line chart). As you can play with it, you will spot that some areas have a daily and a weekly periodicity (e.g., Sunnyside).

Select a zone:

Selected taxi zone: Woodside

2. Let's create an army of craft Agents

An **agent** is an independent module that stores the context history of a single taxi zone. Looking at the evolution of that context, it uses a decision tree to learn the number of rides per hour.

Now that you have explored data by clicking on the map and sliding the line chart, you know that each taxi zone is different from the others. In some areas yellow cabs are very busy (e.g., JFK Airport), while others remain calm (e.g., CharlesTown/Tottenville). Because every taxi_zone is different we have designed and fitted one custom prediction model (i.e., a craft agent) on each one. So each Agent receives historical records of a single taxi_zone during the training process. Once trained they are able to predict in their own zone how many clients are present for a given timestamp.

Each Agent receives the same data structure, so the configuration is identical for all of them:

  // Object that initiates the agent by giving the input and output features types
  context: {
    // Ride number observed in the `taxi_zone` for the given hour. It corresponds to the variable to learn and predict.
    trip_counter: {
      type: 'continuous'
    // Hour of the day.
    // This information is not present in DataFrame columns but craft ai will extract it by itself from the index.
    time: {
      type: 'time_of_day'
    // Day of the week.
    // It corresponds to the second periodicity observed in the data exploration phase and is self-generated by **craft ai**.
    day_of_week: {
      type: 'day_of_week'
    timezone: {
      type: 'timezone'
  output: ['ride_counter'],
  // Period considered to look for historical data. Here 31536000
  // is equivalent to one year at the second's scale. It means that Agents
  // won't look at older records.
  learning_period: 365 * 24 * 3600, // equal to 31536000
  // Maximum depth for each Agent.
  tree_max_depth: 5

For more details on Agents creation, please refer to the documentation.

3. Predictions analysis

3.1 Prediction on new data

To score our models, we train them on historical records from 2017-01-01 00:00 to 2017-12-03 23:00. Each agent then predicts the clients inflow from 2017-12-04 00:00 to 2017-12-17 23:00 in its own taxi zone. We can then compare the predicted ride number with the actual commands that happened on the ground.

The following maps show the R2 prediction error by zone while the graph below allows you to look at craft ai predictions (dashed line) versus reality (continuous line):

Select a zone:

Selected taxi zone: Woodside

Our predictions are quite good on zones where clients demand is high (eg. JFK Airport), but perform poorly on areas with few inflow (eg. Bloomfield/Emerson Hill). This is due to the fact that an agent needs a certain amount of data to understand the global behaviour of its taxi zone.

3.2 Observe decision rules learned by craft ai

To learn from historical data, craft ai builds decision trees tuned to work with timeseries. This is the training process. When it's time to predict a value, all previously created agents follow the rules of their own decision tree. It's time for you to explore the decisions rules that led to the results you've seen previously. Decision trees are the core engine of craft ai solution!

Select a zone:

Selected taxi zone: Woodside

As you may have noticed the hue of leaves indicates the prediction confidence level. The more intense the color, the more confident the model is in its predictions.

4. Time to be rich!

Now that you have understood how craft ai works, you can directly drive toward the best taxi zone to pick up clients! Just select your working hours below:

Select a date and an hour:

    <label for="rich_date">Date:</label>
    <input type="date" id="rich_date" name="rich_date" min="2017-12-04" max="2017-12-17" value="2017-12-04"/>
    <label for="rich_time">Time:</label>
    <input type="time" id="rich_time" name="rich_time" min="00:00" max="23:00" step="3600" value="00:00" required />
    <div class='nyc_container'>
        <div class='nyc_containee'>
            <button id='rich_predict'>Predict</button>
<div class='nyc_containee'>
    <h4 class='figure-title'><span id='best_taxi_zone'></span></h4>
    <div id="map_rich"></div>

You can observe that taxi drivers should drive toward JFK Airport (taxi_zone 132) if we are on the 2017-12-04 at 00:00, or toward East Village (taxi_zone 79) if we are on the 2017-12-09 at 21:00.

5. Conclusion

craft agents perform well on areas with a lot of taxi needs, because they can learn citizens' habits. On areas where there are only sparse demands, predictions cannot generalize taxi commands with high precision.

craft ai's models provide accurate predictions of an affordable complexity in a minimum of time. In this project, they allowed us to extract and modelise NYC citizens' taxi demand with great precision, making it easy for taxi drivers to target the right areas!

Taxis drivers can now stop wandering in the city streets with empty seats!

Note: All the code made with by craft is available on github/craft-ai/craft-ai-starterkit-jupyter.