Inside the Mind of a Caterpillar Data Scientist

December 5, 2022

Digital discoveries have made it easier to communicate, do our work, and access information. Can you imagine doing your job or running a household without access to computers or smart devices?

Caterpillar customers are no different. They need convenient access to information that can help them keep their Cat® assets running with maximum productivity and minimum downtime. Using real-time, 24/7 data and the Internet of things (IoT) helps them do just that. 

Cat Digital Analytics Director Daniel Reaume leads a team of data scientists that capture and analyze the voices of Caterpillar products. They use these discoveries to make Caterpillar equipment more reliable, customers more satisfied, and the company stronger. 

Dan explains how Caterpillar uses machine and engine data to support customer success. 

Q: How does Caterpillar equipment talk?

Dan: Our equipment talks to us in many ways, and our data collection processes are built on sound business models. There are three main components of equipment data collection:

  1. Sensors that collect the data – examples are temperature sensors, fuel-burn sensors, GPS sensors, and others.
  2. A “box” that collects, integrates, and performs calculations on the data, and prepares to send it to Caterpillar.
  3. Data transmission using a variety of communication channels and networks.

Q: Can all Caterpillar machines and engines transmit data?

Dan: Most new Cat® machines and engines, and much of our older fleet, have the potential to collect and transmit data. Our company currently has the world’s largest connected fleet with more than 1.2M connected assets in the field. For quite some time, our machines have been sold as “connectivity enabled.” And even older equipment can often be retrofitted for connectivity. 

But the number and types of sensors on our machines and engines vary by asset type and model. For example, our smallest machines and engines generally have fewer, less complex sensors than our largest mining machines. 


Caterpillar currently has the world’s largest connected fleet with more than 1.2M connected assets in the field.

Two mining trucks on jobsite

Q: What does the data tell us?

Dan: We classify machine and engine data into two broad buckets. The first group is like warning lights on a car dashboard; the second is more like actual readouts of a car’s current tire pressure or battery voltage.

The first category might include fuel burn, GPS data, and fault codes. Advanced analytics allows us to use simplified data – such as fault codes – to help predict maintenance issues, degraded performance, etc. But the actionability and accuracy of predictions may not be as great as with richer data sources.

The second type of data is richer and more complex. It usually involves samples of sensor readings taken once per second, or more often. Right now, we collect this data mostly from mining and larger construction equipment. 

Once we collect data, we can identify patterns. For example, the data might tell us that when the operator applies the brakes, the pressure doesn’t recover as quickly as expected. In that case, we would recommend an inspection to see if there’s a leak in the system. If that turns out to be the case, a customer can get the repair taken care of before it becomes a more significant problem.  

Using fault code data alone, we might be able to warn of low pressure – but maybe not as soon as with richer data – and we might not pinpoint the site of a leak as accurately. Note that we don’t rely on fault codes alone but on patterns and trends of fault codes. For example, a single low-level alert may not be concerning, but it might be more so if we detect a pattern of repeated fault codes, possibly of several types.

Regardless of data complexity, identifying patterns like this helps us detect conditions that might not be otherwise apparent. Even low-cost, limited data like fault codes might enable us to do condition monitoring and differentiate between a healthy machine and one with an impending issue. 

Q: Interesting. What is condition monitoring?

Dan: Dealers use condition monitoring to track asset health and contact customers when something seems amiss in their operation. Because of our dealers’ vast proprietary equipment and industry experience, no one is better positioned to monitor the health of our customers’ assets – even mixed fleets. 

Q: What’s next on the data horizon? 

Dan: I think of data as the new DNA. At Caterpillar, we’re using innovative, sophisticated techniques to examine, explore, slice, dice, and recombine data in new ways to fix problems. It’s exciting and constantly evolving – things never stay still.

There may never have been a more exciting time to work at Caterpillar. With some of the industry’s brightest minds on board and our leadership committed to investing in game-changing technology, our future looks very bright. We can’t wait to see what tomorrow brings.

Explore digital and analytics career opportunities and see how our teams are building on our legacy as a global leader and innovating for the next generation.


Dan and his team at the Business of Data Awards celebration.

group photo of people at awards ceremony