You may have heard the term “Digital Twin” being bandied about at some point over the past few years. A digital twin is an electronic counterpart to something that exists in the real world. We can interrogate the digital doppelganger, perform analysis, and make predictions about the real world entity that would be difficult or otherwise costly to perform. For this blog post, I’ll describe two digital twin examples and then discuss data storage and retrieval requirements for a modern digital twin project.
High Flying Twins
As a modern air traveler you’ve travelled on a plane that has a digital twin of a number of its components. Today’s jet engines and their myriad bits and pieces are outfitted with sensors that collect massive amounts of data that is sent from the engine to another location where it is married up with other information about that engine. The result of this is a unique twin for each individual jet engine. As the engine goes through its day-to-day life, its twin is run through the same paces. So how is this twin constructed? Broadly speaking, we need two types of data. First a description of the physical structure of the engine and second, measurements related to the conditions this structure is exposed to over time.
In the case of something as complex as a jet engine, when we talk about structure, we’re likely talking about a hierarchy of models, engineering specifications, and other metadata. These could be physics based models, surrogate models, or likely a combination of both. It’s also important to remember that the structure itself will change over time. Maintenance will be performed and parts will be replaced, sometimes with equivalent components, sometimes with upgrades. Regardless, this data must be collected to maintain an accurate picture of the structure of the twin over time.
A modern jet engine is outfitted with 100’s of sensors
The measurement data is all of the data that we collect about the life of that real world entity. In many cases this is very large amounts of time series data. That data is then used as inputs into the many models that make up the digital twin. You can think of it as hitting the record button on the engine while it’s in flight, and then using that data that was collected to play that same flight scenario back with your digital twin. As we play that scenario, the model is perturbed in a fashion similar to the real entity. In the case of a jet engine, measurement data could include acceleration, outside air temperatures, humidity information, vibrations, RPMs, etc.
Optimizing the Oil Field
In the first example, we “twinned” a single (albeit complex) part of an airplane. There’s no reason we need to stop there, in fact, there is a great benefit to twinning the whole system. Let’s take the modern oilfield as an example. Here we seek to capture the configuration and operation of an entire macroscopic system. Some of the O&G equipment is complex enough to warrant its own twins, compressors come readily to mind, but we’ll focus on that macro view. Our twin in this case is a vast array of wellsites. Each containing some number of wells, compressors, separators, tank batteries, and pipelines. Measurement data streams in, recording point in time measurements of flows, temperatures, and tank levels.
With a twin in place, let’s look at some examples of what we can do with it.
Moving from Preventive to Predictive Maintenance
Maintenance is usually performed on a schedule. This schedule is likely based upon statistics and it may be appropriate for a substantial portion of your fleet of airplanes or compressors, but likely not all of them. For a certain non-trivial subset of the fleet, maintenance is not being performed early enough, for another, maintenance is happening too early. In both cases, this is costing you extra money through the maintenance operations or unscheduled downtime. The engines on the daily flight in and out of Riyadh, Saudi Arabia probably end up sucking in a lot more sand than my local Oklahoma City to Dallas/Fort Worth commuter. In a worst case scenario, safety is also coming into play. With a digital twin, we no longer have to rely on a set schedule to determine when our engine needs to be looked at. Instead, the unique conditions that our engine went through have produced a model that mimics our real world engine with which we can use to determine maintenance should be performed.
Improved Root Cause Analysis
If something does breakdown, the digital twin can be used to quickly get to the root of the problem. The comprehensive view that the twin provides can be used to look back through history to determine that the environment the component was in, its parts & maintenance history, or both, conspired to cause an issue.
Inform future designs
The telemetry data collected for a fleet of digital twins provides a huge opportunity for post hoc analysis of engineering decisions and designs. This data can be used to inform future iterations of the product, or if the twin is operating at the macro-system level, like the digital oilfield, the data can be used to more optimally run the whole system. What-if scenario analysis is also more easily accomplished and accurate with a digital twin. Before costly changes are made to the real thing, the same changes can be made on the twin. Many hypotheses can be tested before any final decisions are made.
Database Components of a Digital Twin Project
There is no digital twin without data. Below, we illustrate a high level data architecture for a digital twin platform. The following high level requirements are used to shape the architecture:
- Storage for ever growing amounts of high frequency time series data.
- Low latency queries against specific time ranges of data, retrieve large ranges, and also perform aggregations.
- We must be able to store the structure of our twin. This structure will likely be composed of a hierarchy of components, for which some subset we have models.
- The structure must be versioned, so that for any given point in time, we can know the exact makeup of the system we are twinning.
- We must be able to quickly gather statistics at the macroscopic scale.
Data Sources and Target Databases
Building a digital twin is a data consolidation effort. A comprehensive view will require sensor data, engineering data about how the object or system is put together, maintenance history, etc. This data is likely spread around the enterprise in various databases and file systems.
Property Graph Database
Mapping these requirements to data stores we find that we have a strong graph use case when it comes to the structural portion of the twin. Whether it be a single complex component or an entire network of equipment, the property graph modeling paradigm lines up nicely with the hierarchical and networked nature of real world systems.
These nodes can then reference various models that are in turn used to run simulations. This provides a powerful metadata structure decorating and organizing all of the many digital artifacts that go into the twin. The models, CAD files, and other file artifacts that are referenced in the graph can be stored in an object store.
Time Series Storage
Time is critically important in a digital twin. Sensors will generate large quantities of readings taken from various pieces of the twin. Time series is usually worthy of special treatment because it piles up at such a high rate. This can strain your storage and ingest capacity. Time series databases are one of the fastest growing segments in the database market. There are a host of time series-only options, but the old standby RDBMs can still get you pretty far. Relative newcomer TimescaleDB stuck with the relational approach and has built their time series focused offering on top of PostgreSQL. They support very high ingest rates and SQL query capabilities with added time series-specific functionality. At this point, they have not released their horizontal scaleout option so if you need to support even higher volumes of ingest and better HA options, you can evaluate the various multimaster options out there including Apache Cassandra, DataStax Enterprise, ScyllaDB, or Google Cloud Bigtable.
Reporting and Analytics
No matter the domain or specific use case, traditional RDBMS style OLAP reporting is still a critical part of most systems we work with. The current spate of property graph databases struggle with these query patterns so I would recommend projecting portions of your time series and graph data into an RDBMS, or if data volumes warrant it, into one of the newer generations of analytical RDBMSes. This will open up a plethora of tooling options for building dashboards, reports, and KPIs. Macro views of the digital twin fleet will benefit from the slice and dice capabilities an RDBMS can provide while detailed exploration can be handled by the property graph and time series databases. A well designed UI will abstract these particular storage options and enable the end user to seamlessly zoom in and out on the digital twin data without being aware of the low level implementation details.
The digital twin idea isn’t a new one, but it’s gaining popularity as more manufacturers are digitizing their equipment and recognizing the cost saving potential. Maintaining a 360° view of a fleet of equipment may seem to be a daunting data management task but its difficulty can be lessened by selecting the right tools.