The Digital Twin Data Center

Digital Twins are seen as innovations, providing a virtual replica of real-world assets and their surroundings. These Digital Twins offer numerous benefits across various sectors, including the construction sector. However, many users lack insight into the process of Digital Twin creation. In this series, we aim to bridge this gap by providing insights into the creation of a Digital Twin. Whereas the previous article was about the requirements of a Digital Twin, this article will dive into the Digital Twin data center and Digital Twin data models. Get inspired now!

Technical Platform 

The technical platform is the core for running the Digital Twin and consists of software components to manage tasks such as data storage, data processing and data management. These components are the building blocks of the Digital Twin infrastructure: 

  • Hosting and Deployment: The foundation of the Digital Twin infrastructure is established on a cloud platform powered by Google Kubernetes Engine (GKE). The cloud environment organizes resources dynamically, adjusting to the usage of the Digital Twin. Next to that, Docker is employed to package software, ensuring that all the individual technical components can work together. 
  • Security: Ensuring the security and confidentiality of sensitive information and data is important within the technical platform. Robust security features are integrated into this platform to safeguard against cyber threats and unauthorized access. User roles are customizable, which means that the different sections of the Digital Twin infrastructure can be made accessible to only certain people. This allows us to comply with the ISO standards for digital security. 
  • Open source: Open-source software has public access to its source code, enabling inspection, modification, and enhancement as needed. The Sogelink Digital Twin platform is also using open-source software, allowing for adaptability to specific requirements for the creation of the Digital Twin. Next to open source software, we also use other methods for data collection, such as data archives and sensor data. Read more about this in our article about Digital Twin requirements 
  • Open standards: Open standards serve as important elements in the efficient operation of a Digital Twin. For example, standards for data formats, metadata, data catalogs and data exchange ensure the consistency and compatibility across different systems. The Digital Twin infrastructure is built upon open standards, facilitating seamless data processing and usage. Many of the geodata standards are maintained by the Open Geospatial Consortium (OGC). 

Digital Twin Data Storage 

Digital Twins are all about data — and this data needs to be stored. There are many different types of databases, each with its own advantages. The selection of a database depends on factors such as data formats and intended usage. We will dive deeper into this:  

  • Object Storage: Data, such as 3D tiles, can be stored as files, images, or text-based formats like CSV or JSON. These data objects are housed within cloud storage buckets, offering flexibility in scaling up or down.  
  • Relational data: Structured in a table-like format with columns and rows, relational databases are utilized for storing relational data. PostgreSQL stands out as an open-source database system designed for this purpose. This system offers extensions to optimize the storage for various data types, including PostGIS for geospatial data, h3 for hexagonal grids, and pgvector for AI applications. 
  • Real-time data: Real-time data, continuously collected and processed by sensors, plays an important role in the creation of Digital Twins. This real-time data is offering live insights into the assets or providing other insights such as the amount of traffic or status of the weather. Hydra is an extension for PostgreSQL for efficiently storing time series data through columnar storage for fast analytical queries. Time series data can also be stored in Parquet files for easy exporting while strongly reducing the storage usage. 
  • Vector data: vector data refers to the conversion of data into numerical vectors. pgvector is an extension for PostgreSQL to store such data. Widely employed in fields such as Natural Language Processing, vector databases are optimized for storing and querying these vectors. 

Digital Twin Data Management

Digital Twin Data Management

As the amount of data increases, maintaining a clear overview becomes increasingly challenging. Digital Twin Data management is about keeping track of data quality, traceability, and other metadata. By using the FAIR principles (Findable, Accessible, Interoperable, Reusable), we ensure that data remains well-organized and usable.  

Datasets within the Digital Twin are systematically cataloged along with their metadata. Metadata are important descriptors of data and provide information such as , licenses, quality, and structure. Metadata play an important role in data tracking and dataset sharing. 

Digital Twin Tools 

As we have our technical platform ready and gathered a lot of data. Now it is time to shape the data for the needs of the Digital Twin. This can be done by checking different data driven Digital Twin tools. A data-driven approach is all about getting as much value from the data as possible. This can be achieved through various methods, including the combination of existing datasets to generate new data and the application of modeling and machine learning techniques to generate new insights. We will dive deeper into these methods:  

  • Data fusion merges data from different sources. The goal is to combine the data to create a new dataset with a higher value. Data fusion can be performed in different ways. For instance, consider our 3D buildings dataset, where point cloud data are combined with building footprints to automatically generate 3D building models. 
  • AI and Machine learning (ML) is gaining importance in the world of Geo-IT to analyze large amounts of data. ML, a subset of Artificial Intelligence (AI), empowers computers to learn from data without programming. ML algorithms are capable of learning from data and making predictions based on learned patterns. For example, ML algorithms can detect objects in images or forecast future events based on historical data. For example, ML can identify windows from aerial imagery and incorporate this information into a building dataset. This is for example interesting for a Digital Twin for the Energy sector. For example, we use this to calculate the energy consumption of a house, since energy can escape more easily through the windows.  
  • Predictive Modelling allows us to simulate and predict the behavior of a system. Models are basically calculations that can be used to test different scenarios under different conditions. For instance, models can predict the stability of infrastructural assets under different scenarios. In the context of smart cities, AI models can forecast streams of people in public transport and traffic, which are important insights for in efficient urban planning. 

Data Processing Pipelines  

Data Processing Pipelines

Within the Digital Twin infrastructure, data flows between storage locations and processing tools. This entire transfer process is commonly known as ETL: Extract, Transform, Load. Our aim is to make sure that ETL pipelines operate smoothly and efficiently. We will dive deeper into some examples of Digital Twin data processing.  

  • Georeferencing: A huge number of coordinate reference systems exist. Aligning all data to the same coordinate system (i.e. georeferencing) is required to make sure that all the Digital Twin data can be used together. GDAL, a robust open-source library for geospatial data processing, is important for converting data to various coordinate systems and geographic formats. 
  • 3D Data conversion: To facilitate visualization, data is converted into 3D formats. PostGIS is a handy tool for constructing 3D geometries within a PostgreSQL database. Sogelink Research uses custom tools, like pg2b3dm and i3dm.export, to directly convert 3D data from PostgreSQL databases into standardized 3D formats. 
  • Automizated ETL process: ETL pipelines can be complex because the required input and output are different per data processing tool. Automating this ETL process helps to prevent mistakes and keep the data up-to-date. Mage is a tool to automate complete ETL pipelines to extract data from a source, transform it, and then load it into a database. Additionally, it facilitates scheduling of pipelines at regular intervals to maintain data freshness.  

Digital Twin API’s and Digital Twin Services

Now we have the capability to store, process, and manage our data, this data doesn’t automatically integrate into the Digital Twin. The final step in the data core is serving the data to make it accessible for users. This is achieved through APIs, which are software components facilitating communication and data exchange between applications. We are happy to share more about these Digital Twin API’s and Digital Twin services: 

  • Web API: Data processing tools can be integrated into a web framework to enable HTTP-based querying. With this, modeling and simulation algorithms are accessible directly through the web. This means that end-users can run simulations in real-time. 
  • Database API: Databases can also be queried directly via the web. APIs like PostGraphile, pg_featureserv and Observable’s database proxy enable communication with PostgreSQL to facilitate on-demand data retrieval from the database.  
  • OGC Web Map Services: OGC Web Map Services (WMS) and Web Map Tiled Services (WMTS) are standard protocols established by the Open Geospatial Consortium (OGC) for sharing 2D map layers online. These services are commonly used in GIS systems to request and receive map images from remote servers. GeoServer is an open-source tool used for publishing geospatial data as web services. 
  • Tiled Map Services: Alternatives like Cloud Optimized GeoTIFF (COG), MBTiles, and PMTiles serve map layers in a scalable and cost-effective manner without requiring server-side management, offering efficient solutions for serving map layers. 
  • OGC SensorThings API: The OGC SensorThings API is a standard for providing access to IoT sensor data. Developed by the Open Geospatial Consortium (OGC), this API provides an open and unified method to connect IoT devices and applications over the web. FROST-Server, an open-source implementation of the SensorThings API, facilitates the integration of real-time sensor data for the Sogelink Digital Twin platform. 

Key takeaways  

  • The Digital Twin’s core, its technical platform, manages data storage, processing, and management. In summary, the technical platform is the backbone of the Digital Twin, guaranteeing efficient and secure data handling for optimal performance. 
  • Managing data in the Digital Twin is important as the volume grows. Adhering to FAIR principles (Findable, Accessible, Interoperable, Reusable) ensures organized and usable data.  
  • Data fusion combines available data sources for new insights, like merging point cloud and building footprint data for 3D models. Machine learning analyzes large datasets, identifying objects in images or forecasting events. Predictive modeling simulates system behavior, aiding in infrastructure stability prediction and urban planning in smart cities. 
  • In the Digital Twin infrastructure, data flows seamlessly between storage and processing tools through ETL (Extract, Transform, Load) processes.  
  • Data accessibility is important, achieved through APIs facilitating communication between applications.  

Frequently asked questions

What is the Digital Twin Data center? The Digital Twin Data center is the core of the Digital Twin, responsible for managing data storage, processing, and serving within a technical platform. 

What are the key components of the technical platform? The technical platform includes hosting and deployment on a cloud environment, robust security features, utilization of open-source software and standards, and integration of AI and machine learning technologies. 

How is data stored within the Digital Twin Data center? Data storage encompasses various types such as object storage for files, relational databases like PostgreSQL, columnar storage for real-time data, and vector databases optimized for AI algorithms. 

How is data managed within the Digital Twin Data center? Data management adheres to FAIR principles (Findable, Accessible, Interoperable, Reusable), systematically cataloging datasets with metadata, adhering to standards like ISO-19115 for geospatial metadata, and leveraging AI for enhanced dataset searches. 

What analytical tools are employed in the Digital Twin Data center? Analytical tools include data fusion for combining datasets, AI and machine learning for data analysis, and predictive modeling for simulating system behaviors under different conditions

Visit our Digital Twin FAQ page for more frequently asked questions about Digital Twins. Next to that, here you find more information about the Digital Twin concept as a whole.

Digital Twin

Contact us

Contact us now to learn more about our Digital Twin solutions

Contact us