This article explains the data management design system in digital twin construction

As the wave of digitalization sweeps the world, data has become one of the most valuable assets of enterprises. Effective data asset management can not only help enterprises tap into data value and optimize operational decisions, but also enhance their competitiveness in the market. In order to meet the growing and complex needs of enterprises for data management, this system came into being. It integrates advanced data lake technology architecture and tools, and builds a complete and efficient data asset management system from data collection, processing, storage to application.


Data service technology - external data acquisition

This system has strong and extensive compatibility and shows excellent adaptability in the field of industrial Internet of Things. It can seamlessly connect with various complex devices on site, such as PLC (programmable logic controller), DCS (distributed control system) and intelligent modules, intelligent instruments, boards, inverters and other equipment. It supports multiple communication links such as COM, TCP, UDP, GPRS, programming port, USB, etc., and can be stably connected regardless of the transmission method used by the device. At the same time, it covers many standard protocols such as OPC, Modbus, Bacnet, Lonworks, IEC101, IEC104, DNP, etc., ensuring compatibility with different industries and different brands of equipment. For some non-standard equipment, data collection can also be achieved through customized development to meet the data acquisition needs of special equipment of enterprises. After the data is collected, the data is accurately docked, efficiently parsed and securely stored using webscoket or http, and the three-dimensional scene is deeply integrated with real-time data and historical data. For example, in the smart factory scene, by displaying the current actual data of the equipment operation in real time, the staff can grasp the equipment status at any time. In addition, it can set threshold alarms for real-time data. When the equipment data exceeds the normal range, the system immediately issues an alarm to help the staff find hidden dangers at the first time, take measures to solve the problem quickly, and effectively reduce production risks.


In terms of video surveillance data access, precise docking with on-site video surveillance cameras is achieved. First, the camera is modeled at a ratio of 1:1 to accurately restore the appearance of the camera. Then, according to the detailed drawing point position, the camera model is accurately deployed in the three-dimensional scene. By matching the location in the scene with the camera number one by one, the quick positioning function is realized. For example, when it is necessary to view the monitoring screen of a specific area, the staff only needs to click the corresponding camera number in the three-dimensional scene to quickly locate the camera. The system has a built-in VLC component, which can quickly parse and smoothly play the real-time video stream after obtaining the video stream address (RTMP/RTSP). Users can not only flexibly switch the observed camera in the video list, but also directly click the camera model or nameplate in the scene to conveniently and quickly play the real-time video stream, realizing the real-time online video screen monitoring of the site in all directions, all-weather and full coverage.


Data Service Technology - Workbench Function

The workbench provides users with a convenient entrance to each functional module of the data center, like an integrated operation hub. Users can quickly access multiple core modules such as data source management, data development, project management, data factory, workflow monitoring, data cleaning, data mapping rule management, data service, service monitoring, subscription authorization, subscription review, etc. without tedious navigation and search. This greatly improves user operation efficiency, reduces the time wasted due to frequent page switching, and allows users to focus on data management.


Comprehensively display the business development of the platform, and provide users with clear business insights through intuitive data statistics. Covering the number of metadata tables collected, such as in the enterprise's data integration project, real-time statistics on the number of metadata tables that have been successfully collected, so that users can understand the progress of data collection; the number of service catalogs established, clearly presenting the number of service catalogs built for business departments, so that business personnel can quickly find the required services; the number of API data services, showing the number of API services provided by the platform to the outside world, evaluating the service output capabilities of the platform; the number of service subscription users, reflecting how many users are interested in the platform services and have subscribed; the number of service subscriptions, reflecting the popularity of the service; the number of workflow monitoring, helping operation and maintenance personnel to grasp the number of workflow running states; the number of workflow projects, clearly presenting the number of ongoing workflow projects; the number of configured databases, so that database administrators can understand the status of configured databases; the total number of workflow execution tasks, so that project leaders have a clear understanding of the overall workflow task volume.


Conduct in-depth analysis of workflow tasks and API data services. In terms of workflow tasks, by counting the total number of workflow execution tasks, the busyness and execution efficiency of the workflow can be analyzed. For example, when the total number of tasks continues to grow and the execution time is too long, it may be necessary to optimize the workflow configuration. For API data services, open API services are classified and counted, such as API services are divided into data query and data update categories, to help developers understand the frequency of use of different types of APIs, so as to optimize and maintain them in a targeted manner.


Supports the entry of data source connection information of various departments of the organization, and performs strict testing during entry to ensure the availability of the data source. Taking the data integration of multiple departments within an enterprise as an example, the data sources of different departments such as the marketing department and the financial department are diverse, and support the table structure collection and management of multiple databases such as sqlserver, postgresql, MySQL, Oracle, and Hive. Through the built-in data source adapter and configuration information, a stable connection can be established with various data sources. For example, for a multinational company, its domestic part uses the mysql database and the foreign part uses the Oracle database. The system can be configured to achieve stable connection with these databases, laying a solid foundation for subsequent data mapping and synchronization. Supports batch data integration, and can customize the field-level mapping relationship between the source table and the destination table to meet the data integration needs in different business scenarios.


Data service technology - data mapping rule management

The data mapping rule management module is responsible for maintaining the source database synchronization mapping resource data. This process is like building a data bridge. It submits the fused metadata structure to the resource data fusion system to provide a reliable basis for subsequent data synchronization. For example, in the digital transformation project of an enterprise, it is necessary to migrate the data in the old system to the new data warehouse. This module analyzes and fuses the metadata of the source database and the target data warehouse to establish an accurate mapping relationship. It supports the configuration of data mapping rules. Users can flexibly set the field correspondence between the source table and the target table according to business needs; it supports batch configuration. For a large number of data tables with similar mapping rules, they can be configured at one time, greatly improving work efficiency; it supports the deletion of data mapping rules. When business needs change and some mapping rules are no longer applicable, they can be deleted in time; it also supports table query function, which is convenient for users to quickly find and view the configured mapping relationship. The mapping relationship between the data table of the source business system and the middle-end data warehouse can be flexibly configured. Whether it is batch mapping of multiple tables, single table mapping, or streaming table mapping and ordinary table mapping for streaming data, it can be easily handled to meet the complex data integration and management needs of enterprises.

Data service technology - fast data collection management

Provide efficient real-time data collection functions to meet the strict requirements of enterprises for data timeliness. Support the creation of new collection tasks. Users can quickly create new data collection tasks according to business needs. For example, in e-commerce companies, in order to monitor commodity sales data in real time, you can add collection tasks for the sales database at any time; support editing collection tasks. When business rules change and the collection frequency or collection fields need to be adjusted, existing tasks can be easily edited; support deletion of collection tasks. For collection tasks that are no longer needed, they can be cleaned up in time to release system resources; support querying collection tasks. Users can query the execution status, collection progress and other information of collection tasks at any time. At the same time, it supports two modes: incremental collection and full collection. Incremental collection is suitable for scenarios with small data changes. It only collects new or modified data to reduce data transmission and processing; full collection is suitable for scenarios where data is initialized or needs to be fully updated.


Comprehensively manage collection tasks to ensure the stable operation of data collection work. Including adding new collection tasks. When creating a task, you can set detailed information such as task name, collection cycle, data source, etc.; enabling collection tasks. When the task configuration is completed and ready, you can start the task to collect data; disabling collection tasks. When the task needs to be paused, you can stop the collection at any time; editing collection tasks. You can modify various parameters of the task; deleting collection tasks. For tasks that are no longer used, you can completely delete them; viewing collection logs. By viewing the logs, you can understand the detailed information during the execution of the task, such as collection time, amount of collected data, whether errors occur, etc.; monitoring task status logs, real-time grasp of the running status of the task, timely discovery and solution of problems. It can uniformly schedule the established data mapping relationship to ensure the efficient operation of data collection and avoid data conflicts or untimely collection.


Data Service Engine

As the core of data processing, the data service engine is like the brain of the entire data asset management system, realizing data integration, analysis, visualization and storage, and providing all-round support for the flexible application of data. In terms of data integration, it can integrate data from different data sources to break data silos; in terms of data analysis, it provides powerful analysis tools and algorithms to help users dig out valuable information from massive data; in terms of data visualization, it displays data in an easy-to-understand form through intuitive charts and graphics; in terms of data storage, it adopts an efficient storage architecture to ensure the safe storage and fast access of data.


Provides a detailed metadata overview to give users a comprehensive and clear understanding of data assets. Including the total number of metadata, counting the number of all metadata in the system; technical metadata statistics, counting the technical attributes of data, such as data type, data format, storage location, etc.; business metadata statistics, counting the business meaning and business rules of data; the distribution of technical metadata across the entire network, showing the distribution of metadata in different technical fields; technical metadata situation, presenting the details of technical metadata in detail; business metadata situation, in-depth analysis of the characteristics of business metadata; metadata proportion statistics, calculating the proportion of different types of metadata in the overall metadata. Supports real-time and periodic metadata collection, and users can flexibly set the collection method according to business needs. For example, for business data with high real-time requirements, real-time collection can be set; for some data that changes relatively slowly, periodic collection can be set. Users can configure data source parameters and scheduled collection tasks, and use built-in collection adapters to achieve automated collection, which greatly reduces the burden of manual operations. At the same time, you can also view the historical execution of collection tasks, which is convenient for users to trace tasks and troubleshoot problems. Support independent metadata model management, based on the CWM metadata protocol standard, unified management and storage of business metadata and technical metadata to ensure the consistency and standardization of metadata. Support multiple types of data models, such as relational and non-relational, to meet the data modeling needs in different business scenarios, and customize the business attributes of the metamodel to make the metadata model more in line with the actual business of the enterprise. Support viewing, modification, hierarchical directory customization, backup, full-chain relationship viewing, blood relationship viewing and impact analysis of the collected metadata, as well as metadata query and full-chain analysis. Through these functions, users can have a deep understanding of the ins and outs of metadata and better manage and utilize data assets.


Data resource aggregation

Classify and integrate the data required by the platform, design the database and standardize the storage according to the unified data standards. This process is like classifying and organizing the messy library books and putting them on the shelves, which greatly improves the efficiency of data use. In actual operation, firstly, the data is carefully classified, such as customer data, product data, sales data, etc.; then, according to the unified data standards, the data is cleaned, converted and processed to ensure the consistency and accuracy of the data; finally, the processed data is stored in the database according to the designed database structure.


Provide a convenient resource directory query function, support fuzzy matching of keywords such as directory name and directory code, so that users can quickly find the required standards. For example, when users need to find a certain industry standard, they only need to enter the relevant keywords in the search box, and the system can quickly locate the relevant standards. You can query national standards, industry standards, local standards, standard codes, creation time, creator, directory description and other information to provide users with comprehensive standard details. Support the generation of standard sets by directly citing standard metadata, citing metadata, batch import, customization and other methods. For example, when formulating data standards within an enterprise, you can directly reference existing national standard elements to quickly generate a standard set that meets the needs of the enterprise; you can also generate a standard set by referencing the metadata within the enterprise after adjustment and optimization. Comprehensive management of standard elements, including adding new standard elements. When the enterprise has new business needs or standard updates, you can add new standard elements in a timely manner; publishing standard elements, publishing the determined standard elements for internal use; deactivating standard elements, when the standard elements are no longer applicable, you can suspend their use; restoring standard elements, you can re-enable them when needed; deleting standard elements, for expired or useless standard elements, you can completely delete them; exporting standard elements, making it easy to share standard elements with other systems or departments; querying standard elements, quickly finding and viewing detailed information about standard elements; editing standard elements, modifying and improving the various attributes of standard elements. Support the input of multi-source standard files, refer to the national standard management platform specifications, and uniformly manage national standards, industry standards, local standards, and enterprise standards as the classification basis for data standards. Support single or batch review of standard sets, and record the review status, process, and personnel information to ensure the quality and compliance of standard sets. Supports custom standard rules. When configuring data standards, standard elements can be applied, including digital value domains, data dictionaries, regular expressions and other verification methods to ensure data accuracy and consistency.


Data quality inspection

Comprehensively supports basic data quality, geometric accuracy, graphic quality, attribute accuracy, logical consistency and integrity inspection content, and provides scientific data quality inspection methods. In terms of basic data quality inspection, the main inspection is the accuracy, integrity, consistency and other basic attributes of the data; the geometric accuracy inspection is aimed at scenes involving spatial data to ensure the geometric shape and position accuracy of the data; the graphic quality inspection focuses on the clarity and integrity of the graphic data; the attribute accuracy inspection ensures the accuracy of the data attribute value; the logical consistency inspection checks whether the logical relationship between the data is correct; the integrity inspection ensures that the data is not missing.


It has a rich rule template library, which provides powerful tool support for data quality inspection. Built-in 21 technical quality inspection rule templates, such as non-empty rules, to ensure that data fields cannot be empty values; duplicate rules, to detect whether there are duplicate records in the data; data output timeliness rules, to check whether the data is generated on time. 37 statistical quality inspection rule templates, covering

Table row count: whether the number of rows in the statistical table meets expectations; table size: check whether the storage size of the data table is reasonable; variance fluctuation and volatility quality inspection: judge the stability of the data by analyzing the variance fluctuation and volatility of the data. It also supports custom business quality inspection rule templates. Users can flexibly set quality inspection rules according to their own business needs. For example, in the financial industry, you can customize the risk assessment quality inspection rules for transaction data; it supports AI quality inspection rule templates. The algorithm platform forms an algorithm model based on the quality inspection business requirements. The quality platform calls the model for quality inspection to improve the intelligence level of quality inspection. It supports drag-and-drop configuration of quality inspection schemes. Users can select multiple quality inspection components such as digital range, non-repetitive rules, non-empty rules, etc. according to data characteristics and business needs, and easily build a quality inspection process. Customize where query conditions or partition expressions to implement quality inspection of specific data or incremental data. For example, in e-commerce companies, you can perform quality inspection on sales data within a specific time period. You can also configure automatic repair rules. When data quality problems are found, the system automatically repairs them. Supports monitoring the execution of data quality inspection tasks and overall quality, viewing audit results, quality details, problem data details, batch repair and data rectification. Supports generating visual quality reports based on quality audit rules, comprehensively analyzing data quality from multiple dimensions such as completeness, standardization, accuracy, relevance, uniqueness, consistency, timeliness, etc., and can export reports into PDF format, supporting daily, monthly, weekly, quarterly and annual reports, so that users can track and evaluate data quality over a long period of time.


Data asset application management

Supports processing and spatiotemporal database construction of vector data, image data, elevation model data, geographic entity data, three-dimensional model data and three-dimensional scene data. In geographic information system (GIS) applications, for vector data, professional algorithms and tools are used to clean, convert and optimize data, and then the data is stored in the database according to the requirements of spatiotemporal database construction, which is convenient for subsequent spatial analysis and query. For image data, after image enhancement, correction and other processing, an index structure such as an image pyramid is established to achieve efficient storage and fast access. After the elevation model data is processed, a digital elevation model (DEM) is constructed and stored in the database. Geographic entity data is integrated into the spatiotemporal database through semantic annotation and association. After optimization and format conversion, 3D model data and 3D scene data are stored in a special database to provide data support for applications such as virtual reality (VR) and augmented reality (AR).


In terms of master data modeling, it supports the creation of master data models in three ways: standard import, existing model import, and manual creation. For example, when an enterprise undergoes digital transformation, it can directly import the master data model framework of the industry standard and make adjustments and improvements on this basis; it can also import the existing master data model within the enterprise for optimization and upgrading; it can also manually create a new master data model according to the business characteristics and needs of the enterprise. And version control is performed on model changes to ensure that every change of the model is recorded and traceable. Support mainstream modeling tool functions, including model design, through an intuitive graphical interface, it is convenient for users to design the structure of the master data model; change SQL statement generation, according to the changes in the model, automatically generate corresponding SQL statements to facilitate database updates; reverse database data, extract data structure from the existing database, and generate a master data model; version management, manage and compare different versions of the master data model. In terms of master data integration, four integration methods are provided for different scenarios: registration, integration, coexistence, and centralization. For example, in the scenario of enterprise mergers and acquisitions, the integration method or centralization can be used to deeply integrate the master data of the acquired enterprise with its own master data; in the scenario of data sharing between enterprises and partners, the registration or coexistence integration method can be used. Batch integration and message integration are supported. Supports configuration of mapping relationships between production data and master data, original data and reference data, and mapping original master data from different sources to core master data. In the process of master data merging, supports cross-source data integration, removal of duplicate data, data similarity matching based on NLP algorithm, determination of master data field values through multi-domain trust recommendation framework, and provides two matching modes: similarity and equality. Supports comprehensive maintenance of master data and reference data, including creation and maintenance of flat table data and tree structure data, and supports multiple operation methods, such as sub-table lookup drop-down, drop-down tree menu, drop-down multi-value display, true and false value display, etc. Supports real-time or regular distribution of newly created and updated master data to the target business system to ensure data consistency, completeness and accuracy, such as timely distribution of customer master data to sales, customer service and other business systems to ensure consistency of customer data used by various departments.


Database Management

It has the ability to manage three-dimensional entity libraries, indicator libraries, and model libraries, supports the configuration of hive, mysql, sqlserver, postgreSQL, Oracle, TDengine, kingbase and other types of data warehouses, and maintains them in an interface form. Users can easily complete data warehouse configuration, creation, deletion and other operations through a simple and intuitive graphical interface. For example, in an enterprise's data warehouse construction project, data administrators can, according to business needs,

京ICP备18044876号