1. OverviewIn actual privatized IoT platform projects, some existing devices cannot be directly connected to the IoT platform due to heterogeneous buses, multi-standard Ethernet, and diversified protocols. It is difficult to integrate a large amount of data, and the platform and device sides face a large amount of customized development, which is costly. Therefore, it is difficult to encourage customers or equipment manufacturers to access and transform existing equipment, resulting in the inability of devices to directly connect to the IoT platform, and the IoT platform cannot uniformly manage all enterprise equipment data. The existing data collection systems within enterprises are mostly "chimney-style", and each manufacturer's system only needs to connect to its own equipment, resulting in a prominent data island problem. The data formats of each "chimney" are different, and the customized collection task codes cannot be reused. It is time-consuming and labor-intensive, and it is difficult to support multiple projects at the same time. In addition to device data collection, there is also a need to collect business data. Traditional IoT systems can only collect device data but cannot integrate business data. 2. Technology SelectionDigital integration technology extracts data from different systems, cleans and transforms data, and inputs data into the final target system, thus connecting various business islands, achieving data interconnection and interoperability, and helping enterprises with digital transformation. Since most data processing in IoT scenarios requires real-time performance, it is required to have real-time data processing capabilities when implemented. Real-time computing is also called stream computing, represented by big data technologies such as Storm, Spark Streaming, and Flink. Computing engines are also constantly being updated and iterated, from the first generation of Hadoop MapReduce, to the second generation of Spark, to the third generation of Flink technology, from batch processing to micro-batch, and then to true streaming computing. Apache Flink is an open source stream processing framework for distributed, high-performance, and highly available data stream applications. It can process both finite and infinite data streams, that is, it can process bounded and unbounded data streams. Unbounded data streams are truly streaming data, so Flink supports stream computing. Flink can be deployed in various cluster environments and can quickly compute data of various sizes. The Flink framework has powerful streaming ETL capabilities, which are implemented through its rich operators. 2.1 Source OperatorFlink can use StreamExecutionEnvironment.addSource(source) to add data sources to our program. Flink already provides several implemented source functions. You can also customize non-parallel sources by implementing SourceFunction or customize parallel sources by implementing the ParallelSourceFunction interface or extending RichParallelSourceFunction. There are four main types of sources for Flink stream processing:
Use custom Source operators to implement rich data extraction functions. 2.2 Transform operator① map Convert each element in the DataStream to another element, such as increasing element x to 5 times its original size: dataStream.map { x => x * 5 } ② FlatMap Takes an element and produces zero, one, or more elements. For example, a flatmap function that splits a sentence into words: dataStream.flatMap { str => str.split(" ") } ③ Filter Evaluate a Boolean function for each element and save the elements for which the function returns true. For example, a filter that removes zero values: dataStream.filter { x != 0 } Of course, Flink also has many other conversion operators, such as KeyBy, Reduce, Aggregations, etc. Through rich conversion operators, Flink can realize data cleaning and conversion functions. 2.3 Sink OperatorFlink's sink operator supports outputting data to: local files, local collections, HDFS. In addition, it also supports: sink to kafka, sink to mysql, sink to redis, and custom sink operators. The cleaned and converted data is input into the target system through a custom sink operator. 3. Digital Integration ImplementationThe implementation process is as follows: The first step is to abstractly define the basic control class Digital integration can abstractly define three types of basic functional controls based on Flink. Each type of control can implement specific sub-class functional controls according to different functions. The details are as follows: Basic functional controls are divided into three categories: data source controls, data output controls, and data processing controls. Data source control: The Source operator is abstractly defined as a data source control class with the function of extracting data, and the corresponding configuration specifications are formulated. When using it, you only need to configure the file according to the specification. The system will create a specific instantiation object based on the configuration file to realize the data extraction function; Data operation control: According to different basic functional requirements, the Transform operator is abstracted into a data processing control class, and corresponding configuration specifications are formulated. When using it, you only need to configure the file according to the specification. The system will create the corresponding instantiation object according to the configuration to implement the data processing function; Data output control: Abstract the Sink operator into a data output control class, formulate corresponding configuration specifications, and use it only by configuring the file according to the specification. The system will create an instantiated object based on the configuration to implement the data output function. At the same time, the system clearly defines the data format for flow between Flink operators as the internal flow data format and the data format output by each basic function control according to the configuration. The second step is to formulate specific configuration specifications based on the abstractly defined basic functional controls. The basic functional control specifications are as follows: After the above two steps of specification definition, in the same system, the same processing process only needs to define one basic function control specification. For example, the configuration required by Kafka consumers, such as Kafka cluster address, consumer group, data topic, data partition key, consumer location, etc., only needs to define a Kafka consumer control as shown in the above example and develop and implement it. This control can be reused in the system. Each time the data processing workflow is configured, the Kafka consumer control class is reused and the Kafka consumer required by the workflow can be instantiated according to the Kafka cluster address, data topic and other configurations provided by the newly configured source system. The implementation process changes from developing Kafka Consumer code countless times to implementing Kafka Consumer control code once, which greatly saves development time and development costs. The third step is to realize and implement basic function controls such as HTTP request, Kafka production, data traversal, conditional loop, data mapping, MySQL write operation, etc. by abstracting the basic functions, and then assemble the corresponding configuration execution scripts according to the logical sequence of operation of each basic function to orchestrate and assemble a complete Flink stream processing link to complete the data integration function between different systems. For example, in a privatization project, there is a need to synchronize the smart door lock status information in the equipment manufacturer's cloud platform to the company's own cloud platform for smart door lock control. Since the smart door lock device protocol is incompatible with the data collection protocol of the company's own IoT platform and cannot be directly connected, the equipment manufacturer's cloud platform provides the smart door lock status information push function, and the company's own IoT platform provides the push data receiving interface to complete the synchronization function of the smart door lock status information. In this case, the HTTP listener control of the HTTP POST functional interface is implemented through the custom Source operator of the flink framework to complete the push data receiving function of the device manufacturer's cloud platform. The IF branch control compares the received smart door lock status information with the status stored in the own cloud platform according to the smart door lock ID and status status, and transfers the smart door lock status information data with status changes to the subsequent Sink operator. The custom Sink operator is used to realize the data upload function of the own cloud platform, completing the cross-platform update function of the smart door lock status information. The fourth step is to generate a directed acyclic graph based on the established execution logic and submit it to Flink for execution. The details are as follows: By placing different basic function controls into the vertices of a directed acyclic graph based on a directed acyclic graph, there is only one data source control in the entire graph, and no other basic function controls can transfer data to it; there can be multiple data output controls and data operation controls, corresponding to multiple branch processing logics. The data transmission direction is used as the edge of the directed acyclic graph to connect and organize the different logical orders of data in the cross-system data transmission process, generate a complete data transmission processing link, and fully implement this graph and submit it to flink for execution, so as to realize the complete digital integration function of data extraction, conversion and output. IV. ConclusionFinally, let's summarize the implementation of digital integration capabilities based on Flink. Thanks to Flink's rich capabilities in ETL data integration and the basic functions that are easy to process between operators, we abstractly define the three types of operators in Flink to implement three types of basic function controls to achieve different data processing processes. According to different functional requirements, the Source operator is used to extract data from multiple data sources such as message queues, APIs, and databases; the rich Transform operators are used to implement data cleaning, filtering, and conversion; finally, the Sink operator can be used to input the target format data into the target system to receive data through channels such as message queues, databases, APIs, etc. In summary, the digital integration capabilities based on Flink are achievable and have rich functions and scalability. |
<<: Framework learning input touch event principle
>>: Google Android 13 QPR3 Beta 1 released for Pixel phones
What should I do if the promotion is ineffective?...
Why is there no traffic in your live broadcast ro...
I have compiled some tools and useful websites, h...
For an accounting app, an effective participating...
This article explains the operation of Xiaohongsh...
Douyin short video monetization course: teach you...
Review expert: Peng Guoqiu, deputy chief physicia...
Li Enlin's "Songcheng Performing Arts: I...
Zhou Hongyi was willful yesterday and spent $400 ...
[[135660]] In March, Microsoft reached a cooperat...
Why does WeChat take up so much disk space? And h...
Although after Google released Android 9.0 this y...
How does rice grown in water breathe? Rice can gr...
Written by: Zhu Hengheng Editor: Wang Haha Layout...