Digital integration solution based on flink

1. Overview

In actual privatized IoT platform projects, some existing devices cannot be directly connected to the IoT platform due to heterogeneous buses, multi-standard Ethernet, and diversified protocols. It is difficult to integrate a large amount of data, and the platform and device sides face a large amount of customized development, which is costly. Therefore, it is difficult to encourage customers or equipment manufacturers to access and transform existing equipment, resulting in the inability of devices to directly connect to the IoT platform, and the IoT platform cannot uniformly manage all enterprise equipment data.

The existing data collection systems within enterprises are mostly "chimney-style", and each manufacturer's system only needs to connect to its own equipment, resulting in a prominent data island problem.

The data formats of each "chimney" are different, and the customized collection task codes cannot be reused. It is time-consuming and labor-intensive, and it is difficult to support multiple projects at the same time.

In addition to device data collection, there is also a need to collect business data. Traditional IoT systems can only collect device data but cannot integrate business data.

2. Technology Selection

Digital integration technology extracts data from different systems, cleans and transforms data, and inputs data into the final target system, thus connecting various business islands, achieving data interconnection and interoperability, and helping enterprises with digital transformation. Since most data processing in IoT scenarios requires real-time performance, it is required to have real-time data processing capabilities when implemented. Real-time computing is also called stream computing, represented by big data technologies such as Storm, Spark Streaming, and Flink. Computing engines are also constantly being updated and iterated, from the first generation of Hadoop MapReduce, to the second generation of Spark, to the third generation of Flink technology, from batch processing to micro-batch, and then to true streaming computing.

Apache Flink is an open source stream processing framework for distributed, high-performance, and highly available data stream applications. It can process both finite and infinite data streams, that is, it can process bounded and unbounded data streams. Unbounded data streams are truly streaming data, so Flink supports stream computing. Flink can be deployed in various cluster environments and can quickly compute data of various sizes.

The Flink framework has powerful streaming ETL capabilities, which are implemented through its rich operators.

2.1 Source Operator

Flink can use StreamExecutionEnvironment.addSource(source) to add data sources to our program.

Flink already provides several implemented source functions. You can also customize non-parallel sources by implementing SourceFunction or customize parallel sources by implementing the ParallelSourceFunction interface or extending RichParallelSourceFunction.

There are four main types of sources for Flink stream processing:

Collection-based-source
File-based-source - reads a text file, that is, a file that conforms to the TextInputFormat specification, and returns it as a string
Socket-based-source - reads from a socket. Elements can be separated by delimiters.
Custom-source

Use custom Source operators to implement rich data extraction functions.

2.2 Transform operator

① map

Convert each element in the DataStream to another element, such as increasing element x to 5 times its original size:

dataStream.map { x => x * 5 }

② FlatMap

Takes an element and produces zero, one, or more elements. For example, a flatmap function that splits a sentence into words:

dataStream.flatMap { str => str.split(" ") }

③ Filter

Evaluate a Boolean function for each element and save the elements for which the function returns true. For example, a filter that removes zero values:

dataStream.filter { x != 0 }

Of course, Flink also has many other conversion operators, such as KeyBy, Reduce, Aggregations, etc. Through rich conversion operators, Flink can realize data cleaning and conversion functions.

2.3 Sink Operator

Flink's sink operator supports outputting data to: local files, local collections, HDFS. In addition, it also supports: sink to kafka, sink to mysql, sink to redis, and custom sink operators.

The cleaned and converted data is input into the target system through a custom sink operator.

3. Digital Integration Implementation

The implementation process is as follows:

The first step is to abstractly define the basic control class

Digital integration can abstractly define three types of basic functional controls based on Flink. Each type of control can implement specific sub-class functional controls according to different functions. The details are as follows:

Basic functional controls are divided into three categories: data source controls, data output controls, and data processing controls.

Data source control: The Source operator is abstractly defined as a data source control class with the function of extracting data, and the corresponding configuration specifications are formulated. When using it, you only need to configure the file according to the specification. The system will create a specific instantiation object based on the configuration file to realize the data extraction function;

Data operation control: According to different basic functional requirements, the Transform operator is abstracted into a data processing control class, and corresponding configuration specifications are formulated. When using it, you only need to configure the file according to the specification. The system will create the corresponding instantiation object according to the configuration to implement the data processing function;

Data output control: Abstract the Sink operator into a data output control class, formulate corresponding configuration specifications, and use it only by configuring the file according to the specification. The system will create an instantiated object based on the configuration to implement the data output function.

At the same time, the system clearly defines the data format for flow between Flink operators as the internal flow data format and the data format output by each basic function control according to the configuration.

The second step is to formulate specific configuration specifications based on the abstractly defined basic functional controls.

The basic functional control specifications are as follows:

After the above two steps of specification definition, in the same system, the same processing process only needs to define one basic function control specification. For example, the configuration required by Kafka consumers, such as Kafka cluster address, consumer group, data topic, data partition key, consumer location, etc., only needs to define a Kafka consumer control as shown in the above example and develop and implement it. This control can be reused in the system. Each time the data processing workflow is configured, the Kafka consumer control class is reused and the Kafka consumer required by the workflow can be instantiated according to the Kafka cluster address, data topic and other configurations provided by the newly configured source system. The implementation process changes from developing Kafka Consumer code countless times to implementing Kafka Consumer control code once, which greatly saves development time and development costs.

The third step is to realize and implement basic function controls such as HTTP request, Kafka production, data traversal, conditional loop, data mapping, MySQL write operation, etc. by abstracting the basic functions, and then assemble the corresponding configuration execution scripts according to the logical sequence of operation of each basic function to orchestrate and assemble a complete Flink stream processing link to complete the data integration function between different systems.

For example, in a privatization project, there is a need to synchronize the smart door lock status information in the equipment manufacturer's cloud platform to the company's own cloud platform for smart door lock control. Since the smart door lock device protocol is incompatible with the data collection protocol of the company's own IoT platform and cannot be directly connected, the equipment manufacturer's cloud platform provides the smart door lock status information push function, and the company's own IoT platform provides the push data receiving interface to complete the synchronization function of the smart door lock status information.

In this case, the HTTP listener control of the HTTP POST functional interface is implemented through the custom Source operator of the flink framework to complete the push data receiving function of the device manufacturer's cloud platform. The IF branch control compares the received smart door lock status information with the status stored in the own cloud platform according to the smart door lock ID and status status, and transfers the smart door lock status information data with status changes to the subsequent Sink operator. The custom Sink operator is used to realize the data upload function of the own cloud platform, completing the cross-platform update function of the smart door lock status information.

The fourth step is to generate a directed acyclic graph based on the established execution logic and submit it to Flink for execution. The details are as follows:

By placing different basic function controls into the vertices of a directed acyclic graph based on a directed acyclic graph, there is only one data source control in the entire graph, and no other basic function controls can transfer data to it; there can be multiple data output controls and data operation controls, corresponding to multiple branch processing logics. The data transmission direction is used as the edge of the directed acyclic graph to connect and organize the different logical orders of data in the cross-system data transmission process, generate a complete data transmission processing link, and fully implement this graph and submit it to flink for execution, so as to realize the complete digital integration function of data extraction, conversion and output.

IV. Conclusion

Finally, let's summarize the implementation of digital integration capabilities based on Flink. Thanks to Flink's rich capabilities in ETL data integration and the basic functions that are easy to process between operators, we abstractly define the three types of operators in Flink to implement three types of basic function controls to achieve different data processing processes. According to different functional requirements, the Source operator is used to extract data from multiple data sources such as message queues, APIs, and databases; the rich Transform operators are used to implement data cleaning, filtering, and conversion; finally, the Sink operator can be used to input the target format data into the target system to receive data through channels such as message queues, databases, APIs, etc. In summary, the digital integration capabilities based on Flink are achievable and have rich functions and scalability.

<<: Framework learning input touch event principle

>>: Google Android 13 QPR3 Beta 1 released for Pixel phones

Reverse product thinking: What is the logic of "use it and leave it"?

Studying becomes a practical training camp, and you can easily pass the manuscript with no basic knowledge and earn an extra 50,000 yuan per month

Blog

Advertising is returning, and before the value of television is re-evaluated, can you still ridiculously say that television is dead?

The value of television is seriously underestimat...

Digital integration solution based on flink

1. Overview

2. Technology Selection

2.1 Source Operator

2.2 Transform operator

2.3 Sink Operator

3. Digital Integration Implementation

IV. Conclusion

Reverse product thinking: What is the logic of "use it and leave it"?

5 ways to express your creative advertising skills!

Is it lack of ability or bad luck? In fact, your brain is quietly making excuses for you

Why Google was fined a record 4.3 billion euros by the European Union

From 0 to 1, how to operate well in the field of Internet finance?

How to avoid invalid traffic in advertising?

The fate of a pig is determined by humans, not by the pig: the past and present of "pork belly"

What did you say? I can't hear you without my glasses.

Has your information been leaked today?

Studying becomes a practical training camp, and you can easily pass the manuscript with no basic knowledge and earn an extra 50,000 yuan per month

Recommend

Are humans the first civilization in the universe? Four scientific theories about aliens tell you the truth

Solid info! A brief analysis of the three major monetization methods of the community

Two years ago today, the lifeline of the Yangtze River Dragon was cut off and it was declared extinct.

Atushi SEO Training: Exposing the reasons why search engines crack down on external link exchange platforms

Product operation: new user retention and conversion strategy!

Disassembly of the live broadcast room data model!

How to write a good advertising case analysis? Three simple steps to tell you

How to Become a Great JavaScript Programmer

Decoding: How does Pinduoduo play with the addiction model?

Analysis and construction guide for advertising and marketing pages in the automotive industry!

A strange abbreviation has appeared! Is "TMD" actually a disease?

Advertising is returning, and before the value of television is re-evaluated, can you still ridiculously say that television is dead?

WeChat suddenly announced that after May 20, the mini program will shut down a much-criticized service

The Diexi Earthquake That Cannot Be Forgotten

Zhihu's "Wilson Formula" recommendation algorithm!