1. Flink Overview1.1 Basic IntroductionThe main features include: batch and stream integration, precise state management, event time support, and exactly-once state consistency guarantee. Flink can not only run on a variety of resource management frameworks including YARN, Mesos, and Kubernetes, but also supports independent deployment on bare metal clusters. With the high availability option enabled, there is no single point of failure. There are two concepts to explain here:
1.2 Application ScenariosData Driven Event-driven applications do not need to query remote databases. Local data access enables them to have higher throughput and lower latency. Taking the anti-fraud case as an example, DataDriven writes the processing rule model into the DatastreamAPI, and then abstracts the entire logic to the Flink engine. When events or data flow in, the corresponding rule model will be triggered. Once the conditions in the rule are triggered, DataDriven will quickly process it and notify the business application. Data Analytics Compared with batch analysis, streaming analysis eliminates the need for periodic data import and query processes, so the latency of obtaining indicators from events is lower. In addition, batch queries must deal with artificial data boundaries caused by periodic imports and input bounds, while streaming queries do not need to consider this problem. Flink provides good support for both continuous streaming analysis and batch analysis, and processes and analyzes data in real time. It is widely used in scenarios such as real-time large screens and real-time reports. Data Pipeline Compared with periodic ETL tasks, continuous data pipelines can significantly reduce the latency of moving data to the destination. For example, based on the upstream StreamETL, real-time data cleaning or expansion can be performed, and a real-time data warehouse can be built downstream to ensure the timeliness of data queries and form a high-efficiency data query link. This scenario is very common in media stream recommendations or search engines. 2. Environment Deployment2.1. Installation package management
2.2 Cluster ConfigurationManagement Node
Distributed Nodes
The two configurations are synchronized to all cluster nodes. 2.3. Start and stop
Startup log:
2.4 Web Interface Visit: 3. Development Entry Case3.1 Data ScriptDistribute a data script to each node:
3.2. Introducing basic dependenciesHere is a basic case written in Java. <dependencies> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-java</artifactId> <version>1.7.0</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-streaming-java_2.11</artifactId> <version>1.7.0</version> </dependency> </dependencies> 3.3. Read file dataHere, the data in the file is read directly, and the number of times each word appears is analyzed through the program flow. public class WordCount { public static void main(String[] args) throws Exception { // Read file data readFile (); } public static void readFile () throws Exception { // 1. Create execution environment ExecutionEnvironment environment = ExecutionEnvironment.getExecutionEnvironment(); // 2. Read data file String filePath = "/var/flink/test/word.txt"; DataSet<String> inputFile = environment.readTextFile(filePath); // 3. Group and sum DataSet<Tuple2<String, Integer>> wordDataSet = inputFile.flatMap(new WordFlatMapFunction( )).groupBy(0).sum(1); // 4. Print processing results wordDataSet.print(); } //Data reading and cutting method static class WordFlatMapFunction implements FlatMapFunction<String, Tuple2<String, Integer>> { @Override public void flatMap(String input, Collector<Tuple2<String, Integer>> collector){ String[] wordArr = input.split(","); for (String word : wordArr) { collector.collect(new Tuple2<>(word, 1)); } } } } 3.4. Read port dataCreate a port on the hop01 service and simulate sending some data to the port:
Use the Flink program to read and analyze the data content of the port: public class WordCount { public static void main(String[] args) throws Exception { // Read port data readPort (); } public static void readPort () throws Exception { // 1. Create execution environment StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment(); // 2. Read the Socket data port DataStreamSource<String> inputStream = environment.socketTextStream("hop01", 5566); // 3. Data reading and cutting method SingleOutputStreamOperator<Tuple2<String, Integer>> resultDataStream = inputStream.flatMap( new FlatMapFunction<String, Tuple2<String, Integer>>() { @Override public void flatMap(String input, Collector<Tuple2<String, Integer>> collector) { String[] wordArr = input.split(","); for (String word : wordArr) { collector.collect(new Tuple2<>(word, 1)); } } }).keyBy(0).sum(1); // 4. Print analysis results resultDataStream.print(); // 5. Environment startup environment.execute(); } } IV. Operation Mechanism4.1、FlinkClientThe client is used to prepare and send data streams to the JobManager node. Then, according to specific needs, the client can directly disconnect or maintain the connection status and wait for the task processing results. 4.2 JobManagerIn a Flink cluster, a JobManger node and at least one TaskManager node are started. After the JobManager receives the task submitted by the client, it coordinates and sends the task to a specific TaskManager node for execution. The TaskManager node sends heartbeat and processing information to the JobManager. 4.3 TaskManagerA slot is the smallest resource scheduling unit in TaskManager. The number of slots is set at startup. Each slot can start a task, receive tasks deployed by the JobManager node, and perform specific analysis and processing. 5. Source code addressGitHub Address https://github.com/cicadasmile/big-data-parent GitEE Address https://gitee.com/cicadasmile/big-data-parent The above is a brief discussion of the details of the construction and operation mechanism of the real-time computing framework Flink cluster. For more information about the construction and operation mechanism of the real-time computing framework Flink cluster, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: The visual design path of the website should conform to user habits
>>: Solution to the problem of MySQL deleting and inserting data very slowly
1. Create a new user wwweee000 [root@localhost ~]...
Table of contents Cause of the incident Use Node ...
1. Python installation 1. Create a folder. mkdir ...
Table of contents 1. Grammar 2. Examples 3. Other...
Omit the protocol of the resource file It is reco...
Table of contents Overview Object rest attribute ...
This article example shares the specific code of ...
Today I saw a case study on MySQL IN subquery opt...
In the process of product design, designers always...
Table of contents The cause of the incident Anato...
Table of contents MySQL's current_timestamp p...
Deleting a single table: DELETE FROM tableName WH...
When it comes to styling our web pages, we have t...
Open the cpanel management backend, under the &qu...
Table of contents 1. Introduction to import_table...