The first issue of Aiti Tribe: Spark offline analysis dimensions

The first issue of Aiti Tribe: Spark offline analysis dimensions

【51CTO.com original article】 Activity description : Aiti Tribe is a service community that provides core developers with in-depth technical exchanges, solutions to development needs, and resource sharing. Based on this community, we invite industry technology experts to provide one-on-one breakthroughs on development needs and remove stumbling blocks in the development process. We help developers solve development problems with the most professional and efficient answers.

Topic keywords: big data spark data analysis   Data portrait

Tribe lineup : Xu Tao, director of big data at Longzhu Live; Wang Jin, co-founder of Shuguo Technology;

Target audience : Junior development engineer, data analyst, operation and maintenance engineer

How to participate: Join the 51CTO developer QQ exchange group 370892523. If you have any technical questions, ask in the group or send them to the group owner.

Event Details:


Nanjing-Shi Guojun-Java: Is there any relevant information on Spark learning?

Xu Tao: I recommend studying the official Spark documentation. Other Spark books may not keep up with the updates of Spark technology.

Beijing-robingao –Java: When using Spark for offline analysis, how do you analyze Nginx logs? What specific dimensions do you look at?

Xu Tao: It is recommended to use Hive + map/reduce for offline analysis, which is more stable than Spark. Nginx logs are generally used for traffic monitoring and operation and maintenance alarms, and have strong timeliness. Spark-Streaming can be used. Indicators: number of online users, number of user visits, traffic usage, interface errors, number of slow queries, and server status. Dimensions: By site, by module. You can also do some lightweight user behavior analysis, such as user access paths.

Beijing-robingao –Java: Do you have any experience sharing on customer profiling? Please be more specific.

Xu Tao: User portraits are "labeling" users. User portraits can be divided into static labels and dynamic labels. Static labels are indicators that are rarely updated or almost unchanged, such as user personal information. Dynamic labels are user behavior labels, such as the favorite categories of live broadcast stations. Labels are added through user behavior logs and transaction flow data. Some websites/APPs only have a small amount of user personal information, but through labeling, we can obtain a large amount of user behavior logs. We can predict the user's gender, age group, city type, job type, etc. through cluster analysis. Some of the more characteristic labels of live broadcast stations include: favorite anchors, habitual online time periods, sign-in users, etc.

Nanjing-Shi Guojun-Java: If you want to submit multiple SQL statements to a Spark cluster at the same time, can you do it without using Spark-submit?

Xu Tao: It is recommended to submit it in the Spark-SQL client.

Chongqing-Xiaobao-Android: Regarding streaming media, I would like to know about any cases related to streaming media in Android, such as video and voice streaming?

Xu Tao: This topic is too broad. Cases related to live streaming include live playback, microphone connection, and H5 live streaming player.

Guangzhou-Zhao Hui-Big Data: What is the value of multi-source data fusion in big data?

Wang Jin: If big data is not integrated across multiple sources, the value of the data is very limited, and the true core value of big data cannot be reflected. The value of multi-source data integration can be better reflected in industries such as finance, e-commerce, and insurance.

Zhuhai-Xiaoyuan-Java: Does 51CTO have any special topics related to big data?

51CTO : Yes, you can subscribe to the Big Data Journal. To subscribe, go to Homepage, Personal Homepage, and click Subscriptions. For example: New trends in big data; everything is in big data; high-end interviews on the journey into the world of big data; how small teams can master big data.

Zhuhai-Xiaoyuan-Java: Are there any security-related topics provided?

51CTO : Security topics such as: HPE Security - the data bodyguard behind "Kung Fu Panda"; focusing on the US network paralysis incident, the security of the Internet of Things is thought-provoking; special report on the 2016 National Cyber ​​Security Awareness Week; special report on the 11th (ISC)2 Asia Pacific Information Security Summit; prevention is still the best way to avoid ransomware attacks.

Beijing-Yang Kai-Network Engineer: Want to learn about cloud computing

51CTO : You can refer to this article to learn about re:Invent 2016----AWS's five cloud computing superpowers.

Nanjing-Xiaopang-Android: The relationship between cloud computing and big data

51CTO : Features of cloud computing: Through dynamic scheduling of computing, network and storage resources and rapid deployment of applications, virtual technology is used to improve the utilization of information equipment, thereby achieving the goals of saving resources, improving efficiency, centralized management, information sharing and saving fiscal expenditure. Cloud computing platforms mainly deploy various application systems, store massive amounts of data, and provide services for e-government, social management, public services, etc. Characteristics of big data: Through distributed computing architectures such as Hadoop and tools such as ETL, massive data is extracted from the cloud computing platform, and cross-departmental and cross-industry big data analysis, modeling, and verification are carried out according to the set goals. The results of big data analysis are published through the cloud computing platform, and services are provided to relevant units and support for leadership decision-making.

Do you still have questions about these solutions? Welcome to join the 51CTO developer QQ exchange group 370892523 for discussion.

Next event: December 26

Keywords : mobile android internet of things    front end

[51CTO original article, please indicate the original author and source as 51CTO.com when reprinting on partner sites]

<<:  Zhao Bin, founder of Agora.io: Live streaming is the new trend, and the real-time Internet industry is growing at a massive scale

>>:  Android performance optimization memory leak

Recommend

A large collection of promotion channels, see which ones you need!

1 WeChat is a semi-closed circle. “Good wine need...

The secret of Tencent Video’s advertising materials to achieve high volume

The author recently met a friend in the same indu...

Why is “10 degrees in autumn” colder than “10 degrees in spring”?

I wonder if you have had a similar experience: th...

Microsoft admits failure in mobile phone business and removes "Nokia tumor"

[[139618]] After continuing to invest resources i...

Analysis of B station product operations!

Many years ago, the first impression Bilibili lef...

Why do train attendants take away sleeper tickets? It turns out that...

Have you ever had such an experience? After getti...

Windows 8.1 Free Edition is awesome!

Microsoft announced the first free Windows version...

Lock the CPU frequency of Android devices

[[184787]] This article introduces the method of ...