How to become a data analyst in seven weeks: Data visualization and building BI step by step

How to become a data analyst in seven weeks: Data visualization and building BI step by step

The report that data analysts use more often is BI.

BI stands for Business Intelligence. In traditional enterprises, it is a complete solution. Effectively integrate enterprise data and quickly create reports for decision making. Involving modules such as data warehouse, ETL, OLAP, and authority control.

Today we will only focus on the data analysis process and use Power BI to create a data analyst dashboard report. For better learning and practice, we will still use the data in "How to Become a Data Analyst in Seven Weeks 03: Teach You Excel Practical Skills Step by Step" for operation. This is a simple piece of work.

If you don’t have the data yet, please send the keyword "practice data" in the conversation page of the public account : qinlu to obtain the download address.

Power BI is available for download on Microsoft's official website (note that it doesn't matter if you register or not). You only need to download the Desktop version, which is about 120MB in size. Only the Win version is available for now.

If the CAB file is damaged during the download process, it is probably because there is an error in the software. You can change the browser to download, or you can download the Traditional Chinese version. Anyway, I kept reporting errors.

To make it easier for everyone to be lazy, you can send the keyword "BI download" in the conversation page of the public account: qinlu to obtain the official website download address.

What is Power BI

BI tools have two main uses. One is to use BI to create automated reports. Data-related work involves dealing with large amounts of data every day, and the data needs to be sorted and summarized, which is a huge workload. This part of the work can be completed by BI automation, from data organization, modeling to downloading.

Another way is to use its visualization function for analysis. The advantage of BI is that it provides richer visualization functions than Excel, is easy to use, and looks beautiful. If it takes you two hours to draw a chart every day, BI can shorten the time by half.

As an enterprise-level application, BI can be connected to the company database to create enterprise-level reports. This involves data architecture, so I won’t go into detail.

Power BI is from Microsoft. If you are familiar with Excel, you should know that Power Query, Power Pivot, Power View and Power Map launched by Microsoft are four very powerful plug-ins for Excel. Power BI is Microsoft's way of launching them as a collection.

Power Query is a plug-in for data extraction, integration, and search. It is biased towards the creation of data models rather than the use of cells.

Power Pivot is an advanced application of PivotTable, which can perform a large number of scientific calculations using DAX. In terms of performance, it is two orders of magnitude faster than Excel functions, and processing millions of numbers is no problem.

Power View is an advanced application of charts, which realizes functions such as filtering, linkage, and dragging.

Power Map is a visual map.

If you are proficient in the above four plug-ins, you can also implement some BI on Excel. After all, Excel is a tool that everyone in the enterprise has, and it has the advantage of being lightweight compared to BI, although data analysts need to master more tools.

Steps of BI

There are many BI tools on the market, such as Tableau, QlikView, BDP, etc., each with its own focus and price. However, the operation process is similar and can be roughly divided into five steps: data source reading, data cleaning, data association, chart making, and dashboard integration. Once you are familiar with one, it is not difficult to learn the other.

Because the BI I use for work is privately deployed to the server and directly connected to the production environment, it is inconvenient to demonstrate. That’s why I used Power BI for the demonstration, but I can’t say I’m very proficient in it.

Data source reading

We open Power BI and it will ask us to log in, just ignore it.

The interface is similar to Office software. The top is the operation tool item, and the left column is the navigation bar.

The left navigation bar of Power BI corresponds to three modules: Dashboard, Report, and Dataset. A dashboard or report requires data to operate, so we first read the dataset.

Click Get Information (weird translation) on the toolbar.

Power BI supports a variety of rich data sources (most BI on the market support them, but the reading methods are slightly different). In addition to Excel and CSV files, it also supports Access, SQL databases, Hadoop/HDFS, Spark, third-party APIs, etc.

This is a beginner tutorial. Just connect to CSV and choose to load the practice data DataAnalyst.

Here you can edit the data, skip it and choose to load.

Automatically jump to the data report page. The data report (Report) is the process of data regularization and cleaning.

Do you still remember the data cleaning demonstrated in the practical chapter? Previously we experienced the process of Excel function cleaning. This time, BI needs to be used for cleaning again. Data cleaning is the most annoying and time-consuming task for analysts, no doubt about it.

Data cleaning

Power BI has an advanced feature called DAX (Data Analysis Expressions ), which is the formula language used throughout Power BI.

DAX is similar to Excel functions (most third-party BI functions are close to Excel), so it is very friendly to novices. If you are already familiar with Excel functions, you will be able to get started quickly. Basically, the function names are the same. If you are not familiar with them, you can refer to the documentation provided by the official website.

Let's first clean the salary salery in the report. Just like the actual process, we need to split it into two new columns and calculate the average.

Click on the new data row of the model item. At this time, the newly added column has no content. All we need to do is generate two columns with salery.

DAX is needed here. Just use it as a function, but Excel uses cell-level references, while any references, calculations, summaries, etc. in DAX are based on columns.

='Table Name'[ColumnName]

This is the simplest reference. Table Name is the name of our report. The csv I loaded is called DataAnalyst, so the report is called DataAnalyst. ColumnName is the column we need to reference, and its name is salary. The formula below is an example.

If the table name contains spaces, quotes are required; if not, they are not required. If it is a cross-table reference, TableName is required, otherwise only ColumnName is required. DAX supports auto-fill, which allows quick input through fuzzy input + enter.

I said it is similar to Excel, so the Excel addition, subtraction, multiplication and division habits can be directly applied to it.

='Table Name'[ColumnName1]+'Table Name'[ColumnName2]*3

Continuing with the cleaning step, we find the string position where k is located.

=search("k", DataAnalyst[salary],1)

Use the left function to intercept the lower limit of the salary.

=left(DataAnalyst[salary],search(“k”, DataAnalyst[salary],1)-1)

Done. The data row is renamed to bottomSalery. Next is the salary cap topSalery. An error was reported when using "-" to intercept it.

=search(“-“, DataAnalyst[salary],1)

After checking, I found that there were strings such as "more than 10K". DAX cannot find "-", so it needs to return a value expressed when an error occurs. Because the salary cap cannot be determined for descriptions above 10k, we limit the returned value to bottomSalery.

Keep in mind here that DAX is less error tolerant than Excel, and as long as one row in DAX returns an Error, the entire column will be an Error. We need to use the Iferror function to ensure fault tolerance.

The calculation of topSalary is given here, which is rather complicated.

 topSalery = IFERROR(mid(DataAnalyst[salary],SEARCH("-",DataAnalyst[salary],1)+1,LEN(DataAnalyst[salary])-SEARCH("-",DataAnalyst[salary],1)-1),DataAnalyst[bottomSalery])

Then create a new column and use (DataAnalyst[bottomSalery]+DataAnalyst[topSalery])/2 to calculate the average salary for the position.

After reading this, do you think the DAX formula is very long? Beginners can add more auxiliary columns to perform calculations.

Excel has a convenient column splitting function, so does Power BI have it? The answer is yes, right click on the column and select Edit Query option.

I'm still complaining about the translation here. Splitting rows of data is the familiar Split Column function. Select Custom and use “-” to complete the column split (the original data will be split, so it is recommended to copy a column first).

As mentioned in the practical section, our Beijing data has duplicate values, so we can delete the duplicates through the unique identifier of the position, positionId. Right click to remove duplicates.

Let's look at other features of query editing.

Group By can be thought of as a PivotTable. You can select multiple fields to group by. Perform operations such as summing and counting the results.

If it is a large amount of data such as orders, user behavior , user information, etc., it is generally calculated in groups. Different grouping fields will generate different dimensions. For example, the city, years of work experience, and educational background in the example are all dimensions and the basis of the chart. If enough dimensions are generated, we can use the dimensions to form a data model, which is the concept of OLAP.

In addition, you can also use filtering to directly filter data. We selected the data containing data analysis and analysis. Eliminate distracting positions such as big data engineers.

Multi-condition complex logic filtering is supported here.

At this point, we have completed the cleaning process in the practical section. I have simplified it this time. The above steps can be restored and undone through the apply steps on the right. There will be no such column as bottomSalery.

Then select Close and Apply on the toolbar and the report data will be updated. The final data is more than 2,300 lines.
Through data query and report DAX formulas, we can complete the steps of data cleaning and regularization. The main ideas are: remove duplicate values, filter target data, clean dirty data, and convert data format.

Data Association

We use a lot of data in our work, and it is impossible to rely on just one table to cover everything. In Excel, we often use the Vlookup function to link and summarize multiple tables. Power BI uses dragging and dropping related data, which is more convenient. Generally, it is associated first and then cleaned.

Because my data only has one table, there is no need for association. Take the screenshot of the official website as an example.

It's very simple. Use drag and drop to associate the manufacturerId of the Product and the manufacturerId of the Manufacturer. We can think of it as a vlookup reference, or as a SQL Join.

The analysis involves many complex factors, and the data related to these factors will not stay quietly in one table, but in different tables, so data association is needed.

Data association will become clearer after learning SQL, which is one of the core concepts of SQL.

chart

Enter the chart design stage and click the first item in the sidebar. The advantage of BI over Excel is that it can be designed and generated simply by dragging and dropping.

Click on any chart and a graphic will be automatically generated on the canvas. To switch chart types, just click on another one. We drag the city and average salary to the column under Visual Effects, and it will automatically generate a chart. Different charts require different dimensions and axes. Please follow the prompts.

There are design options under visual effects that can adjust the chart to make it more beautiful. I won’t introduce them in detail here.

Click Add Visual Effect to continue adding charts to the canvas. Most BIs support linkage. In simple terms, linkage means clicking on a dimension element on a chart, and other data will change accordingly according to this dimension.

The above picture is a good example of linkage. When you click on Beijing in the city dimension, other charts change, and the average salary changes from 14.23 to 15.23. Academic qualifications have become a prominent feature, showing that PhDs in Beijing earn much higher salaries than the average.

Chart linkage brings better data insights, and the combination and selection of different data dimensions improves analysts' decision-making capabilities. Of course one of my favorite features is the time saving.

Through continuous combination of charts, the data analyst's own analysis canvas can be generated. This canvas is called Dashboard. Of course, whether the chart looks good or not depends on the analyst’s design taste (I can’t teach you this, haha).

If there are too many dimensions, you can insert a cross filter in the visual options to add filtering functions, which is often used for refined analysis, such as the time dimension. Click the button in the upper right corner of the chart to export data and export a csv data of the chart.

We can also publish it online for collaboration between colleagues and companies, or browse data on mobile phones. Of course, you need to register an account here, it depends on your wishes.

There are more features for you to learn. This is the end of the Power BI beginner tutorial. I have listed the commonly used functions. I wonder if you have leaped from the Excel chart level to a new stage. You can use the data to make chart reports yourself as a stepping stone to the analyst industry. If you still have questions, use the official website documents to learn. BI as a field is worthy of in-depth study by data analysts.

Here are some additional points:

example

Due to time constraints, I did not explain more about style design content. You can go to the official website to download examples, including original data exercises. The main purpose is to learn other people's ideas on report preparation.

In addition, Power BI has relatively few charts, such as target charts and box plots. However, the official website has various charts for download. Just search for pbiviz, there is no Chinese version.

R Integration

Power BI has integrated the R language within it, yes, the R language in statistics. If you think the view function is not powerful enough, we can use R to draw charts and even do regression analysis with the help of R. Of course R is the content of the seventh week. Here we only take the screenshot of the official website as an example.

Data Update

A very important function of BI is data update, which is the basis of report automation and is usually associated with SQL. We use CSV and can only paste data into it for update, which is still a bit cumbersome and only semi-automated. This will be solved after learning SQL.

The author of this article @秦路 compiled and published by (Qinggua Media). Please indicate the author information and source when reprinting!

Product promotion services: APP promotion services Advertising

<<:  As an operations manager, what is your daily job?

>>:  How much does it cost to customize the Linyi wine and beverage mini program? Linyi wine and beverage applet customized price inquiry

Recommend

Arrange a tea party in Nanjing and find Xiao Wang

Nanjing tea drinking studio arrangement, Nanjing ...

User growth fission method!

“Create 50 communities in 48 hours at zero cost” ...

How does bidding SEM analyze competitors’ promotion data?

1. Analyze your competitors’ SEM advertising in t...

Ideas and techniques for creating an Internet celebrity store!

In the past few years, the streets were full of d...

Lei Zi Doudian no source 3.0 upgrade practical gameplay

Douyin course catalog: Lesson 1: Ideas and core es...

How can we reduce the uninstall rate of APP users?

There is a question that has been bothering App d...

How can new media operators quickly write high-conversion copy?

We all often have a perception that writing an ar...

My friend, you must have wasted more than half of your advertising budget...

A few days ago, I met a friend who worked on home...