Instead of standing by the abyss and envying the fish, it is better to retreat and make a net. We will take every step carefully and help you make the first word cloud map with Python from scratch. Welcome to try it! needIn the era of big data, you can often see some very beautiful infographics in the media or on websites. For example, like this. How do you feel after watching it? Do you want to make one yourself? If your answer is yes, let's not delay and make a word cloud analysis chart from scratch step by step today. Of course, as a basic word cloud chart, it is definitely not as cool as the two infographics just now. But it doesn't matter, a good start is half the battle. After tasting the marrow, you can upgrade your skills and enter your own road to success. There are many tutorials on the Internet that teach you how to make infographics. Many of them use special tools. These tools are good, convenient and powerful. But their functions are too specific and their application scope is limited. What we are going to try today is to use Python, a general programming language, to make a word cloud. Python is a very popular programming language nowadays. You can use it not only for data analysis and visualization, but also for building websites, crawling data, doing math problems, and writing scripts to help you be lazy... Do you know Douban? It was originally written in Python. In the current ranking of programming language popularity, Python ranks fourth (of course, many people disagree, so there are many rankings of programming languages, you know). But we should look at the problem with a development perspective. With the development of data science, Python has a tendency to explode. It is very beneficial to stand on the cusp of the trend as soon as possible. If you have no programming experience, that's ok. Starting from scratch means I will teach you how to install Python and complete the word cloud map step by step. I hope you will not just browse it, but try it yourself. By the time you finish it, you will not only be able to make your first word cloud map, but it will also be your first useful programming work. Are you interested? Then let’s get started. InstallFirst, we need to install the Python runtime environment. If you are using macOS, Python is already pre-installed on your system. However, we need to use the functions of many extension packages. Therefore, it is best to install a Python tool suite. You only need to install it once, and most of the functions will be integrated in the future. You don't have to install new packages piecemeal every time you use new functions. There are many Python packages, and I recommend anaconda here. After more than 4 years of trying and comparing, I feel that this package is more convenient to install, and the coverage and structure of the extension package are more reasonable. First download the anaconda package, the specific download link is as follows: http://t.cn/RyWsyHV Scroll down the web page to find the download location. Select the appropriate version based on your operating system type. Because my system is macOS, the website directly recommends the macOS system version to me. But if you are using Windows or Linux, please switch to the corresponding tab. No matter which operating system you are using, please note the two buttons on the right, corresponding to Python 2.X and 3.X versions. Some people must be wondering, since there is a new version, why should I use the old one? No, there will be two versions of Python until 2020. Python developers really want everyone to upgrade to version 3.X. Unfortunately, the number of extension packages compatible with version 3.X is less than that of version 2.X, especially for packages related to data science. So if you are a beginner, I suggest you download version 2.X (currently 2.7) so that you may encounter fewer problems in future use. It will not be too late to migrate to version 3.X when you are proficient in it. Believe me, you will adapt to the new version quickly by then. After downloading, just execute the installation file. The installation may take a while, depending on the speed of your computer. Be patient, this is only the first time. Once you have that installed, install a "modern" browser. If you are using macOS, the built-in Safari is fine. Other options include Firefox and Google Chrome. Please install one of the above browsers and set it as your system default browser. OK, now please enter the command line mode. In macOS and Linux, you need to open a terminal. If you are using Windows, open "Start" - "Accessories" - "Command Prompt". Type the following command: mkdir demo cd demo OK, now you have a dedicated directory called demo. Go to Finder in macOS or My Computer in Windows, find this directory and open it. Back to the terminal, macOS or Linux users please type the following command: pip install wordcloud macOS will prompt you to install the XCode command line tool first. You can follow the default settings step by step. But please note that you must install it in a WiFi environment. If you use 4G data, it will cost you a lot. If you are using Windows, then in order to use this word cloud package, it is a little more troublesome. You need to download the wordcloud‑1.3.1‑cp27‑cp27m‑win32.whl file and drag it to your demo directory. The specific download link is as follows: http://t.cn/RJ6Emm4 On the command line, first execute: pip install wheel Then, execute: pip install wordcloud‑1.3.1‑cp27‑cp27m‑win32.whl OK, finally all the Python runtime environments we need are installed. Please be sure to follow the above steps and ensure that each step has been successfully completed. Otherwise, if any step is missed, the program will report an error when it is run later. dataThe object of word cloud analysis is text. Theoretically, the text can be in any language. English, Chinese, French, Arabic... For simplicity, we will use English text as an example. You can find an English article on the Internet as the object of analysis. I particularly like the British drama "Yes, Minister", so I found the introduction of this drama on Wikipedia. I copied the main text and saved it as a text file called yes-minister.txt. Move this file to our working directory demo. OK, the text data is ready. Let's start entering the magical world of programming! CodeIn the command line, execute: jupyter notebook The browser will open automatically and display the following interface. This is the result of our work - the installed operating environment. We haven't written any programs yet, and there is only a text file in the directory that we just generated. Open this file and browse its contents. Go back to the main page of Jupyter Notebook. Click the New button to create a new notebook. In Notebooks, select the Python 2 option. You will be prompted to enter a name for the notebook. You can name your code file anything you want, but I recommend you give it a meaningful name so you can find it easily in the future. Since we are going to try word cloud, let’s call it wordcloud. Then a blank notebook will appear for us to use. We enter the following three statements in the only code text box on the web page. Please enter them verbatim according to the sample code, and the number of spaces cannot be different. Pay special attention to the third line, which starts with 4 spaces, or 1 Tab. After entering, press Shift+Enter to execute. filename = "yes-minister.txt" with open(filename) as f: mytext = f.read() There are no results. Yes, because we don’t have any output actions here, the program just opens your yes-minister.txt text file, reads out the contents, and stores them in a variable called mytext. Then we try to display the content of mytext. After entering the following statement, you still have to press Shift+Enter for the system to actually execute the statement. mytext In the subsequent steps, don't forget to confirm the execution action. The displayed results are shown in the figure below. Well, it seems that the text stored in the mytext variable is the text we picked up from the Internet. So far, everything is working fine. Then we call (import) the word cloud package and use the text content stored in mytext to create a word cloud. from wordcloud import WordCloud wordcloud = WordCloud().generate(mytext) The program may give you a warning at this point. Don't worry. Warnings do not affect the normal operation of the program. At this point, the word cloud analysis has been completed. You read that right. The core steps of making a word cloud only require these two lines of statements, and the first one is just looking for external help from the expansion pack. But the program will not show us anything. Where is the word cloud? I've been working on it for so long, but there's nothing. Are you kidding me? ! Don't get excited. After entering the following 4 lines, it's time to witness the miracle happen. %pylab inline import matplotlib.pyplot as plt plt.imshow(wordcloud, interpolation='bilinear') plt.axis("off") The running results are shown in the figure: Don't be so excited. You can right-click on the word cloud image and use the "Save Image As" function to export it. Through this word cloud, we can see the frequency of different words and phrases. The font of high-frequency words is obviously larger and the color is also eye-catching. It is worth mentioning that the most prominent word Hacker does not refer to hackers, but to one of the protagonists of the show - Prime Minister Hacker. I have also shared the ipynb file containing the complete code of the program. You can download it from the following link: http://t.cn/RKQvFBM I hope you have a smooth experience. Are you satisfied with the word cloud you generated? If not, don't worry, you can explore other advanced features of the wordcloud package. Try it and see if you can make a word cloud like this. discussAfter learning this method, what kind of word cloud did you make? In addition to the methods introduced in this article, what other convenient ways do you know to make word clouds or other infographics? Welcome to leave a message and share with everyone. Let's communicate and discuss together. This article is reproduced from Leifeng.com, the author is Wang Shuyi, and was originally published on the WeChat public account Yushuzhilan (nkwangshuyi). |
<<: 7 Benefits of Data Visualization
>>: Training deep residual neural networks based on boosting principle
How much does it cost to invest in the bargaining...
When a product enters the mature stage and has en...
In life, we see some families placing the Feng Sh...
Parking, maneuvering, and flying around... Friend...
With the continuous improvement of living standar...
SEO technology, most people know that companies n...
Next, I will talk about building a community from...
What is the price to join the Qujing Designated D...
In modern society, people's leisure and enter...
The common style is the basic style of the brand ...
Introduction: CCTV's "Weekly Quality Repo...
[[125065]] language As user-oriented programs, th...
Many friends will experience sour, sweet, bitter,...
Why is it that even though some products are rare...
"My eyes were as big as bells, and I tossed ...