Careful analysis of mmap: what it is, why it is used, and how to use it

Careful analysis of mmap: what it is, why it is used, and how to use it

Mmap basic concept

Mmap is a method of memory mapping files, that is, mapping a file or other object to the address space of the process, realizing a one-to-one mapping relationship between the file disk address and a virtual address in the process virtual address space. After realizing such a mapping relationship, the process can use pointers to read and write this section of memory, and the system will automatically write back the dirty page to the corresponding file disk, that is, the operation on the file is completed without calling system call functions such as read and write. On the contrary, the modification of this area by the kernel space is also directly reflected in the user space, so that file sharing between different processes can be realized. As shown in the following figure:

As can be seen from the figure above, the virtual address space of a process is composed of multiple virtual memory areas. A virtual memory area is a homogeneous interval in the virtual address space of a process, that is, a continuous address range with the same characteristics. The text data segment (code segment), initial data segment, BSS data segment, heap, stack and memory mapping shown in the figure above are all independent virtual memory areas. The address space serving the memory mapping is in the free space between the stacks.

The Linux kernel uses the vm_area_struct structure to represent an independent virtual memory area. Since each virtual memory area of ​​different nature has different functions and internal mechanisms, a process uses multiple vm_area_struct structures to represent different types of virtual memory areas. Each vm_area_struct structure is linked using a linked list or tree structure to facilitate fast access by the process, as shown in the following figure:

The vm_area_struct structure contains the starting and ending addresses of the area and other related information. It also contains a vm_ops pointer, which can lead to all the system call functions that can be used for this area. In this way, any information required by the process for any operation on a virtual memory area can be obtained from the vm_area_struct. The mmap function is to create a new vm_area_struct structure and connect it to the physical disk address of the file. See the next section for specific steps.

Mmap memory mapping principle

The implementation process of mmap memory mapping can be generally divided into three stages:

(I) The process starts the mapping process and creates a virtual mapping area for the mapping in the virtual address space

1. The process calls the library function mmap in user space. The prototype is: void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);

2. In the virtual address space of the current process, find a free continuous virtual address that meets the requirements

3. Allocate a vm_area_struct structure for this virtual area, and then initialize each field of this structure

4. Insert the newly created virtual area structure (vm_area_struct) into the virtual address area list or tree of the process

(ii) Call the kernel space system call function mmap (different from the user space function) to achieve a one-to-one mapping relationship between the file physical address and the process virtual address

5. After allocating a new virtual address area for the mapping, the corresponding file descriptor is found in the file descriptor table through the file pointer to be mapped. Through the file descriptor, it is linked to the file structure (struct file) of the file in the kernel's "open file set". Each file structure maintains various information related to this open file.

6. Through the file structure of the file, link to the file_operations module and call the kernel function mmap, whose prototype is: int mmap(struct file *filp, struct vm_area_struct *vma), which is different from the user space library function.

7. The kernel mmap function locates the file disk physical address through the virtual file system inode module.

8. The page table is created through the remap_pfn_range function, which realizes the mapping relationship between the file address and the virtual address area. At this time, this virtual address has no data associated with the main memory.

(III) The process initiates access to this mapped space, causing a page fault exception, and copies the file content to physical memory (main memory).

Note: The first two stages are only to create a virtual interval and complete the address mapping, but no file data is copied to the main memory. The real file read is when the process initiates a read or write operation.

9. The process's read or write operation accesses this mapping address in the virtual address space. By querying the page table, it is found that this address is not on the physical page. Because only the address mapping has been established, the real hard disk data has not been copied to the memory, so a page fault exception is caused.

10. After a series of judgments are made for the page fault exception and it is determined that there is no illegal operation, the kernel initiates the paging request process.

11. The paging process first searches for the memory page that needs to be accessed in the swap cache space. If it is not found, the nopage function is called to load the missing page from the disk into the main memory.

12. After that, the process can read or write this piece of main memory. If the write operation changes its content, the system will automatically write back the dirty page to the corresponding disk address after a certain period of time, thus completing the process of writing to the file.

Note: The modified dirty pages are not updated back to the file immediately, but there is a delay. You can call msync() to force synchronization so that the written content can be saved to the file immediately.

The difference between mmap and regular file operations

For those who don't know much about the Linux file system, please refer to my previous blog post "File Reading and Writing Process from the Kernel File System". Let's first briefly review the function calling process in conventional file system operations (calling read/fread and other similar functions):

1. The process initiates a file read request.

2. The kernel locates the file information on the kernel's open file set by searching the process file symbol table, thereby finding the inode of this file.

3. The inode searches the address_space to see if the requested file page is already cached in the page cache. If it is, the content of this file page is directly returned.

4. If it does not exist, locate the file disk address through the inode and copy the data from the disk to the page cache. Then initiate the page read process again and send the data in the page cache to the user process.

In summary, conventional file operations use a page cache mechanism to improve read and write efficiency and protect disks. This means that when reading a file, the file page must first be copied from the disk to the page cache. Since the page cache is in the kernel space and cannot be directly addressed by the user process, the data page in the page cache must be copied again to the user space corresponding to the memory. In this way, the process can complete the task of obtaining the file content after two data copy processes. The same is true for write operations. The buffer to be written cannot be directly accessed in the kernel space. It must first be copied to the main memory corresponding to the kernel space, and then written back to the disk (delayed write back), which also requires two data copies.

When using mmap to operate a file, there are two steps: creating a new virtual memory area and mapping the file disk address to the virtual memory area. There is no file copy operation. When accessing data later, if there is no data in the memory and a page fault exception is initiated, the data can be transferred from the disk to the user space of the memory through the established mapping relationship with only one data copy for the process to use.

In short, conventional file operations require two data copies from disk to page cache and then to user main memory. However, mmap manipulation of files only requires one data copy from disk to user main memory. To put it simply, the key point of mmap is to achieve direct data interaction between user space and kernel space, eliminating the tedious process of different data spaces. Therefore, mmap is more efficient.

Summary of mmap advantages

From the above discussion, we can see that mmap has the following advantages:

1. The file reading operation skips the page cache, reducing the number of data copies, replacing I/O reading and writing with memory reading and writing, and improving file reading efficiency.

2. An efficient interaction mode between user space and kernel space is realized. The modification operations of the two spaces can be directly reflected in the mapped area, so that they can be captured by the other space in time.

3. Provide a way for processes to share memory and communicate with each other. Whether it is a parent-child process or an unrelated process, they can map their own user space to the same file or anonymously map it to the same area. Thus, by changing the mapping area, the purpose of inter-process communication and inter-process sharing can be achieved.

At the same time, if process A and process B both map area C, when A reads C for the first time, the file page is copied from the disk to the memory through a page fault; but when B reads the same page of C again, although a page fault exception will also occur, there is no need to copy the file from the disk, and the file data already stored in the memory can be used directly.

4. It can be used to achieve efficient large-scale data transmission. Insufficient memory space is one of the aspects that restrict big data operations. The solution is often to use hard disk space to assist operations and make up for the lack of memory. However, this will further cause a large number of file I/O operations, greatly affecting efficiency. This problem can be well solved by mmap mapping. In other words, whenever disk space is needed instead of memory, mmap can play its role.

mmap related functions

Function prototype

void *mmap(void *start, size_t length, int prot, int flags, int fd, off_t offset);

Return Description

On successful execution, mmap() returns a pointer to the mapped area. On failure, mmap() returns MAP_FAILED [which is (void *)-1], and error is set to one of the following values:

Return error type

parameter

start: the starting address of the mapping area

length: the length of the mapping area

prot: The desired memory protection flag, which cannot conflict with the file's open mode. It is one of the following values, which can be reasonably combined through the or operation

prot

flags: Specifies the type of mapping object, mapping options, and whether the mapping page can be shared. Its value can be a combination of one or more of the following bits

flag

fd: A valid file descriptor. If MAP_ANONYMOUS is set, its value should be -1 for compatibility reasons.

offset: the starting point of the mapped object content

Related functions

int munmap( void * addr, size_t len ​​)

On successful execution, munmap() returns 0. On failure, munmap returns -1, and the error return flag is the same as mmap;

This call releases a mapping relationship in the process address space. addr is the address returned when mmap() is called, and len is the size of the mapping area.

When the mapping relationship is released, access to the original mapping address will cause a segmentation fault.

int msync(void *addr, size_t len, int flags)

Generally speaking, changes made by a process to shared content in the mapped space are not written directly back to the disk file; this operation is often performed after calling munmap().

The contents of the file on disk can be made consistent with the contents of the shared memory area by calling msync().

mmap usage details

1. A key point to note when using mmap is that the size of the mmap mapping area must be an integer multiple of the physical page size (page_size) (usually 4k bytes in a 32-bit system). The reason is that the smallest granularity of memory is a page, and the mapping of the process virtual address space and memory is also in pages. In order to match the operation of memory, the mapping of mmap from disk to virtual address space must also be pages.

2. The kernel can track the size of the underlying object (file) mapped by memory, and the process can legally access those bytes within the current file size and within the memory mapping area. In other words, if the size of the file is constantly expanding, the process can legally obtain the data as long as it is within the mapping area, regardless of the size of the file when the mapping is established. For specific situations, see "Situation 3".

3. After the mapping is established, the mapping still exists even if the file is closed. This is because the mapping is to the disk address, not the file itself, and has nothing to do with the file handle. At the same time, the effective address space that can be used for inter-process communication is not completely limited by the size of the mapped file, because it is mapped by page.

Based on the above knowledge, let's look at the specific situation if the size is not an integer multiple of the page:

Case 1: The size of a file is 5000 bytes. The mmap function starts from the beginning of a file and maps 5000 bytes into virtual memory.

Analysis: Because the size of a physical page is 4096 bytes, although the mapped file is only 5000 bytes, the size of the corresponding process virtual address area must meet the full page size. Therefore, after the mmap function is executed, 8192 bytes are actually mapped to the virtual memory area, and the bytes from 5000 to 8191 are filled with zeros. The corresponding relationship after mapping is shown in the following figure:

at this time:

(1) Read/write the first 5000 bytes (0-4999), and the operation file content will be returned.

(2) When reading bytes 5000 to 8191, the results are all 0. When writing bytes 5000 to 8191, the process will not report an error, but the written content will not be written to the original file.

(3) Reading/writing disk parts other than 8192 will return a SIGSECV error.

Scenario 2: The size of a file is 5000 bytes. The mmap function starts from the beginning of a file and maps 15000 bytes into virtual memory, that is, the mapped size exceeds the size of the original file.

Analysis: Since the file size is 5000 bytes, it corresponds to two physical pages, just like in case 1. Then both physical pages are legal and can be read and written, but the part exceeding 5000 will not be reflected in the original file. Since the program requires mapping 15000 bytes, and the file only occupies two physical pages, 8192 bytes to 15000 bytes cannot be read or written, and an exception will be returned during the operation. As shown in the following figure:

at this time:

(1) The process can read/write the first 5000 bytes (0-4999) of the mapping normally, and the changes made by the write operation will be reflected in the original file after a certain period of time.

(2) For bytes 5000 to 8191, the process can read and write without error. However, the content is 0 before writing, and it is not reflected in the file after writing.

(3) For bytes 8192 to 14999, the process cannot read or write them and will report a SIGBUS error.

(4) For bytes other than 15000, the process cannot read or write them, which will cause a SIGSEGV error.

Scenario 3: The initial size of a file is 0. The mmap operation is used to map a size of 1000*4K, that is, 1000 physical pages of approximately 4M bytes of space. mmap returns the pointer ptr.

Analysis: If the file is read or written at the beginning of the mapping, since the file size is 0 and there is no legal physical page corresponding to it, a SIGBUS error will be returned, just like in case 2.

However, if the file size is increased before each operation of ptr to read or write, then the operation of ptr within the file size is legal. For example, if the file is expanded by 4096 bytes, ptr can operate the space of ptr ~ [(char)ptr + 4095]. As long as the range of file expansion is within 1000 physical pages (mapping range), ptr can operate the same size.

In this way, it is convenient to expand the file space at any time and write files at any time without wasting space.

<<:  7 Micro-Interactions to Improve User Experience

>>:  Things about the bad code you wrote

Recommend

Guide to writing event operation plan!

No matter what kind of operational activities you...

Lao Duan said: Four key changes in Internet TV

In 2013, the Internetization of television became...

How to write copy without any flaws? Here are 4 practical tips

How do top writers treat their writing? Hemingway...