Preface Well, that's the trick you asked for. Along with this article, I also released two libraries, CTPersistance and CTJSBridge. If you encounter any problems when using them, please raise an issue, PR or comment. I will reply to every issue, PR and comment. The persistence solution is a topic worth discussing, whether it is on the server or client side. Especially on the server side, the quality of the persistence solution will often affect the performance of the product to a certain extent. However, on the client side, only a few business needs will involve the persistence solution, and in most cases, the performance requirements of the persistence solution are not particularly stringent. So when I design the persistence solution on the mobile side, I consider more the maintainability and scalability of the solution, and then performance tuning on this basis. In this article, performance tuning will not be discussed in a separate section, but will be interspersed in various subsections. If you are interested, you can focus on it. The impact of persistence solutions on the entire App architecture is similar to that of network layer solutions on the entire architecture, and is generally the culprit for the high coupling of the entire project. As always, I am a practitioner of de-modeling. In the process of de-modeling the persistence layer, I introduced the design of Virtual Record, which will also be described in detail in this article. This article mainly discusses the following points: Determine the persistence solution based on needs Isolation between the persistence layer and the business layer How the persistence layer and the business layer interact Data Migration Solution Data synchronization solution In addition, for database storage, I wrote a CTPersistance library, which can currently meet most persistence layer requirements and is also an example of my Virtual Record design idea. This library can be directly introduced by cocoapods. I hope you can raise more issues when using it. Here is the CTPersistance Class Reference. Determine the persistence solution based on needs When there is a need for persistence, we have many options: NSUserDefault, KeyChain, File, and countless sub-options based on databases. Therefore, when there is a need for persistence, the first thing we consider is what means should be used to achieve persistence. NSUserDefault Generally speaking, small-scale data and data with weak business relevance can be put into NSUserDefault, while data with more content and data with strong business relevance are not suitable for NSUserDefault. Another thing I want to complain about is that the Tmall App actually does not have a designed data persistence layer. The persistence solution in Tmall is very confusing. I have seen some business lines put most of the business data into NSUserDefault. When I read the code, I was shocked... When asked why they did this, they said it was because it was convenient to write. Damn it... Keychain Keychain is a storage mechanism with reversible encryption provided by Apple, which is widely used for various password storage needs. In addition, since the data in Keychain can still be retained as long as the system is not reinstalled after uninstalling the App, and can be synchronized with iCloud, everyone will store the user's unique identification string here. Therefore, sensitive small data that needs to be encrypted and stored in iCloud is generally placed in Keychain. File Storage File storage includes Plist, archive, Stream and other methods. Generally, structured data or data that needs to be easily queried will be persisted in the form of Plist. Archive is suitable for storing large amounts of data that is not used frequently, or data that you want to directly objectify after reading. Because Archive will serialize objects and their object relationships, it takes a lot of time to decode when reading data. The decoding process can be decompression or objectification, which can be determined according to the specific implementation. Stream is a general file storage, generally used to store pictures and the like, suitable for data that is used frequently but the amount of data is not very large. Database storage As for database storage, there are more options. Apple comes with a Core Data, and of course there are countless alternatives in the industry. However, in addition to Core Data, FMDB is more commonly used in the iOS field. The database solution is mainly to facilitate addition, deletion, modification and query. When the data has states and categories, it is better to use a database solution, and especially when these states and categories are strongly business-related, a database solution should be used. Because you can't traverse the file through the file system to identify the data that you need to obtain that belongs to a certain state or category, the cost of doing so is too high. Of course, a particularly large amount of data is not suitable for direct storage in the database. For example, data such as pictures or articles generally have a file name in the database, and this file name points to a certain picture or article file. If you really need to do full-text indexing, it is recommended that you hang an API and throw it to the server to do it. In general NSUserDefault, Keychain, File and other persistence solutions are very simple and basic. Just know when to use each one. Don’t write randomly like Tmall. And there are no more complex derivative requirements. If you really want to write an article about them, it’s nothing more than writing about how to store and read. You can find it by just searching Google, so I won’t waste my time. Since most of the complex derivative requirements are met by using database-based persistence solutions, the focus of this article is on the design and implementation of database-related architecture solutions. If there are any questions that I haven’t written about in the article, you can ask in the comment area, and I will answer them one by one or directly add the missing content to the article. Isolation to be aware of when implementing the persistence layer When designing the persistence layer architecture, we need to focus on the following aspects of isolation: Isolation of the persistence layer and the business layer Database read-write isolation Isolation caused by multi-threaded control Isolation of data representation and data manipulation 1. Isolation of persistence layer and business layer About Model Before discussing the data processing in the persistence layer, I think a complete analysis of this issue is needed. In the design of the View layer, I mentioned the design ideas of fat model and thin model respectively, and told you that I prefer the design idea of fat model. In the design of the network layer, I used the idea of de-modeling to design the data interaction between APIManager and the business layer. These two seemingly contradictory design ideas about the model are actually not contradictory in the persistence layer solution I will propose next, and they cooperate with each other. In the article on network layer design, I only gave ideas and practices for de-modeling, and there were not many related explanations, because explaining this issue involves a wide range of aspects, and when I was writing it, I didn’t think it was the best time to explain it in that article. Since the persistence layer involves both fat model and de-modeling here, I think it would be better to explain this topic when talking about the persistence layer. When I communicate with architects in various fields, I find that everyone will more or less mix up the concepts of Model and Model Layer, which often leads to different issues being discussed. When talking about Model, he will talk to you about Model Layer, then well, I will talk to you about Model Layer, but he ends up talking about Model again, and the discussion stops. I think as an architect, if you don't distinguish between these two concepts, it will definitely have a great impact on the quality of the architecture you design. If we refer to Model as Data Model and put it together with Model Layer, it will be easy to distinguish the concepts. Data Model The term Data Model refers to the problem domain of business data modeling and the representation of this data model in the code. The two complement each other: the modeling scheme of business data and the characteristics of the business itself ultimately determine the representation of data. When operating a batch of data, your data modeling scheme is basically to abstract a logical entity after refining the business problem. When implementing this business, you can choose different representation methods to represent this logical entity, such as byte streams (TCP packets, etc.), string streams (JSON, XML, etc.), and object streams. Object streams are divided into general data objects (NSDictionary, etc.) and business data objects (HomeCellModel, etc.). We have already gone through all the forms of Data Model. Conventionally, when we discuss Modeling, we only refer to the business data objects in the object flow. However, de-modeling means that more general data objects are used to represent data, and business data objects are not given priority in design. The general data objects here can be understood as paradigms to some extent. Model Layer The problem domain described by the Model Layer is how to perform CURD (Create, Update, Read, Delete) and related business processing on data. Generally speaking, if the design concept of thin Model is adopted in the Model Layer, it is almost as far as CURD. The fat Model will also be concerned with how to provide services other than CRUD to the upper layer that needs data, and provide them with corresponding solutions. For example, caching, data synchronization, weak business processing, etc. My Tendency I prefer de-modeling design. In the network layer, I designed reformer to achieve de-modeling. In the persistence layer, I designed Virtual Record to achieve de-modeling. Because the specific model is a practice that easily introduces coupling, while weakening the model concept as much as possible, it can provide sufficient space for introducing and connecting businesses. At the same time, the purpose of distinguishing strong and weak businesses can be achieved through the design of de-modeling, which is crucial in future code migration and maintenance. Many poorly designed architectures are because the architects do not realize the importance of distinguishing strong and weak businesses, so the architecture decays quickly and becomes increasingly difficult to maintain. So, the isolation between the persistence layer and the business layer is achieved through the isolation of strong and weak businesses. Virtual Record achieves the isolation of strong and weak businesses precisely because of this de-modeling design, and thus achieves a balance between the persistence layer and the business layer, which is both isolated and interactive. I will analyze the specific design of Virtual Record later. 2. Database read-write isolation In the website architecture, database read-write separation is mainly to improve response speed. In the iOS application architecture, the design of read-write isolation of the persistence layer is mainly to improve the maintainability of the code. This is also an aspect in which the two fields require architects to focus differently when designing the architecture. Here, what we call read-write isolation does not mean isolating the read and write operations of data. Instead, it is based on a certain boundary. All data models outside this boundary are not writable or modifiable, or the behavior of modifying attributes does not affect the data in the database. Data within this boundary is writable and modifiable. Generally speaking, the boundary we divide during design will be consistent with the boundary between the persistence layer and the business layer, that is, after the business layer obtains data from the persistence layer, it cannot be written or modified, or the business layer's write and modify operations on this data model will have no effect on the content of the database file. Only operations in the persistence layer can have an effect on the content of the database file. In the architecture design of the persistence layer solution Core Data provided by Apple, there is no isolation for read and write output, and the data results are all thrown out as NSManagedObject. Therefore, as long as the business engineer accidentally changes a certain attribute, the NSManagedObjectContext will save this change when saving. In addition, when we need to perform AOP slicing for all add, delete, modify and query operations, the implementation of the Core Data technology stack will be very complicated. Overall, I think Core Data is over-designed for most needs. When I designed the persistence layer of the Anjuke chat module, I used Core Data. Then, for read-write isolation, I converted all the thrown NSManagedObjects into ordinary objects. In addition, because the chat record business is quite complex, after using Core Data, in order to meet the needs, a lot of Hack methods had to be introduced. This approach has reduced the maintainability of the persistence layer to a certain extent and increased the learning curve of the engineers who took over the module, which is not good. In the Tmall client, when I went there, the Tmall App had basically no persistence layer at all, which was quite chaotic. It can only rely on each business line to show its own magic to solve the data persistence needs, and it is difficult to promote a unified persistence layer solution. For project maintenance, especially cross-business project cooperation, it is basically no different from a car accident scene. I have resigned from Tmall now. If there are readers who are from Alibaba and want to get promoted and want to show their presence and get 3.75, you can consider making a unified persistence layer solution for Tmall. Read-write isolation also makes it easier to add AOP pointcuts, because write operations to the database are isolated to a fixed place, and it is easy to put slices in the right place when adding AOP. This will be seen in the application when talking about data synchronization solutions. 3. Isolation caused by multithreading Core Data Core Data requires that in a multi-threaded scenario, an NSManagedObjectContext be generated for asynchronous operations, and then its ConcurrencyType be set to NSPrivateQueueConcurrencyType, and finally the parentContext of this Context be set to the Context under the Main thread. This is much easier than using the original SQLite for multi-threading. However, it should be noted that when passing an NSManagedObject, you cannot directly pass the pointer of the object, but the NSManagedObjectID. This is the isolation of object transfer in a multi-threaded environment, which needs to be paid attention to when designing the architecture. SQLite Pure SQLite actually supports multi-threading directly. The SQLite library provides three methods: Single Thread, Multi Thread, and Serialized. Single Thread mode is not thread-safe and does not provide any synchronization mechanism. Multi Thread mode requires that the database connection cannot be shared among multiple threads, and there are no other special restrictions on its use. Serialized mode, as the name implies, uses a serial queue to execute all operations. For users, except for the slower response speed, there are basically no restrictions. In most cases, the default mode of SQLite is Serialized. Based on the performance of Core Data in multi-threaded scenarios, I think Core Data should use Multi Thread mode when using SQLite as a data carrier. SQLite uses a read-write lock in Multi Thread mode, and it locks the entire database, not a table lock or a row lock. This is something that architects need to pay attention to. If the response speed is very high, it is recommended to open an auxiliary database, write a large write task to the auxiliary database first, and then split it into several small write tasks to write to the main database once in a while, and then delete the auxiliary database after writing. However, from practical experience, the read and write operations required for persistence of local apps are generally not large, and as long as several points are paid attention to, it will generally not affect the user experience. Therefore, compared with the Multi Thread mode, I think the Serialized mode is a more cost-effective choice. The code is easy to write and maintain, and the performance loss is not large. I think it is not worth sacrificing the maintainability of the code in order to improve the performance of tens of milliseconds. Realm I haven't had time to study Realm in detail, so I can't say much. 4. Isolation of data expression and data manipulation This is the point that is most easily overlooked. Whether the isolation of data expression and data operation can be done well directly affects the scalability of the entire program. For a long time, we are accustomed to the data operation and expression of Active Record type, such as this:
Or this:
Simply put, let an object map a table in the database, and then operations on this object are equivalent to operations on this table and the data expressed by this object. One bad thing here is that this Record is both a mapping of the data table in the database and a mapping of a certain data in this table. I have seen many frameworks (not limited to iOS, including Python, PHP, etc.) mix the two together. If data operations and data expressions are organized in this inappropriate way, it will be difficult to distinguish between strong and weak businesses in the practice of fat models, causing great difficulties. I think the practice of using thin models itself has shortcomings. I have already talked about the details in the opening, so I won’t go into details here. The biggest difficulty caused by the inability to distinguish between strong and weak businesses lies in code reuse and migration, because the high coupling of strong businesses in the persistence layer to the View layer business is inevitable, while weak businesses, relatively speaking, are only coupled to the lower layer and not to the upper layer. When we do code migration or reuse, we often hope to reuse weak businesses rather than strong businesses. If the strong and weak businesses are inseparable at this time, code reuse is out of the question and migration becomes doubly difficult. In addition, mixing data operations and data expressions together will lead to the following problems: objectively, data can be expressed in a variety of ways in the view layer business, which may be a view or a separate object. If the data object mapping the database table is used to map the data, this diversity will be limited, and an additional layer of conversion will have to be performed every time the data is used in the actual coding. I think the reason for this bad practice is that the mapping of objects to data tables and the mapping of objects to data expressions are very similar, especially when expressing columns, they are almost identical. The key point to distinguish between mapping to data tables and mapping to data here is: the operation starting point of this mapping object is internal or external operation relative to the data table. If it is an internal operation, then the scope of this operation is limited to the current data table, and it is more appropriate to map these operations to the data table model. If it is an external operation, other data tables may be involved when executing these operations, so these operations should not be mapped to the data table object. Therefore, in actual operation, I encapsulate objects for operations based on data tables, and then encapsulate objects for data records. The operations in the data table are all ordinary addition, deletion, modification and query operations for records, which are weak business logic. Data records are just a way of expressing data, and these operations are best delivered to objects in charge of strong business in the data layer to execute. I will continue to talk about the details below. How the persistence layer and the business layer interact At this point, we have to talk about CTPersistance and Virtual Record. I will use them to explain the interaction between the persistence layer and the business layer.
Let me explain this diagram first: the persistence layer has a DataCenter that is specifically responsible for connecting to the View layer module or business, and they interact through Record. The DataCenter provides a business-friendly interface to the upper layer, which is generally a strong business: for example, returning data that meets the requirements based on user screening conditions, etc. Then DataCenter schedules each Table in this interface, performs a series of business logic, and finally generates a record object and delivers it to the View layer business. In order to complete the tasks delivered by the View layer, DataCenter will involve data assembly and cross-table data operations. Data assembly is different because of the different requirements of the View layer, so it is a strong business. Cross-table data operations are essentially a combination of single-table data operations. DataCenter is responsible for scheduling these single-table data operations to obtain the desired basic data for assembly. Then, at this time, single-table data operations are weak businesses, which are completed by Table mapping objects. The Table object generates the corresponding SQL statement through QueryCommand, delivers it to the database engine to query and obtain data, and then delivers it to the DataCenter. DataCenter and Virtual Record Before talking about Virtual Record, we must first talk about DataCenter. DataCenter is actually a business object. DataCenter is the glue between the persistence layer and the business layer in the entire App. It opens a business-friendly interface to the business layer, and then completes the strong business logic by scheduling the weak business logic and data records of each persistence layer, and delivers the generated results to the business layer. Since DataCenter is between the business layer and the persistence layer, the carrier it needs to execute the business logic must be understandable by both the business layer and the persistence layer. CTPersistanceTable encapsulates weak business logic and is called by DataCenter to operate data. Virtual Record is the data carrier mentioned above that can be understood by both the business layer and the persistence layer. Virtual Record is not actually an object, it is just a protocol, that's why it is virtual. As long as an object implements Virtual Record, it can be directly operated as a Record by the persistence layer, so it is also a Record. Put together, it is a Virtual Record. Therefore, the implementer of Virtual Record can be any object, which is generally a business layer object. In the business layer, the common way of expressing data is generally View, so generally speaking, the implementer of Virtual Record will also be a View object. Let's review the traditional data operation process: generally, data is retrieved from the database first, then modeled into an object, and then the model is thrown out, the Controller is converted into a View, and then the subsequent operations are performed. Virtual Record follows similar steps. The only difference is that in the whole process, it does not require an intermediate object to express the data. Different ways of expressing data are completed by the implementers of each Virtual Record, and there is no need to put these codes in the Controller. Therefore, this is a de-modeled design. If there is a need to reuse this data conversion logic in the future, you can directly reuse Virtual Record, which is very convenient. The key to making good use of Virtual Record is that the interface provided by DataCenter is business-friendly enough and has sufficient business context. Therefore, DataCenter is generally held by Controller, so if there is only one DataCenter for the entire App, this is not a good thing. I have seen many Apps whose persistence layer is a global singleton, and all persistent businesses use this singleton, which is a very painful approach. DataCenter also needs to be highly differentiated according to the business. Each large business must provide a DataCenter, and then hang it under the relevant Controller and hand it over to the Controller for scheduling. For example, it is differentiated into SettingsDataCenter, ChatRoomDataCenter, ProfileDataCenter, etc. In addition, it should be noted that there should be no business overlap between several DataCenters. If the business of a DataCenter is really large, then split it into several small businesses. If a single small business is very large, then split it into various Categories. For specific practices, you can refer to the practice of CTPersistanceTable and CTPersistanceQueryCommand in my framework. In this way, if you want to migrate a strong business involving the persistence layer, you only need to migrate DataCenter. If you want to migrate a weak business, you only need to migrate CTPersistanceTable. Actual scenario Assume that the business layer has collected the user's filtering conditions at this time:
Then ViewController calls the interface provided by DataCenter to the business layer to obtain data for direct display:
In fact, what needs to be done in the View layer has already ended here. Now let's look back at how DataCenter implements this business:
Basically, the process is as above. Generally speaking, architects design poor persistence layers without separating strong business from weak business by designing DataCenter and Table. The DataCenter and Table objects are designed mainly to facilitate code migration. If you want to migrate strong business, you can take away DataCenter and Table together. If you want to migrate only weak business, you can take away Table. In addition, I would like to emphasize this concept through the code: distinguish between Table and Record. This has been shown in the architecture diagram I drew before, but it was not emphasized above. In fact, many other architects did not distinguish between Table and Record when designing the persistence layer framework. Yes, the frameworks I mentioned here include Core Data and FMDB. This is not limited to the iOS field. CodeIgniter, ThinkPHP, Yii, Flask, etc. do not make this distinction. (Here is a complaint. I mentioned above that Core Data is over-designed. In fact, it is not designed where it should be designed, and various designs are piled up where it should not be designed...) The above is a brief introduction to the design of Virtual Record. Next, we will start to discuss how to interact in different scenarios. The most familiar scenario is this: a data object is assembled through various logics, and then the data object is delivered to the persistence layer for processing. I call this scenario a one-to-one interaction scenario. The implementation of this interaction scenario is very traditional, just as you think, and it is in the test case of CTPersistance, so I won't say more here. So, since you already know that there is a one-to-one, then it is natural that there will also be many-to-one and one-to-many interaction scenarios. Below I will describe one by one how Virtual Record takes advantage of virtuality to interact with different scenarios. In a many-to-one scenario, how does the business layer interact with the persistence layer? There are actually two ways to understand the many-to-one scenario. One is that the data of a record is composed of data from multiple Views. For example, a user table contains all the information of a user. Then some Views only contain user nicknames and user avatars, and some objects only contain user IDs and user tokens. However, all this data only exists in one user table, so this is a scenario where the data of multiple objects constitute a complete Record data. This is one of the understandings of the many-to-one scenario. The second understanding is this: for example, a ViewA object contains all the information of a Record, and another ViewB object actually also contains all the information of a Record. This is a scenario where multiple different objects express a Record data, which is also an understanding of a many-to-one scenario. At the same time, the so-called interaction here is divided into two directions: storage and retrieval. In fact, the solutions of these two understandings are the same. The implementer of Virtual Record completes the aggregation of record data by implementing the Merge operation, thereby realizing the storage operation. Through the Merge operation, any implementer of Virtual Record can deliver its own data to other different objects for expression, thereby realizing the retrieval operation. The specific implementation is explained in detail below. How to perform storage operations in a many-to-one scenario? Provides the method - (NSObject*)mergeRecord:(NSObject*)record shouldOverride:(BOOL)shouldOverride;. As the name implies, a record can be merged with another record. When shouldOverride is NO, the nil value on either side will be overwritten by the non-nil record on the other side. If neither object contains these empty data during the merge process, shouldOverride will be used to decide whether to let the data in the parameter record overwrite its own data. If shouldOverride is YES, even if it is nil, the existing value will be overwritten. This method will return the merged object for easy chain calling. Here is a code sample:
I won’t write the logic of collecting the values of a, b, and c. The basic idea is to obtain complete data by merging different record objects. Since it is a virtual record, the specific implementation is determined by each View. View is the object that knows its own properties best, so it has sufficient and necessary conditions to retrieve and merge its own data related to the persistence layer. Then this data-collecting code is distributed to each View object accordingly, and the Controller can be very clean, and the overall maintainability is improved. If the traditional method is used, there will be a lot of codes for gathering data scattered in ViewController or DataCenter, and a long section of code for merging will appear when writing, which is very ugly and difficult to maintain. How to perform the fetch operation in a many-to-one scenario? In fact, this statement is not appropriate, because no matter how the Virtual Record is implemented or who the object is, as long as the data is retrieved from the database, the data can be guaranteed to be complete. A more accurate statement here is how to deliver the data to different objects after it is retrieved. In fact, the mergeRecord method mentioned above is still used to handle this.
In this way, the data recorded by a can be easily handed over to b and c. The code looks great and is easy to write and maintain. In a one-to-many scenario, how does the business layer interact with the persistence layer? There are two ways to understand the one-to-many scenario. One is that an object contains data from multiple tables, and the other is that an object is used to display data from multiple tables. This code sample has actually been mentioned before in the article, and this section will emphasize it. At first glance, there is no difference between the two, so I need to point out that the former emphasizes inclusion, that is, this object is a melting pot, composed of data from multiple tables. Let’s take the example of a user list: Assume that there are multiple user-related tables in the database. In most cases, this is because there are too many columns in a single table, so vertical slicing is performed to improve maintainability and query performance. One more thing to say, in actual operation, vertical slicing is mostly divided into multiple different tables according to business scenarios to express the partial data related to each business of the user. Therefore, the result of vertical slicing is to split a table with a lot of columns into several tables with fewer columns. Although the database has been sliced vertically, some scenarios still need to display complete data, such as the user details page. Therefore, the View of this user details page may contain the user basic information table (user name, user ID, user token, etc.) and the user detailed information table (user email address, user mobile phone number, etc.). This is what it means that a one-to-many object contains data from multiple tables. The latter emphasizes presentation. For example, there are three tables in the database: The data of second-hand houses, new houses and rent houses are stored in three tables respectively, which is actually a kind of cross-cutting. Cross-cutting is also a database optimization method. The difference between cross-cutting and vertical tangent is that cross-cutting is performed on the premise of retaining the integrity of this set of data. The result of cross-cutting is to divide a table with a large amount of data into several tables with a small amount of data. That is, the three types of houses can be stored with the same table, but the data volume is too large, and the database response speed will decrease. Therefore, these three tables are divided into these three tables according to the type of house. Cross-cutting is also based on ID cutting, such as determining which tables are divided into based on the result of ID collection. This practice is relatively wide because it is convenient to expand. At that time, the data table will be larger. At worst, the divisor can be changed to a larger number. In fact, cross-cutting is also possible according to the type, but it is not so convenient to expand. I just talked about it and now I will talk about it again. When displaying these three tables, the interfaces are slightly different depending on the type, so I will still use the same View to display these three types of data. This is the meaning of a one-to-many object to display the data of multiple tables. How to access an object when data from multiple tables is included? When performing a fetch operation, it is actually the same as the previous many-to-one fetch operation, just use Merge operation.
When performing storage operations, the implementer requires the implementer to implement - (NSDictionary *)dictionaryRepresentationWithColumnInfo:(NSDictionary *)columnInfo tableName:(NSString *)tableName; This method allows the implementer to return the corresponding data based on the incoming columnInfo and tableName, so that the content you care about when storing the data this time is provided to the persistence layer. The code example is like this:
Through the above access cases, you will find that after using Virtual Record, the amount of code is reduced a lot. The original messy code used to piece together conditions has been scattered into the implementation of various virtual records, so the code maintenance becomes quite convenient. If traditional practices are adopted, a large piece of logic is necessary to write before accessing again. If it involves code migration, this large piece of logic must be migrated, which is very painful. An object is used to display data from multiple tables. How to access it? In this case, the storage operation is actually the same as above, and it is stored directly. The implementer of Virtual Record will assemble the data based on the information of the table to be stored and provided to the persistence layer. The sample code is exactly the same as the storage operation in the previous section, so I will not copy and paste it. The fetching operation is different, but since the object is unique when fetching (because one-to-many), the code is also very simple:
Here a, b, and c are the same View, and then itemATable, itemBTable, and itemCTable are different types of tables. This example shows how an object is used to display different types of data. If you use traditional methods, you must write a lot of adaptation code here, but after using Virtual Record, these codes are digested by their respective implementers, and you can do not need to care about adaptation logic when executing data logic. Many-to-many scenarios? In fact, many-to-many scenarios are the arrangement and combination of the above one-to-many and many-to-one scenarios. The implementation methods are exactly the same, so I won’t talk much about it here. Summary of interaction schemes In the design of interactive solutions, architects should distinguish between strong and weak businesses, distinguish traditional Data Model into Table and Record, and DataCenter to implement strong business and Table to implement weak businesses. Here, since DataCenter is related to strong business, in actual coding, business engineers are responsible for creating DataCenter and providing business-friendly methods to the business layer, and then operate Table in DataCenter to complete the requirements delivered by the business layer. The benefits of distinguishing between strong and weak businesses and splitting Table and Record are: Reduce coupling through business segmentation, making code migration and maintenance very convenient By disassembling the data processing logic and data expression form, the code has very good scalability Isolate read and write to avoid bugs introduced by the business layer by mistake Provides a foundation for the practice of Virtual Record's design idea, and thus achieves a more flexible and business-friendly architecture Any architecture that does not distinguish between strong and weak businesses is an architect who is acting like a gangster, yes. When interacting with the business layer, the Virtual Record design idea is used to design the Record, and the specific business objects are used to implement Virtual Record, and it is used as a data medium between the DataCenter and the business layer to interact. Instead of using the traditional data model to interact with the business layer. The benefits of using Virtual Record are: Encapsulating data adaptation and data conversion logic into a specific Record implementation can make the code more abstract and concise and less code pollution When migrating data, you only need to migrate Virtual Record related methods, which is very easy to split. When a business engineer realizes business logic, he can greatly improve the flexibility of business implementation without losing maintainability. This part also mentioned the concepts of cross-cutting and vertical slicing. I originally planned to have a section dedicated to database performance optimization, but in fact, the performance optimization methods of databases in mobile App scenarios are not as rich and colorful as those of the server. Many awesome technologies and parameter tuning methods are not used even if you want to use them. Almost all the data slicing methods are more effective, so there is nothing to write about in performance optimization. In fact, after you understand the slicing methods and scenarios, it is enough to optimize according to your own business scenarios. Using the Time Profile of Instrument and with some functions provided by SQLite, it is enough to find where the slowness is and then do performance tuning. But if I write these out, it will become a teaching you how to use tools. I feel that this is too low and it is not exciting to write. If you are interested in searching the user manual and reading it. Database version migration plan Generally speaking, apps with persistence layers will also have the need for version migration. When a user installs an old version of the app, after updating the app, if the table structure of the database needs to be updated, or the data itself needs to be updated in batches, a version migration mechanism is needed to perform these operations. However, the version migration mechanism also takes into account the cross-version migration requirements, so basically there is only one major solution: establish database version nodes and run over one by one when migrating. In fact, data migration is relatively simple to implement, so it is not a big problem to do the following: Record changes to each database version according to the application version and encapsulate these changes into objects Record the current database version to facilitate comparison with migration records Perform migration operations when starting the database, and if the migration fails, provide some downgrade options In terms of data migration, if you want to add new data tables that are not originally in the database, they will be automatically created when using tables. Therefore, for business engineers, there is no need to do anything extra, just use them directly. Putting this part of the work here also saves some steps for database version migration. CTPersistance also provides a Migrator. Business engineers can write a Migrator for a certain database by themselves. This Migrator must be derived from CTPersistanceMigrator and meets the same. Just provide a dictionary of migrationStep and an array of records version order. Then write the class name of your own derived Migrator and the corresponding database name of your database in CTPersistanceConfiguration.plist. CTPersistance will find the Migrator according to the configuration in the plist at the initial database and perform the logic of database version migration. One thing to note when migrating versions is performance issues. We generally do not do version migration on the main thread, so there is no need to say that this is naturally not a question. It should be emphasized that SQLite itself is a database engine with very strong fault tolerance, so when executing every SQL, it is a Transaction inside. When a certain version of SQL is particularly large, it is recommended to establish a Transaction in the version migration method and then wrap all the relevant SQLs, so that SQLite will execute these SQLs faster. There seems to be nothing else to emphasize. If there is nothing to say, you can put it in the comment section. Data synchronization scheme There are roughly two types of data synchronization schemes: one type is one-way data synchronization and the other type is two-way data synchronization. Next, I will talk about the design of these two types of data synchronization schemes separately. One-way data synchronization One-way data synchronization means only synchronizing operations of local newer data to the server, and will not actively pull the synchronization operations from the server. For example, in an instant messaging application, after a device sends a message, it needs to wait for the server to return to know whether the message is sent successfully, whether the cancellation is successful, and whether it is deleted successfully. Then the data recorded in the database will change its state as these operations are successful. However, if another device continues to perform operations, only the old data will be pulled on the new device, such as chat records. However, there is no need to delete or modify the old data, so the new device will not ask the server for data synchronization operations, so it is called one-way data synchronization. In general, there is no need for a job to do a regular update. If an operation has not received confirmation from the server for a long time, then the application side can be considered to have failed this operation, and then the failed operations are generally displayed on the interface, and then the user will check the operations that need to be retryed, and then re-initiate the request. When WeChat fails to send a message, there is a red circle in front of the message, and there is an exclamation point inside. Only when the user clicks this exclamation point will the message be resented, and there will be no job behind it. So after refining the requirements, we found that one-way data synchronization only requires the state of synchronizing the data. How to complete the requirements of one-way data synchronization Add an identifier The purpose of adding an identifier is mainly to solve the problem of inconsistency between the primary key of the client data and the primary key of the server data. Since it is unidirectional data synchronization, the data producer will only be the current device, so the identifier should be generated by the device. When the device initiates a synchronization request, it will bring the identifier with it, and when the server completes the task and returns the data, it will also bring these identifiers. Then the client updates the status of the local data according to the identifier given by the server. The identifier generally uses a UUID string. Add isDirty isDirty is mainly used to identify data insertion and modification. When new data is generated locally or updated, isDirty is set to YES before receiving the server's confirmation return. After the server's confirmation packet returns, the data is found according to the identifier provided in the package and set to NO. This completes the synchronization of the data. However, this is just a simple scenario. There is a more extreme situation in that when the request is initiated and the request is received, the user modifies the data again. If according to the current logic, after receiving the request reply, the isDirty of the modified data will be set to NO, so the new modification will never be synchronized to the server. The simple solution for this extreme situation is that the user does not allow the user to modify it on the interface during the request is initiated and the reply is received. If you want to do it more carefully and allow users to modify during the synchronization request, you need to add an additional DirtyList to the database to record these operations. There must be at least two fields in this table: identifier and primaryKey. Then each operation is assigned an identifier, and then a new modification operation has a new identifier. When synchronizing, find the record in the original data table according to the primaryKey, and then hand the data together with the identifier to the server. Then after the server's confirmation package comes back, just take out the identifier and delete the operation record. This table can also directly serve multiple tables, but an additional tablename field is needed to facilitate the data being found when initiating a synchronization request. Add isDeleted When there is a need for data synchronization, the deletion operation cannot be a simple physical deletion, but just a logical deletion. The so-called logical deletion is to record the isDeleted of this record as YES in the database. Only when the server's acknowledgement packet returns can the record be deleted. The difference between isDeleted and isDirty is that after receiving the confirmation packet, if the data pointed to by the returned identifier is isDeleted, then the data must be deleted. If the pointed data is only newly inserted data and updated data, then the status is only necessary. The operations performed by inserting data and updating data after receiving the data packet are the same, so it is enough to use isDirty to distinguish it. In short, this is a distinction based on the different operations after receiving the confirmation packet. Both must be included, and neither can be missing. In the requested packet, add dependencyIdentifier In many other data synchronization schemes I have seen, dependencyIdentifier is not provided, which will lead to a problem like this: suppose that two data synchronization requests are issued together, A first and B first. As a result, B first arrives, and A later arrives. If a series of synchronization operations requested by A contains the operation of inserting an object, and a series of synchronization operations requested by B happens to delete this object, then due to the confusing order of arrival, this data cannot be deleted. This is very easy to happen in the use scenario of mobile devices. The network environment of the mobile device itself is changing, and the packet is sent first and then arrives. The chance of this happening is relatively high. Therefore, in the requested data packet, we need to bring one of the series of identifiers from the last request, and it is enough. Generally, we choose the identifier of the last operation in the last request, so that the operation of the last request can be characterized. The server also needs to record the last identifier in the last 100 request packages. The reason why 100 is just a number that is determined by the head is just a slap. I think 100 is almost enough. When the client sends a request, denpendency should not involve the first 100 packages. When the server receives the synchronous request package, first check whether the denpendencyIdentifier has been recorded. If it has been recorded, then execute the operations in this package. If it has not been recorded, then put it first and wait, and wait until the conditions are met before executing. This can solve this problem. The reason why we do not need to update the time but to identify is because if we use time as identification, it can only be based on the time when the client sends the data packet. But sometimes the time of different devices may not be completely matched, and it will be a few seconds or milliseconds apart. In addition, if two devices initiate synchronization requests at the same time, the time of the two packets will be the same. Suppose A1 and B1 are requests sent by device 1, A2 and B2 are requests sent by device 2. If time is used to distinguish, after A1 arrives, B2 may be able to execute it directly, and A1 has not arrived at the server yet. Of course, this is also an extreme situation. If you use time, the server only needs to record one time. Any dependency time that is greater than this time must wait, which is more convenient to implement. However, in order to ensure as few bugs as possible, I think dependencies should be based on the identifier, which is better than time as the basis, and in fact, it does not increase much complexity when implemented. Summary of one-way data synchronization scheme Add the identifier, isDirty, isDeleted fields during the transformation. If operations on data are still allowed during the request period, then the identifier and primaryKey should be placed in a new table. After each data is generated, a corresponding identifier is generated. Then, as long as it is an operation for data, it will modify isDirty or isDeleted, and then initiate a request to bring the identifier and operation instructions to inform the server to perform related operations. If it is a complex synchronization method, then a new identifier is generated every time the data is modified, and then initiate a request to bring the relevant data to inform the server. The server performs operations based on the identifier and other data of the request package. After the operation is completed, the reply is made to the client to confirm. After receiving the server's confirmation package, find the corresponding record based on the identifier given by the server (sometimes there will be tablename, depending on your specific implementation). If it is a deletion operation, just delete the data directly. If it is an insertion and update operation, set isDirty to NO. If there is an additional table that records the update operation, just delete the operation record corresponding to the identifier. Things to note When using tables to record update operations, it is very likely that multiple update operations are performed on the same data within a short period of time. Therefore, before synchronization, it is best to merge the update operations of the same data to save the server's computing resources. Of course, if your server is too powerful, it doesn't matter. Two-way data synchronization Two-way data synchronization is more common in note-based and schedule applications. For a device, not only will it push up the data synchronization information, but it will also ask the server to actively request data synchronization information, so it is called two-way data synchronization. For example: After a device generates data for a certain period of time, it goes to another device and modifys the old historical data. When you return to the original device, the device needs to actively ask the server if the old data has been modified. If so, you need to download and synchronize these operations to the local area. The implementation of two-way data synchronization is more complicated than one-way data synchronization, and sometimes there is a need for real-time synchronization, such as collaborative editing. Since the solution itself is relatively complex, it is also necessary to take into account the difficulty of the business engineer (it mainly depends on the conscience of your architect), so it is still very interesting and challenging to implement the two-way data synchronization solution. How to complete the requirements of two-way data synchronization Encapsulate operation objects This actually involves a little bit when synchronizing one-way data, but since the requirements for synchronizing one-way data is not complicated, you just need to tell the server what data it is and what things you want to do. There is no need to encapsulate this operation. During bidirectional data synchronization, you also have to parse the data operations, so you have to agree on a protocol to each other. By encapsulating this protocol, you can encapsulate the operation object. This agreement should include: Unique identifier for operation Unique identifier of data Type of operation The specific data will be used mainly in Insert and Update. Dependency identification of operation The timestamp of the user's execution Let’s explain the significance of these 6 items: 1. Unique identifier for the operation This function is the same as the one-way synchronization solution. After receiving the server's confirmation packet, the local application can find the corresponding operation and perform the confirmation process. 2. Unique identification of data When performing the confirmation logic processing when finding specific operations, it involves processing of the object itself, and whether it is updated or deleted, it must be reflected in the local database. Therefore, this identification is used to find the corresponding data. 3. Type of operation The types of operations are Delete, Update, and Insert. For different operation types, the operations performed on the local database will be different, so use it to identify them. 4. Specific data When there is an Update or Insert operation during update, specific data needs to be participated in. Sometimes the data here may not be a single piece of data content, and sometimes it will be batch data. For example, mark all tasks before October 1 as completed. Therefore, how to express the specific data here also requires a protocol to determine when to perform the insertion or update operation as the content of a single piece of data, and when to operate as batch update, you can define this according to actual business needs. 5. Operation dependency identification Like the dependency logo mentioned above, it is to prevent the extreme situation of first sending packets and later sending packets from first sending packets. 6. Timestamp of user performing this operation Because it crosses the device and because old data will also be updated, there will be a possibility of conflict to a certain extent. After the operation data is synchronized from the server, it will be stored in a new table. This table is the data table to be operated. While performing these operations, it will be compared with the operation data in the data table to be synchronized. If it is an operation for the same data and these two operations conflict, then the timestamp will be used to decide how to execute it. Another way is to submit it directly to the interface to inform the user and let the user make a decision. Added data table to be operated and data table to be synchronized This has been mentioned in the previous section. The synchronization operation list pulled down from the server is in the data table to be executed. If there is a need to inform the server after the operation is completed, it is equivalent to telling the server to use a one-way synchronization plan. During the execution process, these operations must also be matched with the data table to be synchronized to see if there is any conflict. If there is no conflict, it will continue to be executed. If there is a conflict, either execute according to the timestamp or tell the user to make a decision. When pulling the list of operations to be executed, the identifier of the last operation must also be thrown to the server so that the server can return the corresponding data. The function of the data table to be synchronized is actually similar to that of the one-way synchronization scheme, which is to prevent the user from having operations when sending a request, and it is also to provide convenience for resolving conflicts. Before initiating a synchronization request, we should first check whether there is a list to be executed. After the synchronization list is synchronized, we can delete the records, and then hand over the local data to be synchronized to the server. After the synchronization is completed, the data can be deleted. Therefore, under normal circumstances, there will be only conflicts between the operation to be executed and the operation to be executed. Some things are considered conflicting in theory, such as obtaining the data to be executed is relatively late, but there is a conflict with the operation to be synchronized. We actually have no solution to this extreme situation, so we can only go there. However, this situation is also a relatively extreme situation and the probability of occurrence is not high. When to pull the list to be executed from the server Before throwing local data to the server for synchronization, you must pull the list of pending execution and upload the local synchronized data after execution. It will be updated every time you enter the relevant page to see if there are any new operations If the real-time requirement is high, either the client will start a thread locally to poll, or the server will push the operation to be executed through a long link. I can't think of the other things for the time being, it depends on the needs Summary of the two-way data synchronization scheme Design a synchronization protocol for interaction with the server and guide locally to perform synchronization operations Add data table to be executed and synchronized to record the operations to be executed and the operations to be synchronized to be synchronized Things to note I have also seen some solutions that directly throw SQL out for synchronization, but I don't recommend this. It is better to separate the operations and data and then refine them. Otherwise, you have to analyze SQL when detecting conflicts. If there are any bugs in this implementation, when solving this bug, you must consider the front-end compatibility issues, mechanism reconstruction costs, etc., because you are greedy for a while, and in the end you are lazy, it is actually not worth the money. Summarize This article mainly talks about how to design the persistence layer design scheme, as well as data migration scheme and data synchronization scheme. This emphasizes the isolation that various persistence layer solutions should be considered in designing, and proposes the design idea of Virtual Record, and explains it. Then some points to be considered when designing data migration solutions. In the data synchronization solution section, the design of one-way data synchronization solution and two-way data synchronization solution is discussed separately, but the specific implementation still needs to be weighed according to specific business needs. I hope everyone thinks that these contents are valuable to the problems encountered in their respective work. If there are any problems, please feel free to discuss them in the comment section. In addition, regarding dynamic deployment solutions, there is actually no particularly good dynamic deployment solutions in the iOS field until today. I think the most reliable one is actually the Hybrid solution of H5 and Native. In my opinion, React Native still has more restrictions than Hybrid. Regarding the Hybrid solution, I also provide the CTJSBridge library to achieve this requirement. The dynamic deployment solution has actually been written for a long time. The reason why it has not been released is because I feel that there was no silver bullet to solve the dynamic deployment of iOS App at that time, and there are also some problems that have not been considered clearly. I have confirmed that the problems I thought of at the beginning have no solution. I always thought that the dynamic deployment solution I wrote could not be published as a separate article, so I will put this article here as a reference for you. |
<<: Twitter lays off 336 employees, with engineering department making the biggest changes
>>: What is hidden in Apple's top-secret laboratory, the birthplace of black technology?
As an important means of production and transport...
At the beginning of 2020, a sudden epidemic cause...
When Chu Chu saw that the "teaching material...
□ Right to speak Using electric-powered furnaces ...
More and more Chinese Internet companies are begi...
A Mustang new energy vehicle caught fire while cha...
Changsha's high-quality tea drinking and reli...
Training course content: 1. Title design: Make pe...
You have been sitting on the toilet for a long ti...
The advantages of WeChat mini program investment ...
When people reach middle age, nothing is easy, bu...
Microsoft ready to continue discussions to acquir...
After four years, Apple finally launched a new ca...
Beijing time, October 8, noon news, it is reporte...
With the popularization of broadband and the deve...