Swift provides a high-performance, Unicode-compatible String implementation as part of the standard library. In Swift 2, the String type no longer conforms to the CollectionType protocol. Previously, the String type was a collection of characters, similar to an array. Now, the String type provides a collection of characters through a characters property. Why the change? While it seems natural to model a string as a collection of characters, the String type behaves very differently from real collection types like Array, Set, and Dictionary. This has always been the case, but with the addition of protocol extensions in Swift 2, these differences necessitated some fundamental changes. More Than the Sum of Its Parts When you add an element to a collection, you want the collection to contain that element. That is, when you add a value to an array, the array contains that value. The same applies to Dictionary and Set. However, when you append a combining mark character to a string, the contents of the string itself are changed. For example, the string cafe contains four characters: c, a, f, e:
If you append a COMBINING ACCENT CHARACTER (U+0301 ?) to the end of the string, the string still has four characters, but the last character is now é:
The characters property of the string does not contain the original lowercase e, nor does it contain the accented ? that was just appended to it. The string is now a lowercase é with an accented ?:
This can be surprising if you try to treat strings like any other collection type, like if you add UIColor.redColor() and UIColor.greenColor() to a collection, the collection will report that it contains a UIColor.yellowColor(). Judging by character content Another difference between strings and sets is the way they handle equality.
However, equality of the String type is based on canonical equality. If two strings have the same semantics and appearance, they are canonical equal even if they are actually constructed with different Unicode codes. Consider the Korean writing system, which consists of 24 letters, or Jamo, consisting of individual consonants and vowels. When written, these letters form the characters for each syllable. For example, the character ([ga]) is made up of the letters ([g]) and [a]. In Swift, strings are considered equal whether they are made up of decomposed or composed characters. This behavior is again different from collection types in Swift. It is surprising that the values and in the array are considered equal to .Depends on your perspective Strings are not collections. However, they do provide a number of views that conform to the CollectionType protocol: characters is a collection of values of type Character, or extended grapheme clusters. unicodeScalars is a collection of Unicode scalar values. utf8 is a collection of UTF-8 code units (UTF-8) utf16 is a set of UTF-16 code units (UTF-16) Let’s look at the previous example of the word “café”, which is made up of the individual characters [c, a, f, e] and [?]. Here’s what the Views for various strings would contain: The characters property segments text into extended glyph clusters that approximate the characters that the user sees (c, a, f, and é in this case). Since the string must be iterated over every position (called a code point) in the string to determine the boundaries of characters, accessing this property has a linear O(n) time complexity. When processing strings containing human-readable text, high-level locale-sensitive Unicode calculations, such as the localizedStandardCompare(_:) method and the localizedLowercaseString property, need to process characters one by one. The unicodeScalars property provides the quantifier value stored in the string. If the original string was created with the characters é instead of e + ?, this would be represented by the unicodeScalar property. Use this API when you are performing low-level operations on the data. The utf8 and utf16 properties respectively provide the code points they represent; these values correspond to the actual number of bytes written to a file when the string is converted, and are from a specific encoding. UTF-8 code units are used by many POSIX string processing APIs, while UTF-16 code units are always used to represent string lengths and offsets in Cocoa and Cocoa Touch. For more information about characters and strings in Swift, see The Swift Programming Language and The Swift Standard Library Reference. |
<<: The new generation of iPhone is coming. What preparations do mobile developers need to make?
>>: Google Now employees leave due to dissatisfaction with new CEO's restructuring plan
"What is the most powerful weapon I have? In...
In a patent application filed by Google in Decemb...
The day before yesterday, Apple pushed the iOS 1...
Early WeChat users started using WeChat because o...
May 14th In Taiyuan, Shanxi Firefighters complete...
Online employment class for video packaging and e...
Faced with the increasing intensity of Internet s...
When I had some free time, I reread the creative ...
Preface | As everyone has experienced—— iPhones h...
Many bidders don’t know how to start when they fi...
yesterday, The insect world has a new member From...
The concept of growth has a long history in Silic...
The editor has observed that the App Store has be...
Still looking for ways to increase your client...
Coding at work, coding overtime, and falling asle...