Understand the changes in string design ideas in Swift 2.0

Understand the changes in string design ideas in Swift 2.0

[[146697]]

Swift provides a high-performance, Unicode-compatible String implementation as part of the standard library. In Swift 2, the String type no longer conforms to the CollectionType protocol. Previously, the String type was a collection of characters, similar to an array. Now, the String type provides a collection of characters through a characters property.

Why the change? While it seems natural to model a string as a collection of characters, the String type behaves very differently from real collection types like Array, Set, and Dictionary. This has always been the case, but with the addition of protocol extensions in Swift 2, these differences necessitated some fundamental changes.

More Than the Sum of Its Parts

When you add an element to a collection, you want the collection to contain that element. That is, when you add a value to an array, the array contains that value. The same applies to Dictionary and Set. However, when you append a combining mark character to a string, the contents of the string itself are changed.

For example, the string cafe contains four characters: c, a, f, e:

  1. var letters: [Character] = [ "c" , "a" , "f" , "e" ]
  2. var string: String = String(letters)
  3.   
  4. print(letters.count) // 4  
  5. print(string) // cafe  
  6. print(string.characters.count) // 4  

If you append a COMBINING ACCENT CHARACTER (U+0301 ?) to the end of the string, the string still has four characters, but the last character is now é:

  1. let acuteAccent: Character = "\u{0301}"   // 'COMBINING ACUTE ACCENT' (U+0301)  
  2.   
  3. string.append(acuteAccent)
  4. print(string.characters.count) // 4  
  5. print(string.characters.last!) // é  

The characters property of the string does not contain the original lowercase e, nor does it contain the accented ? that was just appended to it. The string is now a lowercase é with an accented ?:

  1. string.characters.contains( "e" ) // false  
  2. string.characters.contains( "?" ) // false  
  3. string.characters.contains( "é" ) // true  

This can be surprising if you try to treat strings like any other collection type, like if you add UIColor.redColor() and UIColor.greenColor() to a collection, the collection will report that it contains a UIColor.yellowColor().

Judging by character content

Another difference between strings and sets is the way they handle equality.

  • Two arrays are equal only if they have the same number of elements and the elements at each corresponding index position are also equal.

  • Two sets are equal only if they have the same number of elements and if the first set contains the same elements as the second set.

  • Two dictionaries are equal only if they have the same key-value pairs.

However, equality of the String type is based on canonical equality. If two strings have the same semantics and appearance, they are canonical equal even if they are actually constructed with different Unicode codes.

Consider the Korean writing system, which consists of 24 letters, or Jamo, consisting of individual consonants and vowels. When written, these letters form the characters for each syllable. For example, the character ([ga]) is made up of the letters ([g]) and [a]. In Swift, strings are considered equal whether they are made up of decomposed or composed characters.

This behavior is again different from collection types in Swift. It is surprising that the values ​​[[146703]] and [[146704]] in the array are considered equal to [[146705]] .

Depends on your perspective

Strings are not collections. However, they do provide a number of views that conform to the CollectionType protocol:

characters is a collection of values ​​of type Character, or extended grapheme clusters.

unicodeScalars is a collection of Unicode scalar values.

utf8 is a collection of UTF-8 code units (UTF-8)

utf16 is a set of UTF-16 code units (UTF-16)

Let’s look at the previous example of the word “café”, which is made up of the individual characters [c, a, f, e] and [?]. Here’s what the Views for various strings would contain:

The characters property segments text into extended glyph clusters that approximate the characters that the user sees (c, a, f, and é in this case). Since the string must be iterated over every position (called a code point) in the string to determine the boundaries of characters, accessing this property has a linear O(n) time complexity. When processing strings containing human-readable text, high-level locale-sensitive Unicode calculations, such as the localizedStandardCompare(_:) method and the localizedLowercaseString property, need to process characters one by one.

The unicodeScalars property provides the quantifier value stored in the string. If the original string was created with the characters é instead of e + ?, this would be represented by the unicodeScalar property. Use this API when you are performing low-level operations on the data.

The utf8 and utf16 properties respectively provide the code points they represent; these values ​​correspond to the actual number of bytes written to a file when the string is converted, and are from a specific encoding.

UTF-8 code units are used by many POSIX string processing APIs, while UTF-16 code units are always used to represent string lengths and offsets in Cocoa and Cocoa Touch.

For more information about characters and strings in Swift, see The Swift Programming Language and The Swift Standard Library Reference.

<<:  The new generation of iPhone is coming. What preparations do mobile developers need to make?

>>:  Google Now employees leave due to dissatisfaction with new CEO's restructuring plan

Recommend

Renewing domain name for a long time is helpful for ranking

In a patent application filed by Google in Decemb...

iOS 16.2 finally supports 120Hz high refresh rate!

​The day before yesterday, Apple pushed the iOS 1...

How to retain users? Please give them a reason to stay!

Early WeChat users started using WeChat because o...

Online employment class for video packaging and editing!

Online employment class for video packaging and e...

Advertising, how to understand creativity?

When I had some free time, I reread the creative ...

Weibo Fantong & self-media traffic, how to make 1 million per month?

Preface | As everyone has experienced—— iPhones h...

How to make a feasible bidding promotion plan?

Many bidders don’t know how to start when they fi...

Discovered for the first time in the world! Named "Haizhu"!

yesterday, The insect world has a new member From...

Growth Hacker: Talking to Former Googlers about App Growth

The concept of growth has a long history in Silic...

2015 Chinese Programmers Survival Report—How miserable life is!

Coding at work, coding overtime, and falling asle...