?
Conceptually, Java strings are sequences of Unicode characters. For example, the string "Java\u2122" consists of the five Unicode characters J, a, v, a, and ?. Java does not have a built-in string type. Instead, the standard Java library contains a predefined class called, naturally enough, String. Each quoted string is an instance of the String class:
概念上講,Java字符串就是Unicode字符序列。例如,字符串"Java\u2122"由5個Unicode字符J,a,v,a和?組成。Java沒有內建的string類型。但是,標準Java庫提供了一個類,很自然的,叫做String。每個被引起來的字符串就是一個String實例:
String e = ""; // an empty string
String greeting = "Hello";
Code Points and Code Units 代碼點和代碼單元
Java strings are implemented as sequences of char values. As we discussed on page 41, the char data type is a code unit for representing Unicode code points in the UTF-16 encoding. The most commonly used Unicode characters can be represented with a single code unit. The supplementary characters require a pair of code units.
Java字符串是以char值序列的方式實現的。如我們在41頁中提到的,char數據類型是一個表示UTF-16編碼中各個Unicode代碼點的代碼單元。最常用的Unicode字符可以用一個單獨的代碼單元表示。增補字符需要一對代碼單元。
The length method yields the number of code units required for a given string in the UTF-16 encoding. For example:
length方法返回指定的UTF-16編碼字符串所需代碼單元的數量,例如:
String greeting = "Hello";
int n = greeting.length(); // is 5.
To get the true length, that is, the number of code points, call
要得到真實的長度,即代碼點的數量,調用:
int cpCount = greeting.codePointCount(0, greeting.length());
The call s.charAt(n) returns the code unit at position n, where n is between 0 and s.length() – 1. For example,
s.charAt(n) 返回位置n對應的代碼單元,這里n介于0和s.length()-1之間。例如:
char first = greeting.charAt(0); // first is 'H'
char last = greeting.charAt(4); // last is 'o'
To get at the ith code point, use the statements
要獲得第i個代碼點,使用語句:
int index = greeting.offsetByCodePoints(0, i);
int cp = greeting.codePointAt(index);
NOTE
?
Java counts the code units in strings in a peculiar fashion: the first code unit in a string has position 0. This convention originated in C, where there was a technical reason for counting positions starting at 0. That reason has long gone away and only the nuisance remains. However, so many programmers are used to this convention that the Java designers decided to keep it.
Java以一種特殊的方式計算字符串中的代碼單元:字符串中的第一個代碼單元的位置是0。這個約定源于C,在C中位置從0開始計數是有技術原因的。這個技術原因現在早已不存在了,但是卻留下了這個令人討厭的方式。但是由于很多程序員習慣了這個約定,所以Java的設計者們決定保留它。
Why are we making a fuss about code units? Consider the sentence
我們為什么要在代碼單元上小題大做?看一下這個句子
is the set of integers
The character requires two code units in the UTF-16 encoding. Calling
在UTF-16編碼中,字符需要兩個代碼單元,調用
char ch = sentence.charAt(1)
doesn't return a space but the second code unit of . To avoid this problem, you should not use the char type. It is too low-level.
并不返回一個空格,而是其第二個代碼單元。要避免這個問題,你不應當使用char類型。這個類型太低級。
If your code traverses a string, and you want to look at each code point in turn, use these statements:
如果你的代碼遍歷一個字符串,并且你想逐個查看每個代碼單元,請使用下面的語句:
int cp = sentence.codePointAt(i);
if (Character.isSupplementaryCodePoint(cp)) i += 2;
else i++;
Fortunately, the codePointAt method can tell whether a code unit is the first or second half of a supplementary character, and it returns the right result either way. That is, you can move backwards with the following statements:
幸運的是,codePointAt方法可以告訴我們何處是一個輔助字符的前一半或者后一半,并且對于任一一種都可以返回正確的結果。也就是說,你也可以用下面的語句進行逆向遍歷
i--;
int cp = sentence.codePointAt(i);
if (Character.isSupplementaryCodePoint(cp)) i--;
Substrings 子串
You extract a substring from a larger string with the substring method of the String class. For example,
用substring方法可以從一個大的字符串中提取字串。例如
String greeting = "Hello";
String s = greeting.substring(0, 3);
creates a string consisting of the characters "Hel".
得到一個由字符”Hel”組成的字串。
The second parameter of substring is the first code unit that you do not want to copy. In our case, we want to copy the code units in positions 0, 1, and 2 (from position 0 to position 2 inclusive). As substring counts it, this means from position 0 inclusive to position 3 exclusive.
substring的第二個參數是你第一個不想復制的代碼單元。在我們的例子中,我們想復制的是位置0、1、2(從位置0到位置2,包含端點)。也就是從位置0(包含)到位置3(不包含)。
There is one advantage to the way substring works: Computing the number of code units in the substring is easy. The string s.substring(a, b) always has b - a code units. For example, the substring "Hel" has 3 – 0 = 3 code units.
substring的這種工作方式有一個優點:計算字串中的代碼單元數量是簡單的。字符串s.substring(a,b)的代碼單元數總是等于b-a。從例子即可看出。
String Editing 字符串編輯
The String class gives no methods that let you change a character in an existing string. If you want to turn greeting into "Help!", you cannot directly change the last positions of greeting into 'p' and '!'. If you are a C programmer, this will make you feel pretty helpless. How are you going to modify the string? In Java, it is quite easy: concatenate the substring that you want to keep with the characters that you want to replace.
Java中的String類型雖然不提供字符串編輯的方法,但是,你可以采用將某個字符串的字串和其他字串相連接的方式。例如你希望將”Hello”修改為”Help!”,你可以這樣做
greeting = greeting.substring(0, 3) + "p!";
This declaration changes the current value of the greeting variable to "Help!".
Because you cannot change the individual characters in a Java string, the documentation refers to the objects of the String class as being immutable. Just as the number 3 is always 3, the string "Hello" will always contain the code unit sequence describing the characters H, e, l, l, o. You cannot change these values. You can, as you just saw however, change the contents of the string variable greeting and make it refer to a different string, just as you can make a numeric variable currently holding the value 3 hold the value 4.
在Java中,你不能改變Java字串中的某個值,但是,你可以改變變量的內容,即使得字符串變量指向其他字符串。
Isn't that a lot less efficient? It would seem simpler to change the code units than to build up a whole new string from scratch. Well, yes and no. Indeed, it isn't efficient to generate a new string that holds the concatenation of "Hel" and "p!". But immutable strings have one great advantage: the compiler can arrange that strings are shared.
雖然生成新的字符組合效率會降低,但是不可變的字符串有一大優點:編譯器可以將字符串共享。
To understand how this works, think of the various strings as sitting in a common pool. String variables then point to locations in the pool. If you copy a string variable, both the original and the copy share the same characters. Overall, the designers of Java decided that the efficiency of sharing outweighs the inefficiency of string editing by extracting substrings and concatenating.
要理解這個工作過程,假設各種字符串處在一個共享池中。字符串變量指向池中的某一個位置。如果你復制一個字符串,原字串和拷貝共享同一個字符序列。總之,Java設計者認為共享的有效性遠大于提取字串在連接的字符串編輯的有效性。
Look at your own programs; we suspect that most of the time, you don't change strings—you just compare them. Of course, in some cases, direct manipulation of strings is more efficient. (One example is assembling strings from individual characters that come from a file or the keyboard.) For these situations, Java provides a separate StringBuilder class that we describe in Chapter 12. If you are not concerned with the efficiency of string handling, you can ignore StringBuilder and just use String.
看看我們的程序,大多數時候,我們并不改變字串,而是進行比較。當然,有時候直接對字符串進行操作更為有效。(一個例子就是編譯來自于一個文件或者鍵盤的獨立字符序列。)對此種情況而言,Java提供獨立的StringBuilder類。我們在12章中討論。如果你對字符串處理的效率不感興趣,你可以跳過StringBuilder,僅僅使用String就可以了。
C++ NOTE
?
C programmers generally are bewildered when they see Java strings for the first time because they think of strings as arrays of characters:
C程序員第一次看到Java字符串的時候會感到疑惑,因為他們會認為字符串其實就是字符數組:
char greeting[] = "Hello";
That is the wrong analogy: a Java string is roughly analogous to a char* pointer,
這是一個錯誤的類比:Java字符串組略的類同于一個char*指針。
char* greeting = "Hello";
When you replace greeting with another string, the Java code does roughly the following:
當你用另一個字符串代替greeting的時候,Java代碼粗略的進行如下工作:
char* temp = malloc(6);
strncpy(temp, greeting, 3);
strncpy(temp + 3, "p!", 3);
greeting = temp;
Sure, now greeting points to the string "Help!". And even the most hardened C programmer must admit that the Java syntax is more pleasant than a sequence of strncpy calls. But what if we make another assignment to greeting?
當然,現在greeting指向字符串”Help!”。即使是最鐵桿的C程序員也必須承認Java語句比使用一組strncpy函數要令人愉快。但如果我們要給greeting另作指派又會如何呢?
greeting = "Howdy";
Don't we have a memory leak? After all, the original string was allocated on the heap. Fortunately, Java does automatic garbage collection. If a block of memory is no longer needed, it will eventually be recycled.
這難道不會產生內存泄漏?畢竟,原字串是分配在堆上的。幸運的是,Java具有垃圾自動回收機制。如果某部分內存不再需要,它將最終被回收。
If you are a C++ programmer and use the string class defined by ANSI C++, you will be much more comfortable with the Java String type. C++ string objects also perform automatic allocation and deallocation of memory. The memory management is performed explicitly by constructors, assignment operators, and destructors. However, C++ strings are mutable—you can modify individual characters in a string.
如果你是C++程序員,并且使用由ANSI C++定義的string類,你將感到和Java中的String類型一樣的舒服。C++中的string對象也能自動分配和釋放內存。內存管理由構造函數、賦值運算符和析構函數清晰的執行。但是C++字符串是可變的,你可以修改字串中的獨立字符。
Concatenation 連接
Java, like most programming languages, allows you to use the + sign to join (concatenate) two strings.
和大多數編程語言一樣,Java也可以使用+號將兩個字符串相連接。
String expletive = "Expletive";
String PG13 = "deleted";
String message = expletive + PG13;
The above code sets the variable message to the string "Expletivedeleted". (Note the lack of a space between the words: the + sign joins two strings in the order received, exactly as they are given.)
以上的代碼將變量message設置為字符串”Expletivedeleted”。(之所以兩個單詞之間會缺少空格,是因為+號精確的按照兩個單詞給出的順序將其連接起來)
When you concatenate a string with a value that is not a string, the latter is converted to a string. (As you see in Chapter 5, every Java object can be converted to a string.) For example:
當你將一個字符串和一個非字符串連接時,后者將轉換為字符串(第五章中,你將看到,Java對象都可以轉換成字符串),例如:
int age = 13;
String rating = "PG" + age;
sets rating to the string "PG13".
將rating設置為”PG13”。
This feature is commonly used in output statements. For example,
這一功能通常用于輸出語句,例如
System.out.println("The answer is " + answer);
is perfectly acceptable and will print what one would want (and with the correct spacing because of the space after the word is).
可以很好的接受,并打印出你想要的(由于is 后面有空格,所以也能正確的打印出空格)
Testing Strings for Equality 測試字符串相等
To test whether two strings are equal, use the equals method. The expression
要測試兩個字符串是否相等,使用equals方法。表達式
s.equals(t)
returns TRue if the strings s and t are equal, false otherwise. Note that s and t can be string variables or string constants. For example, the expression
返回true當字符串t和s相等時,否則,返回false。注意,s和t可以是字符串變量,也可以是字符串常量。例如,表達式
"Hello".equals(greeting)
is perfectly legal. To test whether two strings are identical except for the upper/lowercase letter distinction, use the equalsIgnoreCase method.
也是很合法的。要測試兩個字符串除了大小寫的差別是否相同,使用equalsIgnoreCase方法。
"Hello".equalsIgnoreCase("hello")
Do not use the == operator to test whether two strings are equal! It only determines whether or not the strings are stored in the same location. Sure, if strings are in the same location, they must be equal. But it is entirely possible to store multiple copies of identical strings in different places.
不要使用==運算符來測試兩個字符串是否相等!這種方法僅能判斷兩個字符串是否存儲在同一個位置上。當然,如果字符串存儲在同一個位置上,他們肯定相等。但是在不同的位置存儲相同的字符串的多個拷貝也是完全有可能的。
String greeting = "Hello"; //initialize greeting to a string
if (greeting == "Hello") . . .
// probably true
if (greeting.substring(0, 3) == "Hel") . . .
// probably false
If the virtual machine would always arrange for equal strings to be shared, then you could use the == operator for testing equality. But only string constants are shared, not strings that are the result of operations like + or substring. Therefore, never use == to compare strings lest you end up with a program with the worst kind of bug—an intermittent one that seems to occur randomly.
如果虛擬機總是將字符串分配為共享的,那么,你可以使用==運算符來測試相等。但是只有字符串常量是共享的,而那些+或者substring運算產生的字符串則不是共享的。所以,千萬不可以使用==來比較字符串,以免你編寫出的程序存在最糟糕的一種bug——一種不連續發生的貌似隨機的Bug。
C++ NOTE
?
If you are used to the C++ string class, you have to be particularly careful about equality testing. The C++ string class does overload the == operator to test for equality of the string contents. It is perhaps unfortunate that Java goes out of its way to give strings the same "look and feel" as numeric values but then makes strings behave like pointers for equality testing. The language designers could have redefined == for strings, just as they made a special arrangement for +. Oh well, every language has its share of inconsistencies.
如果你習慣了C++的string類,那么你需要特別注意相等性測試。C++中的string類在進行字符串內容的相等比較的時候,運算符==進行了重載。Java打破自己的形式,給字符串賦予數字值一般的外表,而實際上又讓這些字符串在比較的時候像指針一樣,這也許是個不幸的事情。這門語言的設計者可以重新定義string中的==符號,就像他們特別分配了+號一樣。嗯,好吧,每個語言都有其矛盾的一面。
C programmers never use == to compare strings but use strcmp instead. The Java method compareTo is the exact analog to strcmp. You can use
C程序員從不使用==來比較字符串,而是使用strcmp。Java中的compareTo方法精確的類似于strcmp,你可以使用
if (greeting.compareTo("Hello") == 0) . . .
but it seems clearer to use equals instead.
但是使用equals似乎更清晰明了。
The String class in Java contains more than 50 methods. A surprisingly large number of them are sufficiently useful so that we can imagine using them frequently. The following API note summarizes the ones we found most useful.
Java中的String類有多達50個方法。這么多的方法都十分的有用,所以我們可以經常使用它們。下面的API注釋中,我們總結了最有用的一些方法
(譯者:以下內容不再翻譯,僅供參考。)
NOTE
?
You will find these API notes throughout the book to help you understand the Java Application Programming Interface (API). Each API note starts with the name of a class such as java.lang.String—the significance of the so-called package name java.lang is explained in Chapter 4. The class name is followed by the names, explanations, and parameter descriptions of one or more methods.
We typically do not list all methods of a particular class but instead select those that are most commonly used, and describe them in a concise form. For a full listing, consult the on-line documentation.
We also list the version number in which a particular class was introduced. If a method has been added later, it has a separate version number.
?
java.lang.String 1.0
returns the code unit at the specified location. You probably don't want to call this method unless you are interested in low-level code units.
- int codePointAt(int index) 5.0
returns the code point that starts or ends at the specified location.
- int offsetByCodePoints(int startIndex, int cpCount) 5.0
returns the index of the code point that is cpCount code points away from the code point at startIndex.
- int compareTo(String other)
returns a negative value if the string comes before other in dictionary order, a positive value if the string comes after other in dictionary order, or 0 if the strings are equal.
- boolean endsWith(String suffix)
returns TRue if the string ends with suffix.
- boolean equals(Object other)
returns true if the string equals other.
- boolean equalsIgnoreCase(String other)
returns true if the string equals other, except for upper/lowercase distinction.
- int indexOf(String str)
- int indexOf(String str, int fromIndex)
- int indexOf(int cp)
- int indexOf(int cp, int fromIndex)
return the start of the first substring equal to the string str or the code point cp, starting at index 0 or at fromIndex, or -1 if str does not occur in this string.
- int lastIndexOf(String str)
- int lastIndexOf(String str, int fromIndex)
- int lastindexOf(int cp)
- int lastindexOf(int cp, int fromIndex)
return the start of the last substring equal to the string str or the code point cp, starting at the end of the string or at fromIndex.
returns the length of the string.
- int codePointCount(int startIndex, int endIndex) 5.0
returns the number of code points between startIndex and endIndex - 1. Unpaired surrogates are counted as code points.
- String replace(CharSequence oldString, CharSequence newString)
returns a new string that is obtained by replacing all substrings matching oldString in the string with the string newString. You can supply String or StringBuilder objects for the CharSequence parameters.
- boolean startsWith(String prefix)
returns true if the string begins with prefix.
- String substring(int beginIndex)
- String substring(int beginIndex, int endIndex)
return a new string consisting of all code units from beginIndex until the end of the string or until endIndex - 1.
returns a new string containing all characters in the original string, with uppercase characters converted to lower case.
returns a new string containing all characters in the original string, with lowercase characters converted to upper case.
returns a new string by eliminating all leading and trailing spaces in the original string.
Reading the On-Line API Documentation
As you just saw, the String class has lots of methods. Furthermore, there are thousands of classes in the standard libraries, with many more methods. It is plainly impossible to remember all useful classes and methods. Therefore, it is essential that you become familiar with the on-line API documentation that lets you look up all classes and methods in the standard library. The API documentation is part of the JDK. It is in HTML format. Point your web browser to the docs/api/index.html subdirectory of your JDK installation. You will see a screen like that in Figure 3-2.
Figure 3-2. The three panes of the API documentation
[View full size image]
The screen is organized into three frames. A small frame on the top left shows all available packages. Below it, a larger frame lists all classes. Click on any class name, and the API documentation for the class is displayed in the large frame to the right (see Figure 3-3). For example, to get more information on the methods of the String class, scroll the second frame until you see the String link, then click on it.
Figure 3-3. Class description for the String class
[View full size image]
Then scroll the frame on the right until you reach a summary of all methods, sorted in alphabetical order (see Figure 3-4). Click on any method name for a detailed description of that method (see Figure 3-5). For example, if you click on the compareToIgnoreCase link, you get the description of the compareToIgnoreCase method.
Figure 3-4. Method summary of the String class
[View full size image]
Figure 3-5. Detailed description of a String method
[View full size image]
TIP
?
Bookmark the docs/api/index.html page in your browser right now.
文章來源:
http://x-spirit.spaces.live.com/Blog/cns!CC0B04AE126337C0!330.entry