榪欐槸涓綃囩▼搴忓憳鍐欑粰紼嬪簭鍛樼殑瓚e懗璇葷墿銆傛墍璋撹叮鍛蟲槸鎸囧彲浠ユ瘮杈冭交鏉懼湴浜嗚В涓浜涘師鏉ヤ笉娓呮鐨勬蹇碉紝澧炶繘鐭ヨ瘑錛岀被浼間簬鎵揜PG娓告垙鐨勫崌綰с傛暣鐞嗚繖綃囨枃绔犵殑鍔ㄦ満鏄袱涓棶棰橈細
浣跨敤Windows璁頒簨鏈殑鈥滃彟瀛樹負鈥濓紝鍙互鍦℅BK銆乁nicode銆乁nicode big endian鍜孶TF-8榪欏嚑縐嶇紪鐮佹柟寮忛棿鐩鎬簰杞崲銆傚悓鏍鋒槸txt鏂囦歡錛學indows鏄庢牱璇嗗埆緙栫爜鏂瑰紡鐨勫憿錛?/FONT>
鎴戝緢鏃╁墠灝卞彂鐜癠nicode銆乁nicode big endian鍜孶TF-8緙栫爜鐨則xt鏂囦歡鐨勫紑澶翠細澶氬嚭鍑犱釜瀛楄妭錛屽垎鍒槸FF銆丗E錛圲nicode錛?FE銆丗F錛圲nicode big endian錛?EF銆丅B銆丅F錛圲TF-8錛夈備絾榪欎簺鏍囪鏄熀浜庝粈涔堟爣鍑嗗憿錛?/FONT>
鏌ヤ簡鏌ョ浉鍏寵祫鏂欙紝鎬葷畻灝嗚繖浜涢棶棰樺紕娓呮浜嗭紝欏哄甫涔熶簡瑙d簡涓浜沀nicode鐨勭粏鑺傘傚啓鎴愪竴綃囨枃绔狅紝閫佺粰鏈夎繃綾諱技鐤戦棶鐨勬湅鍙嬨傛湰鏂囧湪鍐欎綔鏃跺敖閲忓仛鍒伴氫織鏄撴噦錛屼絾瑕佹眰璇昏呯煡閬撲粈涔堟槸瀛楄妭錛屼粈涔堟槸鍗佸叚榪涘埗銆?/FONT>
big endian鍜宭ittle endian鏄疌PU澶勭悊澶氬瓧鑺傛暟鐨勪笉鍚屾柟寮忋備緥濡傗滄眽鈥濆瓧鐨刄nicode緙栫爜鏄?C49銆傞偅涔堝啓鍒版枃浠墮噷鏃訛紝絀剁珶鏄皢6C鍐欏湪鍓嶉潰錛岃繕鏄皢49鍐欏湪鍓嶉潰錛熷鏋滃皢6C鍐欏湪鍓嶉潰錛屽氨鏄痓ig endian銆傝繕鏄皢49鍐欏湪鍓嶉潰錛屽氨鏄痩ittle endian銆?/FONT>
鈥渆ndian鈥濊繖涓瘝鍑鴻嚜銆婃牸鍒椾經娓歌銆嬨傚皬浜哄浗鐨勫唴鎴樺氨婧愪簬鍚冮浮铔嬫椂鏄┒绔熶粠澶уご(Big-Endian)鏁插紑榪樻槸浠庡皬澶?Little-Endian)鏁插紑錛岀敱姝ゆ浘鍙戠敓榪囧叚嬈″彌涔憋紝鍏朵腑涓涓殗甯濋佷簡鍛斤紝鍙︿竴涓涪浜嗙帇浣嶃?/FONT>
鎴戜滑涓鑸皢endian緲昏瘧鎴愨滃瓧鑺傚簭鈥濓紝灝哹ig endian鍜宭ittle endian縐頒綔鈥滃ぇ灝鋸濆拰鈥滃皬灝鋸濄?/FONT>
瀛楃蹇呴』緙栫爜鍚庢墠鑳借璁$畻鏈哄鐞嗐傝綆楁満浣跨敤鐨勭己鐪佺紪鐮佹柟寮忓氨鏄綆楁満鐨勫唴鐮併傛棭鏈熺殑璁$畻鏈轟嬌鐢?浣嶇殑ASCII緙栫爜錛屼負浜嗗鐞嗘眽瀛楋紝紼嬪簭鍛樿璁′簡鐢ㄤ簬綆浣撲腑鏂囩殑GB2312鍜岀敤浜庣箒浣撲腑鏂囩殑big5銆?/FONT>
GB2312(1980騫?涓鍏辨敹褰曚簡7445涓瓧絎︼紝鍖呮嫭6763涓眽瀛楀拰682涓叾瀹冪鍙楓傛眽瀛楀尯鐨勫唴鐮佽寖鍥撮珮瀛楄妭浠嶣0-F7錛屼綆瀛楄妭浠嶢1-FE錛屽崰鐢ㄧ殑鐮佷綅鏄?2*94=6768銆傚叾涓湁5涓┖浣嶆槸D7FA-D7FE銆?/FONT>
GB2312鏀寔鐨勬眽瀛楀お灝戙?995騫寸殑姹夊瓧鎵╁睍瑙勮寖GBK1.0鏀跺綍浜?1886涓鍙鳳紝瀹冨垎涓烘眽瀛楀尯鍜屽浘褰㈢鍙峰尯銆傛眽瀛楀尯鍖呮嫭21003涓瓧絎︺?000騫寸殑GB18030鏄彇浠BK1.0鐨勬寮忓浗瀹舵爣鍑嗐傝鏍囧噯鏀跺綍浜?7484涓眽瀛楋紝鍚屾椂榪樻敹褰曚簡钘忔枃銆佽挋鏂囥佺淮鍚懼皵鏂囩瓑涓昏鐨勫皯鏁版皯鏃忔枃瀛椼傜幇鍦ㄧ殑PC騫沖彴蹇呴』鏀寔GB18030錛屽宓屽叆寮忎駭鍝佹殏涓嶄綔瑕佹眰銆傛墍浠ユ墜鏈恒丮P3涓鑸彧鏀寔GB2312銆?/FONT>
浠嶢SCII銆丟B2312銆丟BK鍒癎B18030錛岃繖浜涚紪鐮佹柟娉曟槸鍚戜笅鍏煎鐨勶紝鍗沖悓涓涓瓧絎﹀湪榪欎簺鏂規涓繪槸鏈夌浉鍚岀殑緙栫爜錛屽悗闈㈢殑鏍囧噯鏀寔鏇村鐨勫瓧絎︺傚湪榪欎簺緙栫爜涓紝鑻辨枃鍜屼腑鏂囧彲浠ョ粺涓鍦板鐞嗐傚尯鍒嗕腑鏂囩紪鐮佺殑鏂規硶鏄珮瀛楄妭鐨勬渶楂樹綅涓嶄負0銆傛寜鐓х▼搴忓憳鐨勭О鍛鹼紝GB2312銆丟BK鍒癎B18030閮藉睘浜庡弻瀛楄妭瀛楃闆?(DBCS)銆?/FONT>
鏈夌殑涓枃Windows鐨勭己鐪佸唴鐮佽繕鏄疓BK錛屽彲浠ラ氳繃GB18030鍗囩駭鍖呭崌綰у埌GB18030銆備笉榪嘒B18030鐩稿GBK澧炲姞鐨勫瓧絎︼紝鏅氫漢鏄緢闅劇敤鍒扮殑錛岄氬父鎴戜滑榪樻槸鐢℅BK鎸囦唬涓枃Windows鍐呯爜銆?/FONT>
榪欓噷榪樻湁涓浜涚粏鑺傦細
GB2312鐨勫師鏂囪繕鏄尯浣嶇爜錛屼粠鍖轟綅鐮佸埌鍐呯爜錛岄渶瑕佸湪楂樺瓧鑺傚拰浣庡瓧鑺備笂鍒嗗埆鍔犱笂A0銆?/FONT>
鍦―BCS涓紝GB鍐呯爜鐨勫瓨鍌ㄦ牸寮忓緇堟槸big endian錛屽嵆楂樹綅鍦ㄥ墠銆?/FONT>
GB2312鐨勪袱涓瓧鑺傜殑鏈楂樹綅閮芥槸1銆備絾絎﹀悎榪欎釜鏉′歡鐨勭爜浣嶅彧鏈?28*128=16384涓傛墍浠BK鍜孏B18030鐨勪綆瀛楄妭鏈楂樹綅閮藉彲鑳戒笉鏄?銆備笉榪囪繖涓嶅獎鍝岲BCS瀛楃嫻佺殑瑙f瀽錛氬湪璇誨彇DBCS瀛楃嫻佹椂錛屽彧瑕侀亣鍒伴珮浣嶄負1鐨勫瓧鑺傦紝灝卞彲浠ュ皢涓嬩袱涓瓧鑺備綔涓轟竴涓弻瀛楄妭緙栫爜錛岃屼笉鐢ㄧ浣庡瓧鑺傜殑楂樹綅鏄粈涔堛?/FONT>
鍓嶉潰鎻愬埌浠嶢SCII銆丟B2312銆丟BK鍒癎B18030鐨勭紪鐮佹柟娉曟槸鍚戜笅鍏煎鐨勩傝孶nicode鍙笌ASCII鍏煎錛堟洿鍑嗙‘鍦拌錛屾槸涓嶪SO-8859-1鍏煎錛夛紝涓嶨B鐮佷笉鍏煎銆備緥濡傗滄眽鈥濆瓧鐨刄nicode緙栫爜鏄?C49錛岃孏B鐮佹槸BABA銆?/FONT>
Unicode涔熸槸涓縐嶅瓧絎︾紪鐮佹柟娉曪紝涓嶈繃瀹冩槸鐢卞浗闄呯粍緇囪璁★紝鍙互瀹圭撼鍏ㄤ笘鐣屾墍鏈夎璦鏂囧瓧鐨勭紪鐮佹柟妗堛俇nicode鐨勫鍚嶆槸"Universal Multiple-Octet Coded Character Set"錛岀畝縐頒負UCS銆俇CS鍙互鐪嬩綔鏄?Unicode Character Set"鐨勭緝鍐欍?/FONT>
鏍規嵁緇村熀鐧劇鍏ㄤ功(http://zh.wikipedia.org/wiki/)鐨勮杞斤細鍘嗗彶涓婂瓨鍦ㄤ袱涓瘯鍥劇嫭绔嬭璁nicode鐨勭粍緇囷紝鍗沖浗闄呮爣鍑嗗寲緇勭粐錛圛SO錛夊拰涓涓蔣浠跺埗閫犲晢鐨勫崗浼氾紙unicode.org錛夈侷SO寮鍙戜簡ISO 10646欏圭洰錛孶nicode鍗忎細寮鍙戜簡Unicode欏圭洰銆?/FONT>
鍦?991騫村墠鍚庯紝鍙屾柟閮借璇嗗埌涓栫晫涓嶉渶瑕佷袱涓笉鍏煎鐨勫瓧絎﹂泦銆備簬鏄畠浠紑濮嬪悎騫跺弻鏂圭殑宸ヤ綔鎴愭灉錛屽茍涓哄垱绔嬩竴涓崟涓緙栫爜琛ㄨ屽崗鍚屽伐浣溿備粠Unicode2.0寮濮嬶紝Unicode欏圭洰閲囩敤浜嗕笌ISO 10646-1鐩稿悓鐨勫瓧搴撳拰瀛楃爜銆?/FONT>
鐩墠涓や釜欏圭洰浠嶉兘瀛樺湪錛屽茍鐙珛鍦板叕甯冨悇鑷殑鏍囧噯銆俇nicode鍗忎細鐜板湪鐨勬渶鏂扮増鏈槸2005騫寸殑Unicode 4.1.0銆侷SO鐨勬渶鏂版爣鍑嗘槸10646-3:2003銆?/FONT>
UCS瑙勫畾浜嗘庝箞鐢ㄥ涓瓧鑺傝〃紺哄悇縐嶆枃瀛椼傛庢牱浼犺緭榪欎簺緙栫爜錛屾槸鐢盪TF(UCS Transformation Format)瑙勮寖瑙勫畾鐨勶紝甯歌鐨刄TF瑙勮寖鍖呮嫭UTF-8銆乁TF-7銆乁TF-16銆?/FONT>
IETF鐨凴FC2781鍜孯FC3629浠FC鐨勪竴璐鏍鹼紝娓呮櫚銆佹槑蹇張涓嶅け涓ヨ皚鍦版弿榪頒簡UTF-16鍜孶TF-8鐨勭紪鐮佹柟娉曘傛垜鎬繪槸璁頒笉寰桰ETF鏄疘nternet Engineering Task Force鐨勭緝鍐欍備絾IETF璐熻矗緇存姢鐨凴FC鏄疘nternet涓婁竴鍒囪鑼冪殑鍩虹銆?/FONT>
UCS鏈変袱縐嶆牸寮忥細UCS-2鍜孶CS-4銆傞【鍚嶆濅箟錛孶CS-2灝辨槸鐢ㄤ袱涓瓧鑺傜紪鐮侊紝UCS-4灝辨槸鐢?涓瓧鑺傦紙瀹為檯涓婂彧鐢ㄤ簡31浣嶏紝鏈楂樹綅蹇呴』涓?錛夌紪鐮併備笅闈㈣鎴戜滑鍋氫竴浜涚畝鍗曠殑鏁板娓告垙錛?/FONT>
UCS-2鏈?^16=65536涓爜浣嶏紝UCS-4鏈?^31=2147483648涓爜浣嶃?/FONT>
UCS-4鏍規嵁鏈楂樹綅涓?鐨勬渶楂樺瓧鑺傚垎鎴?^7=128涓猤roup銆傛瘡涓猤roup鍐嶆牴鎹楂樺瓧鑺傚垎涓?56涓猵lane銆傛瘡涓猵lane鏍規嵁絎?涓瓧鑺傚垎涓?56琛?(rows)錛屾瘡琛屽寘鍚?56涓猚ells銆傚綋鐒跺悓涓琛岀殑cells鍙槸鏈鍚庝竴涓瓧鑺備笉鍚岋紝鍏朵綑閮界浉鍚屻?/FONT>
group 0鐨刾lane 0琚О浣淏asic Multilingual Plane, 鍗矪MP銆傛垨鑰呰UCS-4涓紝楂樹袱涓瓧鑺備負0鐨勭爜浣嶈縐頒綔BMP銆?/FONT>
灝哢CS-4鐨凚MP鍘繪帀鍓嶉潰鐨勪袱涓浂瀛楄妭灝卞緱鍒頒簡UCS-2銆傚湪UCS-2鐨勪袱涓瓧鑺傚墠鍔犱笂涓や釜闆跺瓧鑺傦紝灝卞緱鍒頒簡UCS-4鐨凚MP銆傝岀洰鍓嶇殑UCS-4瑙勮寖涓繕娌℃湁浠諱綍瀛楃琚垎閰嶅湪BMP涔嬪銆?/FONT>
UTF-8灝辨槸浠?浣嶄負鍗曞厓瀵筓CS榪涜緙栫爜銆備粠UCS-2鍒癠TF-8鐨勭紪鐮佹柟寮忓涓嬶細
UCS-2緙栫爜(16榪涘埗) | UTF-8 瀛楄妭嫻?浜岃繘鍒? |
0000 - 007F | 0xxxxxxx |
0080 - 07FF | 110xxxxx 10xxxxxx |
0800 - FFFF | 1110xxxx 10xxxxxx 10xxxxxx |
渚嬪鈥滄眽鈥濆瓧鐨刄nicode緙栫爜鏄?C49銆?C49鍦?800-FFFF涔嬮棿錛屾墍浠ヨ偗瀹氳鐢?瀛楄妭妯℃澘浜嗭細1110xxxx 10xxxxxx 10xxxxxx銆傚皢6C49鍐欐垚浜岃繘鍒舵槸錛?110 110001 001001錛?鐢ㄨ繖涓瘮鐗規祦渚濇浠f浛妯℃澘涓殑x錛屽緱鍒幫細11100110 10110001 10001001錛屽嵆E6 B1 89銆?/FONT>
璇昏呭彲浠ョ敤璁頒簨鏈祴璇曚竴涓嬫垜浠殑緙栫爜鏄惁姝g‘銆?/FONT>
UTF-16浠?6浣嶄負鍗曞厓瀵筓CS榪涜緙栫爜銆傚浜庡皬浜?x10000鐨刄CS鐮侊紝UTF-16緙栫爜灝辯瓑浜嶶CS鐮佸搴旂殑16浣嶆棤絎﹀彿鏁存暟銆傚浜庝笉灝忎簬0x10000鐨刄CS鐮侊紝瀹氫箟浜嗕竴涓畻娉曘備笉榪囩敱浜庡疄闄呬嬌鐢ㄧ殑UCS2錛屾垨鑰匲CS4鐨凚MP蹇呯劧灝忎簬0x10000錛屾墍浠ュ氨鐩墠鑰岃█錛屽彲浠ヨ涓篣TF-16鍜孶CS-2鍩烘湰鐩稿悓銆備絾UCS-2鍙槸涓涓紪鐮佹柟妗堬紝UTF-16鍗磋鐢ㄤ簬瀹為檯鐨勪紶杈擄紝鎵浠ュ氨涓嶅緱涓嶈冭檻瀛楄妭搴忕殑闂銆?/FONT>
UTF-8浠ュ瓧鑺備負緙栫爜鍗曞厓錛屾病鏈夊瓧鑺傚簭鐨勯棶棰樸俇TF-16浠ヤ袱涓瓧鑺備負緙栫爜鍗曞厓錛屽湪瑙i噴涓涓猆TF-16鏂囨湰鍓嶏紝棣栧厛瑕佸紕娓呮姣忎釜緙栫爜鍗曞厓鐨勫瓧鑺傚簭銆備緥濡傛敹鍒頒竴涓滃鈥濈殑Unicode緙栫爜鏄?94E錛屸滀箼鈥濈殑Unicode緙栫爜鏄?E59銆傚鏋滄垜浠敹鍒癠TF-16瀛楄妭嫻佲?94E鈥濓紝閭d箞榪欐槸鈥滃鈥濊繕鏄滀箼鈥濓紵
Unicode瑙勮寖涓帹鑽愮殑鏍囪瀛楄妭欏哄簭鐨勬柟娉曟槸BOM銆侭OM涓嶆槸鈥淏ill Of Material鈥濈殑BOM琛紝鑰屾槸Byte Order Mark銆侭OM鏄竴涓湁鐐瑰皬鑱槑鐨勬兂娉曪細
鍦║CS緙栫爜涓湁涓涓彨鍋?ZERO WIDTH NO-BREAK SPACE"鐨勫瓧絎︼紝瀹冪殑緙栫爜鏄疐EFF銆傝孎FFE鍦║CS涓槸涓嶅瓨鍦ㄧ殑瀛楃錛屾墍浠ヤ笉搴旇鍑虹幇鍦ㄥ疄闄呬紶杈撲腑銆俇CS瑙勮寖寤鴻鎴戜滑鍦ㄤ紶杈撳瓧鑺傛祦鍓嶏紝鍏堜紶杈撳瓧絎?ZERO WIDTH NO-BREAK SPACE"銆?/FONT>
榪欐牱濡傛灉鎺ユ敹鑰呮敹鍒癋EFF錛屽氨琛ㄦ槑榪欎釜瀛楄妭嫻佹槸Big-Endian鐨勶紱濡傛灉鏀跺埌FFFE錛屽氨琛ㄦ槑榪欎釜瀛楄妭嫻佹槸Little-Endian鐨勩傚洜姝ゅ瓧絎?ZERO WIDTH NO-BREAK SPACE"鍙堣縐頒綔BOM銆?/FONT>
UTF-8涓嶉渶瑕丅OM鏉ヨ〃鏄庡瓧鑺傞『搴忥紝浣嗗彲浠ョ敤BOM鏉ヨ〃鏄庣紪鐮佹柟寮忋傚瓧絎?ZERO WIDTH NO-BREAK SPACE"鐨刄TF-8緙栫爜鏄疎F BB BF錛堣鑰呭彲浠ョ敤鎴戜滑鍓嶉潰浠嬬粛鐨勭紪鐮佹柟娉曢獙璇佷竴涓嬶級銆傛墍浠ュ鏋滄帴鏀惰呮敹鍒頒互EF BB BF寮澶寸殑瀛楄妭嫻侊紝灝辯煡閬撹繖鏄疷TF-8緙栫爜浜嗐?/FONT>
Windows灝辨槸浣跨敤BOM鏉ユ爣璁版枃鏈枃浠剁殑緙栫爜鏂瑰紡鐨勩?/FONT>
鏈枃涓昏鍙傝冪殑璧勬枡鏄?"Short overview of ISO-IEC 10646 and Unicode" (http://www.nada.kth.se/i18n/ucs/unicode-iso10646-oview.html)銆?/FONT>
鎴戣繕鎵句簡涓ょ瘒鐪嬩笂鍘諱笉閿欑殑璧勬枡錛屼笉榪囧洜涓烘垜寮濮嬬殑鐤戦棶閮芥壘鍒頒簡絳旀錛屾墍浠ュ氨娌℃湁鐪嬶細
甯屾湜鏈夎鑰呰兘浠庝腑鍙楃泭銆?BR>
鍘熸枃鍦板潃錛?A >http://dev.csdn.net/develop/article/69/article/69/69883.shtm