fxzeng

ruby && watir

统计

留言簿(3)

阅读排行榜

评论排行榜

UTF-8与UNICODE [转载自 http://www.cppblog.com/zuroc/archive/2006/02/15/3269.html ]

看到有前辈写了一个 UTF-8与UNICODE相互转换的代码 , 顺便提一下,希望可以给大家提供一点帮助.
下面是一些编码格式的bit长

Examples of fixed-width encoding forms:

Type Each character
encoded as
Notes
  7-bit a single 7-bit quantity example: ISO 646
  8-bit G0/G1 a single 8-bit quantity with constraints on use of C0 and C1 spaces
  8-bit a single 8-bit quantity with no constraints on use of C1 space
  8-bit EBCDIC a single 8-bit quantity with the EBCDIC conventions rather than ASCII conventions
16-bit ( UCS -2) a single 16-bit quantity within a code space of 0..FFFF
32-bit ( UCS -4) a single 32-bit quantity within a code space 0..7FFFFFFF
32-bit ( UTF -32) a single 32-bit quantity within a code space of 0..10FFFF
16-bit DBCS process code a single 16-bit quantity example: UNIX widechar implementations of Asian CCS's
32-bit DBCS process code a single 32-bit quantity example: UNIX widechar implementations of Asian CCS's
DBCS Host two 8-bit quantities following IBM host conventions

Examples of variable-width encoding forms:

Name Characters are encoded as Notes
UTF -8 a mix of one to four 8-bit code units in Unicode
and one to six code units in 10646
used only with Unicode/10646
UTF -16 a mix of one to two 16 bit code units used only with Unicode/10646

Boost中提供了一个UTF-8 Codecvt Facet,可以在utf8和UCS-4(Unicode-32)之间转换.
使用方式如下

  //...
  // My encoding type
  typedef wchar_t ucs4_t;

  std::locale old_locale;
  std::locale utf8_locale(old_locale,new utf8_codecvt_facet<ucs4_t>);

  // Set a New global locale
  std::locale::global(utf8_locale);

  //  UCS-4 转换为 UTF-8
  {
    std::wofstream ofs("data.ucd");
    ofs.imbue(utf8_locale);
    std::copy(ucs4_data.begin(),ucs4_data.end(),
          std::ostream_iterator<ucs4_t,ucs4_t>(ofs));
  }

  // 读入 UTF-8 ,转换为 UCS-4 
  std::vector<ucs4_t> from_file;
  {
    std::wifstream ifs("data.ucd");
    ifs.imbue(utf8_locale);
    ucs4_t item = 0;
    while (ifs >> item) from_file.push_back(item);
  }
  //...
UTF-8 Codecvt Facet详见

http://www.boost.org/libs/serialization/doc/codecvt.html

posted on 2006-09-10 13:12 Fxzeng's space 阅读(563) 评论(0)  编辑 收藏 引用

只有注册用户登录后才能发表评论。