Contents IndexCollation overview Choosing a character set

User's Guide
   Part III. Using SQL Anywhere
     Chapter 22. Database Collations
      Support for multibyte character sets

SQL Anywhere supports several multibyte character sets, including the following:

Multibyte character set Description
SJIS Japanese Shift-JIS Encoding
EUC_JAPAN Japanese EUC JIS X 0208-1990 and JIS X 0212-1990 Encoding
EUC_CHINA Chinese GB 2312-80 Encoding
EUC_TAIWAN Taiwanese Big 5 Encoding
EUC_KOREA Korean KS C 5601-1992 Encoding
UTF8 UTF8 is a variable-byte encoding of 4-byte unicode (UCS-4). SQL Anywhere supports UTF8 characters up to 4 bytes in length, while UTF8 may use up to 6 bytes.

This section describes how SQL Anywhere handles multibyte character sets. The description applies to the supported collations and to any custom collations.

Top of page


Variable length character sets

SQL Anywhere supports variable length character sets. In these sets, some characters are represented by one byte, and some by more than one, to a maximum of four bytes. The value of the first byte in any character indicates the number of bytes used for that character, and also indicates whether the character is a space character, a digit, or an alphabetic (alpha) character. SQL Anywhere does not support fixed-length multibyte character sets such as 2-byte UNICODE or 4-byte UNICODE.

Example

As an example, characters in the Shift-JIS character set are of either one or two bytes in length. If the hex value of the first byte is in the range 81-9F or E0-EF (decimal values 129-159 or 224-239) then the character is a two-byte character and the subsequent byte (called a follow byte) completes the character. If the first byte is outside this range, the character is a single-byte character and the next byte is the first byte of the following character.

The properties of any Shift-JIS character can be read from its first byte also. Characters with a first byte in the (hex) range 09 to 0D, or 20, are space characters, those in the ranges 41 to 5A, 61 to 7A, 81 to 9F or E0 to EF are alpha characters (letters), and those in the range 30 to 39 are digits.

In building custom collations, you can specify which ranges of values for the first byte signify single and double byte (or more) characters, and which specify space, alpha, and digit characters. However, all first bytes of value less than 40 (hex 28) must be single byte characters, and no follow bytes may have values less than 40. This restriction is satisfied by all known current encodings.

Top of page


First-byte collation orderings

A sorting order for characters in a multibyte character set can be specified only for the first byte. Characters that have the same first byte are sorted according to the hexadecimal value of the following bytes. If the two characters are the same up to the length of the shorter of the two, the longer character is greater than the shorter.

Top of page


Contents IndexCollation overview Choosing a character set