Character strings
A character string is a sequence of code units. The length of the string is the number of code units in the sequence. If the length is zero, the value is called the empty string, which should not be confused with the null value.
Fixed-length character string (CHAR)
All values in a fixed-length string column have the same length, which is determined by the length attribute of the column. The length attribute must be in the range 1 - 255, inclusive, unless the string unit is CODEUNITS32, which has a range of 1 - 63, inclusive.
Varying-length character strings
There are two types of varying-length character strings:
- VARCHAR
- A VARCHAR value can be up to 32,672 bytes long. If the string unit is CODEUNITS32, the length can be up to 8,168 string units.
- CLOB
- A character large object (CLOB) value can be up to 2 gigabytes minus 1 byte (2,147,483,647 bytes) long or, if the string unit is CODEUNITS32, up to 536,870,911 string units. A CLOB is used to store large SBCS or mixed (SBCS and MBCS) character-based data (such as documents written with a single character set) and, therefore, has an SBCS or mixed code page that is associated with it.
Special restrictions apply to expressions that result in a CLOB data type, and to structured type columns; such expressions and columns are not permitted in:
- A SELECT list that is preceded by the DISTINCT clause
- A GROUP BY clause
- An ORDER BY clause
- A subselect of a set operator other than UNION ALL
- A basic, quantified, BETWEEN, or IN predicate
- An aggregate function
- VARGRAPHIC, TRANSLATE, and datetime scalar functions
- The pattern operand in a LIKE predicate, or the search string operand in a POSSTR function
- The string representation of a datetime value
The functions in the SYSFUN schema taking a VARCHAR as an argument will not accept VARCHARs greater than 4,000 bytes long as an argument. However, many of these functions also have an alternative signature accepting a CLOB(1M). For these functions, the user can explicitly cast the greater than 4,000 VARCHAR strings into CLOBs and then recast the result back into VARCHARs of the required length.
NUL-terminated character strings that are found in C are handled differently, depending on the standards level of the precompile option.
Example
String units specification for character strings
The unit of length for the character string data type is OCTETS or CODEUNITS32. The unit of length defines the counting method that is used to determine the length of the data.
OCTETS
Indicates that the units for the length attribute are bytes. This unit of length applies to all character string data types in a non-Unicode database. In a Unicode database, OCTETS can be explicitly specified or determined based on an environment setting. Indicates that the units for the length attribute are bytes. This unit of length applies to all non-Unicode character string data types. For a Unicode character string data type, OCTETS can be explicitly specified or determined based on an environment setting.
CODEUNITS32
Indicates that the units for the length attribute are Unicode UTF-32 code units which approximate counting in characters. This unit of length does not affect the underlying code page of the data type. The actual length of a data value is determined by counting the UTF-32 code units as if the data was converted to UTF-32. A string unit of CODEUNITS32 can be used only in a Unicode database for a Unicode character string data type. CODEUNITS32 can be explicitly specified or determined based on an environment setting.
For example, assume that NAME, a VARCHAR(128) column that is encoded in Unicode UTF-8, contains the value 'Jürgen'. The following two queries, which count the length of the string in CODEUNITS16 or CODEUNITS32, return the same value (6).
SELECT CHARACTER_LENGTH(NAME,CODEUNITS16) FROM T1
WHERE NAME = 'Jürgen'
SELECT CHARACTER_LENGTH(NAME,CODEUNITS32) FROM T1
WHERE NAME = 'Jürgen
The next query, which counts the length of the string in OCTETS, returns the value 7.
SELECT CHARACTER_LENGTH(NAME,OCTETS) FROM T1
WHERE NAME = 'Jürgen'
These values represent the length of the string that is expressed in the specified string unit. The following table shows the UTF-8, UTF-16BE (big-endian), and UTF-32BE (big-endian) representations of the name 'Jürgen':
Format Representation of the name 'Jürgen'
-------- --------------------------------------
UTF-8 X'4AC3BC7267656E'
UTF-16BE X'004A00FC007200670065006E'
UTF-32BE X'0000004A000000FC0000007200000067000000650000006E'
Character | ASCII Value | String Length | Character Code | Character Representation |
---|---|---|---|---|
M | 77 | 1 | U+004D | Letter M |
a | 97 | 1 | U+0061 | Letter a |
n | 110 | 1 | U+006E | Letter n |
x | 120 | 1 | U+0078 | Letter x |