Greenplum Database Supported Character Set

The character set support in Greenplum Database allows you to store text in a variety of character sets, including single-byte character sets such as the ISO 8859 series and multiple-byte character sets such as EUC (Extended Unix Code), UTF-8, and Mule internal code. All supported character sets can be used transparently by clients, but a few are not supported for use within the server (that is, as a server-side encoding). The default character set is selected while initializing your Greenplum Database array using gpinitsystem. It can be overridden when you create a database, so you can have multiple databases each with a different character set.

See also

 Name Description Language Server? Bytes/Char Aliases
 BIG5 Big Five Traditional Chinese No 1-2 WIN950, Windows950
 EUC_CN Extended UNIX Code-CN Simplified Chinese Yes 1-3 
 EUC_JP Extended UNIX Code-JP Japanese Yes 1-3 
 EUC_KR Extended UNIX Code-KR Korean Yes 1-3 
 EUC_TW Extended UNIX Code-TW Traditional Chinese, Taiwanese Yes 1-3  
 GB18030 National Standard Chinese No 1-2 
 GBK Extended National Standard Simplified Chinese No 1-2 WIN936, Windows936
 ISO_8859_5 ISO 8859-5, ECMA 113 Latin/Cyrillic Yes     1 
 ISO_8859_6 ISO 8859-6, ECMA 114 Latin/Arabic Yes 1 
 ISO_8859_7 ISO 8859-7, ECMA 118 Latin/Greek Yes 1 
 ISO_8859_8 ISO 8859-8, ECMA 121 Latin/Hebrew Yes 1 
 JOHAB JOHA Korean (Hangul) Yes 1-3 
 KOI8 KOI8-R(U) Cyrillic Yes 1 KOI8R
 LATIN1 ISO 8859-1, ECMA 94 Western European Yes 1 ISO88591
 LATIN2 ISO 8859-2, ECMA 94 Central European Yes 1 ISO88592
 LATIN3 ISO 8859-3, ECMA 94 South European Yes 1 ISO88593
 LATIN4 ISO 8859-4, ECMA 94 North European Yes 1 ISO88594
 LATIN5 ISO 8859-9, ECMA 128 Turkish Yes 1 ISO88599
 LATIN6 ISO 8859-10, ECMA 144 Nordic Yes 1 ISO885910
 LATIN7 ISO 8859-13 Baltic Yes 1 ISO885913
 LATIN8 ISO 8859-14 Celtic Yes 1 ISO885914
 LATIN9 ISO 8859-15 LATIN1 with Euro and accents Yes 1 ISO885915
 LATIN10 ISO 8859-16, ASRO SR 14111 Romanian Yes     1 ISO885916
 MULE_INTERNAL Mule internal code Multilingual Emacs Yes 1-4 
 SJIS Shift JIS Japanese No 1-2 Mskanji, ShiftJIS, WIN932, Windows932
 SQL_ASCII unspecified2 any No 1 
 UHC Unified Hangul Code Korean No 1-2 WIN949, Windows949
 UTF8 Unicode, 8-bit all Yes 1-4 Unicode
 WIN866 Windows CP866 Cyrillic Yes 1 ALT
 WIN874 Windows CP874 Thai Yes 1 
 WIN1250 Windows CP1250 Central European Yes 1 
 WIN1251 Windows CP1251 Cyrillic Yes 1 WIN
 WIN1252 Windows CP1252  Western European Yes 1 
 WIN1253 Windows CP1253 Greek Yes 1 
 WIN1254 Windows CP1254 Turkish Yes 1 
 WIN1255 Windows CP1255 Hebrew Yes 1 
 WIN1256 Windows CP1256 Arabic Yes 1 
 WIN1257 Windows CP1257 Baltic Yes 1 
 WIN1258 Windows CP1258 Vietnamese Yes 1 ABC, TCVN, TCVN5712, VSCII

1. Not all APIs support all the listed character sets. For example, the JDBC driver does not support MULE_INTERNAL, LATIN6, LATIN8, and LATIN10.

2. The SQL_ASCII setting behaves considerably differently from the other settings. Byte values 0-127 are interpreted according to the ASCII standard, while byte values 128-255 are taken as uninterpreted characters. If you are working with any non-ASCII data, it is unwise to use the SQL_ASCII setting as a client encoding. SQL_ASCII is not supported as a server encoding.