Skip to content

Encoding & Languages

Supported Data Coding Schemes

Data Coding
0x00 GSM 7 bit encoding
0x01 US-ASCII
0x03 ISO8859-1 (Latin-1), delivered as GSM-7
0x04 Binary5
0x08 UCS-2 / UTF-16BE

GSM 7-bit encoding

The standard and default encoding for GSM messages.

0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70
0x00 @ Δ SP1 0 ¡ P ¿ p
0x01 £ _ ! 1 A Q a q
0x02 $ Φ " 2 B R b r
0x03 ¥ Γ # 3 C S c s
0x04 è Λ ¤ 4 D T d t
0x05 é Ω % 5 E U e u
0x06 ù Π & 6 F V f v
0x07 ì Ψ ' 7 G W g w
0x08 ò Σ ( 8 H X h x
0x09 Ç Θ ) 9 I Y i y
0x0A LF2 Ξ * : J Z j z
0x0B Ø ESC4 + ; K Ä k ä
0x0C ø Æ , < L Ö l ö
0x0D CR3 æ - = M Ñ m ñ
0x0E Å ß . > N Ü n ü
0x0F å É / ? O § o à

The most common text encodings for SMS text are GSM 7 bit encoding. The GSM encoding can map 128 Latin characters.

Septet & Octet

The 7 bits binary representation of a character is called a septet and the 8bit binary representation is called an octet.

The process of filling septets (7bits characters) into octets (8bits bytes) is called packing. The reverse process is called unpacking which means extracting septets from the packed data.

Example GSM 7-bit encoding

HelloWorld Example Septet Table

Character Hex Septets
H 0x48 1001000
e 0x65 1100101
l 0x6C 1101100
l 0x6C 1101100
o 0x6F 1101111
W 0x57 1010111
o 0x6F 1101111
r 0x72 1110010
l 0x6C 1101100
d 0x64 1100100

HelloWorld Example Packing

Character Hex Septets Octets
H 0x48 1001000 11001000
e 0x65 1100101 00110010
l 0x6C 1101100 10011011
l 0x6C 1101100 11111101
o 0x6F 1101111 10111110
W 0x57 1010111 10111110
o 0x6F 1101111 11100101
r 0x72 1110010 01101100
l 0x6C 1101100 11001000
d 0x64 1100100
Note that the very last septet padded with zeros.

UCS-2 Encoding

This encoding allows use of a greater range of characters and languages. UCS-2 can represent the most commonly used Latin and eastern characters at the cost of a greater space expense. A single SMS GSM message using this encoding can have at most 70 characters (140 octets).

Locking Shift Characterset Support

Turkish language

User_Data_Header contains 0x25 0x01 0x01 UDHI element.
0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70
0x00 @ Δ SP1 0 İ P ç p
0x01 £ _ ! 1 A Q a q
0x02 $ Φ " 2 B R b r
0x03 ¥ Γ # 3 C S c s
0x04 Λ ¤ 4 D T d t
0x05 é Ω % 5 E U e u
0x06 ù Π & 6 F V f v
0x07 ı Ψ ' 7 G W g w
0x08 ò Σ ( 8 H X h x
0x09 Ç Θ ) 9 I Y i y
0x0A LF2 Ξ * : J Z j z
0x0B Ğ ESC4 + ; K Ä k ä
0x0C ğ Ş , < L Ö l ö
0x0D CR3 ş - = M Ñ m ñ
0x0E Å ß . > N Ü n ü
0x0F å É / ? O § o à

  1. SP is a Space character. 

  2. LF is a Line Feed \n control. 

  3. CR is a Carriage Return \r control. 

  4. ESC is an Escape ^[ control. 

  5. Contact your account manager to enable binary submissions.