SyntaxStudy
Sign Up
PHP Multibyte String Handling
PHP Intermediate 8 min read

Multibyte String Handling

Standard string functions like strlen() count bytes, not characters. For UTF-8 text containing non-ASCII characters (accented letters, CJK, emoji), you need the mbstring extension.

  • mb_strlen() — character count.
  • mb_strtolower() / mb_strtoupper() — Unicode-aware case conversion.
  • mb_substr() — character-based slicing.
  • mb_strpos() — character-based search.
Example
<?php
$str = 'Héllo Wörld'; // 11 characters, but >11 bytes in UTF-8

echo strlen($str);      // e.g. 13 (bytes)
echo mb_strlen($str);   // 11 (characters)

echo mb_strtoupper($str, 'UTF-8'); // HÉLLO WÖRLD
echo mb_strtolower($str, 'UTF-8'); // héllo wörld

echo mb_substr($str, 0, 5, 'UTF-8'); // Héllo

// Safe check for multibyte position
$pos = mb_strpos($str, 'Wö', 0, 'UTF-8');
echo $pos; // 6

// Convert encoding
$latin = mb_convert_encoding($str, 'ISO-8859-1', 'UTF-8');

// Set internal encoding once at bootstrap
mb_internal_encoding('UTF-8');
// After this, the encoding argument can be omitted
echo mb_strlen($str); // 11
Pro Tip

Tip: Call mb_internal_encoding('UTF-8') once in your application bootstrap (or set mbstring.internal_encoding = UTF-8 in php.ini) so you never need to pass the encoding argument to every mb_* call.