Posts Tagged ‘utf-8’

Find multi-byte characters in a table

Friday, May 21st, 2010

Multi-byte characters can cause quite a few problems for the unsuspecting DBA or web master.

Most of the times all you need to do to figure out how to fix the problem is detect which database records have UTF-8 data in them.
Scanning records manually is not an option.

Try the following query to find strings with multi-byte characters in a database table.

Oracle:
SELECT c FROM t WHERE LENGTH(c) < LENGTHB(c);

MySQL:
SELECT c FROM t WHERE LENGTH(c) != CHAR_LENGTH(c);