MySql: cleaning user input before storing the data

This article describes a single MySql function to sanitize strings.
The provided options are:

  • replace multiple spaces to a single space
  • remove Unix and Windows linebreaks and replace them with a space
  • remove tabs and replace them for a single space
  • remove multiple backslashes
  • completely remove htmlentities
  • replace htmlentities with the corresponding character
  • remove all HTML tags

All code is available in a zip file.

Where do you stop verifying and cleaning your data?
The question of where your data verification and cleaning should be done is too application-specific to have one right answer.
Only relying on client-side scripting is definitely the wrong answer.
Verifying the data for SQL injection, valid numbers and dates and formats should definitely be done on the server side before it is send to your database.
Cleaning user input on the server side should ideally also be done before the data is sent to the database.
As a DBA, the question rises if your can really rely on it? And if so, what about cleaning migrated data or cleaning up legacy data from a predecessor?

You can use this MySql function to clean up strings before inserting the string in a table.
Because this is a single function that can be used in different scenarios it has a lot of parameters to customize the output.
Feel free to remove some functionality from this function to limit the number of parameters.
Alternatively you can use a wrapper function to call this function without parameters. e.g.:

DELIMITER //
CREATE FUNCTION mycleanup (s TEXT)
RETURNS TEXT DETERMINISTIC
RETURN str_clean(s , FALSE , TRUE , FALSE , TRUE , TRUE , FALSE , FALSE);
//
DELIMITER ;
SELECT mycleanup(@text);

Install the function in your database:
The functionality to html-UnEncode a string is done by calling another function. The credits for this function goes to Charlie Armor.
You can find it here: http://forums.mysql.com/read.php?98,246527,246527 and I have included it in the zip file because my code relies on it.
Unzip the file and on the mysql prompt:

mysql>source fn_str_html_unencode.sql;
mysql>source fn_str_clean.sql;

This assumes both files are in the same directory where you have started the mysql client.

Using the function:
The function is called str_clean() and has eight parameters:

p_text              i         string : the text that needs to be cleaned
p_remove_multiple_blanks      Boolean: replace multiple spaces to a single space
p_remove_linebreaks           Boolean: remove Unix and Windows linebreaks and replace them with a space
p_remove_tabs                 Boolean: remove tabs and replace them for a single space
p_remove_multiple_backslashes Boolean: remove multiple backslashes
p_remove_htmlentities         Boolean: completely remove htmlentities
p_html_unencode               Boolean: replace htmlentities with the corresponding character
p_remove_html                 Boolean: remove all HTML tags

This is also documented in the code.

Now let’s clean up a string:

mysql&gt;SET @text := ' This  &amp;text \\\\contains double spaces         and tabs as well as <a href="www.google.com">tags</a> and
linebreaks. ';

mysql&gt;SELECT @text as originalText
 , str_clean(@text , FALSE , FALSE , FALSE , FALSE , FALSE , FALSE , FALSE) as onlyTrim
 , str_clean(@text , FALSE , TRUE  , FALSE , FALSE , FALSE , FALSE , FALSE) as removeLineBreaks
 , str_clean(@text , FALSE , TRUE  , TRUE  , FALSE , FALSE , FALSE , FALSE) as removeTabs
 , str_clean(@text , TRUE  , TRUE  , TRUE  , FALSE , FALSE , FALSE , FALSE) as removeMultiBlanks
 , str_clean(@text , TRUE  , TRUE  , TRUE  , TRUE  , FALSE , FALSE , FALSE) as removeMultipleBackSlashes
 , str_clean(@text , TRUE  , TRUE  , TRUE  , TRUE  , TRUE  , FALSE , FALSE) as removeHtmlEntities
 , str_clean(@text , TRUE  , TRUE  , TRUE  , TRUE  , FALSE , TRUE  , FALSE) as htmlUnEncode
 , str_clean(@text , TRUE  , TRUE  , TRUE  , TRUE  , TRUE  , TRUE  , TRUE)  as removeHtml
\G
*************************** 1. row ***************************
             originalText:  This  &amp;text \\contains double spaces         and tabs   as well as <a href="www.google.com">tags</a> and
linebreaks.
                 onlyTrim: This  &amp;text \\contains double spaces         and tabs    as well as <a href="www.google.com">tags</a> and
linebreaks.
         removeLineBreaks: This  &amp;text \\contains double spaces         and tabs    as well as <a href="www.google.com">tags</a> and linebreaks.
               removeTabs: This  &amp;text \\contains double spaces         and tabs as well as <a href="www.google.com">tags</a> and linebreaks.
        removeMultiBlanks: This &amp;text \\contains double spaces and tabs as well as <a href="www.google.com">tags</a> and linebreaks.
removeMultipleBackSlashes: This &amp;text \contains double spaces and tabs as well as <a href="www.google.com">tags</a> and linebreaks.
       removeHtmlEntities: This text \contains double spaces and tabs as well as <a href="www.google.com">tags</a> and linebreaks.
             htmlUnEncode: This &amp;text \contains double spaces and tabs as well as <a href="www.google.com">tags</a> and linebreaks.
               removeHtml: This text \contains double spaces and tabs as well as tags and linebreaks.
1 row in set (0.00 sec)

Note: The html is messing up the ampersand on line with the htmlUnencode.

You can download the functions in this zip file: str_clean_2012-06-02.zip

What kind of data cleanup do you use in your database application?

Related posts

Tags: , , ,

3 Responses to “MySql: cleaning user input before storing the data”

  1. Hi Will,

    I think you should have a look at the MySql documentation on how to use the commandline to connect to MySql: http://dev.mysql.com/doc/refman/5.6/en/mysql.html

    In the example you connect to the mysql schema. This schema is for internal use of MySql only. I would recommend that you do not install user functions in this schema.
    First create a new schema/database like this:
    mysql>CREATE DATABASE `myDatabase`;
    And then change your login start using this schema instead of the mysql schema or change to this schema from the mysql commandline:
    mysql>use myDatabase;

    From this new schema you can install these functions. It is possible to run functions from another schema, but you have to set the access rights to these functions correct.
    You can find out what the warnings are about if you run the command:
    mysql>show warnings;

    /Ronald.

  2. Will says:

    Ive never done this before and cant find much help on doing it around the internet.

    I assume you connect to your mysql from the command line.. and attach to the database called “mysql” which has the “func” table? And then you can use this function from any other database within your mysql instance?

    This is what I did (below) and Im worried that there are warnings?

    Any help much appreciated.

    C:\>mysql -u MYROOTACCOUNT -p mysql
    Enter password: MYROOTPASSWORD

    mysql> source fn_str_html_unencode.sql;
    Query OK, 0 rows affected (0.00 sec)
    Query OK, 0 rows affected, 1 warning (0.00 sec)
    Query OK, 0 rows affected (0.02 sec)
    Query OK, 0 rows affected (0.00 sec)

    mysql> source fn_str_clean.sql;
    Query OK, 0 rows affected (0.00 sec)
    Query OK, 0 rows affected, 1 warning (0.00 sec)
    Query OK, 0 rows affected (0.00 sec)
    Query OK, 0 rows affected (0.00 sec)

  3. Jerry says:

    I was looking for cleaning blogs and found yours =). I guess everyone has to clean sometime…even programmers. Ah well it was a nice break from mother blogs and such. I never read a blog without leaving a comment so here is yours =)

    Jerry.

Leave a Reply