This article describes a single MySql function to sanitize strings.
The provided options are:
- replace multiple spaces to a single space
- remove Unix and Windows linebreaks and replace them with a space
- remove tabs and replace them for a single space
- remove multiple backslashes
- completely remove htmlentities
- replace htmlentities with the corresponding character
- remove all HTML tags
All code is available in a zip file.
Where do you stop verifying and cleaning your data?
The question of where your data verification and cleaning should be done is too application-specific to have one right answer.
Only relying on client-side scripting is definitely the wrong answer.
Verifying the data for SQL injection, valid numbers and dates and formats should definitely be done on the server side before it is send to your database.
Cleaning user input on the server side should ideally also be done before the data is sent to the database.
As a DBA, the question rises if your can really rely on it? And if so, what about cleaning migrated data or cleaning up legacy data from a predecessor?
You can use this MySql function to clean up strings before inserting the string in a table.
Because this is a single function that can be used in different scenarios it has a lot of parameters to customize the output.
Feel free to remove some functionality from this function to limit the number of parameters.
Alternatively you can use a wrapper function to call this function without parameters. e.g.:
CREATE FUNCTION mycleanup (s TEXT)
RETURNS TEXT DETERMINISTIC
RETURN str_clean(s , FALSE , TRUE , FALSE , TRUE , TRUE , FALSE , FALSE);
//
DELIMITER ;
SELECT mycleanup(@text);
Install the function in your database:
The functionality to html-UnEncode a string is done by calling another function. The credits for this function goes to Charlie Armor.
You can find it here: http://forums.mysql.com/read.php?98,246527,246527 and I have included it in the zip file because my code relies on it.
Unzip the file and on the mysql prompt:
mysql>source fn_str_clean.sql;
This assumes both files are in the same directory where you have started the mysql client.
Using the function:
The function is called str_clean() and has eight parameters:
p_text i string : the text that needs to be cleaned p_remove_multiple_blanks Boolean: replace multiple spaces to a single space p_remove_linebreaks Boolean: remove Unix and Windows linebreaks and replace them with a space p_remove_tabs Boolean: remove tabs and replace them for a single space p_remove_multiple_backslashes Boolean: remove multiple backslashes p_remove_htmlentities Boolean: completely remove htmlentities p_html_unencode Boolean: replace htmlentities with the corresponding character p_remove_html Boolean: remove all HTML tags
This is also documented in the code.
Now let’s clean up a string:
linebreaks. ';
mysql>SELECT @text as originalText
, str_clean(@text , FALSE , FALSE , FALSE , FALSE , FALSE , FALSE , FALSE) as onlyTrim
, str_clean(@text , FALSE , TRUE , FALSE , FALSE , FALSE , FALSE , FALSE) as removeLineBreaks
, str_clean(@text , FALSE , TRUE , TRUE , FALSE , FALSE , FALSE , FALSE) as removeTabs
, str_clean(@text , TRUE , TRUE , TRUE , FALSE , FALSE , FALSE , FALSE) as removeMultiBlanks
, str_clean(@text , TRUE , TRUE , TRUE , TRUE , FALSE , FALSE , FALSE) as removeMultipleBackSlashes
, str_clean(@text , TRUE , TRUE , TRUE , TRUE , TRUE , FALSE , FALSE) as removeHtmlEntities
, str_clean(@text , TRUE , TRUE , TRUE , TRUE , FALSE , TRUE , FALSE) as htmlUnEncode
, str_clean(@text , TRUE , TRUE , TRUE , TRUE , TRUE , TRUE , TRUE) as removeHtml
\G
*************************** 1. row ***************************
originalText: This &text \\contains double spaces and tabs as well as <a href="www.google.com">tags</a> and
linebreaks.
onlyTrim: This &text \\contains double spaces and tabs as well as <a href="www.google.com">tags</a> and
linebreaks.
removeLineBreaks: This &text \\contains double spaces and tabs as well as <a href="www.google.com">tags</a> and linebreaks.
removeTabs: This &text \\contains double spaces and tabs as well as <a href="www.google.com">tags</a> and linebreaks.
removeMultiBlanks: This &text \\contains double spaces and tabs as well as <a href="www.google.com">tags</a> and linebreaks.
removeMultipleBackSlashes: This &text \contains double spaces and tabs as well as <a href="www.google.com">tags</a> and linebreaks.
removeHtmlEntities: This text \contains double spaces and tabs as well as <a href="www.google.com">tags</a> and linebreaks.
htmlUnEncode: This &text \contains double spaces and tabs as well as <a href="www.google.com">tags</a> and linebreaks.
removeHtml: This text \contains double spaces and tabs as well as tags and linebreaks.
1 row in set (0.00 sec)
Note: The html is messing up the ampersand on line with the htmlUnencode.
You can download the functions in this zip file: str_clean_2012-06-02.zip
What kind of data cleanup do you use in your database application?
Related posts
Tags: cleanup data, download code, functions, MySQL
Hi Will,
I think you should have a look at the MySql documentation on how to use the commandline to connect to MySql: http://dev.mysql.com/doc/refman/5.6/en/mysql.html
In the example you connect to the mysql schema. This schema is for internal use of MySql only. I would recommend that you do not install user functions in this schema.
First create a new schema/database like this:
mysql>CREATE DATABASE `myDatabase`;
And then change your login start using this schema instead of the mysql schema or change to this schema from the mysql commandline:
mysql>use myDatabase;
From this new schema you can install these functions. It is possible to run functions from another schema, but you have to set the access rights to these functions correct.
You can find out what the warnings are about if you run the command:
mysql>show warnings;
/Ronald.
Ive never done this before and cant find much help on doing it around the internet.
I assume you connect to your mysql from the command line.. and attach to the database called “mysql” which has the “func” table? And then you can use this function from any other database within your mysql instance?
This is what I did (below) and Im worried that there are warnings?
Any help much appreciated.
C:\>mysql -u MYROOTACCOUNT -p mysql
Enter password: MYROOTPASSWORD
mysql> source fn_str_html_unencode.sql;
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected (0.02 sec)
Query OK, 0 rows affected (0.00 sec)
mysql> source fn_str_clean.sql;
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected, 1 warning (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
I was looking for cleaning blogs and found yours =). I guess everyone has to clean sometime…even programmers. Ah well it was a nice break from mother blogs and such. I never read a blog without leaving a comment so here is yours =)
Jerry.