Howto generate meaningful test data using a MySQL function

You can use this MySQL function to generate names, (e-mail)addresses, phone numbers, urls, bit values, colors, IP address, etc..
As usual, the code is provided in a zipfile and the code is fully documented.

When creating random test data it is important that the data matches your domain values. Just a random string generated with a MD5 hash does not resemble real data and cannot be used for testing. There are some tools that you can buy to generate meaningful test data, but you can do this with a simple function in your database, free of charge ;-)

It is as simple as this:
MYSQL>SELECT str_random('[http|https]://www.c{4}c(15).[com|co.uk|org|net]') AS url;

How does it work?
This functionality makes use of masks, or a pattern, to generate the random values. During the generation process the masking characters will be replaced with a random character matching the mask values.
A mask is any of these characters: c, C, A, d, D, b, X, *
Besides the mask you can give directives on how to interpret the masks: nn times, random times, fixed text or an option list.

This is the full description of the masks and directives:

MASKS:
c returns lower-case character [a-z]
C returns upper-case character [A-Z]
A returns either upper or lower-case character [a-z A-Z]
d returns a digit [0-9]
D returns a digit without a zero [1-9]
b returns a bit [0-1]
X returns hexadecimal character [0-F]
* returns characters, decimals and special characters [a-z A-Z 0-9 !?-_@$#]

DIRECTIVES
“text” : text is taken literally
{nn} : repeat the last mask nn times
(nn) : repeat random, but max nn times
[item|item] : pick a random item from this list, items are separated by a pipe symbol
All other characters are taken literally

How to install?
In the zip file are two functions:
fn_str_random.sql and fn_str_random_character.sql
Install first fn_str_random_character.sql and then fn_str_random.sql because function str_random() is dependent on function str_random_character().

str_random() loops through the pattern,handles the directives and collects the whole string. We will use only str_random() in the following examples.
str_random_character() is a helper function for str_random(), this function generates a single random character based on the mask.

You can input a mask of 200 characters and the output is limited to 2000 characters. You can change these values if you feel the need for it.

You can call the function in any INSERT INTO statement like:
INSERT INTO t1(c1) VALUES(str_random('add your mask here'));
Adding it in a loop in a stored procedure or execute the SQL statement in a loop in your (PHP) application will generate lot’s of test data for your application.

Examples:
These examples only show the functionality of the function, not inserting the generated values in a table.

The first example is a simple one, we want to generate a Dutch zip code like “1234 AB”.
We can do this is two ways:

MYSQL>SELECT str_random('dddd CC') AS DutchZip1,  str_random('d{4} C{2}') AS DutchZip2;
+-----------+-----------+
| DutchZip1 | DutchZip2 |
+-----------+-----------+
| 8868 GS   | 8911 DH   |
+-----------+-----------+
1 row in set (0.00 sec)

d is a digit[0-9] and C is an upper-case character[A-Z]
In the second we used a directive to repeat the digit exactly four times and the upper-case character exactly two times.
The space in between the masks is taken literally because it is not a known mask or a directive.

Now what if we have the need for not an exact amount of characters, for example we want to generate a password of at least 6 characters?

MYSQL>SELECT str_random('*{6}*(5)') AS pass;
+-----------+
| pass      |
+-----------+
| 0rTM?rY6P |
+-----------+
1 row in set (0.00 sec)

The star * returns all alpha numeric, numeric and some special characters.
We want at least 6 characters so we use *{6} and a random number on top of that.
For a random number we use (nn)

Now what if we have the need of a * in our test data? The * is a known mask so we have to tell the function to take it literally.
We can do that by surrounding it in double quotes like this:

MYSQL>SELECT str_random('"***"d{2}"***"') AS test;
+----------+
| test     |
+----------+
| ***95*** |
+----------+
1 row in set (0.00 sec)

Another useful feature is the possibility to add a list of possible values. This might be useful for example when generating an url.
You do not want the top domain name completely random but a value like:”.com” or “.org”

MYSQL>SELECT str_random('[http|https]://www.c{4}c(15).[com|co.uk|org|net]') AS url;
+----------------------------+
| url                        |
+----------------------------+
| http://www.rdokmknmt.co.uk |
+----------------------------+
1 row in set (0.00 sec)

The list is surrounded by square brackets and the items in the list are separated by the pipe symbol.

Here are some other useful patterns you can use:

MYSQL>SELECT str_random('c{3}c(5)[.|_]c{8}c(8)@[google|yahoo|live|mail]".com"') AS email1,
       str_random('c{3}c(5)[.|_]c{8}c(8)@c{4}c(3).[com|de|co.uk|fr|org|net]') AS email2;
+-----------------------------+--------------------------------+
| email1                      | email2                         |
+-----------------------------+--------------------------------+
| jfvqgj.adngpgnuir@yahoo.com | uvucb_uxcxcwzjhwidsch@fdars.fr |
+-----------------------------+--------------------------------+
1 row in set (0.00 sec)

MYSQL>SELECT str_random('#X{6}') AS htmlcolor, str_random('"("d{3},d{3},d{3}")"') AS RGBcolor ;
+-----------+---------------+
| htmlcolor | RGBcolor      |
+-----------+---------------+
| #7E4A88   | (055,246,985) |
+-----------+---------------+
1 row in set (0.00 sec)

MYSQL>SELECT str_random('Cc{3}c(4)') AS firstname ,str_random('Cc{5}c(6)') AS lastname
      ,str_random('Cc{5}[street|lane|road|park] d{1}d(2)') AS adres
      ,str_random('Cc{5}c(6)') AS city ,str_random('d{5}-d{4}') AS zip
      ,str_random('[0011|+1|+61|007] d{8}') AS phone
      ,str_random('[m|f]') AS sex;
+-----------+-----------+----------------+--------+------------+--------------+------+
| firstname | lastname  | adres          | city   | zip        | phone        | sex  |
+-----------+-----------+----------------+--------+------------+--------------+------+
| Svczwp    | Jazytysax | Lpobqastreet 9 | Kdmbwd | 09572-9971 | +61 23223828 | m    |
+-----------+-----------+----------------+--------+------------+--------------+------+
1 row in set (0.00 sec)

MYSQL>SELECT str_random('b') AS flag
      ,str_random('b{8}') AS Byte
      ,str_random('X{1}X(8)') AS Hex
      ,str_random('d{1}d(10) "kb"') AS filesize
      ,str_random('d{1}d(3).d{1}d(3).d{1}d(3).d{1}d(3)') AS IPaddress;
+------+----------+------+------------+---------------+
| flag | Byte     | Hex  | filesize   | IPaddress     |
+------+----------+------+------------+---------------+
| 1    | 01101111 | F1B8 | 5416832 kb | 20.37.702.821 |
+------+----------+------+------------+---------------+
1 row in set (0.00 sec)

You can download the code in the zip file here: str_random_2012-06-21.zip.
Please let me know if you like it, if you find any bugs or you have a request for added functionality or even if you have added new functionality yourself.

Related posts

Tags: , , , , ,

2 Responses to “Howto generate meaningful test data using a MySQL function”

  1. TooMeeK says:

    Hello, just want let You know that this post was useful:)
    Look @ http://toomeek.waw.pl/blog/?p=2677
    MemSQL doesn’t support procedures/functions used in these scripts (maybe it will some day..)

  2. Boyd says:

    Thanks a lot for publishing these functions! Very useful!

Leave a Reply