Posts Tagged ‘random’

Create random test data using real data from your MySql database.

Wednesday, November 28th, 2012

Mix and match random data with real data from your database.

Recently I showed a function that could create test data based on a mask or pattern.
See: http://moinne.com/blog/ronald/mysql/howto-generate-meaningful-test-data-using-a-mysql-function
This works fine for phone numbers, zip-codes etc. but it is not very useful for things like names of persons, products you sell, countries- and cities, etc. etc.
However it is very simple to mix the output of the str_random() function with data that is already available in your database.

Consider this example, we want to generate a random e-mail address only using the str_random() function:
MYSQL>SELECT str_random('c{3}c(5)[.|_]c{8}c(8)@[google|yahoo|live|mail]".com"') AS email;

This function can give results like this:
edbsotd.afdyittnge@google.com
smhwnzi_pwodzpzbwef@live.com
hpgz_gvkmgtbyitups@yahoo.com

We have generated valid email addresses, but they do not look like real email addresses.
We could do better if we could use real names in the test data. The following example gets this data from the table “persons”.
I ask only for a random female name, a random last name and optionally we add a number after the last name:

MYSQL>SELECT CONCAT((SELECT firstname FROM persons WHERE gender = 'f' AND RAND()<(SELECT ((1/COUNT(*))*10) FROM persons) LIMIT 1)
             ,str_random('[.|-|_]')
             ,(SELECT lastname FROM persons WHERE RAND()<(SELECT ((1/COUNT(*))*10) FROM persons) LIMIT 1)
             ,str_random('d(4)@[google|yahoo|live|mail]".com"')
            ) AS email;

The results are like this:
Emily_Kendall55@google.com
Angela-Lim86@mail.com
Melanie.Kirby@mail.com

This is much better, as we can use real names combined with different formatting and different mail providers.
I have added the "persons" table in the zip file for your convenience.

Fortunately, the "persons" table is small, only a thousand rows. I do not recommend selecting random rows from very large tables, this can be very slow.
If you need to select random rows for your test data from a very large table, check out this link: http://www.warpconduit.net/2011/03/23/selecting-a-random-record-using-mysql-benchmark-results/
This will show different techniques to do this as efficient as possible.

In the next example we create some random user information: firstname, lastname, birthdate, email, city, address and country.
- The firstname and lastname should be reflected in the emailadres.
- The city name should belong in the country.

The country information is from geonames.org. You can download the information here: http://download.geonames.org/export/dump/countryInfo.txt
I have used a simplified version and have added it in the zip file as a table.

I have used the city information from geonames.org also. You can download this text file here: http://download.geonames.org/export/dump/cities1000.zip
I have added a new auto increment column so it is more efficient to select random rows from it.
You can create a table and import the data like this:

CREATE TABLE `cities` (
  `auto_id` int(11) unsigned AUTO_INCREMENT NOT NULL,
  `id` int(11) unsigned NOT NULL,
  `name` varchar(200) NOT NULL DEFAULT '',
  `ascii_name` varchar(200) NOT NULL DEFAULT '',
  `alternate_names` varchar(2000) NOT NULL DEFAULT '',
  `lat` double(11,7) NOT NULL DEFAULT '0.0000000',
  `lon` double(11,7) NOT NULL DEFAULT '0.0000000',
  `feature_class` char(1) DEFAULT NULL,
  `feature_code` varchar(10) DEFAULT NULL,
  `country_code` char(2) DEFAULT NULL,
  `country_code2` varchar(60) DEFAULT NULL,
  `admin1_code` varchar(20) DEFAULT '',
  `admin2_code` varchar(80) DEFAULT '',
  `admin3_code` varchar(20) DEFAULT '',
  `admin4_code` varchar(20) DEFAULT '',
  `population` bigint(11) DEFAULT '0',
  `elevation` int(11) DEFAULT '0',
  `dem` int(11) DEFAULT '0',
  `timezone` varchar(40) DEFAULT NULL,
  `mod_date` date DEFAULT '0000-00-00',
  PRIMARY KEY (`auto_id`),
  KEY `country_code_name_idx` (`country_code`,`name`),
  KEY `fk_cities_countries` (`country_code`),
  CONSTRAINT `fk_cities_countries` FOREIGN KEY (`country_code`) REFERENCES `countries` (`iso_3166_2`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

LOAD DATA LOCAL INFILE '/path_to_file/cities1000.txt' REPLACE INTO TABLE cities FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' (id,name,ascii_name,alternate_names,lat,lon,feature_class,feature_code,country_code,country_code2, admin1_code,admin2_code,admin3_code,admin4_code,population,elevation,dem,timezone,mod_date) ;
ANALYZE TABLE cities;

Let's create a table for the test data:

CREATE TABLE IF NOT EXISTS testdata (
  `firstname`        VARCHAR(200) NOT NULL,
  `lastname`         VARCHAR(200) NOT NULL,
  `email`            VARCHAR(200) NOT NULL,
  `birthday`         DATE NOT NULL,
  `city`             VARCHAR(200) NOT NULL,
  `address`          VARCHAR(200) NOT NULL,
  `country`          VARCHAR(200) NOT NULL
);

And insert a row:

SELECT firstname INTO @firstname FROM persons WHERE RAND()<(SELECT ((1/COUNT(*))*10) FROM persons) ORDER BY RAND() LIMIT 1;
SELECT lastname INTO @lastname FROM persons WHERE RAND()select * from testdata;
SELECT DISTINCT co.iso_3166_2 INTO @countrycode FROM countries co INNER JOIN cities ci ON (co.iso_3166_2 = ci.country_code) ORDER BY RAND() LIMIT 1;
SELECT `name` INTO @countryname FROM countries WHERE iso_3166_2 = @countrycode;
SELECT `ascii_name` INTO @city FROM (SELECT `ascii_name` FROM cities WHERE country_code = @countrycode ORDER BY RAND()) res LIMIT 1;
INSERT INTO testdata(firstname,lastname,email,birthday,city,address,country)
VALUES(@firstname
      ,@lastname
      ,CONCAT(@firstname,str_random('[.|-|_]'),@lastname,str_random('d(4)@[google|yahoo|live|mail]".com"'))
      ,str_random_date('1920-01-01','1999-12-31','%Y-%m-%d')
      ,@city
      ,str_random('Cc{5}[street|lane|road|park] d{1}d(2)')
      ,@countryname
      );

After running this code a couple of times the table is filled like this:

MYSQL>select * from testdata;
+------------+-----------+------------------------------+------------+---------------------+-----------------+-----------------------------+
| firstname  | lastname  | email                        | birthday   | city                | address         | country                     |
+------------+-----------+------------------------------+------------+---------------------+-----------------+-----------------------------+
| Phyllis    | Lin       | Phyllis_Lin@mail.com         | 1940-09-04 | Ad Dakhla           | Qittopstreet 1  | Western Sahara              |
| Carmen     | Robinson  | Carmen-Robinson879@live.com  | 1985-04-14 | Meedhoo             | Vtgybmstreet 41 | Maldives                    |
| Danielle   | Sutton    | Danielle_Sutton9@live.com    | 1989-03-15 | Schellenberg        | Dbvyisroad 0    | Liechtenstein               |
| Jeremy     | Sellers   | Jeremy-Sellers58@live.com    | 1951-07-08 | Lucea               | Geaoswstreet 3  | Jamaica                     |
| Lisa       | MacDonald | Lisa.MacDonald@google.com    | 1929-09-05 | Marigot             | Fqpaippark 10   | Saint Martin                |
| Courtney   | Newton    | Courtney.Newton5@yahoo.com   | 1943-01-17 | La Condamine        | Irmkoplane 6    | Monaco                      |
| Rebecca    | Mathews   | Rebecca_Mathews037@live.com  | 1994-09-01 | Las Acequias        | Ucdjmhpark 0    | Argentina                   |
| Marshall   | Kearney   | Marshall_Kearney3@mail.com   | 1921-01-29 | Kirakira            | Qbmfrsroad 6    | Solomon Islands             |
| Christian  | Pollock   | Christian-Pollock65@live.com | 1953-03-06 | Laayoune / El Aaiun | Hctktmstreet 66 | Western Sahara              |
| Jamie      | Rosenberg | Jamie-Rosenberg8@google.com  | 1952-03-06 | Hadrut              | Nidsbfpark 03   | Azerbaijan                  |
| Jose       | Morton    | Jose-Morton34@yahoo.com      | 1987-06-21 | Ilorin              | Vrvcawstreet 23 | Nigeria                     |
| Marguerite | Melton    | Marguerite.Melton@mail.com   | 1948-04-17 | Port-aux-Francais   | Iqfdbylane 48   | French Southern Territories |
| Debbie     | Langston  | Debbie.Langston414@mail.com  | 1974-02-12 | Ugurchin            | Zyroszpark 9    | Bulgaria                    |
+------------+-----------+------------------------------+------------+---------------------+-----------------+-----------------------------+
13 rows in set (0.00 sec)

The birthday is generated with a small function that generates a random date between a date range. You can format the output with the third parameter.
E.g.:

MYSQL>SELECT str_random_date('1920-03-12','2012-01-31','%Y-%m-%d') AS RandomMysqlDate
     , str_random_date('1920-03-12','2012-01-31','%Y-%m-%d:%T') AS RandomDateWithTime
     , str_random_date('2012-01-01','2012-12-31','%u') AS RandomWeeknumberIn2012
     , str_random_date('2012-02-01','2012-03-01','%e') AS RandomDayInFebruary
\G
*************************** 1. row ***************************
       RandomMysqlDate: 1992-09-17
    RandomDateWithTime: 1943-03-10:14:28:02
RandomWeeknumberIn2012: 46
   RandomDayInFebruary: 20
1 row in set (0.00 sec)

You can download the persons and countries tables in this zip file: countries_persons_tables_2012-11-28.zip
You can download the str_random_date function in this zip file: str_random_date_2012-07-29.zip
You can download the functions for generating random data and see examples of generating useful test data here: http://moinne.com/blog/ronald/mysql/howto-generate-meaningful-test-data-using-a-mysql-function

Have fun!

MySql Lorum Ipsum generator

Saturday, July 7th, 2012

A short MySql function to generate a Lorum Ipsum text.
You can download the code in the zip file below.

The function has three parameters:
p_max_words Number: the maximum amount of words, if no min_words are provided this will be the exact amount of words in the result. Default = 50
p_min_words Number: the minimum amount of words in the result, By providing the parameter, you provide a range. Default = 0
p_start_with_lipsum Boolean: if “1″ the string will start with ‘Lorum ipsum dolor sit amet.’, Default = 0

Create exactly 3 words:

mysql>SELECT str_random_lipsum(3,NULL,NULL);
+--------------------------------+
| str_random_lipsum(3,NULL,NULL) |
+--------------------------------+
| Class facilisi dictum.         |
+--------------------------------+
1 row in set (0.00 sec)

Create exactly 20 words and let it start with “Lorum ipsum dolor sit amet.”

mysql>SELECT str_random_lipsum(20,NULL,1);
+----------------------------------------------------------------------------------------------------------------------------------------+
| str_random_lipsum(20,NULL,1)                                                                                                           |
+----------------------------------------------------------------------------------------------------------------------------------------+
| Lorem ipsum dolor sit amet. Penatibus eleifend cum amet sagittis sem per vestibulum sociosqu. Sem etiam vehicula urna scelerisque dui. |
+----------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Create max 30 words, but minimal 10, let it start with “Lorum ipsum dolor sit amet.”

mysql>SELECT str_random_lipsum(30,10,1);
+----------------------------------------------------------------------------------------------------+
| str_random_lipsum(30,10,1)                                                                         |
+----------------------------------------------------------------------------------------------------+
| Lorem ipsum dolor sit amet. Senectus erat sed est urna sollicitudin litora pulvinar felis. Varius. |
+----------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

You can download the function in this zip file: str_random_lipsum_2012-07-07.zip

Howto generate meaningful test data using a MySQL function

Thursday, June 21st, 2012

You can use this MySQL function to generate names, (e-mail)addresses, phone numbers, urls, bit values, colors, IP address, etc..
As usual, the code is provided in a zipfile and the code is fully documented.

When creating random test data it is important that the data matches your domain values. Just a random string generated with a MD5 hash does not resemble real data and cannot be used for testing. There are some tools that you can buy to generate meaningful test data, but you can do this with a simple function in your database, free of charge ;-)

It is as simple as this:
MYSQL>SELECT str_random('[http|https]://www.c{4}c(15).[com|co.uk|org|net]') AS url;

How does it work?
This functionality makes use of masks, or a pattern, to generate the random values. During the generation process the masking characters will be replaced with a random character matching the mask values.
A mask is any of these characters: c, C, A, d, D, b, X, *
Besides the mask you can give directives on how to interpret the masks: nn times, random times, fixed text or an option list.

This is the full description of the masks and directives:

MASKS:
c returns lower-case character [a-z]
C returns upper-case character [A-Z]
A returns either upper or lower-case character [a-z A-Z]
d returns a digit [0-9]
D returns a digit without a zero [1-9]
b returns a bit [0-1]
X returns hexadecimal character [0-F]
* returns characters, decimals and special characters [a-z A-Z 0-9 !?-_@$#]

DIRECTIVES
“text” : text is taken literally
{nn} : repeat the last mask nn times
(nn) : repeat random, but max nn times
[item|item] : pick a random item from this list, items are separated by a pipe symbol
All other characters are taken literally

How to install?
In the zip file are two functions:
fn_str_random.sql and fn_str_random_character.sql
Install first fn_str_random_character.sql and then fn_str_random.sql because function str_random() is dependent on function str_random_character().

str_random() loops through the pattern,handles the directives and collects the whole string. We will use only str_random() in the following examples.
str_random_character() is a helper function for str_random(), this function generates a single random character based on the mask.

You can input a mask of 200 characters and the output is limited to 2000 characters. You can change these values if you feel the need for it.

You can call the function in any INSERT INTO statement like:
INSERT INTO t1(c1) VALUES(str_random('add your mask here'));
Adding it in a loop in a stored procedure or execute the SQL statement in a loop in your (PHP) application will generate lot’s of test data for your application.

Examples:
These examples only show the functionality of the function, not inserting the generated values in a table.

The first example is a simple one, we want to generate a Dutch zip code like “1234 AB”.
We can do this is two ways:

MYSQL>SELECT str_random('dddd CC') AS DutchZip1,  str_random('d{4} C{2}') AS DutchZip2;
+-----------+-----------+
| DutchZip1 | DutchZip2 |
+-----------+-----------+
| 8868 GS   | 8911 DH   |
+-----------+-----------+
1 row in set (0.00 sec)

d is a digit[0-9] and C is an upper-case character[A-Z]
In the second we used a directive to repeat the digit exactly four times and the upper-case character exactly two times.
The space in between the masks is taken literally because it is not a known mask or a directive.

Now what if we have the need for not an exact amount of characters, for example we want to generate a password of at least 6 characters?

MYSQL>SELECT str_random('*{6}*(5)') AS pass;
+-----------+
| pass      |
+-----------+
| 0rTM?rY6P |
+-----------+
1 row in set (0.00 sec)

The star * returns all alpha numeric, numeric and some special characters.
We want at least 6 characters so we use *{6} and a random number on top of that.
For a random number we use (nn)

Now what if we have the need of a * in our test data? The * is a known mask so we have to tell the function to take it literally.
We can do that by surrounding it in double quotes like this:

MYSQL>SELECT str_random('"***"d{2}"***"') AS test;
+----------+
| test     |
+----------+
| ***95*** |
+----------+
1 row in set (0.00 sec)

Another useful feature is the possibility to add a list of possible values. This might be useful for example when generating an url.
You do not want the top domain name completely random but a value like:”.com” or “.org”

MYSQL>SELECT str_random('[http|https]://www.c{4}c(15).[com|co.uk|org|net]') AS url;
+----------------------------+
| url                        |
+----------------------------+
| http://www.rdokmknmt.co.uk |
+----------------------------+
1 row in set (0.00 sec)

The list is surrounded by square brackets and the items in the list are separated by the pipe symbol.

Here are some other useful patterns you can use:

MYSQL>SELECT str_random('c{3}c(5)[.|_]c{8}c(8)@[google|yahoo|live|mail]".com"') AS email1,
       str_random('c{3}c(5)[.|_]c{8}c(8)@c{4}c(3).[com|de|co.uk|fr|org|net]') AS email2;
+-----------------------------+--------------------------------+
| email1                      | email2                         |
+-----------------------------+--------------------------------+
| jfvqgj.adngpgnuir@yahoo.com | uvucb_uxcxcwzjhwidsch@fdars.fr |
+-----------------------------+--------------------------------+
1 row in set (0.00 sec)

MYSQL>SELECT str_random('#X{6}') AS htmlcolor, str_random('"("d{3},d{3},d{3}")"') AS RGBcolor ;
+-----------+---------------+
| htmlcolor | RGBcolor      |
+-----------+---------------+
| #7E4A88   | (055,246,985) |
+-----------+---------------+
1 row in set (0.00 sec)

MYSQL>SELECT str_random('Cc{3}c(4)') AS firstname ,str_random('Cc{5}c(6)') AS lastname
      ,str_random('Cc{5}[street|lane|road|park] d{1}d(2)') AS adres
      ,str_random('Cc{5}c(6)') AS city ,str_random('d{5}-d{4}') AS zip
      ,str_random('[0011|+1|+61|007] d{8}') AS phone
      ,str_random('[m|f]') AS sex;
+-----------+-----------+----------------+--------+------------+--------------+------+
| firstname | lastname  | adres          | city   | zip        | phone        | sex  |
+-----------+-----------+----------------+--------+------------+--------------+------+
| Svczwp    | Jazytysax | Lpobqastreet 9 | Kdmbwd | 09572-9971 | +61 23223828 | m    |
+-----------+-----------+----------------+--------+------------+--------------+------+
1 row in set (0.00 sec)

MYSQL>SELECT str_random('b') AS flag
      ,str_random('b{8}') AS Byte
      ,str_random('X{1}X(8)') AS Hex
      ,str_random('d{1}d(10) "kb"') AS filesize
      ,str_random('d{1}d(3).d{1}d(3).d{1}d(3).d{1}d(3)') AS IPaddress;
+------+----------+------+------------+---------------+
| flag | Byte     | Hex  | filesize   | IPaddress     |
+------+----------+------+------------+---------------+
| 1    | 01101111 | F1B8 | 5416832 kb | 20.37.702.821 |
+------+----------+------+------------+---------------+
1 row in set (0.00 sec)

You can download the code in the zip file here: str_random_2012-06-21.zip.
Please let me know if you like it, if you find any bugs or you have a request for added functionality or even if you have added new functionality yourself.