Create random test data using real data from your MySql database.

Mix and match random data with real data from your database.

Recently I showed a function that could create test data based on a mask or pattern.
See: http://moinne.com/blog/ronald/mysql/howto-generate-meaningful-test-data-using-a-mysql-function
This works fine for phone numbers, zip-codes etc. but it is not very useful for things like names of persons, products you sell, countries- and cities, etc. etc.
However it is very simple to mix the output of the str_random() function with data that is already available in your database.

Consider this example, we want to generate a random e-mail address only using the str_random() function:
MYSQL>SELECT str_random('c{3}c(5)[.|_]c{8}c(8)@[google|yahoo|live|mail]".com"') AS email;

This function can give results like this:
edbsotd.afdyittnge@google.com
smhwnzi_pwodzpzbwef@live.com
hpgz_gvkmgtbyitups@yahoo.com

We have generated valid email addresses, but they do not look like real email addresses.
We could do better if we could use real names in the test data. The following example gets this data from the table “persons”.
I ask only for a random female name, a random last name and optionally we add a number after the last name:

MYSQL>SELECT CONCAT((SELECT firstname FROM persons WHERE gender = 'f' AND RAND()<(SELECT ((1/COUNT(*))*10) FROM persons) LIMIT 1)
             ,str_random('[.|-|_]')
             ,(SELECT lastname FROM persons WHERE RAND()<(SELECT ((1/COUNT(*))*10) FROM persons) LIMIT 1)
             ,str_random('d(4)@[google|yahoo|live|mail]".com"')
            ) AS email;

The results are like this:
Emily_Kendall55@google.com
Angela-Lim86@mail.com
Melanie.Kirby@mail.com

This is much better, as we can use real names combined with different formatting and different mail providers.
I have added the "persons" table in the zip file for your convenience.

Fortunately, the "persons" table is small, only a thousand rows. I do not recommend selecting random rows from very large tables, this can be very slow.
If you need to select random rows for your test data from a very large table, check out this link: http://www.warpconduit.net/2011/03/23/selecting-a-random-record-using-mysql-benchmark-results/
This will show different techniques to do this as efficient as possible.

In the next example we create some random user information: firstname, lastname, birthdate, email, city, address and country.
- The firstname and lastname should be reflected in the emailadres.
- The city name should belong in the country.

The country information is from geonames.org. You can download the information here: http://download.geonames.org/export/dump/countryInfo.txt
I have used a simplified version and have added it in the zip file as a table.

I have used the city information from geonames.org also. You can download this text file here: http://download.geonames.org/export/dump/cities1000.zip
I have added a new auto increment column so it is more efficient to select random rows from it.
You can create a table and import the data like this:

CREATE TABLE `cities` (
  `auto_id` int(11) unsigned AUTO_INCREMENT NOT NULL,
  `id` int(11) unsigned NOT NULL,
  `name` varchar(200) NOT NULL DEFAULT '',
  `ascii_name` varchar(200) NOT NULL DEFAULT '',
  `alternate_names` varchar(2000) NOT NULL DEFAULT '',
  `lat` double(11,7) NOT NULL DEFAULT '0.0000000',
  `lon` double(11,7) NOT NULL DEFAULT '0.0000000',
  `feature_class` char(1) DEFAULT NULL,
  `feature_code` varchar(10) DEFAULT NULL,
  `country_code` char(2) DEFAULT NULL,
  `country_code2` varchar(60) DEFAULT NULL,
  `admin1_code` varchar(20) DEFAULT '',
  `admin2_code` varchar(80) DEFAULT '',
  `admin3_code` varchar(20) DEFAULT '',
  `admin4_code` varchar(20) DEFAULT '',
  `population` bigint(11) DEFAULT '0',
  `elevation` int(11) DEFAULT '0',
  `dem` int(11) DEFAULT '0',
  `timezone` varchar(40) DEFAULT NULL,
  `mod_date` date DEFAULT '0000-00-00',
  PRIMARY KEY (`auto_id`),
  KEY `country_code_name_idx` (`country_code`,`name`),
  KEY `fk_cities_countries` (`country_code`),
  CONSTRAINT `fk_cities_countries` FOREIGN KEY (`country_code`) REFERENCES `countries` (`iso_3166_2`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

LOAD DATA LOCAL INFILE '/path_to_file/cities1000.txt' REPLACE INTO TABLE cities FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' (id,name,ascii_name,alternate_names,lat,lon,feature_class,feature_code,country_code,country_code2, admin1_code,admin2_code,admin3_code,admin4_code,population,elevation,dem,timezone,mod_date) ;
ANALYZE TABLE cities;

Let's create a table for the test data:

CREATE TABLE IF NOT EXISTS testdata (
  `firstname`        VARCHAR(200) NOT NULL,
  `lastname`         VARCHAR(200) NOT NULL,
  `email`            VARCHAR(200) NOT NULL,
  `birthday`         DATE NOT NULL,
  `city`             VARCHAR(200) NOT NULL,
  `address`          VARCHAR(200) NOT NULL,
  `country`          VARCHAR(200) NOT NULL
);

And insert a row:

SELECT firstname INTO @firstname FROM persons WHERE RAND()<(SELECT ((1/COUNT(*))*10) FROM persons) ORDER BY RAND() LIMIT 1;
SELECT lastname INTO @lastname FROM persons WHERE RAND()select * from testdata;
SELECT DISTINCT co.iso_3166_2 INTO @countrycode FROM countries co INNER JOIN cities ci ON (co.iso_3166_2 = ci.country_code) ORDER BY RAND() LIMIT 1;
SELECT `name` INTO @countryname FROM countries WHERE iso_3166_2 = @countrycode;
SELECT `ascii_name` INTO @city FROM (SELECT `ascii_name` FROM cities WHERE country_code = @countrycode ORDER BY RAND()) res LIMIT 1;
INSERT INTO testdata(firstname,lastname,email,birthday,city,address,country)
VALUES(@firstname
      ,@lastname
      ,CONCAT(@firstname,str_random('[.|-|_]'),@lastname,str_random('d(4)@[google|yahoo|live|mail]".com"'))
      ,str_random_date('1920-01-01','1999-12-31','%Y-%m-%d')
      ,@city
      ,str_random('Cc{5}[street|lane|road|park] d{1}d(2)')
      ,@countryname
      );

After running this code a couple of times the table is filled like this:

MYSQL>select * from testdata;
+------------+-----------+------------------------------+------------+---------------------+-----------------+-----------------------------+
| firstname  | lastname  | email                        | birthday   | city                | address         | country                     |
+------------+-----------+------------------------------+------------+---------------------+-----------------+-----------------------------+
| Phyllis    | Lin       | Phyllis_Lin@mail.com         | 1940-09-04 | Ad Dakhla           | Qittopstreet 1  | Western Sahara              |
| Carmen     | Robinson  | Carmen-Robinson879@live.com  | 1985-04-14 | Meedhoo             | Vtgybmstreet 41 | Maldives                    |
| Danielle   | Sutton    | Danielle_Sutton9@live.com    | 1989-03-15 | Schellenberg        | Dbvyisroad 0    | Liechtenstein               |
| Jeremy     | Sellers   | Jeremy-Sellers58@live.com    | 1951-07-08 | Lucea               | Geaoswstreet 3  | Jamaica                     |
| Lisa       | MacDonald | Lisa.MacDonald@google.com    | 1929-09-05 | Marigot             | Fqpaippark 10   | Saint Martin                |
| Courtney   | Newton    | Courtney.Newton5@yahoo.com   | 1943-01-17 | La Condamine        | Irmkoplane 6    | Monaco                      |
| Rebecca    | Mathews   | Rebecca_Mathews037@live.com  | 1994-09-01 | Las Acequias        | Ucdjmhpark 0    | Argentina                   |
| Marshall   | Kearney   | Marshall_Kearney3@mail.com   | 1921-01-29 | Kirakira            | Qbmfrsroad 6    | Solomon Islands             |
| Christian  | Pollock   | Christian-Pollock65@live.com | 1953-03-06 | Laayoune / El Aaiun | Hctktmstreet 66 | Western Sahara              |
| Jamie      | Rosenberg | Jamie-Rosenberg8@google.com  | 1952-03-06 | Hadrut              | Nidsbfpark 03   | Azerbaijan                  |
| Jose       | Morton    | Jose-Morton34@yahoo.com      | 1987-06-21 | Ilorin              | Vrvcawstreet 23 | Nigeria                     |
| Marguerite | Melton    | Marguerite.Melton@mail.com   | 1948-04-17 | Port-aux-Francais   | Iqfdbylane 48   | French Southern Territories |
| Debbie     | Langston  | Debbie.Langston414@mail.com  | 1974-02-12 | Ugurchin            | Zyroszpark 9    | Bulgaria                    |
+------------+-----------+------------------------------+------------+---------------------+-----------------+-----------------------------+
13 rows in set (0.00 sec)

The birthday is generated with a small function that generates a random date between a date range. You can format the output with the third parameter.
E.g.:

MYSQL>SELECT str_random_date('1920-03-12','2012-01-31','%Y-%m-%d') AS RandomMysqlDate
     , str_random_date('1920-03-12','2012-01-31','%Y-%m-%d:%T') AS RandomDateWithTime
     , str_random_date('2012-01-01','2012-12-31','%u') AS RandomWeeknumberIn2012
     , str_random_date('2012-02-01','2012-03-01','%e') AS RandomDayInFebruary
\G
*************************** 1. row ***************************
       RandomMysqlDate: 1992-09-17
    RandomDateWithTime: 1943-03-10:14:28:02
RandomWeeknumberIn2012: 46
   RandomDayInFebruary: 20
1 row in set (0.00 sec)

You can download the persons and countries tables in this zip file: countries_persons_tables_2012-11-28.zip
You can download the str_random_date function in this zip file: str_random_date_2012-07-29.zip
You can download the functions for generating random data and see examples of generating useful test data here: http://moinne.com/blog/ronald/mysql/howto-generate-meaningful-test-data-using-a-mysql-function

Have fun!

Related posts

Tags: , , ,

One Response to “Create random test data using real data from your MySql database.”

  1. melinda says:

    Helped great deal with de-sentitizing our data for our health care organization!

Leave a Reply