Mysql: creating a link to your glossary while fetching text for a webpage

This MySql function will generate a link to your glossary when you are fetching text for a web page.
All code is provided in a zip file.

The requirements:
I needed to add a new functionality for an existing application. I needed to add a link to the glossary page whenever a text contains a word that is in the glossary. The requirements where to avoid a lot of database calls, it needs to be fast and it needed to be implemented in existing application code.

There are three other requirements that are related to adding a link to the glossary:
1: Only link the first occurrence of a word. If a word occurs multiple times in a text, you only add a link to the first occurrence of the word.
2: Keywords are case sensitive. E.g.: the word “August” does not always means a month; “an august person”. Or an apple is different than the company with the same name but always written with an upper case A: “Apple”
3: A keyword can actually be multiple words like: “primary key” or “board of directors”.

Install the code:
First create a glossary table (The script is in the zip file):

mysql>CREATE TABLE glossary(
    glossary_id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT
   ,`name` VARCHAR(100) BINARY NOT NULL
   ,description VARCHAR(4000) NOT NULL
   ,PRIMARY KEY (`glossary_id`)
   ,UNIQUE glossary_name_udx (`name`)
   ) ENGINE = 'InnoDB'
   DEFAULT CHARACTER SET = utf8
   COLLATE = utf8_bin
   ;

A glossary table usually does not has a lot of rows, I have to see the first glossary table with more then 1000 rows. A smallint is enough.
The name must be unique.

Set-up test data:
Populate the glossary table with a 1000+ rows so we can test the performance on a full table. The first 1000 are just random data and the last 5 rows holds real definitions used for our test. Find the script to populate the table in the zip file and execute it:

mysql>source populate_glossary.sql;

Analyze the table:

mysql>ANALYZE TABLE glossary;
+------------------+---------+----------+----------+
| Table            | Op      | Msg_type | Msg_text |
+------------------+---------+----------+----------+
| schemas.glossary | analyze | status   | OK       |
+------------------+---------+----------+----------+
1 row in set (0.00 sec)

Add the function:

mysql>source fn_glossary_link_words.sql;

The function we just created has two parameters:
p_text String: The text that contains the keywords that must be replaced
p_format String: A string that replaces the keyword. This string can contain three place-holders: {id} , {name} , {description}
An example of such a format is:’<a href="/glossary.php?id={id}">{name}</a>’

Test the function:
As an example of how to execute the function let’s first set up a test table that contains the text that we want to show on a webpage:

mysql>CREATE table t1(id INT,c1 VARCHAR(400));
mysql>INSERT INTO t1 VALUES (1,'This text has twice the string: "table" and twice the string "column". Only the first occurrence of the string is marked and the following strings not: table column. This text also has a glossary word that consist of two words like "primary key". Also note that the lower-case string "mysql" is not marked because in the glossary the string is using upper and lower-case like this: MySQL.');

Now select the text from column “c1″ and add the links to the glossary

mysql>SELECT str_add_glossary_words(c1,'<a href="/glossary/{id}">{name}</a>') AS textWithGlossaryLinks
        FROM t1
       WHERE id = 1;

This is the result:
This text has twice the string: "<a href="/glossary/1002">table</a>" and twice the string "<a href="/glossary/1005">column</a>". Only the first occurrence of the string is marked and the following strings not: table column. This text also has a glossary word that consist of two words like "<a href="/glossary/1001">primary key</a>". Also note that the lower-case string "mysql" is not marked because in the glossary the string is using upper and lower-case like this: <a href="/glossary/1004">MySQL</a>.

This is how the result looks on a webpage ( the links do not work, because the glossary page does not exist):
This text has twice the string: “table” and twice the string “column“. Only the first occurrence of the string is marked and the following strings not: table column. This text also has a glossary word that consist of two words like “primary key“. Also note that the lower-case string “mysql” is not marked because in the glossary the string is using upper and lower-case like this: MySQL.

Compare it with the original text:
This text has twice the string: “table” and twice the string “column”. Only the first occurrence of the string is marked and the following strings not: table column. This text also has a glossary word that consist of two words like “primary key”. Also note that the lower-case string “mysql” is not marked because in the glossary the string is using upper and lower-case like this: MySQL.

Conclusion:
It is easy to apply this function to an existing application, you only have to add the function to some of your database calls and leave the rest of the application unchanged. There are no extra database calls and replacing the text is as fast in your application code as in the database.
You can download all code in this zip file: glossary_2012-07-07.zip

Related posts

Tags: , , ,

2 Responses to “Mysql: creating a link to your glossary while fetching text for a webpage”

  1. Hi Ilya

    Thanks for your comment and thanks for noticing the full table scan in the cursor.
    In the original application I do not have the full table scan because I retrieve only the name in my cursor and the optimizer can use the unique index on the name for this. I added the id and description in this function for demonstration purposes only and thereby forcing the database in a full table scan. Adding a new index would solve the problem.

    The function can to be altered the specific needs of your application. If you don not need an id or description to show the glossary item, you can remove them from the cursor and remove the sub string replace for them too.

    /Ronald

  2. Ilya says:

    The performance is awful. Full table scan X 3 times substring search. Hope you have some extra CPUs and memory to fit glossary table in memory.

Leave a Reply