Naturally, using regular expressions for SELECT queries to check if certain text strings or text patterns are residing somewhere in large chunks of data is the most resource-intensive option and thus your last option. However it’s somehow unavoidable to practice regular expressions in the SQL queries for complicated patterns. For example, word boundaries are a common reason why you want regular expressions in SQL:
SELECT * FROM articles WHERE content REGEXP '[[:<:]]MySQL efficiency[[:>:]]'
This query searches for any entry in table articles that has a ‘MySQL efficiency’ phrase in the content field. In a table as large as 100,000 rows, it’d definitely take more than 0.5 seconds which is a rather outrageous length and would very probably take down the server if large volumes of searches flood in.
So, let’s consider the LIKE clause:
SELECT * FROM articles WHERE content LIKE '%MySQL efficiency%'
Much better, the query time is instantly reduced to 0.01 seconds or so. The problem is % is simply not the right symbol for a word boundary because it matches:
- theMySQL efficiency-
In this case of ‘MySQL efficiency’, there won’t be much of a problem because the chance of ‘theMySQL efficiency’ to appear is rather slim. However, consider the case of ‘War’, you can’t use LIKE clause for it because it’d also match ‘Edward’ and so forth. You get the idea, using LIKE alone in this manner is incorrect.
The solution
Eventually, we’ve got something good to talk about. This solution can address both of the problems, that is, to combine LIKE and REGEXP / RLIKE together in the query.
SELECT * FROM articles WHERE content LIKE '%MySQL efficiency%' AND content REGEXP '[[:<:]]MySQL efficiency[[:>:]]'
This way, not only the query time is reduced to approximately 0.025 seconds or less because of the LIKE clause, but the phrase can also be actually matched to the real thing thanks to the REGEXP clause.
Probably in that MySQL will first process the LIKE clause and then pass the filtered results to be further processed by the REGEXP clause. Though the REGEXP clause is much more resource-intensive, as there’s a lot less to be processed after being helped by the LIKE clause, the query time is considerably reduced.
Thats’s a great post – I have been running select statement for years with %LIKE and never thought of incorporating it with regex.
This might really improve hand built search engines.
It would be great to know, how do you check your query time and loads for these mysql statements ?
@Tzvi
phpMyAdmin gives query time every time you execute something. That’s what I use to test different combinations of SQL codes. 🙂
Hi,
How can I make it search for example like this:
If the text that should be searched within is “nokia phone in good condition” and if someone searches for “good condition” or “nokia phone”, it will show the text as matching result, but if someone searches for “nokia good” it won’t show it as a matching result. This is because it searches for connected words, like “nokia phone”, “in good”, “good condition” etc. So, how can I break the “MySQL efficiency” string to search for words “nokia” and “condition” within the text “nokia phone in good contition” and if there are those two words in the text, no matter if they are connected one after another, or one is at the begining of the text, and another is in the middle of the text, the text “nokia phone in good condition” is shown as a matching result, because it contains those two words, “nokia” and “condition”?
Thanks in advance.
P.S. The quotes are put just to distinguish the words from the describing text, because I can not use BOLD here, so all the searches I mentioned above are without quotes.
Tnx, really reeeeeeeeeealllly great solution