The idea of counting the number of matches in a field that a regexp has, is a very intesting method to sort sets, for example, a web search result set, when you need the most relevant sources first (for example, containing the searched word more times).
I researched a bit and unfortunatelly, MySQL is far behind PostgreSQL, despite the fact that Oracle bought it (yes, at first I though it could improve a lot with Oracle's experience, but reality is that Oracle is not going to put another free competitor against its own), so I will write this howto for PostgreSQL with an example.
PostgreSQL has a very clever function regexp_matches that will show all substrings which matches a pattern and we will use that to count matches and sort result.
For example, suppose we have this table in database:
DATABASE TABLE AND DATA | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
If we try to search by keyword key, we could want to appear in relevance order:
Expected result | Actual result | |
---|---|---|
id | 2 | 1 |
id | 3 | 2 |
id | 1 | 3 |
To achieve this sorting, we need to use regexp_matches correctly along with our result set, like this way:
That will result in a set like Expected result in previous table.
But, that was only an example!! How about a real usage?
With a little modification of previous sentence, it could be applied to sort a real search query, just substitute bold text with your pattern and your subquery/data set.
The only requeriment to do that is: Your subquery must have id and htmltext between its results! (and of course, replace 'key' with
So assuming we have a function called our_expensive_lookup (with the lookup parameter) that do a very expensive lookup by some keyword, we could apply this sorting to our returned results by:
SQL Example Sorting Sentence SELECT id, COUNT(*) AS patternmatch
FROM (
SELECT id,regexp_matches(htmltext, 'key', 'g')
FROM CONTENT
)
AS alias
GROUP BY id ORDER BY patternmatch DESC;
That will result in a set like Expected result in previous table.
But, that was only an example!! How about a real usage?
With a little modification of previous sentence, it could be applied to sort a real search query, just substitute bold text with your pattern and your subquery/data set.
The only requeriment to do that is: Your subquery must have id and htmltext between its results! (and of course, replace 'key' with
So assuming we have a function called our_expensive_lookup (with the lookup parameter) that do a very expensive lookup by some keyword, we could apply this sorting to our returned results by:
Of course, it can be combined with prepared statements, stored procedures, and so on! The hardest part is done, so just use your imagination!
SQL Real Usage Sorting Sentence SELECT id, COUNT(*) AS patternmatch
FROM (
SELECT id,regexp_matches(htmltext, 'key', 'g')
FROM (
SELECT id, htmltext
FROM our_expensive_lookup('key')
) AS lookup
) AS alias
GROUP BY id ORDER BY patternmatch DESC;
No comments:
Post a Comment