Today I would like to share a common issue of removing duplicates in
comma separated ordered list (example by LISTAGG) using SQL regular
expressions
Syntax of REGEXP_REPLACE:
REGEXP_REPLACE(src, regExMatch, regExReplace) :-
This function does following till end of src string
1. Find matching string in src as per given 'regExMatch'
2. Replace matching string in step 1, with replace string, 'regExReplace'
Solution: REGEXP_REPLACE(ordered List, '([^,]*)(,\1)+($|,)' ,'\1\3')
Important point : Source string or list must be ordered for this solution.
Understanding the solution with example:
select
REGEXP_REPLACE(
'English,English,English,Hindi,Hindi,Kannada,Kannada,Kannada,Tamil,Tamil,Telugu,Telugu','([^,]*)(,\1)+($|,)'
,'\1\3')
from dual
1. Understanding regExMatch expression: '([^,]*)(,\1)+($|,)'
There are three groups in above expression
group 1: ([^,]*) : match all or no characters till comma
match Result: English
group 2: (,\1)+ : \1 - stands for first group which is Kannada, so it becomes (,English)+, meaning match one or more occurrences of ',English'.
match Result: ,English,English
group 3: ($|,) : $ stands for 'end of string', | stands or. So it says match either end of string or a comma
match Result: , (this is comma which is following group2 match: ,English,English)
2. Understanding regExReplace expression '\1\3':
\1\3 represent the group number which are used in regExMatch expression.
replace Result: English,
Reference:
SELECT OBJID, MERKMAL,
REGEXP_REPLACE(
LISTAGG(ddv.VALUE, ',') WITHIN GROUP (ORDER BY ddv.VALUE)
,'([^,]*)(,\1)+($|,)'
,'\1\3')
VALUE
FROM TT_V_RSOBJ_DOKUMENTE_DETAIL ddv
WHERE SCHEMA='RSOBJ_DOKUMENTE'
and MERKMAL in ('LAENDERAUSWAHL','SPRACHAUSWAHL','RESTRICTION','PROCESSING')
GROUP BY OBJID, MERKMAL;