Thursday, December 8, 2016

SQL - remove duplicates in LISTAGG result

Today I would like to share a common issue of removing duplicates in comma separated ordered list (example by LISTAGG) using SQL regular expressions
Syntax of REGEXP_REPLACE:
REGEXP_REPLACE(src, regExMatch, regExReplace) :-
This function does following till end of src string
1. Find matching string in src as per given 'regExMatch'
2. Replace matching string in step 1,  with replace string, 'regExReplace'

Solution: REGEXP_REPLACE(ordered List, '([^,]*)(,\1)+($|,)'   ,'\1\3')
Important point Idea: Source string or list must be ordered for this solution.
Understanding the solution with example:
select
REGEXP_REPLACE( 'English,English,English,Hindi,Hindi,Kannada,Kannada,Kannada,Tamil,Tamil,Telugu,Telugu','([^,]*)(,\1)+($|,)'   ,'\1\3')
from dual
1. Understanding regExMatch expression: '([^,]*)(,\1)+($|,)'
There are three groups in above expression
group 1: ([^,]*) : match all or no characters till comma   
match Result: English
group 2: (,\1)+ : \1 - stands for first group which is Kannada, so it becomes (,English)+, meaning match one or more occurrences of  ',English'.
match Result: ,English,English
group 3: ($|,) : $ stands for 'end of string', | stands or. So it says match either end of string or a comma
match Result: ,    (this is comma which is following group2 match: ,English,English)
2. Understanding regExReplace expression '\1\3':
\1\3 represent the group number which are used in regExMatch expression.
replace Result: English,
Reference:
SELECT OBJID, MERKMAL,
           REGEXP_REPLACE(
                  LISTAGG(ddv.VALUE, ',') WITHIN GROUP (ORDER BY ddv.VALUE)
                  ,'([^,]*)(,\1)+($|,)'
                  ,'\1\3')

                   VALUE
FROM TT_V_RSOBJ_DOKUMENTE_DETAIL ddv
WHERE SCHEMA='RSOBJ_DOKUMENTE'
and MERKMAL in ('LAENDERAUSWAHL','SPRACHAUSWAHL','RESTRICTION','PROCESSING')
GROUP BY OBJID, MERKMAL;

3 comments:

  1. Hi - quick question. can you help with a use case where the original values are scattered and I still need to calculate unique values' list. e.g.
    SOURCE: ENGLISH,TAMIL,ENGLISH,ENGLISH,TAMIL,KANNADA,KANNADA,...
    TARGET: ENGLISH,TAMIL,KANNADA

    ReplyDelete
  2. BTW, the original post you had helped me get out of a bind!

    ReplyDelete
  3. THANK U SO MUCH! GREAT SOLUTION !

    ReplyDelete