ODC Appreciation Day: Using PIVOT for Unknown Dimensions

This post goes to #ThanksODC (formerly known as #ThanksOTN) which is initiated by Tim Hall (aka OracleBase).
But first, story time of this. I prepared this bunch of code (continue reading below please), because of being stubborn of being performance enthusiast. While I was working as a consultant, I was first invited to increase performance of an ETL which is already running on production and was running at least 12 hours per run (monthly). We did our best with the features of the ETL tool, paralleled queries, and managed to reduce it to 8-9 hours. But, “it-was-not-enough”!

After we have been assigned with the project, we begun to analyse the complete architecture and started to rewrite everything, with of course new requests and enhancements. I started to model the complete architecture and at the end, came to the problem subject area.

To cut the long story short, there is a dimension table, KPIs, which has 63 rows but can be changed, added or deleted as needed (and this looks so normal), other dimension is “products” with 50 million records, but each product wanted to be queried as KPI values (63) in the columns, not rows. Some of the KPI values are calculated with the formulation of other KPI values (i.e. KPI C = KPI A – KPI B – of course not that simple). When the fact records (metrics) are inserted into a single table (thin but long), record count is changed from 1.2 billion to 1.5 billion depending on the transaction day (not all the products had the KPI values, but 1.2B is a good number to be processed). So, I decided to experiment pivot function to transpose 1.2 billion records to 50 million of rows with 64 columns (first column is the product_id, column names will be as KPI1_Name, KPI2_Name and so on). And everybody insisted on inserting 50 million of rows then update the rest of 63 columns. #ChallengeModeOn

But, this was not the first issue, I faced with technical problems using PIVOT function – described below -, because dimensions should be fix (both in value and count). So, it will be a problem when delivering the project, because when a new KPI will be calculated, a new ETL should be developed to update the 64th KPI. Likewise, when a name of the KPI needs to be changed, logical & physical model should be updated, tables should be altered, ETL should be revised, etc. Means, lots of work that can’t be explained to business users.

Besides all of above, I had only one single day to develop the pivot (that was my challenge) and need to face many technical issues.

Technically;
I first concatenated the KPI values with LISTAGG function, but I couldn’t concatenate output to end of the SQL statement. I needed to prepare the SQL statement as a dynamic query. So I used a cursor to build the KPI list and add it to the dynamic query. Here’s the code, then the conclusion.

CREATE OR REPLACE PROCEDURE DM.KPI_PIVOT AS
CURR_KPI VARCHAR2(25);
MY_LIST VARCHAR2(3000);
FINAL_LIST VARCHAR2(3000);
IS_ALIVE NUMBER(1);
ESCUEL VARCHAR2(4000);
CURSOR MY_KPIS IS
SELECT PK.KPI_CODE FROM DWH.POL_KPI PK ORDER BY PK.KPI_ID;

BEGIN

MY_LIST := ”;
CURR_KPI := ”;
OPEN MY_KPIs;
LOOP
FETCH MY_KPIs INTO CURR_KPI;
EXIT WHEN MY_KPIs%NOTFOUND;
MY_LIST := MY_LIST || ”” || CURR_KPI || ”’ as ‘ || CURR_KPI || ‘,’;
END LOOP;
CLOSE MY_KPIs;
FINAL_LIST := substr(MY_LIST, 0, length(MY_LIST)-1);

— This is to see what the final list looks like
DBMS_OUTPUT.PUT_LINE (‘Final: ‘ || FINAL_LIST);

SELECT COUNT(*) INTO IS_ALIVE FROM ALL_TABLES WHERE OWNER = ‘HIST’ AND TABLE_NAME = ‘STG_DM_PRODUCT_KPI_01’;
IF IS_ALIVE = 1 THEN
execute immediate(‘DROP TABLE HIST.STG_DM_PRODUCT_KPI_01 PURGE’);
dbms_output.put_line(‘Staging table HIST.STG_DM_PRODUCT_KPI_01 is dropped!’);
END IF;

ESCUEL := ‘
CREATE TABLE HIST.STG_DM_PRODUCT_KPI_01 AS
SELECT * FROM
(SELECT
P.PRODUCT_ID,
KPI_MASTER.KPI_CODE,
KPI.KPI_VALUE
FROM DWH.P_KPI_PRODUCT KPI,
DWH.P_KPI KPI_MASTER,
DWH.P_PRODUCT P
where KPI_MASTER.KPI_ID = KPI.KPI_ID
and KPI.POLICE_ID = P.POLICE_ID
)
PIVOT
(SUM(KPI_DEGER)
FOR (KPI_KOD) IN (‘ || FINAL_LIST || ‘))
ORDER BY PRODUCT_ID’;

dbms_output.put_line (ESCUEL);
EXECUTE IMMEDIATE ESCUEL;

END;

Results :
- It took approximately 1 hour to build up the “long and thin” staging table (1.2 billion of rows) and 42 minutes to perform pivoting, developed and tested in less than a day. Eventually, happy project manager, happy developer, happy database administrator, happy architect and besides happy customer and happy me.! 😊

Conclusion :
- Be stubborn.
- Optimisation never ends, but there’s a peak point, need to stop there.
- Beware of update.! ➙ Like I always say in my presentations.

Cheers,
Gurcan

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Gurcan Orhan's ODI and DWH Blog

Some words about Oracle Data Integrator and Data Warehousing.

ODC Appreciation Day: Using PIVOT for Unknown Dimensions

Leave a comment Cancel reply

Share this blog :

Related

Leave a comment Cancel reply