ODI Handling DQ

If you have some SQL scripts to send output to some business users, especially for data quality issues or other information, how would you handle this. In case you do not have a data quality tool for that.

Step 1 : Prepare your data quality scripts.
Step 2 : Put those scripts into files.
Step 3 : Prepare PL/SQL or ODI code to read contents of those files. Send this script to your database, insert output to an ERROR table.
Step 4 : Read ERROR table by ERROR_CODE and loop it from beginning to end.
Step 5 : Send e-mail for each script, attach the appropriate file and mention how many rows are generated in this ERROR_CODE.
Step 6 : Continue with Step 4.

Error table could be something like this;

CREATE TABLE MY_ERROR_TABLE
(
DATETIME DATE,
TRX_ID INTEGER,
ERROR_CODE NUMBER(2),
ERROR_DESC VARCHAR2(150 BYTE)
)
TABLESPACE MY_TBS
LOGGING
NOCOMPRESS
NOCACHE
NOPARALLEL
MONITORING;

I prefer ODI methods instead of PL/SQL to insert into this table. Since script could be bigger than VARCHAR2(4000), I use CLOB datatype;

CREATE TABLE MY_ERROR_SCRIPTS
(
ERROR_CODE NUMBER(2),
ERROR_DESC VARCHAR2(150 BYTE),
ERROR_SCRIPT_DWH CLOB,
ERROR_SCRIPT_OLTP CLOB
)
LOB (ERROR_SCRIPT_DWH) STORE AS (
TABLESPACE MY_TBS
ENABLE STORAGE IN ROW
CHUNK 32768
RETENTION
NOCACHE
NOLOGGING
STORAGE (
INITIAL 160K
NEXT 1M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
FLASH_CACHE DEFAULT
CELL_FLASH_CACHE DEFAULT
))
LOB (ERROR_SCRIPT_OLTP) STORE AS (
TABLESPACE MY_TBS
ENABLE STORAGE IN ROW
CHUNK 32768
RETENTION
NOCACHE
NOLOGGING
STORAGE (
INITIAL 160K
NEXT 1M
MINEXTENTS 1
MAXEXTENTS UNLIMITED
PCTINCREASE 0
BUFFER_POOL DEFAULT
FLASH_CACHE DEFAULT
CELL_FLASH_CACHE DEFAULT
))
TABLESPACE MY_TBS
RESULT_CACHE (MODE DEFAULT)
LOGGING
NOCOMPRESS
NOCACHE
NOPARALLEL
MONITORING;

Here is how your package look like;

ODI Handling DQ

PRC_MY_ERROR is a ODI procedure which inserts DQ issues to ERROR table.

  • Delete todays records;
    • delete from <%=odiRef.getSchemaName(“DB.MY_SCHEMA”, “D”)%>.MY_ERROR_TABLE where DATETIME = TO_DATE(‘SYSDATE’, ‘YYYYMMDD’)
  • Execute all Error Scripts; (since I am changing my own codes, below codes must be rewritten to your environment)
    • DECLARE
      TYPE TransactionRec IS RECORD (transaction_id integer);
      TYPE TransactionSet IS TABLE OF TransactionRec;
      ContractSet TransactionSet;CURSOR C1 IS
      select ERROR_CODE, ERROR_DESC, ERROR_SCRIPT_DWH from <%=odiRef.getSchemaName(“DB.MY_SCHEMA”, “D”)%>.ERROR_SCRIPTS;BEGINFOR C1_REC IN C1 LOOPEXECUTE IMMEDIATE to_char(C1_REC.ERROR_SCRIPT_DWH) BULK COLLECT INTO ContractSet;IF ContractSet.FIRST IS NOT NULL THEN
      FOR i IN ContractSet.FIRST..ContractSet.LAST
      LOOP
      INSERT INTO <%=odiRef.getSchemaName(“DB.MY_SCHEMA”, “D”)%>.MY_ERROR_TABLE (DATETIME, TRANSACTION_ID, ERROR_CODE, ERROR_DESC) VALUES
      (TO_DATE(‘#SYSDATE’, ‘YYYYMMDD’), ContractSet(i). transaction_id, C1_REC.ERROR_CODE, C1_REC.ERROR_DESC);
      COMMIT;
      END LOOP;
      END IF;
      END LOOP;
      END;

You have successfully inserted your data quality filtered records to MY_ERROR_TABLE. Now it is time to send them to related people in your organization.

  • V_MY_ERROR_CODE : You assign first value (probably 1) to this variable.
  • V_MY_ERROR_CODE : Evaluate this variable whether it reached last error.
  • V_MY_ERROR_COUNT : Refresh this variable if there is any error counted under variable #V_MY_ERROR_CODE.
    • SELECT COUNT(1)
      FROM <%=odiRef.getSchemaName(“DB.MY_SCHEMA”, “D”)%>.MY_ERROR_TABLE
      WHERE DATETIME = TO_DATE(SYSDATE, ‘YYYYMMDD’) AND
      ERROR_CODE = #V_MY_ERROR_CODE
  • V_MY_ERROR_COUNT : Evaluate if the count variable is bigger than 0 (whether to send a mail)
  • V_MY_ERROR_DESC : If previous step is true, prepare mail body.
  • OdiSendMail : Send the mail.
    • Mail Server : #V_MAIL_SERVER_IP
    • From : This is static, user that you are sending mail.
    • To : #V_ERROR_MAIL_TO (need to be refreshed in the beginning of your ETL or current package)
    • CC : #V_ERROR_MAIL_CC (need to be refreshed in the beginning of your ETL or current package)
    • BCC : #V_ERROR_MAIL_BCC (need to be refreshed in the beginning of your ETL or current package)
    • Subject : There are #V_MY_ERROR_COUNT errors exist in your system (Error Code = #V_MY_ERROR_CODE)
    • Attachment : /data/my_errors/MY_Error_#V_MY_ERROR_CODE.txt (will represent as /data/my_errors/MY_Error_1.txt initially, then will attach regarding file in the loop, every step will attach its own script file)
    • Message Body : #V_MY_ERROR_DESC

That’s it. You have sent how many errors, which script are executed to receive this amount of errors to your operation team, analyst team and business users.

Have fun!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s