Loading multiple files with ODI

If your source is file and you need to load multiple files into your database you need to do more than something. If you want to load your files using ODI interfaces you need to make some customization within ODI. There are some utilities or commands to load files into databases such as Sybase uses BCP, Oracle uses SQL Loader, MS SQL Server uses Bulk Insert. Some of them support multiple files, some of them support transformations. But there is a bit problem if name of the file is generated by source system, that means you don’t know the names of the files.

I want to explain a methodology loading multiple files into Oracle using external tables, which I think the best and easiest way.

First the environmental stuff and some prerequisites;

1. A Knowledge Module to load files using file name as a ODI Variable.
2. Some operating system folders for loading, rejected and log files.
3. A table named ETL_FILE_LOG to hold the file name in ODI Variable and to log in database.
4. 4 ODI Procedure getting file list, work with processed files, work with rejected files and rename files.
5. An ODI Interface to make transformations and loading.
6. 2 ODI Variables to manage file_name and file_id.
7. An ODI Package to manage everything.

Let’s start to do some work.

1. Knowledge Module

This is a simple Knowledge Module named “IKM SQL Control Append (Direct Load,HINT)” which is based on ODI standard Knowledge Module name IKM SQL Control Append. The additions are since we are loading into Oracle database it will be useful to use Oracle HINTS. “INSERT HINT” option and “SELECT HINT” options are added and placed into “Insert new rows” command of Knowledge Module.

This customized Knowledge Module prepares an insert statement all on “Command on Target” tab. There is no working tables and nothing on source. Creates a simple insert statement like this.

 insert /*+ INSERT_HINT */ into
target_table_name
(column_list)
select /*+ SELECT_HINT */column_list
from source_tables
where conditions
<tablejoins>
<table_filters>
group by {if any}
having {if any}

2. Operating System Folders

You need a base folder for every kind of table. If you are loading invoice log files into table, it is better to open a /data/invoice_logs folder as a base folder for invoice log files. Lots of files with same template would be copied here. There may be another file copying mechanism or you can do it within ODI Procedure.

Usable folders are /data/invoice_logs/log and /data/invoice_logs/rejected. If you need to store your file after loading into table it is better to create a folder named /data/invoice_logs/processed folder as well.

3. ETL_FILE_LOG table

ETL_FILE_LOG is a table that hold below information.

COLUMN NAME PK NULL? DATA TYPE DEFAULT COMMENT
FILE_ID 1 N NUMBER (10)   The unique identification number of file.
FILE_NAME   N VARCHAR2 (50 Byte) The name of file to be processed.
FILE_GROUP N VARCHAR2 (20 Byte) Source system name or group.
FILE_SUB_GROUP Y VARCHAR2 (20 Byte) Detail group of file.
FILE_COPIED_FLAG Y NUMBER (1) 0 0:not copied, 1:copied successfully, 2:error in copy.
FILE_COPY_DATE Y DATE Date of file copied.
FILE_READ_FLAG Y NUMBER (1) 0 0:not read, 1:read successfully, 2:error in read.
FILE_READ_DATE Y DATE Date of file read.
FILE_PROCESSED_FLAG Y NUMBER (1) 0 Date of file processed.
FILE_PROCESSED_DATE Y DATE 0:not processed, 1:processed successfully, 2:error in process.

 

FILE_ID is generated from a Oracle sequence and increments each record by 1.

4. ODI Procedures

There are 4 ODI Procedures to use within file loading.

  • Get_File_List : Reads operating system base folder and inserts files into ETL_FILE_LOG table. It’s based on jyhton technology. Code is as follows;

import java.lang as lang

import java.sql as sql

import snpsftp

import java.lang.String

import os

import java.io.File as File

#db connection

driverSrc = ‘oracle.jdbc.driver.OracleDriver’

lang.Class.forName(driverSrc)

#Production Environment

urlSrc = ‘jdbc:oracle:thin:@<host>:<port>:<sid>’

#Development Environment

#urlSrc = ‘jdbc:oracle:thin:@<host>:<port>:<sid>’

userSrc = ‘ODI’

passwdSrc = ‘ODI’

ConSrc = sql.DriverManager.getConnection(urlSrc, userSrc, passwdSrc);

readDBLink = ConSrc.createStatement()

syslist  = os.listdir(‘<%=odiRef.getOption( “SOURCE_DIRECTORY” )%>’)

for sys in syslist:

str = java.lang.String(sys)

if str.length() > 8:

sqlDBLink = “select * from ODI.ETL_FILE_LOG where FILE_NAME = ‘” + sys + “‘”

rqteDBLink = readDBLink.executeQuery(sqlDBLink)

if not rqteDBLink.next():

sqlDBLink = “insert into ODI.ETL_FILE_LOG (FILE_ID, FILE_NAME, FILE_GROUP, FILE_SUB_GROUP, FILE_READ_FLAG, FILE_READ_DATE) values (ODI.SEQ_FILE_ID.NEXTVAL, ‘” + sys + “‘, ‘<source_system_name>’, ‘<file_type>’, ‘1’, SYSDATE)”

rqteDBLink = readDBLink.execute(sqlDBLink)

ConSrc.close()

SOURCE_DIRECTORY is a option for using different folders in same ODI procedure.

  • Rename File : Renames file to use in external table. Since Oracle external table structure is working on same file name, it is much more easier to rename loading file into same name. there are 2 steps in rename file ODI procedure. First, delete previous file (ignore errors), second rename next file with ODI Variable.
    • rm /data/odi_data/file_type/external_table.ext
    • mv /data/odi_data/file_type/#FILE_NAME /data/odi_data/file_type/external_table.ext

File name external_table.ext is the name of the external table file. In each loop I am deleting the loaded file and rename the next file – which I get the name from ODI Variable #FILE_NAME – to external_table.ext.

  • Update Processed File : I need to update ETL_FILE_LOG table that current file is successfully loaded into database. There are 3 steps in this ODI Procedure.
    • Step 1 : Updates ETL_FILE_LOG table for the ODI Variable #FILE_ID. Works in Oracle technology with ODI schema.

Note that table operations use ODI Variable #File_ID and operating system operations use ODI Variable #File_Name.

UPDATE ODI.ETL_FILE_LOG SET FILE_PROCESSED_FLAG = 1, FILE_PROCESSED_DATE = SYSDATE WHERE FILE_ID = #FILE_ID

    • Step 2 : Delete processed file. Deletes the successfully loaded file from operating system. Works in Operating System technology
      • rm /data/odi_data/file_type/external_table.ext
    • Step 3 : Delete processed file. Deletes the successfully loaded file from operating system. Works in Operating System technology. Deletes log file of successfully loaded file.
      • rm /data/odi_data/file_type/file_log.log
  • Update Rejected File : I need to update ETL_FILE_LOG table that current file is not successfully loaded into database. There are 3 steps in this ODI Procedure.
    • Step 1 : Updates ETL_FILE_LOG table for the ODI Variable #FILE_ID. Works in Oracle technology with ODI schema.

UPDATE ODI.ETL_FILE_LOG SET FILE_PROCESSED_FLAG = 2, FILE_PROCESSED_DATE = SYSDATE WHERE FILE_ID = #FILE_ID

    • Step 2 : Move data file (rejected). Moves rejected file to rejected folder in operating system. Works in Operating System technology. When moving renames the file to original name before loading.
      • mv /data/odi_data/file_type/external_table.ext /data/odi_data/file_type/rejected/# FILE_NAME
    • Step 3 : Delete processed file. Deletes the successfully loaded file from operating system. Works in Operating System technology. Deletes log file of successfully loaded file.
      • mv /data/odi_data/file_type/file_log.log /data/odi_data/file_type/rejected/# FILE_NAME.log

5. ODI Interface

ODI Interface is a simple interface that you can do anything within ODI. Souce table is the external table whose file is in Operating System under the folder /data/odi_data/file_type/external_table.ext. You need to prepare external table in Oracle and reverse in ODI Model. It is not different from any table or view in ODI point of view. Target datastore is the table where file would be loaded.

As mentioned above Knowledge Module named “IKM SQL Control Append (Direct Load,HINT)” is selected in flow. It would be faster if you type the options “SELECT HINT” and “INSERT HINT”.

6. ODI Variables

There are 2 variables that are described below.

  • File_ID : Numeric datatype with historize action and default value as “0”. Refreshment SQL statement is as follows.

SELECT NVL(MIN(FILE_ID), 0) FROM ODI.ETL_FILE_LOG

WHERE FILE_READ_FLAG = 1 AND FILE_PROCESSED_FLAG = 0 AND FILE_GROUP = ‘<source_system_name>’ AND FILE_SUB_GROUP = ‘<file_type>’

  • File_Name: Alphanumeric datatype with historize action and default value as “NULL”. Refreshment SQL statement is as follows.

SELECT FILE_NAME FROM ODI.ETL_FILE_LOG WHERE FILE_ID = #FILE_ID

7. ODI Package

This is the pain point of process. First I’ll share the diagram with you, then I’ll explain each step by the red numbers pointing them.


1. Get_File_List : This is an ODI Procedure that previously mentioned above. Reads desired (typed in option) operating system folder and inserts the files that are previously not loaded into ODI.ETL_FILE_LOG table.
2. FILE_ID : Refreshing variable that finds minimum FILE_ID column of ODI.ETL_FILE_LOG table that has not loaded previously.
3. FILE_ID : Evaluates ODI Variable named #FILE_ID if it is greater than “0”. Means if there is a file that would be loaded into database continue to step 4, else ODI Package ends.
4. FILE_NAME : Since ODI Interface is working with Oracle External Table and Oracle External Table’s source is the fixed name (external_table.ext) we need to know name of the file to process it in to further process.
5. RENAME_FILE : Renames operating system file to desired Oracle External Table’s source (external_table.ext). But for Step 7 and Step 9, this ODI Variable will be used to rollback file name.
6. INTERFACE : Loads external table (source) to database (target) with desired transformation and other things. If interface successfully works Step 7 is continues, if there is a problem with the interface (file maybe corrupted or template maybe broken or other problems while loading file into database) Step 9 is continued with NOT OK (ko) flow.
7. UPDATE_PROCESSED : It depends on flow what to do with processed file. In this example file in operating system is deleted and ODI.ETL_FILE_LOG table is updated (FILE_LOAD_FLAG = 1).
8. FILE_ID : Refreshes ODI Variable #FILE_ID again in order to continue with the flow or not. Continues with Step 3 to evaluate whether current #FILE_ID is greater than “0” or not. Flow continues if there is still file to process, ends if there is not any file to process.
9. UPDATE_REJECTED : If file is not loaded into database for any reason (Step 6), ODI.ETL_FILE_LOG table is updated FILE_LOAD_FLAG is set to 2. Current external table file name (external_table.ext) is renamed to original file name (#FILE_NAME) and log file is renamed to #FILE_NAME.log. All renamed files are copied into rejected folder in the base folder (/data/odi_data/rejected).
10. Determine_Error_Desc : Refreshes the variable #Determine_Error_Desc to get the error description of current session and prepare the body of mail to be sent in the next step.
11. OdiSendMail : Sends a mail to give information about why interface couldn’t run. Both ok and ko (if OdiSendMail works successfully or not) is connected to Step 8 in order not to cut the file loading process.

This is one of the approaches to load many files with one Interface, one package, 2 variables and 4 procedures. SQL Loader or other file2table processing can be used as well. Base of the approach is the same. I’ll write about using SQL Loader in a short time.

Hope it works fine with your environment…

If you have any further questions feel free to send me a mail.

gorhan@gmail.com

24 thoughts on “Loading multiple files with ODI

  1. Ellerine saglik, bu detayda bir calismayi hicbir yerde bulamazdim. Bir de gecmis gunleri hatirlamak adina hos oldu 🙂

    • Artık file’ları load ederken SQL Loader yerine External Table kullanmaya başladığımı farketmişsindir. Daha da hızlı load ediyor. Değişmeyen tek şey değişimdir. 🙂

  2. Hi Gurcan,

    I am new to ODI& I have a similar kind of requirement.

    I just wanted to know that while creating the Procedure Get_File_List(Jython Code) did you specified Source & Target Technology or what you have specified in Command on Source & Command on Target.

    Thanks In Advance,
    Nilkamal

    • Get_File_List is a jython code (technology : jython) running on target and reads from Operating System, inserts into target datastore (that you have already connected via jython previously).

      Best,
      Gurcan.

  3. Hi Gurcanorhan,

    Accidentally my Designer navigator went off the ODI. Even when I tried to open by using View–>Designer Navigator also it is showing nothing. I am in hurry, so can you please reply me with the solution as soon as possible.

    Thanks in advance.

    • Which version of ODI are you running?
      But fastest response could be since ODI is a repository based tool, you can uninstall/install your client application with same version of repository and agent.

      • Hi,
        First of all thank you for your reply. I am using ODI 11.1.1.6.0 version. I have uninstalled ODI and reinstalled and even also uninstalled whole middleware also. Please suggest me solution.

        Thanks.

  4. Hello Orhan,

    This is Srinivas . I am a developer working in Oracle ODI .

    I am stuck with a issue .

    I have a flat file ( The file has 5 different file formats as such storing each format in a different table ) .

    as such I created a package with 5 interfaces each interface source is same but the target is different as i had to load the data to a tabel based on format .

    Created a package with all the interfaces.

    I have 3 files as such I had to loop it to process all the three files .

    I could able to process all the files but I could not able to exit from loop .

    Here is what I am doing ..

    1)V_COUNTER ( TYPE SET VARIABLE , VALUE =1 )
    2) V_NAR_FILE ( TYPE REFRESH VARIABLE
    QUERY :

    SELECT FILEPATH
    FROM
    ( SELECT FILEPATH , ROWNUM COUNTER FROM .PRICENAR_FILELOCATION
    ) WHERE COUNTER = #V_COUNTER

    3) 5 INTERFACES

    4) V_COUNTER( TYPE SET VARIABLE , VALUE = INCREMENT 1 )
    5) V_MAX_COUNT ( TYPE EVALUATE VARIABLE 2) ——-3) ——-4)——->5) —–TRUE——> 2)

    • I would suggest a table (ETL_FILE_STATUS) and insert all the file names, groups and file flags (read, inserted, rejected, etc) that would be source of some variables. You can evaluate a variable just like FILE_COUNT which would count uninserted files from ETL_FILE_STATUS table of inserted_flag = 0. When this variable has the value of 0, evaluation will return false.
      Remember that grouping files is sounds like making file loads parallelism into groups. Group1 can have 10 files and Group2 can have have 8 files. Loop will continue 8 times for the Group2 and 10 times for Group1. I generally finalize 3 variables FILE_ID, FILE_NAME and FILE_COUNT for each group. FILE_ID is to update the ETL_FILE_STATUS table, FILE_COUNT is for whether continuing or stopping the loop, FILE_NAME for the file name that would be used in File 2 Table Knowledge Module.

      Hope this method helps you,
      Best,
      Gurcan.

  5. Hi Gurcan,
    I am having problem while executing this proc. Getting java.lang.nullpointer exception.Could you plz help me out

    import java.lang as lang

    import java.sql as sql

    import snpsftp

    import java.lang.String

    import os

    import java.io.File as File

    #db connection

    driverSrc = ‘oracle.jdbc.driver.OracleDriver’

    lang.Class.forName(driverSrc)

    #Production Environment

    #urlSrc = ‘jdbc:oracle:thin:@::’

    #Development Environment

    #urlSrc = ‘jdbc:oracle:thin:@::’

    userSrc = ‘common’

    passwdSrc = ‘common007’

    ConSrc = sql.DriverManager.getConnection(urlSrc, userSrc, passwdSrc);

    readDBLink = ConSrc.createStatement()

    syslist = os.listdir(”)

    for sys in syslist:

    str = java.lang.String(sys)

    if str.length() > 8:

    sqlDBLink = “select * from FILE_LOG where FILE_NAME = ‘” + sys + “‘”

    rqteDBLink = readDBLink.executeQuery(sqlDBLink)

    if not rqteDBLink.next():

    sqlDBLink = “insert into FILE_LOG ( FILE_NAME,SOURCE_PATH) values ( ‘” + sys + “‘, ‘/u01/interface/ERROR_FOLDER/’)”

    rqteDBLink = readDBLink.execute(sqlDBLink)

    ConSrc.close()

    • In your ConSrc, there’s a variable “urlSrc” which you didn’t define, and there’s a driverSrc which you didn’t use.

  6. Hi Gurcanorhan, I am getting below error::
    org.apache.bsf.BSFException: exception from Jython: Traceback (innermost last):
    (no code object) at line 0
    File “”, line 21
    str = java.lang.String(sys)
    ^
    SyntaxError: invalid syntax

    My code is as below::

    import java.lang as lang
    import java.sql as sql
    import snpsftp
    import java.lang.String
    import os
    import java.io.File as File
    #db connection
    driverSrc = ‘oracle.jdbc.driver.OracleDriver’
    lang.Class.forName(driverSrc)
    #Development Environment
    urlSrc = ‘jdbc:oracle:thin:@172.0.0.1:1521:oracle’
    userSrc = ‘common’
    passwdSrc = ‘common’
    ConSrc = sql.DriverManager.getConnection(urlSrc, userSrc, passwdSrc);

    readDBLink = ConSrc.createStatement()

    syslist = os.listdir(‘/interface/CDW/’)
    for sys in syslist:

    str = java.lang.String(sys)

    if str.length() > 8:

    sqlDBLink = select * from FILE_LOG_1 where FILE_NAME = ‘+sys +’

    rqteDBLink = readDBLink.executeQuery(sqlDBLink)

    if not rqteDBLink.next():

    sqlDBLink = insert into FILE_LOG_1 ( FILE_NAME,SOURCE_PATH) values ( ‘+sys+’, ‘/interface/CDW/’)

    rqteDBLink = readDBLink.execute(sqlDBLink)

    ConSrc.close()

  7. Hi Gurcanorhan,
    I am getting following error::
    org.apache.bsf.BSFException: exception from Jython: Traceback (innermost last):
    (no code object) at line 0
    File “”, line 21
    str = java.lang.String(sys)
    ^
    SyntaxError: invalid syntax

    My code is as below::

    import java.lang as lang
    import java.sql as sql
    import snpsftp
    import java.lang.String
    import os
    import java.io.File as File
    #db connection
    driverSrc = ‘oracle.jdbc.driver.OracleDriver’
    lang.Class.forName(driverSrc)
    #Development Environment
    urlSrc = ‘jdbc:oracle:thin:@172.0.0.1:1521:oracle’
    userSrc = ‘common’
    passwdSrc = ‘common’
    ConSrc = sql.DriverManager.getConnection(urlSrc, userSrc, passwdSrc);

    readDBLink = ConSrc.createStatement()

    syslist = os.listdir(‘/interface/CDW/’)
    for sys in syslist:

    str = java.lang.String(sys)

    if str.length() > 8:

    sqlDBLink = select * from FILE_LOG_1 where FILE_NAME = ‘+sys +’

    rqteDBLink = readDBLink.executeQuery(sqlDBLink)

    if not rqteDBLink.next():

    sqlDBLink = insert into FILE_LOG_1 ( FILE_NAME,SOURCE_PATH) values ( ‘+sys+’, ‘/interface/CDW/’)

    rqteDBLink = readDBLink.execute(sqlDBLink)

    ConSrc.close()

  8. Could you please modify the GET_FILE_LIST jython code in case the files are present on application server and not Database server.

    • It actually does not matter where the files are. What I am sharing is the methodology and algorithm, instead of actual job. You can copy & download the code and implement for your needs. What ‘Get File List’ does is simple, reads the contents (file names) of a defined folder and inserts into a table. It doesn’t matter whether files are in Application Server or Database Server. In my example, files were all in the place where agent was installed. You need to modify “syslist = os.listdir(‘‘)” to reach file system you are reading for (please don’t forget that SOURCE_DIRECTORY is an option where files are located).

      Cheers,
      Gurcan.

  9. I’m getting below error when i execute the code .

    org.apache.bsf.BSFException: exception from Jython:
    SyntaxError: (“mismatched input ‘str’ expecting INDENT”, (”, 37, 0, ‘str = java.lang.String(sys)\n’))

  10. Resolved the error . Thanks for the article , it invoked my interest in python.

    cheers !!!

  11. hii gurcan,

    can you please tell how you customized knowledge module??

    • You can use the standard Control Append Knowledge Module in File2Table approach. Since, this method is based on external table, you don’t have to deal with File2Table KMs. I customised KM to add variable hint and omit the gathering statistics to perform faster. No magic touch or no specific commands on KM.

Leave a reply to A123 Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.