CDC and DataStage. My Experience....

1 comment:

  1. IBM Change Data Capture previously known as DataMirror is a Data replication tool and the latest version comes with ETL capabilities as well. Main use of this tool is to move data from one database to another (within a database or between two databases).


    I would put my thoughts together here on CDC implemented on Windows environment on Oracle database. A mapping or unit created in CDC is usually known as "Subscription". this is a link between source and the destination. In windows environment when a subscription is setup; it creates triggers on database tables one for each type Insert, Update and Delete. Whenever there is a change in the data, change is pushed to a journal table which is a central table to contain the changes. There are number of columns in journal table to hold the pushed data. However there are two special columns which I would like to mention here those contains sequence values. One sequence which uniquely identify the record and also used to maintain the order of the change in which they have been pushed. At this point of time another sequence column hold value as 0. There is scheduled service which periodically checks the table for the newly arrived changes and fetch and executes them on the destination. As soon as the changes are applied they get marked with a new sequence number replacing the 0. This indicates that the change has been applied on the destination database.

    Creating and maintaining subscription is easy enough and can be done without any previous knowledge of CDC. To setup a subscription you need to know which tables needs to be mapped connect the source database and select the schema followed by tables and opt option if you would like to map the pre-existing table in destination of would like create new table from scratch. If you choose later CDC will create tables for you. However my experience is that we should always design the tables first in the destination database and then map. Once the mapping is finished there is an option to mark the tables to get refreshed and then mark them to go on continuous mirror mode which means that the subscription will continuously be replication information from source to destination. There is other mode as well where you can just refresh the table instead of setting it for continuous mirror mode.

    There are few things I would like to mention as at the beginning I had lots of trouble because of abrupt end of subscriptions:

    1. If a running subscription is broken and needs a refresh then once all the tables belong to this subscription flagged as refresh; they all will be truncated.
    2. Due to triggers before truncating the table CDC will forcefully disable the triggers.
    3. Sometime after the refresh is over and subscription is in continuous mirroring mode it does not enable all the constraints and subscription abruptly ends as soon as change applied to the destination.

    Few more words of advice
    1. make sure you know the order of execution of other trigger on the table on which CDC has been applied.
    2. Regularly checks the size of the journal table which can grow big.
    3. Configure notification functionality in case subscription goes down.
    More about CDC and its connectivity in DataStage 8.5 in next part of the blog….

    ReplyDelete