CHAPTER 16

Running an initial data load

Overview of initial data load methods

You can use Oracle GoldenGate to:

● Perform a standalone batch load to populate database tables for migration or other purposes.

● Load data into database tables as part of an initial synchronization run in preparation for change synchronization with Oracle GoldenGate.

 

The initial load can be performed from an active source database. Users and applications can access and update data while the load is running. You can perform initial load from a quiesced source database if you delay access to the source tables until the target load is completed.

 

Supported load methods

You can use Oracle GoldenGate to load data in any of the following ways:

● “Loading data with a database utility” on page 217. The utility performs the initial load.

● “Loading data from file to Replicat” on page 218. Extract writes records to an extract file and Replicat applies them to the target tables. This is the slowest initial-load method.

● “Loading data from file to database utility” on page 222. Extract writes records to extract files in external ASCII format. The files are used as data files for input into target tables by a bulk load utility. Replicat creates the run and control files.

● “Loading data with an Oracle GoldenGate direct load” on page 227. Extract communicates with Replicat directly across TCP/IP without using a Collector process or files. Replicat applies the data through the database engine.

● “Loading data with a direct bulk load to SQL*Loader” on page 231. Extract extracts records in external ASCII format and delivers them directly to Replicat, which delivers them to Oracle’s SQL*Loader bulk-load utility. This is the fastest method of loading Oracle data with Oracle GoldenGate.

 

Running an initial data load

Using parallel processing in an initial load

For all initial load methods except those performed with a database utility, you can load large databases more quickly by using parallel Oracle GoldenGate processes.

 

To use parallel processing

·         Follow the directions in this chapter for creating an initial-load Extract and an initialload Replicat for each set of parallel processes that you want to use.

 

·         With the TABLE and MAP parameters, specify a different set of tables for each pair of Extract-Replicat processes, or you can use the SQLPREDICATE option of TABLE to partition the rows of large tables among the different Extract processes.

 

Prerequisites for initial load

 

Disable DDL processing

Before executing an initial load, disable DDL extraction and replication. DDL processing

is controlled by the DDL parameter in the Extract and Replicat parameter files.

 

Prepare the target tables

The following are suggestions that can make the load go faster and help you to avoid errors.

● Data: Make certain that the target tables are empty. Otherwise, there may be duplicate-row errors or conflicts between existing rows and rows that are being loaded.

● Constraints: Disable foreign-key constraints and check constraints. Foreign-key constraints can cause errors, and check constraints can slow down the loading process. Constraints can be reactivated after the load concludes successfully.

● Indexes: Remove indexes from the target tables. Indexes are not necessary for inserts. They will slow down the loading process significantly. For each row that is inserted into a table, the database will update every index on that table. You can add back the indexes after the load is finished.

● Keys: To use the HANDLECOLLISIONS function to reconcile incremental data changes with the load, each target table must have a primary or unique key. If you cannot create a key through your application, use the KEYCOLS option of the TABLE and MAP parameters to specify columns as a substitute key for Oracle GoldenGate’s purposes. A key helps identify which row to process. If you cannot create keys, the source database must be quiesced for the load.

 

Configure the Manager process

On the source and target systems, configure and start a Manager process. One Manager can be used for the initial-load processes and the change-synchronization processes.

 

Create a data-definitions file

A data-definitions file is required if the source and target databases have dissimilar definitions. Oracle GoldenGate uses this file to convert the data to the format required by the target database.

 

Create change-synchronization groups

NOTE If the load is performed from a quiet source database and will not be followed by continuous change synchronization, you can omit these groups.

To prepare for the capture and replication of transactional changes during the initial load, create online Extract and Replicat groups. You will start these groups during the load procedure. See the instructions in this  documentation that are appropriate for the type of replication configuration that you will be using.

Do not start the Extract or Replicat groups until instructed to do so in the initial-load instructions. Change synchronization keeps track of transactional changes while the load is being applied, and then the target tables are reconciled with those changes.

NOTE The first time that Extract starts in a new Oracle GoldenGate configuration, any open transactions will be skipped. Only transactions that begin after Extract starts are captured.

If the source database will remain active during the initial load, include the HANDLECOLLISIONS parameter in the Replicat parameter file; otherwise do not use it.

HANDLECOLLISIONS accounts for collisions that occur during the overlap of time between the initial load and the ongoing change replication. It reconciles insert operations for which the row already exists, and it reconciles update and delete operations for which the row does not exist.

It can be used in these ways:

● globally for all tables in a parameter file

● as an on/off toggle for groups of tables

● within MAP statements to enable or disable the error handling for specific table pairs.

 

Sharing parameters between process groups

Some of the parameters that you use in a change-synchronization parameter file also are required in an initial-load Extract and initial-load Replicat parameter file. You can copy those parameters from one parameter file to another, or you can store them in a central file and use the OBEY parameter in each parameter file to retrieve them. Alternatively, you can create an Oracle GoldenGate macro for the shared parameters and then call the macro from each parameter file with the MACRO parameter.

 

Loading data with a database utility

To use a database copy utility to establish the target data, you start a changesynchronization Extract group to extract ongoing data changes while the database utility makes and applies a static copy of the data. When the copy is finished, you start the change-synchronization Replicat group to re-synchronize rows that were changed while the copy was being applied. From that point forward, both Extract and Replicat continue Running to maintain data synchronization. This method does not involve any special initial-load Extract or Replicat processes.

 

To load data with a database utility

·         On the source and target systems, run GGSCI and start the Manager process.

START MANAGER

·         On the source system, start change extraction.

START EXTRACT

Where: is the name of the Extract group.

·         On the source system, start making the copy.

·         Wait until the copy is finished and record the time of completion.

·         View the Replicat parameter file to make certain that the HANDLECOLLISIONS parameter is listed. If not, add the parameter with the EDIT PARAMS command.

VIEW PARAMS

EDIT PARAMS

Where: is the name of the Replicat group.

·         On the target system, start change replication.

START REPLICAT

Where: is the name of the Replicat group.

·         On the target system, issue the following command to verify the status of change replication.

INFO REPLICAT

·         Continue to issue the INFO REPLICAT command until you have verified that change replication has posted all of the change data that was generated during the initial load. Reference the time of completion that you recorded. For example, if the copy stopped at 12:05, make sure change replication has posted data up to that point.

·         On the target system, issue the following command to turn off the HANDLECOLLISIONS parameter and disable the initial-load error handling.

SEND REPLICAT , NOHANDLECOLLISIONS

·         On the target system, edit the Replicat parameter file to remove the HANDLECOLLISIONS parameter. This prevents HANDLECOLLISIONS from being enabled again the next time Replicat starts.

EDIT PARAMS

·         Save and close the parameter file.

 

From this point forward, Oracle GoldenGate continues to synchronize data changes.

 

Loading data from file to Replicat

To use Replicat to establish the target data, you use an initial-load Extract to extract source records from the source tables and write them to an extract file in canonical format. From the file, an initial-load Replicat loads the data using the database interface. During the load, the change-synchronization groups extract and replicate incremental changes, which are then reconciled with the results of the load. During the load, the records are applied to the target database one record at a time, so this method is considerably slower than any of the other initial load methods. This method permits data transformation to be done on either the source or target system.

 

To load data from file to Replicat

·         On the source and target systems, run GGSCI and start Manager.

START MANAGER

·         On the source system, issue the following command to create an initial-load Extract parameter file.

EDIT PARAMS

·         Enter the parameters listed below in the order shown, starting a new line for each parameter statement.

·          Save and close the parameter file.

 

Initial-load Extract parameters for loading data from file to Replicat

SOURCEISTABLE

Designates Extract as an initial load process extracting records directly from the source tables.

[SOURCEDB ,][USERID [, PASSWORD ]]

SOURCEDB specifies a data source name, if required in the connection information. Not required for Oracle.

USERID specifies database credentials, if required. For Oracle, you can include a host string, for example:

USERID ggs@ora1.ora, PASSWORD ggs123

Specifies database connection information. These parameters also allow for authentication at the operating-system level.

RMTHOST , MGRPORT

Specifies the target system and port where Manager is running.

RMTFILE ,[MAXFILES , MEGABYTES ]

is the fully qualified name of the file.

MAXFILES creates a series of files that are aged as needed. Use if the file could exceed the operating system’s file size limitations. MEGABYTES designates the size of each file.

Specifies the extract file to which the load data will be written. Oracle GoldenGate creates this file during the load. Checkpoints are not maintained with RMTFILE. Use the MAXFILES and MEGABYTES options to control the size.

TABLE .

;

is the schema name.

is the name of the table or a group of tables defined with wildcards. To exclude tables from a wildcard specification, use the TABLEEXCLUDE parameter.

Specifies a source table or tables for initial data extraction.

 

·         On the target system, issue the following command to create an initial-load Replicat parameter file.

EDIT PARAMS

·         Enter the parameters listed below in the order shown, starting a new line for each parameter statement.

 

Initial-load Replicat parameters for loading data from file to Replicat

SPECIALRUN

Implements the initial-load Replicat as a one-time run that does not use checkpoints.

END RUNTIME

Directs the initial-load Replicat to terminate when the load is finished.

TARGETDB ,][USERID [, PASSWORD ]]

TARGETDB specifies a data source name, if requiredin the connection information. Not required for Oracle. USERID specifies database credentials, if required. For Oracle, you can include a host string, for example:

USERID ggs@ora1.ora, PASSWORD ggs123

Specifies database connection information. These parameters also allow for authentication at the operating-system level.

EXTFILE | EXTTRAIL

is the fully qualified name of the file.

Use EXTTRAIL only if you used the MAXFILES option of the RMTFILE parameter in the Extract parameter file.

Specifies the extract file specified with the Extract parameter RMTFILE.

{SOURCEDEFS } | ASSUMETARGETDEFS

Use SOURCEDEFS if the source and target tables have different definitions. Specify the sourcedefinitions file generated by DEFGEN.

 Use ASSUMETARGETDEFS if the source and target tables have the same definitions.

Specifies how to interpret data definitions.

MAP .

,TARGET .
;

  is the schema name.

 

is the name of a table or a wildcard definition for multiple tables. To exclude tables from a wildcard specification, use the MAPEXCLUDE parameter. Specifies a relationship between a source and target table or tables.

·         Save and close the file.

·         On the source system, start change extraction.

START EXTRACT

·         From the directory where Oracle GoldenGate is installed on the source system, start the initial-load Extract.

$ //extract paramfile dirprm/.prm reportfile

Where: is the name of the initial-load Extract that you used when creating the parameter file, and is the fully qualified name of the Extract report file.

·         Verify the progress and results of the initial extraction by viewing the Extract report file using the operating system’s standard method for viewing files.

·         Wait until the initial extraction is finished.

·         On the target system, start the initial-load Replicat.

$ //replicat paramfile dirprm/.prm reportfile

Where: is the name of the initial-load Replicat that you used when creating the parameter file, and is the fully qualified name of the Replicat report file.

·         When the initial-load Replicat is finished running, verify the results by viewing the Replicat report file using the operating system’s standard method for viewing files.

·         On the target system, start change replication.

START REPLICAT

·         On the target system, issue the following command to verify the status of change replication.

INFO REPLICAT

·         Continue to issue the INFO REPLICAT command until you have verified that Replicat posted all of the change data that was generated during the initial load. For example, if the initial-load Extract stopped at 12:05, make sure Replicat posted data up to that point.

·         On the target system, issue the following command to turn off the HANDLECOLLISIONS parameter and disable the initial-load error handling.

SEND REPLICAT , NOHANDLECOLLISIONS

·         On the target system, edit the Replicat parameter file to remove the HANDLECOLLISIONS parameter. This prevents HANDLECOLLISIONS from being enabled again the next time Replicat starts.

EDIT PARAMS

·         Save and close the parameter file.

From this point forward, Oracle GoldenGate continues to synchronize data changes.

 

Loading data from file to database utility

To use a database bulk-load utility, you use an initial-load Extract to extract source records from the source tables and write them to an extract file in external ASCII format. The file can be read by Oracle’s SQL*Loader, Microsoft’s BCP, DTS, or SQL Server Integration Services (SSIS) utility, or IBM’s Load Utility (LOADUTIL). During the load, the changesynchronization groups extract and replicate incremental changes, which are then

reconciled with the results of the load. As part of the load procedure, Oracle GoldenGate uses the initial-load Replicat to create run and control files required by the database utility. Any data transformation must be performed by the initial-load Extract on the source system because the control files are generated dynamically and cannot be pre-configured with transformation rules.

 

To load data from file to database utility

·         Make certain to satisfy “Prerequisites for initial load” on page 215.

·         On the source and target systems, run GGSCI and start Manager.

START MANAGER

·         On the source system, issue the following command to create an initial-load Extract parameter file.

EDIT PARAMS

·         Enter the parameters listed below in the order shown, starting a new line for each parameter statement.

 

Initial-load Extract parameters for loading from file to database utility

SOURCEISTABLE

Designates Extract as an initial load process that extracts records directly from the source tables.

[SOURCEDB ,][USERID [, PASSWORD ]]

SOURCEDB specifies a data source name, if required in the connection information. Not required for Oracle.

USERID specifies database credentials, if required. For Oracle, you can include a host string, for example:

USERID ggs@ora1.ora, PASSWORD ggs123

Specifies database connection information. These parameters also allow for authentication at the operating-system level.

RMTHOST , MGRPORT [, PARAMS – E -d ]

-E converts ASCII to EBCDIC.

-d specifies the source definitions file.

Specifies the target system and port where Manager is running.

The PARAMS clause is necessary when loading with IBM’s Load Utility, because Oracle GoldenGate will need to refer to the source definitions file.

RMTFILE ,[MAXFILES , MEGABYTES ]

is the fully qualified name of the file

MAXFILES creates a series of files that are aged as needed. Use if the file could exceed the operating

system’s file size limitations.

MEGABYTES designates the size of each file. Specifies the extract file to which the load data will

be written. Oracle GoldenGate creates this file during the load. Checkpoints are not maintained with RMTFILE.

FORMATASCII, {BCP | SQLLOADER}

BCP is used for BCP, DTS, or SSIS.

SQLLOADER is used for Oracle SQL*Loader or IBM Load Utility.

Directs output to be formatted as ASCII text rather than the default canonical format.

TABLE .

;

is the schema name.

is the name of the table or a group of tables defined with wildcards. To exclude tables from a wildcard specification, use the TABLEEXCLUDE parameter. Specifies a source table or tables for initial data extraction.

·         Save and close the parameter file.

·         On the target system, issue the following command to create an initial-load Replicat parameter file.

EDIT PARAMS

·         Enter the parameters listed below in the order shown, starting a new line for each parameter statement.

 

Initial-load Replicat parameters for loading from file to database utility

GENLOADFILES