Thursday, August 31, 2017

Turboboost your Database Import!


Related image

Unfortunately there is no Turbo Boost button on the oracle import data pump utility (impdp) and for that matter the same would apply to the rest of the oracle suite of products and most technologies that we use today. This is why we have Oracle DBAs, Architects, Engineers and amazing problem solvers that can work to optimize and tune processes!

In my case I did an export of a large 12c  12.1.0.2 version test database that is using the application JD Edwards Enterprise One version 9.2 . The size of the database is about 2.6TB. The hardware is an Oracle engineered system the ODA X5-2. The database compute node has 1 single CPU socket with 24 cores.



I used the following export data pump parameters in my par file.

cluster=N
compression=ALL
directory=EXPORT_DP
dumpfile=export_jdeprod_schemas_%u.dmp
filesize=20G
FLASHBACK_time=systimestamp
logfile=export_dump_jdeprod_schemas_log.log
parallel=8
schemas=PRODCTL,PRODDTA
content=ALL

The export data pump process took only a short duration of 33 minutes and the total size of the compressed export dump files was 67GB.

I was requested by my client to refresh the two schemas I exported PRODCTL and PRODDTA. I simply dropped and recreated the users from the user creation DDL scripts I had saved prior. This method is the cleanest way I know to do a refresh of a full schema which is the process I normally use.

I then proceeded to create the import parameter file as follows:

userid='/ as sysdba'
Directory=EXPORT_DP
Dumpfile=export_jdeprod_schemas_%U.dmp
Parallel=10
Logfile=impdp_jdeprod_schemas_copy1.log
schemas=proddta:prodctl

Looks straightforward right? Please note the parallel 10 setting which I thought would help on the overall import process. The import process actually ran for a whopping 32 hours and 15 minutes which is basically 1 day and 8.25 hours! This import duration was completely unacceptable especially after doing an export that only took about 33 minutes to run. As I monitored the import process I quickly realized that the majority of the time the import spent on was the index and constraint builds. 

I knew from experience all indexes are normally created in single threaded mode using the default attribute setting of parallel 1. Note in most cases the parallel attribute is not altered for indexes to something other than 1 otherwise if it was something greater when the export was done then the index would definitely be created with the same parallel degree on the import process. Next I worked on creating an SQL DDL script file for the indexes and constraints. My plan would be to run the import process without creating the indexes and constraints, it would just create and load the tables only. The following is the command to run to create a SQL DDL script file for the indexes and constraints.

$ impdp sqlfile=index_constraint_script.sql include=INDEX,CONSTRAINT


Once the above command completes then you will need to modify the index_constraint_script.sql file and replace all occurrences of parallel 1 to something higher such as parallel 12. In my case all indexes had a parallel degree of 1.

The following modified import parameter file is what I used. This new parameter file does not create indexes or constraints, it just creates the tables and load the data into it. Note the parameter TRANSFORM=DISABLE_ARCHIVE_LOGGING:Y, it is quite helpful and what this parameter does is reduce the overall redo logging operations by creating the tables in no logging mode. Keep in mind this parameter does not help if the database is already in force logging mode. I also suggest to set the logtime parameter to be set to all so you can see the timestamp as the import log is written to.

-- Import parameters for Tables only
userid='/ as sysdba'
Directory=EXPORT_DP
Dumpfile=export_jdeprod_schemas_%U.dmp
Parallel=10
Logfile=impdp_jdeprod_schemas_copy1.log
TRANSFORM=DISABLE_ARCHIVE_LOGGING:Y
logtime=all
schemas=proddta:prodctl
EXCLUDE=INDEX,CONSTRAINT

Before I reran the import again for a second time I dropped both of the schemas from the database and recreated them. This time when I ran the import process it completed in about 1 hour. The import only created the tables and loaded the data into the tables since I am excluding the index and constraint creation.

Next you may then invoke the SQL script file that was just created and also ensure you create a spool file then set timing on to see the elapsed time for each statement in the log file. I would suggest running the script in nohup mode as follows:

$ nohup sqlplus '/ as sysdba' @index_constraint_script.sql &

The script to create the indexes in parallel mode for each index and to also create the constraints took about 10 hours so the whole process to do the import and index/constraint creation took about 11 hours. Overall we went from 32.25 hours down to 11 hours which is about a 66% decrease change of improvement! The total duration improved by more than half and it was nice to see this. Please note another tip you can also split the index script into parts and run them in parallel for even a faster execution of the index builds.

Some additional items I did from a database tuning perspective to help were the following based on AWR report findings from the prior import run.
  • Increased the redo log size from 1G to 3GB and created 5 groups instead of 3. This was done to reduce the log file sync waits.
  • Increased the PGA_AGGREGATE_LIMIT from 2GB to 24GB to help reduce overall sorting operations when building the indexes.
  • Increased the PGA_AGGREGATE_TARGET from 2GB to 12GB to help reduce overall sorting operations when building the indexes.
  • Increased the SGA_TARGET from 4GB to 12GB to help reduce the I/O physical reads from disk.
UPDATE Oct 10, 2017 -

I have an additional update that can be used to help optimize the process even further. I was able to optimize the constraint creation process in parallel as well and shave off about by another 7 hours! So now the entire import process went from about 32.25 hours down to about 4 hours of duration this is about a 88% decrease change in improvement now! I like that.

Here is what you need to do to optimize the constraint creation process.


  • Separate the file index_constraint_script.sql into two files one for the indexes and one for the constraints.
  • Enable a parallel degree on all tables in the tables you need to import into such as a degree of 12.
  • Then again split the constraints script into two files again.
  • The first of the constraint file should only have the following command to create the constraint with the novalidate clause. This will create the constraint instantly, an example is below.
ALTER TABLE "OWNER"."TABLE_NAME" ADD CONSTRAINT "PK1" PRIMARY KEY ("COL1", "COL2") USING INDEX "OWNER"."IX1"  ENABLE novalidate;
  • Enable parallel DDL in the same session prior to running the following step.
ALTER SESSION ENABLE PARALLEL DDL;
  • The second constraint file should have the following syntax. This will allow the constraint to be  created in parallel by checking for the parallel degree on the table/index.
alter table OWNER.TABLE_NAME enable constraint PK1;

Also I was told by a colleague that there is an oracle patch which supposedly is supposed to help doing the index and constraint creation in parallel. You may reference the following My Oracle Support notes below. I can only confirm the patch helps to parallelize the index creation but not the constraint creation.


Why Isn't Datapump Creating Multiple Indexes in Parallel? (Doc ID 402511.1)

Patch 22273229: IMPDP DOES NOT USE PARALLEL LOAD FOR PRIMARY KEY


I hope my experience helps others optimize their import process. A wishlist item I have for a future release of oracle would be to allow to set in the import parameter file the parallel degree for the index creation. This would be a major help for a faster import!