A method to create multiple integrate jobs in the same channel for parallelism

Viewing 1 post (of 1 total)
  • Author
    Posts
  • #8718
    Herman Verheul
    Keymaster

    When the need is there to have parallel integrate jobs in the same channel for example to divide the load among multiple integrate jobs, the following example will help in doing so.

    Below example is a channel which captures from an Oracle database and integrates into Amazon S3 buckets. The channel consists of 1 capture job for all tables in an Oracle schema, and 4 parallel integrate jobs sending the data to the bucket.

    In the attached example there is 1 integrate job (location s3aw1) which integrates all tables except tables which have /TableProperties /Absent, these tables are integrated by a different integrate location.
    1 integrate job (location s3aw2) which integrates only 1 table and 2 integrate jobs (locations s3aw3/4) each integrating 2 tables).

    Additional for integrating into Amazon S3 buckets where no small files should be created because of performance impact, the integrate jobs which have multiple tables, have the action /Integrate /OrderByTable which ensures data is sorted in such way only a single target file per source table will be created in an integrate cycle. If data would not be sorted , a lot of small target files will be created per source table. To minimize the amount of integrate cycles, CycleByteLimit is set to 0, which means process all transaction files created by capture in 1 cycle, instead of (default) 10 MB chunks.

    The use of /TableProperties /Absent is supported in HVR version 5.0.4 and higher.

Viewing 1 post (of 1 total)
  • You must be logged in to reply to this topic.

© 2020 HVR

Test drive Contact us