Reply To: Channel Count Best Practices

#15197
ggoodrich
Keymaster

Determining the proper number of channels/jobs to manage depends on several factors, such as:

  •         Amount of transaction log generation (per hour) at each source location
  •         Number of logical change operations (per second) to be integrated
  •        Division of duties and ownership of data
  •         Replication use case

The short answer is … the fewest number of channels/jobs is best, but this may vary depending on your specific use case.

The capture job is typically extremely fast and can handle many gigabytes of redo per hour, and therefore having a single capture job is most efficient whenever possible. Since capture does not store any of the changes locally, the disk i/o is practically nil. It works in large blocks, so again, it is fast and efficient.

Integrate might need to be split into multiple jobs (locations) to help spread the load across the database/file server for faster throughput. For example, you might have a high volume of logical operations which require a few integrate jobs to maintain the latency requirements. When splitting these operations into multiple integrate jobs, you should consider keeping similar data together for transactional consistency. Another factor to consider is the number of processing cores.

If different groups of business users are using HVR for various replication projects, they may elect to maintain their own channels separately as they design and test their replication. They too might have different technical or business requirements that would suggest they need to develop and deploy separate channels.

The scenario or use case might also affect the number of channels defined. For example, some replication requirements might involve different schedules, as to when jobs should or should not be running. Maybe a geographically disbursed active-active use case requires integration jobs to be continuously running with near real-time latency requirements. While another use case might be designed for consolidated data warehousing in which running in small batches might be more optimal. And lastly, maybe one team is constantly making changes to their channels but does not want to impact other teams with the hourly maintenance work.

So, to summarize, let’s look at an example where a company is using HVR to replicate data for a couple of data replication projects. They have large volume of data and need to keep replication with near zero latency for most jobs, but not all. They also have a data mart project that requires jobs to be stopped during certain hours of the day. This company has defined two channels. One channel has a single capture job, but it has defined multiple integrate jobs to keep the latency low. Their integration server has eight cores, so they elected to split the data into eight target jobs. A second channel has a single capture and a single integrate job, but since it has unique scheduling attributes that require it to be stopped for a few hours every day, they split those tables into a separate channel.

 

You can see there are several factors that may attribute to the number of channels/jobs defined that vary from volumes of throughput and replication requirements or user needs.

Test drive
Contact us