- August 3, 2018 at 4:35 pm #15011Jared TorrenceParticipant
In our current HVR setup, nearly all of our Source Agents are sitting directly on the DB server (mostly Oracle DBs) running on Linux and our HUB is running on Linux. Our Target Agents also currently sit on the DB Server (Greenplum DB), but we are looking to stand up dedicated servers for our Target Agents and have the gpfdist communicate with Greenplum from our servers.
We have 5 common targets and was wondering if it is best practice to have a dedicated server for each target or is it possible to have multiple Agents sitting on a single server (possibly for our QA instances). Curious what the best route forward is?August 6, 2018 at 5:22 am #15012MarkKeymaster
Great question. As is the often case with questions related to best practices there are multiple considerations, and eventually depending on your unique environment you should decide what works best for you.
First of all remember it is important for your target agent to be close to the target database so that network communication between agent and target does not become a bottleneck. This means your agent should at least be in the same data center as the target database or when running in the cloud in the same availability zone.
A database like Greenplum is configured to use all system resources and should be balanced to avoid any particular resource from becoming a bottleneck, until the system maxes out compute power i.e. becomes CPU-bound. With that it would arguably be best to avoid running the HVR agent on one of the active nodes in the Greenplum cluster because (1) the gpfdist utility – that HVR uses on the agent environment to get data into Greenplum as fast as possible – is a pretty resource-intensive program, and (2) even when running on the master node access to the database may be compromised with too many resources in-use. We do commonly see Greenplum customers use the standby master node (that is idle unless in a failover scenario) to run the HVR agent which can be a good choice to make best use of available (and typically powerful) hardware resources.
HVR agents get instructions from a hub to process data or perform queries, and are otherwise stateless. As a result you can use a single agent to serve multiple hubs. Each time a job starts (irrespective of which hub initiates the communication) a connection is established to the agent through the HVR remote listener.
It is possible to install multiple agents on a single server. Just ensure different agents use different values for HVR_HOME and HVR_CONFIG, and every agent must be listening on a different port. Note that when setting up an environment like this it is crucial to always set correct values for HVR_HOME and HVR_CONFIG when making changes to the agent configuration (e.g. when upgrading the software).
Now regarding your question, here are some considerations:
- gpfdist can be quite resource intensive to serve the nodes in the Greenplum cluster directly. Running more than a few gpfdist processes concurrently on a single server may result in overloading a server. Of course resource consumption may be mitigated by staggered, scheduled jobs.
- Combining a single (bigger) server/configuration rather than multiple smaller servers provides more flexibility over time to use resources for jobs that need it most e.g. production jobs.
- Having multiple agents enables flexibility in running different versions of the software. E.g. you may want to ensure a new version works well for your QA system before deploying it into production.
- Managing multiple agents on a single server is slightly more complex than having just one agent.
- Consider the use of a load balancer (e.g. in the cloud some of our customers use the AWS Elastic Load Balancer (ELB)) in front of an agent to be flexible in adjusting resources over time (either in size/configuration of the server, and in number of servers). This consideration is independent of whether you decide to use a single or multiple agents on a single server (initially). Behind a load balancer you may end up with multiple servers that all run multiple identical agent configurations with HVR remote listeners on different ports (e.g. HVR 5.2 listener on port 4352 on all servers and HVR 5.3 listener on port 4353 on all servers).
Hope this helps.
Mark.August 6, 2018 at 7:35 am #15013Jared TorrenceParticipant
Thanks Mark. This is helpful. Was in line with my thinking that it’s possible but maintenance could become a bit more complex. My aim is to get fully dedicated servers for maintenance reasons and ensure we are comfortable with the load each can take as we continue to grow our channels.
- You must be logged in to reply to this topic.