- July 1, 2016 at 12:55 am #8835Frank KnotParticipant
Vectorwise Hadoop Edition (aka VectorH) has support table partitioning. Since this is the cluster edition of Vectorwise, running on different (hadoop) nodes, parallelisation of the tables is very good/required for high performance.
They difficulty is in the way the partitioning is done, and the amount of manual action required.
The syntax is:
create table tst_part (key1 int, key2 int, value int) with partition = (HASH on key1,key2 3 partitions)
The recommended number of partitions for VectorH is:
#rows < 100k : 25% * #nodes * #cores_per_node #rows < 100m : 50% * #nodes * #cores_per_node else : #nodes * #cores_per_node
You may always want to use 50% of maximum due to the continuous loading and the concurrent queries.
This is because Vectorwise/VectorH spawn new threads for partitions for queries, so the concurrent number of threads == number of partitions * number of concurrent queries. This is mitigated by the distribution of the query over the cluster nodes, in the case of VectorH.
A consequence is that it is impossible for HVR to automatically determine the number of partitions. The expected amount of rows or concurrent queries on a newly created table is unknown at table creation.
You can define an Environment action for varible:
This variable will ensure HVR creates n partitions for each table created while this variable is set.
- You must be logged in to reply to this topic.