When does HVR use temp files, where do they go and how big are they?

This topic contains 0 replies, has 1 voice, and was last updated by  Frank Knot 1 week, 1 day ago.

Viewing 1 post (of 1 total)
  • Author
    Posts
  • #8753

    Frank Knot
    Participant

    HVR tries to ‘stream’ data, but there are some moments when it has to create temp files.

    HVR”s temp files are created in $HVR_TMP, but if that is not defined then they are located in $HVR_CONFIG/tmp/

    HVR uses temp files at various moments. The following is an [incomplete] list,

    During log-based capture, if a transaction exceeds a limit (all limits are detailed below) then the capture job will spill it to disk on the capture machine.

    Row-wise compare and refresh sometimes need to sort data outside the DBMS. If the data in a table exceeds a limit then this data will be spill into multiple sort-merge files on the target machine.

    During row-wise refresh, if the number of changes detected exceeds a limit then these changes will be spilt to disk.

    During row-wise verbose compare or refresh from the GUI, if the number of changes detected exceeds a limit then these changes will be spilt to disk.

    If a single LOB value (e.g. long varchar or CLOB) exceeds a limit then it’s contents will be spilt to disk.

    Action Transform/ with parameter /Command directs the output of a command transform to a temp file and then reads it back.

    When bulk-loading data into Greenplum or Redshift, HVR puts the data into a ‘staging’ file first.

    The size of these files depends on the data that HVR is processing (obviously).

    Several of the above cases say that HVR spills to disk if it ‘exceeds a limit’.

    For cases 1, 3, 4 & 5 this limit is controlled by env-variable $HVR_TEMP_SPILL. This controls the number of bytes for the temporary ‘blobs’ in HVR’s memory; these will spill to disk if their size exceeds (X-T)/N, where X= $HVR_TEMP_SPILL, T=total_size_of_currently_cached_temp_objects and N=num_open_cached_objects. The default of $HVR_TEMP_STORE is 64M (bytes). This means that, by default, a single BLOB will spill to disk if it’s bigger than 32M..

    For case 2 (sorting), the ‘spill limit’ is controlled by env-variable $HVR_SORT_BYTES_LIMIT (default 512Mb) and $HVR_SORT_ROW_LIMIT (10M). If data exceeds one of these amounts then HVR’s sort will spill to disk. But it will never spill if the number of rows is below $HVR_SORT_ROW_MIN (default 128).

    HVR’s native [uncompressed] format was originally quite inflated, because a varchar(1000) containing “hello world” is held as 1000 bytes not as 11 bytes. So, for older HVR versions, for cases 1, 2 and 3 above (log-based capture transactions, sorting and row-wise changes), the temp file were sometimes much bigger than the original table data.

    To combat this ‘inflation’, since 4.6.1 HVR compresses most of its temp files. In HVR 4.6.1, temp files for cases 1, 3, 4 & 5 are compressed unless $HVR_TMP_NO_COMPRESS=1 is set.. Since HVR 4.6.2, temp files for cases 1, 3, 4 & 5 are compressed unless $HVR_TMP_COMPRESS_LEVEL=0 is set. Before HVR 4.6.2 files for case 2 (sorting during row-wise) are only compressed if HVR_SORT_COMPRESS=1 is set. Since 4.6.2 files for case 2 are compressed unless $HVR_SORT_COMPRESS_LEVEL=0 is set.

Viewing 1 post (of 1 total)

You must be logged in to reply to this topic.

© 2017 HVR Software

Free Trial Contact Us