Friday, March 25, 2022
HomeBig DataUse AQUA with Amazon Redshift RA3.xlplus nodes

Use AQUA with Amazon Redshift RA3.xlplus nodes


Amazon Redshift RA3 is the newest technology node kind that lets you scale compute and storage to your knowledge warehouses independently. The RA3 node household contains RA3.16xlarge, RA3.4xlarge, and RA3.xlplus nodes for big, medium, and small workloads, respectively. RA3.xlplus, the newest member of the RA3 node household, affords one third of the computing energy of RA3.4xlarge and prices one third of the value. RA3.xlplus is the smallest node within the RA3 household, nevertheless it affords the identical superior functionalities. It has been extensively utilized in environments with mild computing demand corresponding to QA, knowledge analytics for small groups, or processing smaller datasets.

In 2021, Amazon Redshift launched AQUA (Superior Question Accelerator) for Amazon Redshift to spice up efficiency of analytical queries that scan, filter, and mixture giant datasets. AQUA makes use of AWS-designed processors with the AWS Nitro chip adapter to hurry up knowledge encryption and compression, and customized analytical processors applied in FPGAs to speed up functions requiring textual content search of a really giant dataset, corresponding to advertising and personalization.

Clients have requested us to help AQUA for RA3.xlplus, and we just lately launched AQUA for RA3.xlplus nodes. On this put up, we proceed to construct on the put up AQUA (Superior Question Accelerator) – A Pace Increase for Your Amazon Redshift Queries and present that with AQUA help, RA3.xlplus gives the identical profit as the present supported RA3 nodes within the following areas:

  • Mechanically boosting sure sorts of queries
  • Decreasing the impression in your Amazon Redshift cluster by offloading sure queries that scan, filter, and mixture giant datasets to AQUA

Check setting

To check AQUA for RA3.xlplus, we began by creating an RA3.xlplus cluster with the next particulars:

  • Amazon Redshift cluster – 2-node RA3.xlplus
  • Dataset – 3 TB TPC-DS, 3 TB TPC-H
  • Question set – Pattern queries based mostly on the TPC-H and TPC-DS workload

Pattern queries

To check AQUA, we created six textual content search queries that scan, filter, and mixture the lineitem desk within the TPC-H dataset, which has 18 billion rows with a WHERE clause predicate towards the l_comment column.

The next desk summarizes our desk definition.

desk encoded diststyle sortkey1 rows
lineitem Y KEY l_shipdate 18,000,048,306

We randomly generated a question set with queries of varied complexity. The queries are designed to measure scan price, that are an space of focus for AQUA. Every question has a predicate with LIKE and OR. The variety of LIKE or OR predicates will get progressively increased to simulate complicated workloads.

For instance, Question 1 has one OR predicate:

SELECT COUNT(l_orderkey)
FROM lineitem
WHERE (l_comment LIKE '%throughout%') OR (l_comment LIKE '%courageous,%');

In distinction, Question 4 has 50 OR predicates:

SELECT COUNT(l_orderkey)
  FROM lineitem
  WHERE (l_comment LIKE '%outsi%') OR
  (l_comment LIKE '%uthless%') OR
  (l_comment LIKE '%capades%') OR
  (l_comment LIKE '%horses%') OR
  (l_comment LIKE '%ornis%' AND l_comment LIKE '%phins?%') OR
  (l_comment LIKE '%affix%') OR
  (l_comment LIKE '%integrat%') OR
....
  (l_comment LIKE '%ithin%' AND l_comment LIKE '%quiet%') OR
  (l_comment LIKE '%taphs%') OR
  (l_comment LIKE '%dugouts%' AND l_comment LIKE '%ches%') OR
  (l_comment LIKE '%telets%' AND l_comment LIKE '%detect!%') OR
  (l_comment LIKE '%develop%') OR
  (l_comment LIKE '%promise!%') OR
  (l_comment LIKE '%was%') OR
  (l_comment LIKE '%accounts%') OR
  (l_comment LIKE '%idly%' AND l_comment LIKE '%deposits%') OR
  (l_comment LIKE '%combine!%' AND l_comment LIKE '%rely%') OR
  (l_comment LIKE '%ins%' AND l_comment LIKE '%makes use of!%') OR
  (l_comment LIKE '%epitaphs!%' AND l_comment LIKE '%breac%') OR
  (l_comment LIKE '%pliers%' AND l_comment LIKE '%phins%') OR
  (l_comment LIKE '%hogs%' AND l_comment LIKE '%sentiments%') OR
  (l_comment LIKE '%ctions%' AND l_comment LIKE '%daringly%') OR
  (l_comment LIKE '%ies%' AND l_comment LIKE '%esias%');

The next desk summarizes the complexity of every question.

Question Quantity Variety of OR Variety of LIKE
Question 1 1 2
Question 2 5 7
Question 3 10 12
Question 4 50 66

Scan efficiency enchancment with AQUA

We ran the 4 queries sequentially with out every other workload on the system. With AQUA, the efficiency enhancements vary from roughly 7–13 occasions quicker, as summarized within the following desk.

Question Quantity Amazon Redshift with AQUA (seconds) Amazon Redshift Solely (seconds) Enchancment
Question 1 78.53 635.89 709.74%
Question 2 92.75 810.04 773.36%
Question 3 130.68 956.83 632.19%
Question 4 137.68 1950.9 1316.98%

AQUA impression on a number of workloads

On this setting, we simulated a multi-user workflow utilizing TPC-DS queries on the Amazon Redshift cluster. We recorded question runtime for 3 situations:

  • Baseline – We measured the end-to-end runtime working all TPC-DS queries serially on the Amazon Redshift cluster. On this state of affairs, AQUA was off and no extra workload was run (a single person was on the cluster).
  • Baseline with extra workload – This was the identical because the baseline state of affairs with an extra workload run in parallel. We simulated a person load by working textual content scan queries randomly chosen from Question 1, Question 2 and Question 3. These queries have comparatively brief runtimes. We had two variations of this state of affairs:
    • AQUA turned off
    • AQUA turned on

From the outcomes, we noticed the next:

  • With AQUA turned on for all workloads, the impression of a textual content scan question on the baseline runtime was negligible.
  • With out AQUA, the baseline runtime was impacted by the extra workload created with textual content scan queries. In our case, overhead was about 31%.
Baseline Baseline with extra workload Enchancment with AQUA
AQUA turned off AQUA turned on
TPC-DS Finish-to-Finish Time 3:43:35 4:54:50 3:44:36 31.27%

Single-node RA3.xlplus help

AQUA additionally helps the just lately launched Amazon Redshift single-node RA3.xlplus. In a single-node configuration, the useful resource is shared amongst all Amazon Redshift operations, that are historically dealt with individually by a frontrunner node and compute nodes. A single-node configuration is often utilized in a private or small group setting for knowledge exploration.

We ran the identical set of queries as earlier than utilizing Question 1, 2 and Question 3. The outcomes demonstrated that AQUA gives an identical degree of accelerations for these queries in a single-node setting.

Question Quantity Amazon Redshift with AQUA (seconds) Amazon Redshift Solely (seconds) Enchancment
Question 1 157.91 1,254.03 694.13%
Question 2 193.64 2,037.79 952.36%
Question 3 260.75 2,495.85 857.19%

Abstract

On this put up, we ran a set of simulated efficiency assessments on the Amazon Redshift RA3.xlplus platform with AQUA. With AQUA on, RA3.xlplus gives the identical profit as earlier supported platforms. It gives a question scan efficiency enhance with AQUA-supported operators, which is able to develop over time. It could actually scale back the efficiency impression of your current workflow by offloading the scan to AQUA.

We invite you to share your feedback and use instances with the Amazon Redshift AQUA staff.

For extra details about how AQUA accelerates Amazon Redshift, see AQUA (Superior Question Accelerator) for Amazon Redshift.

For extra details about queries accelerated by AQUA, see When does Amazon Redshift use AQUA to run queries?


In regards to the Authors

Quan Li is a Senior Database Engineer at Amazon Redshift. His focus is enabling prospects to ship most enterprise worth. Quan is keen about optimizing high-performance analytical databases. Throughout his spare time, he enjoys touring and experiencing several types of cuisines together with his household.

Steffen Rochel is a Sr. Software program Growth Supervisor at AWS. He’s targeted on knowledge analytics acceleration. He has experience in hardware-software design and operation of large-scale, high-performance distributed programs.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments