Write Efficient Queries
Tips for writing efficient queries
When you write a query, you want to make sure it runs as efficiently as possible. Here are some tips to help you write efficient queries:
-
Limit the columns in the
SELECT
clause: Only request the columns you need, as it reduces the amount of data the query engine needs to process. -
Use the
LIMIT
clause: If you are only interested in a specific number of rows, use theLIMIT
clause to avoid processing more data than necessary. -
Filter early and use predicate pushdown: Apply filters as early as possible in the query to reduce the amount of data being processed. This takes advantage of predicate pushdown, which pushes filter conditions down to the storage layer, reducing the amount of data read from storage. For example, if you only need data from a specific date range, filter on the date column as early as possible.
-
Use
UNION ALL
instead ofUNION
: If you’re combining the results of multiple queries, useUNION ALL
instead ofUNION
to avoid the overhead of removing duplicate rows. -
Only order when necessary: Ordering results can be computationally expensive. If you don’t need ordered results, avoid using
ORDER BY
. -
Always use the actual data while filtering: Do not use functions on the filter columns: For example, if you want to filter on a date, do not use date_trunc(‘day’, block_timestamp) > ‘2022-01-01’. Instead, use block_timestamp > ‘2022-01-01’. The first example will not be able to use the min/max values of the block_time column to skip entire parquet files or row groups within files while scanning through a table, while the second example will. The same goes for other functions, such as substr, lower, upper etc.
-
Use
UNION ALL
instead ofOR
: If you need to combine the results of multiple queries, useUNION ALL
instead ofOR
to avoid the overhead of removing duplicate rows. For example, instead of usingSELECT * FROM table WHERE column = 'value1' OR column = 'value2'
, useSELECT * FROM table WHERE column = 'value1' UNION ALL SELECT * FROM table WHERE column = 'value2'
. -
User lower-case address directly in SQL: If you are filtering on an address, use the lower-case address directly in the SQL query. For example, instead of using
SELECT * FROM transactions WHERE from_address = lower('0x1234567890ABCDEF')
, useSELECT * FROM transactions WHERE from_address = '0x1234567890abcdef'
. This allows the query engine to use the min/max values of the from_address column to skip entire ORC files or row groups within files while scanning through a table.