Exadata Smart Scan

What is Smart Scan?

Smart Scan is one of the great feature in Oracle Exadata. With this technology storage send only required rows to database node from storage instead of entire Oracle Block. Multiple rows are stored in one Oracle Block but non-exadata system return entire block even only one rows is required. On the other hand, Exadata Storage returns only interested rows but not entire block.

How does Smart Scan work?

  • With Smart Scan technology, the database servers or compute nodes send additional query details to the storage cells via a protocol known as iDB (Intelligent Database Protocol)
  • Armed with this information, the storage cells can take over a large portion of the data-intensive query processing
  • Exadata storage cells can search the storage layer with this added intelligence about the query and send only the relevant bytes, not all the database blocks, to the database compute nodes
  • The Smart Scan sends a more concentrated set of rows and columns directly to the program Global Area(PGA) of the requesting process instead of the data buffers in the SGA

Smart Scan includes:

  • Full Table and Fast Full Index Scans: Scans are performed inside Exadata Storage Server, rather than transporting all the data to the database server.
  • Predicate filtering: Only the requested rows are returned to the database server, rather than all the rows in a table.
  • Column filtering: Only the requested columns are returned to the database server, rather than all the table columns.
  • Join filtering: Join processing using Bloom filters are offloaded to Exadata Storage Server.

How classic or non exadata database processes I/O or SQL request:

With conventional storage, all the database intelligence resides in the software on the database server. To illustrate how SQL processing is performed in this architecture, an example of a table scan is shown in the graphic on the screen:

  1. The client issues a SELECT statement with a predicate to filter a table and return only the
    rows of interest to the user.
  2. The database kernel maps this request to the file and extents containing the table.
  3. The database kernel issues the I/Os to read all the table blocks.
  4. All the blocks for the table being queried are read into memory.
  5. SQL processing is conducted against the data blocks searching for the rows that satisfy the predicate.
  6. The required rows are returned to the client.

As is often the case with large queries, the predicate filters out most of the rows in the table. Yet all the blocks from the table need to be read, transferred across the storage network, and copied into memory. Many more rows are read into memory than required to complete the requested SQL operation. This generates a large amount of unproductive I/O, which wastefully consumes resources and impacts application throughput and response time.

How exadata database processes I/O or SQL request by using Smart Scan:

Queries that perform table scans can be processed within Exadata cells and return only the required subset of data to the database server. Row filtering, column filtering, some join processing, and other functions can be performed within Exadata cells. Exadata Storage Server uses a special directread mechanism for Smart Scan processing.

The graphic on the screen illustrates how a table scan operates with Exadata cell storage:

  1. The client issues a SELECT statement to return some rows of interest.
  2. The database kernel determines that the data is stored on Exadata cells, so an iDB command representing the SQL command is constructed and sent to the Exadata cells.
  3. The Exadata Storage Server software scans the data blocks to extract the relevant rows and columns that satisfy the SQL command.
  4. Exadata cells return to the database instance iDB messages containing the requested rows and columns of data. These results are not block images, so they are not stored in the buffer cache.
  5. The database kernel consolidates the result sets from across all the Exadata cells. This is similar to how the results from a parallel query operation are consolidated.
  6. The rows are returned to the client.

Moving SQL processing off the database server frees server CPU cycles and eliminates a massive amount of unproductive I/O transfers. These resources are free to better service other requests. Queries run faster, and more of them can be processed.

Example:

SQL> create tablespace tbs_test
datafile '+DATA'
size 2048m;

User created.

SQL> create user smartscan identified by samartscan
default tablespace tbs_test
temporary tablespace temp
quota unlimited on tbs_test;  

Tablespace created.

SQL> grant dba to smartscan;

Grant succeeded.

Change below parameter otherwise will get – ORA-30009: Not enough memory for CONNECT BY operation

SQL> connect smartscan/samartscan
Connected.

SQL> alter session set workarea_size_policy=manual;

Session altered.

SQL> alter session set sort_area_size=1048576000;

Session altered.
SQL> create table sales
    nologging
    as
    select
    rownum as sales_id,
    'iPhone X' as product_name,
    mod(rownum,5) as channel_id,
    mod(rownum,500) as customer_id ,
    1200 as amount,
   trunc(sysdate - 10000 + mod(rownum,10000)) as order_date,
   trunc(sysdate - 9999 + mod(rownum,10000)) as ship_date
   from dual connect by level<=2e7;

Table created.

SQL> select sum(bytes)/1048576 SIZE_MB from user_segments where segment_name='SALES';

   SIZE_MB
----------
      992

SQL> select count(*) from SALES;

  COUNT(*)
----------
  20000000

Smart Scan Testing:

SQL> show parameter cell_offload_processing

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
cell_offload_processing              boolean     TRUE

SQL> select product_name,channel_id,sum(amount) from sales
where order_date>sysdate -30
group by product_name,channel_id; 

PRODUCT_ CHANNEL_ID SUM(AMOUNT)
-------- ---------- -----------
iPhone X          3    14400000
iPhone X          0    12000000
iPhone X          4    14400000
iPhone X          1    14400000
iPhone X          2    14400000

Elapsed: 00:00:02.28

Execution Plan is showing that Smart Scan using.

SQL> set linesize 180
SQL> set pagesize 100
SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  cz6csjz4kqq0r, child number 0
-------------------------------------
select product_name,channel_id,sum(amount) from sales where
order_date>sysdate -30 group by product_name,channel_id

Plan hash value: 2895541888

------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |       |       | 34909 (100)|          |
|   1 |  HASH GROUP BY             |       |  1393 | 62685 | 34909   (2)| 00:06:59 |
|*  2 |   TABLE ACCESS STORAGE FULL| SALES |  1393 | 62685 | 34908   (2)| 00:06:59 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("ORDER_DATE">SYSDATE@!-30)
       filter("ORDER_DATE">SYSDATE@!-30)

Note
-----
   - dynamic sampling used for this statement (level=2)


25 rows selected.

View Smart Scan Statistics of this session:

SQL> SELECT a.NAME, b.VALUE/1048576 MB
FROM v$sysstat a, v$mystat b
WHERE a.statistic# = b.statistic#
AND ( a.NAME IN
('physical read total bytes', 'physical write total bytes',
'cell IO uncompressed bytes')
OR a.NAME LIKE 'cell physical%');  

NAME                                                                     MB
---------------------------------------------------------------- ----------
physical read total bytes                                          2960.375
physical write total bytes                                       1093.59375
cell physical IO interconnect bytes                              1100.52454
cell physical IO bytes saved during optimized file creation               0
cell physical IO bytes saved during optimized RMAN file restore           0
cell physical IO bytes eligible for predicate offload            2955.30469
cell physical IO bytes saved by storage index                             0
cell physical IO bytes sent directly to DB node to balance CPU            0
cell physical IO interconnect bytes returned by smart scan       1.86048126
cell IO uncompressed bytes                                       2955.30469

10 rows selected.

Check same query execution time after disabling the Smart Scan. With Smart Scan enable, the query execution time was around 2.86 seconds but without smart scan the same query execution (below example) time is around 21.63 seconds which is 10 times slower than earlier (when smart scan was enabled).

SQL> conn smartscan/samartscan
Connected.

SQL> alter session set cell_offload_processing=false;

Session altered.

SQL> alter system flush buffer_cache;

System altered.

SQL> set timing on
SQL> select product_name,channel_id,sum(amount) from sales
where order_date>sysdate -30
group by product_name,channel_id;

PRODUCT_NAME         CHANNEL_ID SUM(AMOUNT)
-------------------- ---------- -----------
iPhone X                      3    14400000
iPhone X                      0    12000000
iPhone X                      4    14400000
iPhone X                      2    14400000
iPhone X                      1    14400000

Elapsed: 00:00:21.63

When cell_offload_processing parameter was true (above example) Smart Scan return only 1.86 MB to database sever. But without Smart Scan, it returns around 1970 MB to database server means around 2 GB data need to process by database to make final result set for user.

SQL> SELECT a.NAME, b.VALUE/1048576 MB
FROM v$sysstat a, v$mystat b
WHERE a.statistic# = b.statistic# AND ( a.NAME IN
('physical read total bytes', 'physical write total bytes', 'cell IO uncompressed bytes')
OR a.NAME LIKE 'cell physical%');

NAME                                                                     MB
---------------------------------------------------------------- ----------
physical read total bytes                                        1970.42969
physical write total bytes                                                0
cell physical IO interconnect bytes                              1970.42969
cell physical IO bytes saved during optimized file creation               0
cell physical IO bytes saved during optimized RMAN file restore           0
cell physical IO bytes eligible for predicate offload                     0
cell physical IO bytes saved by storage index                             0
cell physical IO bytes sent directly to DB node to balance CPU            0
cell physical IO interconnect bytes returned by smart scan                0
cell IO uncompressed bytes                                                0

10 rows selected.

When Smart Scan happens:

  • Full Table Scans
  • Full Index Scans
  • Direct-path reads
  • Direct-path reads are automatically used for parallel queries
  • Not used by default for serial scans of small tables Can be forced via _serial_direct_read=TRUE at either session or system level

How to enable smart scan:

  • Must be enabled at the database
    • CELL_OFFLOAD_PROCESSING value set  to TRUE (default)
  • At ASM Diskgroup level via attribute
    • ‘cell.smart_scan_capable’=’TRUE’

Note::: Smart Scan can be defined during ASM DiskGroup creation time with CREATE DISKGROUP or Later with changing the DISKGROUP Attribute. SQL> ALTER DISKGROUP SET ATTRIBUTE.