性能调优是一个对系统资源均衡取舍的过程
Kaya 发表于 os2ora.com
这个世界上没有绝对的真理。大家都认为对的东西,在某种场合却是不适宜的。
Oracle有一个很有用的扩展,叫做multi_table_insert, 它可以实现把一个子查询的结果根据条件把每件记录插入到一个或多个表中。例如下面的例子:
INSERT ALL INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date, sales_sun) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+1, sales_mon) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+2, sales_tue) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+3, sales_wed) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+4, sales_thu) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+5, sales_fri) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+6, sales_sat) SELECT product_id, customer_id, weekly_start_date, sales_sun, sales_mon, sales_tue, sales_wed, sales_thu, sales_fri, sales_sat FROM sales_input_table;
它实现把sales_input_table里的每条记录变化成7条记录,并插入到目标表sales中。稍微做下改写,让每条记录插入到两个表中,就是下面的例子:
INSERT ALL INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date, sales_sun) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+1, sales_mon) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+2, sales_tue) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+3, sales_wed) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+4, sales_thu) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+5, sales_fri) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+6, sales_sat) -- INTO sales2 (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date, sales_sun) INTO sales2 (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+1, sales_mon) INTO sales2 (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+2, sales_tue) INTO sales2 (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+3, sales_wed) INTO sales2 (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+4, sales_thu) INTO sales2 (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+5, sales_fri) INTO sales2 (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+6, sales_sat) SELECT product_id, customer_id, weekly_start_date, sales_sun, sales_mon, sales_tue, sales_wed, sales_thu, sales_fri, sales_sat FROM sales_input_table;
和下面的实现比较,哪种方式性能好呢?
INSERT ALL INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date, sales_sun) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+1, sales_mon) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+2, sales_tue) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+3, sales_wed) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+4, sales_thu) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+5, sales_fri) INTO sales (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+6, sales_sat) SELECT product_id, customer_id, weekly_start_date, sales_sun, sales_mon, sales_tue, sales_wed, sales_thu, sales_fri, sales_sat FROM sales_input_table; -- INSERT ALL INTO sales2 (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date, sales_sun) INTO sales2 (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+1, sales_mon) INTO sales2 (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+2, sales_tue) INTO sales2 (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+3, sales_wed) INTO sales2 (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+4, sales_thu) INTO sales2 (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+5, sales_fri) INTO sales2 (prod_id, cust_id, time_id, amount) VALUES (product_id, customer_id, weekly_start_date+6, sales_sat) SELECT product_id, customer_id, weekly_start_date, sales_sun, sales_mon, sales_tue, sales_wed, sales_thu, sales_fri, sales_sat FROM sales_input_table;
或许大家都认为第一种方式会运行得更快。毕竟,第一种方式比第二种方式少了一遍对sales_input_table的扫描。
正确答案呢?It depends!
第二种方式不一定比第二种方式慢,相反地,第二种方式比第一种方式具有更好的“可扩展性”。
在第一种方式下,当表sales和sales2(可能还有sales3,sales4)是一个有着大量分区(例如1024个子分区)的分区表时,用并行方式执行这条语句将需要大量的内存。一个可能的后果就是系统内存会变得严重不足,最终可以引起OS频繁进行Memory的Swapping,导致整个OS失去响应。
对于第二种方式,从系统资源的角度看,这是一个CPU-Bound的过程,IO在这里并不是瓶颈。多一遍对sales_input_table的扫描其实并不会带来性能上的明显下降。
可以看出,这里我们需要对CPU, IO, Memory的资源利用情况进行综合的分析,才能最终确定一个最适宜的方式。并没有一个公式化的经验,证明multi_table_insert永远都是优于single table insert的。

最近评论