Wednesday 12 August 2015

Partitioning in Hive


Partitioning of data is often used for distributing load horizontally, moving data physically closer to its most frequent users, and other purposes. Hive has the notion of partitioned tables for producing performance benefits, and they can help organize data in a logical fashion, such as hierarchically.

Query- Create table Partition2 (id1 int, name string) 
Partitioned by (dept string)
Row format delimited
Fields terminated by '\t';

You can view the partitions of a table by using SHOW PARTITIONS command:

Query- Show partitions Partition2;

You can also view the number of partitions created in HDFS by typing 

“localhost.localdomain:50075” in a web browser.

Loading data into partitioned table:-

Loading the data into the partitioned table can be of static or dynamic depending upon the data you load into the table.

Static Partition:-

Static partition requires individual loads of data to be loaded into the table for each partitions.
Query -load data local inpath '/home/training/Desktop/part1.txt' into table Partition1 partition (dept="cse");

 Dynamic Partition:-

Loading data into dynamic partitions allows single insert to the partitioned table. The following properties should be enabled while performing the dynamic partitions

Query - set hive.exec.dynamic.partiton=true;
set hive.exec.dynamic.partiton.mode=nonstrict;
set hive.exec.max.dynamic.partiton.pernode=1000;
load data local inpath '/home/training/Desktop/part1.txt' into table Partition1 partition (dept);

To view the data from a particular partition:-

Query -Show partitions clus2 partition (dept="cse");

It is also possible to add a new partition to the existing tables.

Query - Alter table clus2 add if not exists partition (year=2015);

To drop a partition from a table:-

Query - Alter table clus2 drop partition (dept=”cse”);

No comments:

Post a Comment