Wednesday 12 August 2015

Views and Indexes in hive

Views:-

Views are a logical construct that can be used to simplify queries by either abstracting away complexities such as joins or sub-queries or by pre-filtered or restricting data.

Input file:-
1, San Jose, Men’s Clothing, 214.05, Amex
2, Fort Worth, Women’s Clothing, 153.57, Visa
3, San Diego, Music, 66.08, Cash
4, Pittsburgh, Pet Supplies, 493.51, Discover
5, New York, Consumer Electronics, 296.8, Cash
6, Corpus Christi, Toys, 25.38, Discover
7, Fort Worth, Toys, 213.88, Visa
8, Greensboro, DVDs, 290.82, MasterCard
9, Riverside, Consumer Electronics, 252.88, Cash
10,Reno,Crafts,88.25,Visa

The following query creates a managed table for which the view is to be applied -

Create table view_sample (id int, city string, particular string, amount float, cardtype string)
Row format delimited fields terminated by ','
Stored as textfile;

The following query loads data into table view_sample-

Load data local inpath 'views' overwrite into table view_sample;

The below query creates a view of all data for the amount greater than 100 -

Create view view1 as select * from view_sample where amount>100;

Now retrieving the rows from the created view-

Select * from view1;

Output of the view:-
1          San Jose           Men's Clothing            214.05 Amex
2          Fort Worth      Women's Clothing      153.57 Visa
4          Pittsburgh        Pet Supplies     493.51 Discover
5          New York         Consumer Electronics 296.8   Cash
7          Fort Worth      Toys     213.88 Visa
8          Greensboro     DVDs   290.82 MasterCard
9          Riverside         Consumer Electronics 252.88 Cash
Time taken: 19.958 seconds

Dropping a view:-

Drop view view1;

Index:-

Indexes are a critical for efficiently retrieving rows from tables in less time. The following query creates an index for a table column amount-

Create index index_amount on table view_sample (amount) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild;

WITH DEFERRED REBUILD portion of the command prevents the index from immediately being built.

Now retrieving the rows of a particular column for which the index is been built -

Select amount from view_sample;

Output of the indexed column:-
214.05
153.57
66.08
493.51
296.8
25.38
213.88
290.82
252.88
88.25
Time taken: 16.026 seconds

Dropping an index:-

Drop index index_amount on view_sample;

2 comments: