Hive UDF also called as Hive
streaming, is used when the Hive’s built in functions are insufficient. For
example, in the need to extract the ordering of elements in an array is to
write a custom Java UDF that would take a lot of time. So it is easy to create
a script in another languages like Python, Perl and feed it into a hive query
using the function TRANSFORM.
Input data:-
Userid movieid rating unixtime
6 86 3 883603013
7 32 4 891350932
8 338 4 879361873
Now create a hive table and
load the input data into it as shown below -
hive > Create table
udf_data (userid int, movieid int, rating int, unixtime string)
Row format
delimited fields terminated by ‘\t’
Stored as
textfile;
hive > Load data local inpath
‘/home/training/Desktop/udf1.txt’ into table udf_data;
Python
UDF (udf1.py):-
Adding the Python file to the hdfs:-
hive > Add
file udf1.py;
Resource added.
The following query returns
the weekday of the unixtime by processing the above python UDF functions-
hive > SELECT TRANSFORM (userid, movieid, rating,
unixtime) USING 'python udf1.py' AS (userid, movieid, rating, weekday) FROM
udf_data;
6 86 3 3
7 32 4 2
8 338 4 3
Time_Taken: 20.894 seconds