本文共 2536 字,大约阅读时间需要 8 分钟。
当hive提供的内置函数无法满足业务需求时,可以考虑使用用户自定义函数(User-Defined Function,UDF)
hive中常见的UDF有如下3种注意: 1.UDF必须要有返回值,可以是null,但是不能为void 2.推荐使用Text/LongWritable等hadoop类型
1)新建maven工程,在pom.xml中添加UDF函数开发的依赖包,如下:
cloudera https://repository.cloudera.com/artifactory/cloudera-repos/ junit junit 4.12 test org.apache.hadoop hadoop-common 2.6.0 org.apache.hive hive-exec 1.1.0 org.apache.hive hive-jdbc 1.1.0
2)自定义UDF函数实现
package sunyong.hive;import org.apache.hadoop.hive.ql.exec.UDF;import org.apache.hadoop.io.Text;/** * @author sunyong * @date 2020/07/12 * @description * 功能:输入xxx 输出:Hello: xxx */public class HelloUDF extends UDF { public Text evaluate(Text name){ return new Text("Hello : "+name); } public static void main(String[] args) { HelloUDF udf = new HelloUDF(); System.out.println(udf.evaluate(new Text("张三"))); }}
3)编译jar包上传到linux本地hive的lib目录下
4)将自定义UDF函数添加到hive中去,即在hive命令行模式中执行如下命令:add jar /opt/install/hive/lib/UDF.jar;
#语法add jar jar包绝对路径名;
5)创建函数(若在function前面有temporary,表示临时函数退出会话将会删除该函数):create function sayHello as 'sunyong.hive.HelloUDF';
#语法create [temporary] function sayHello as '全类名(包名.类名)';
6)使用函数:select sayHello(emp_name) from employee;
,效果如下:
hive> create temporary function sayHello as "sunyong.hive.HelloUDF";FAILED: Class HelloUDF not foundFAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
hive> create function sayHello as 'sunyong.hive.HelloUDF';Failed to register acid_demo.sayhello using class sunyong.hive.HelloUDFFAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
解决方法:
#下载zip包yum install zip#以下命令会删除先前jar的签名文件(-d后面参数是自己jar包的名字)zip -d UDF.jar 'META-INF/.SF' 'META-INF/.RSA' 'META-INF/*SF'
转载地址:http://tcjxi.baihongyu.com/