hive - standard process after ingesting data in hadoop -
i importing data oracle hadoop , want keep data hive.
what steps followed after ingesting data hadoop?
how perform data cleaning or error check in ingested data?
1. steps followed after ingesting data hadoop?
you don't need (importing data hadoop transferring hive)
as per docs,
you need add --hive-import
in import command.
change hive table
the table name used in hive is, default, same of source table. can control output table name
--hive-table
option.
overwrite hive table
if hive table exists, can specify
--hive-overwrite
option indicate existing table in hive must replaced
@sachin mentioned handling of null values in data. can check docs more details
2. how perform data cleaning or error check in ingested data?
i assume "data cleaning" mean cleaning data in hadoop.
after data imported hdfs or step omitted, sqoop generate hive script containing
create table
operation defining columns using hive’s types, ,load data inpath
statement move data files hive’s warehouse directory.
data moved hive. so, no data in temporary hdfs location.
some of common issues mentioned in troubleshooting docs. can check basis errors.
Comments
Post a Comment