hive - standard process after ingesting data in hadoop -
i importing data oracle hadoop , want keep data hive.
what steps followed after ingesting data hadoop?
how perform data cleaning or error check in ingested data?
1. steps followed after ingesting data hadoop?
you don't need (importing data hadoop transferring hive)
as per docs,
you need add --hive-import in import command.
change hive table
the table name used in hive is, default, same of source table. can control output table name
--hive-tableoption.
overwrite hive table
if hive table exists, can specify
--hive-overwriteoption indicate existing table in hive must replaced
@sachin mentioned handling of null values in data. can check docs more details
2. how perform data cleaning or error check in ingested data?
i assume "data cleaning" mean cleaning data in hadoop.
after data imported hdfs or step omitted, sqoop generate hive script containing
create tableoperation defining columns using hive’s types, ,load data inpathstatement move data files hive’s warehouse directory.
data moved hive. so, no data in temporary hdfs location.
some of common issues mentioned in troubleshooting docs. can check basis errors.
Comments
Post a Comment