[Backlogmanager] [FIWARE-JIRA] (HELP-15764) [fiware-stackoverflow] How to match column between HDFS File(save through cygnus) to Hive table?

Fernando Lopez (JIRA) jira-help-desk at jira.fiware.org
Thu May 2 13:13:01 CEST 2019


     [ https://jira.fiware.org/browse/HELP-15764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fernando Lopez reassigned HELP-15764:
-------------------------------------

    Assignee: Fermín Galán

> [fiware-stackoverflow] How to match column between HDFS File(save through cygnus) to Hive table?
> ------------------------------------------------------------------------------------------------
>
>                 Key: HELP-15764
>                 URL: https://jira.fiware.org/browse/HELP-15764
>             Project: Help-Desk
>          Issue Type: Monitor
>          Components: FIWARE-TECH-HELP
>            Reporter: Backlog Manager
>            Assignee: Fermín Galán
>              Labels: fiware, fiware-cygnus, hdfs, hive
>
> Created question in FIWARE Q/A platform on 02-05-2019 at 10:05
> {color: red}Please, ANSWER this question AT{color} https://stackoverflow.com/questions/55948021/how-to-match-column-between-hdfs-filesave-through-cygnus-to-hive-table
> +Question:+
> How to match column between HDFS File(save through cygnus) to Hive table?
> +Description:+
> I want save some data in hive through fiware-cygnus,
> I set the conf file,
> cygnus-ngsi.sources = http-source
> cygnus-ngsi.sinks = hdfs-sink
> cygnus-ngsi.channels = hdfs-channel
> cygnus-ngsi.sources.http-source.channels = hdfs-channel
> cygnus-ngsi.sources.http-source.type = org.apache.flume.source.http.HTTPSource
> cygnus-ngsi.sources.http-source.port = 5050
> cygnus-ngsi.sources.http-source.handler = com.telefonica.iot.cygnus.handlers.NGSIRestHandler
> cygnus-ngsi.sources.http-source.handler.notification_target = /notify
> cygnus-ngsi.sources.http-source.handler.default_service = def_serv
> cygnus-ngsi.sources.http-source.handler.default_service_path = /def_servpath
> cygnus-ngsi.sources.http-source.handler.events_ttl = 2
> cygnus-ngsi.sources.http-source.interceptors = ts gi
> cygnus-ngsi.sources.http-source.interceptors.ts.type = timestamp
> cygnus-ngsi.sources.http-source.interceptors.gi.type = com.telefonica.iot.cygnus.interceptors.NGSIGroupingInterceptor$Builder
> cygnus-ngsi.sources.http-source.interceptors.gi.grouping_rules_conf_file = /Applications/apache-flume-1.4.0-bin/conf/grouping_rules.confi
> cygnus-ngsi.channels.hdfs-channel.type = memory
> cygnus-ngsi.channels.hdfs-channel.capacity = 1000
> cygnus-ngsi.channels.hdfs-channel.transactionCapacity = 100
> cygnus-ngsi.sinks.hdfs-sink.type = com.telefonica.iot.cygnus.sinks.NGSIHDFSSink
> cygnus-ngsi.sinks.hdfs-sink.channel = hdfs-channel
> cygnus-ngsi.sinks.hdfs-sink.enable_encoding = true
> cygnus-ngsi.sinks.hdfs-sink.enable_grouping = false
> cygnus-ngsi.sinks.hdfs-sink.enable_lowercase = false
> cygnus-ngsi.sinks.hdfs-sink.enable_name_mappings = false
> cygnus-ngsi.sinks.hdfs-sink.data_model = dm-by-entity
> cygnus-ngsi.sinks.hdfs-sink.file_format = json-row
> cygnus-ngsi.sinks.hdfs-sink.backend.impl = rest
> cygnus-ngsi.sinks.hdfs-sink.backend.max_conns = 500
> cygnus-ngsi.sinks.hdfs-sink.backend.max_conns_per_route = 100
> cygnus-ngsi.sinks.hdfs-sink.hdfs_host = [hdfs_host]
> cygnus-ngsi.sinks.hdfs-sink.hdfs_port = 50070
> cygnus-ngsi.sinks.hdfs-sink.hdfs_username = [username]
> cygnus-ngsi.sinks.hdfs-sink.hdfs_password = [password]
> cygnus-ngsi.sinks.hdfs-sink.oauth2_token = [token]
> cygnus-ngsi.sinks.hdfs-sink.service_as_namespace = false
> cygnus-ngsi.sinks.hdfs-sink.batch_size = 100
> cygnus-ngsi.sinks.hdfs-sink.batch_timeout = 30
> cygnus-ngsi.sinks.hdfs-sink.batch_ttl = 10
> cygnus-ngsi.sinks.hdfs-sink.batch_retry_intervals = 5000
> cygnus-ngsi.sinks.hdfs-sink.hive = true
> cygnus-ngsi.sinks.hdfs-sink.hive.server_version = 2
> cygnus-ngsi.sinks.hdfs-sink.hive.host = [hive_host]
> cygnus-ngsi.sinks.hdfs-sink.hive.port = 10000
> cygnus-ngsi.sinks.hdfs-sink.hive.db_type = default-db
> cygnus-ngsi.sinks.hdfs-sink.krb5_auth = false
> and I found HDFS file created,
> {"recvx0054imex0054s":"1556776186","recvx0054ime":"2019-05-02T05:49:46.771Z","fiwarex0053ervicex0050ath":"/LightInfo","entityx0049d":"bedroom3_light","entityx0054ype":"Light","attrx004eame":"lux","attrx0054ype":"Float","attrx0056alue":"7","attrx004dd":[]}
> {"recvx0054imex0054s":"1556776186","recvx0054ime":"2019-05-02T05:49:46.771Z","fiwarex0053ervicex0050ath":"/LightInfo","entityx0049d":"bedroom3_light","entityx0054ype":"Light","attrx004eame":"status","attrx0054ype":"string","attrx0056alue":"off","attrx004dd":[]}
> {"recvx0054imex0054s":"1556776989","recvx0054ime":"2019-05-02T06:03:09.962Z","fiwarex0053ervicex0050ath":"/LightInfo","entityx0049d":"bedroom3_light","entityx0054ype":"Light","attrx004eame":"lux","attrx0054ype":"Float","attrx0056alue":"7","attrx004dd":[]}
> {"recvx0054imex0054s":"1556776989","recvx0054ime":"2019-05-02T06:03:09.962Z","fiwarex0053ervicex0050ath":"/LightInfo","entityx0049d":"bedroom3_light","entityx0054ype":"Light","attrx004eame":"status","attrx0054ype":"string","attrx0056alue":"off","attrx004dd":[]}
> but, In hive, all data show null values.
> hive> select * from ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row;
> chmod: changing permissions of 'hdfs://hadoop01:9000/tmp/hive/hive/ae8cd691-6e96-428d-8ece-02f6e6fbf527/hive_2019-05-02_16-37-44_821_8122085527913292696-1/-mr-10001': Permission denied: user=hive, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwx-w----
> OK
> ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.recvx0054imex0054s ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.recvx0054ime       ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.fiwarex0053ervicex0050ath      ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.entityx0049d       ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.entityx0054ype     ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.attrx004eame   ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.attrx0054ype       ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.attrx0056alue ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row.attrx004dd
> NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
> NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
> NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
> NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
> Time taken: 0.26 seconds, Fetched: 4 row(s)
> External Table creating query in Cygnus Debug Console Message is,
> create external table 
>     if not exists default.ngsi_x004cightx002fx004cightx0049nfox002fbedroom_light_x004cight_row
>     (recvx0054imex0054s bigint,
>     recvx0054ime string,
>     fiwarex0053ervicex0050ath string,
>     entityx0049d string,
>     entityx0054ype string,
>     attrx004eame string,
>     attrx0054ype string,
>     attrx0056alue string,
>     attrx004dd array<struct<name:string,type:string,value:string>>) 
>     row format serde 'org.openx.data.jsonserde.JsonSerDe' with serdeproperties ("dots.in.keys" = "true") 
>     location '/user/ngsi/Light/LightInfo/bedroom_light_Light'
> I guess the difference between hive table column name and HDFS File Attribute Name is the problem.
> Hive table column name used encoding name, but hdfs file attribute name use not encoding name.(ex: recvTimeTs - recvx0054imex0054s)
> so I created external table and tested.
> create external table if not exists default.test01 (recvx0054imex0054s bigint,  recvx0054ime string,    fiwarex0053ervicex0050ath string,   entityx0049d string,    entityx0054ype string,  attrx004eame string,    attrx0054ype string,    attrx0056alue string,attrx004dd array<struct<name:string,type:string,value:string>>) row format serde 'org.openx.data.jsonserde.JsonSerDe' with serdeproperties ("dots.in.keys" = "true") location '/user/ngsi/Light/LightInfo/bedroom_light_Light'
> Result, show right values.
> hive> select * from test01;
> chmod: changing permissions of 'hdfs://hadoop01:9000/tmp/hive/hive/bec8d67d-7d94-4cdc-a577-14d990ca342c/hive_2019-05-02_16-42-59_290_896718721577759467-1/-mr-10001': Permission denied: user=hive, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwx-w----
> OK
> 1556776186      2019-05-02T05:49:46.771Z        /LightInfo      bedroom3_light  Light   lux     Float   7       []
> 1556776186      2019-05-02T05:49:46.771Z        /LightInfo      bedroom3_light  Light   status  string  off     []
> 1556776989      2019-05-02T06:03:09.962Z        /LightInfo      bedroom3_light  Light   lux     Float   7       []
> 1556776989      2019-05-02T06:03:09.962Z        /LightInfo      bedroom3_light  Light   status  string  off     []
> Time taken: 2.329 seconds, Fetched: 4 row(s)
> How to match between Hive Table Column name and HDFS File Attribute name?
> or How to use cygnus and hive other way?



--
This message was sent by Atlassian JIRA
(v6.4.1#64016)


More information about the Backlogmanager mailing list

You can get more information about our cookies and privacy policies clicking on the following links: Privacy policy   Cookies policy