![]() HDFS is a write once file system and ORC is a write-once file format, so edits were implemented using base files and delta files where insert, update, and delete operations are recorded.Ģ. On OLTP requirements support means as, if any record is Deleted or updated then immediately this change would not be reflected in applications accessing data. ORC supports streaming ingest in to Hive tables where streaming applications like Flume or Storm could write data into Hive and have transactions commit once a minute and queries would either see all of a transaction or none of it. ![]() Although ORC support ACID transactions, they are not designed to support OLTP requirements. Usecase comparison between ORC and ParquetĪCID transactions are only possible when using ORC as the file format. More the number of columns the more advantageous is the columnar storage.īigdata is more of aggregate operations, applying MIN, MAX, SUM or any aggregation on a column is faster in columnar format as the control is directly acting upon column. Typically in a warehouse DB there would be 50+ columns in a table as the data would be in a normalized form. So aggregation applied on a particular column set is many times faster than applying aggregation applied on row based set. ![]() Not as beneficial when the input and outputs are about the same.Ĭolumnar is best for bigdata usecases as majority of analytical queries rely on aggregation kind of analysis. Please let me know if you need more information.Columnar is great when your input side is large, and your output is a filtered subset: from big to little is great. I am still investigating what is the best way to handle VARCHAR/CHAR types through Spark dataframe. I noticed some columns are defined as VARCHAR(35) and I think those columns may be the issue.Īfter I made the change from VARCHAR to String and CHAR to String, it worked fine. if use the same ORC but use hive to create a table using second query even then I am getting the same error. ![]() But I use an existing table alter table with a new coulmn using the Spark Hive context and save as ORC with snappy compression, I am getting the following error ORC does not support type conversion from STRING to VARCHAR. if I store ORC file with snappy compression and use hive to create table using script 1 then it is working fine. ![]() If use the first script using spark sql and store the file as ORC with snappy compression it is working. The original table create when we scooped the data from SQL server using SQOOP importĬREATE TABLE `testtabledim`( `person_key` bigint, `pat_last` varchar(35), `pat_first` varchar(35), `pat_dob` timestamp, `pat_zip` char(5), `pat_gender` char(1), `pat_chksum1` bigint, `pat_chksum2` bigint, `dimcreatedgmt` timestamp, `pat_mi` char(1), `h_keychksum` string, `patmd5` string) ROW FORMAT SERDE '.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT '.ql.io.orc.OrcInputFormat' OUTPUTFORMAT '.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://hdp-cent7-01:8020/apps/hive/warehouse/datawarehouse.db/testtabledim' TBLPROPERTIES ( 'COLUMN_STATS_ACCURATE'='false', 'last_modified_by'='hdfs', 'last_modified_time'='1469026541', 'numFiles'='1', 'numRows'='-1', 'orc.compress'='SNAPPY', 'rawDataSize'='-1', 'totalSize'='11144909', 'transient_lastDdlTime'='1469026541') A) The following is the show create table testtable results ( this table is created with Spark SQLĬREATE TABLE `testtabletmp1`( `person_key` bigint, `pat_last` string, `pat_first` string, `pat_dob` timestamp, `pat_zip` string, `pat_gender` string, `pat_chksum1` bigint, `pat_chksum2` bigint, `dimcreatedgmt` timestamp, `pat_mi` string, `h_keychksum` string, `patmd5` string) ROW FORMAT SERDE '.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT '.ql.io.orc.OrcInputFormat' OUTPUTFORMAT '.ql.io.orc.OrcOutputFormat' LOCATION 'hdfs://hdp-cent7-01:8020/apps/hive/warehouse/datawarehouse.db/testtabledimtmp1' | TBLPROPERTIES ( 'orc.compress'='SNAPPY', 'transient_lastDdlTime'='1469207216')Ģ. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |