您好,登錄后才能下訂單哦!
使用kylin的示例分析,相信很多沒有經驗的人對此束手無策,為此本文總結了問題出現的原因和解決方法,通過這篇文章希望你能解決這個問題。
### SERVICE ### # Kylin server mode, valid value [all, query, job] kyin.server.mode=all # Optional information for the owner of kylin platform, it can be your team's email # Currently it will be attached to each kylin's htable attribute kylin.owner=whoami@kylin.apache.org # List of web servers in use, this enables one web server instance to sync up with other servers. kylin.rest.servers=192.168.64.16:7070 # Display timezone on UI,format like[GMT+N or GMT-N] kylin.rest.timezone=GMT+8 ### SOURCE ### # Hive client, valid value [cli, beeline] kylin.hive.client=cli # Parameters for beeline client, only necessary if hive client is beeline #kylin.hive.beeline.params=-n root --hiveconf hive.security.authorization.sqlstd.confwhitelist.append='mapreduce.job.*|dfs.*' -u 'jdbc:hive2://localhost:10000' kylin.hive.keep.flat.table=false ### STORAGE ### # The metadata store in hbase kylin.metadata.url=kylin_metadata@hbase # The storage for final cube file in hbase kylin.storage.url=hbase # In seconds (2 days) kylin.storage.cleanup.time.threshold=172800000 # Working folder in HDFS, make sure user has the right access to the hdfs directory kylin.hdfs.working.dir=/kylin # Compression codec for htable, valid value [none, snappy, lzo, gzip, lz4] kylin.hbase.default.compression.codec=none # HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020 # Leave empty if hbase running on same cluster with hive and mapreduce kylin.hbase.cluster.fs=hdfs://master1:8020 # The cut size for hbase region, in GB. kylin.hbase.region.cut=5 # The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster. # Set 0 to disable this optimization. kylin.hbase.hfile.size.gb=2 kylin.hbase.region.count.min=1 kylin.hbase.region.count.max=500 ### JOB ### # max job retry on error, default 0: no retry kylin.job.retry=0 kylin.job.jar=$KYLIN_HOME/lib/kylin-job-1.5.4.jar kylin.coprocessor.local.jar=$KYLIN_HOME /lib/kylin-coprocessor-1.5.4.jar # If true, job engine will not assume that hadoop CLI reside on the same server as it self # you will have to specify kylin.job.remote.cli.hostname, kylin.job.remote.cli.username and kylin.job.remote.cli.password # It should not be set to "true" unless you're NOT running Kylin.sh on a hadoop client machine # (Thus kylin instance has to ssh to another real hadoop client machine to execute hbase,hive,hadoop commands) kylin.job.run.as.remote.cmd=false # Only necessary when kylin.job.run.as.remote.cmd=true kylin.job.remote.cli.hostname= kylin.job.remote.cli.port=22 # Only necessary when kylin.job.run.as.remote.cmd=true kylin.job.remote.cli.username= # Only necessary when kylin.job.run.as.remote.cmd=true kylin.job.remote.cli.password= # Used by test cases to prepare synthetic data for sample cube kylin.job.remote.cli.working.dir=/tmp/kylin # Max count of concurrent jobs running kylin.job.concurrent.max.limit=10 # Time interval to check hadoop job status kylin.job.yarn.app.rest.check.interval.seconds=10 # Hive database name for putting the intermediate flat tables kylin.job.hive.database.for.intermediatetable=default # The percentage of the sampling, default 100% kylin.job.cubing.inmem.sampling.percent=100 # Whether get job status from resource manager with kerberos authentication kylin.job.status.with.kerberos=false kylin.job.mapreduce.default.reduce.input.mb=500 kylin.job.mapreduce.max.reducer.number=500 kylin.job.mapreduce.mapper.input.rows=1000000 kylin.job.step.timeout=7200 ### CUBE ### # 'auto', 'inmem', 'layer' or 'random' for testing kylin.cube.algorithm=auto kylin.cube.algorithm.auto.threshold=8 kylin.cube.aggrgroup.max.combination=4096 kylin.dictionary.max.cardinality=5000000 kylin.table.snapshot.max_mb=300 ### QUERY ### kylin.query.scan.threshold=10000000 # 3G kylin.query.mem.budget=3221225472 kylin.query.coprocessor.mem.gb=3 # Enable/disable ACL check for cube query kylin.query.security.enabled=true kylin.query.cache.enabled=true ### SECURITY ### # Spring security profile, options: testing, ldap, saml # with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login kylin.security.profile=testing ### SECURITY ### # Default roles and admin roles in LDAP, for ldap and saml acl.defaultRole=ROLE_ANALYST,ROLE_MODELER acl.adminRole=ROLE_ADMIN # LDAP authentication configuration ldap.server=ldap://ldap_server:389 ldap.username= ldap.password= # LDAP user account directory; ldap.user.searchBase= ldap.user.searchPattern= ldap.user.groupSearchBase= # LDAP service account directory ldap.service.searchBase= ldap.service.searchPattern= ldap.service.groupSearchBase= ## SAML configurations for SSO # SAML IDP metadata file location saml.metadata.file=classpath:sso_metadata.xml saml.metadata.entityBaseURL=https://hostname/kylin saml.context.scheme=https saml.context.serverName=hostname saml.context.serverPort=443 saml.context.contextPath=/kylin ### MAIL ### # If true, will send email notification; mail.enabled=false mail.host= mail.username= mail.password= mail.sender= ### WEB ### # Help info, format{name|displayName|link}, optional kylin.web.help.length=4 kylin.web.help.0=start|Getting Started| kylin.web.help.1=odbc|ODBC Driver| kylin.web.help.2=tableau|Tableau Guide| kylin.web.help.3=onboard|Cube Design Tutorial| # Guide user how to build streaming cube kylin.web.streaming.guide=http://kylin.apache.org/ # Hadoop url link, optional kylin.web.hadoop= #job diagnostic url link, optional kylin.web.diagnostic= #contact mail on web page, optional kylin.web.contact_mail= crossdomain.enable=true
./bin/find-hive-dependency.sh
看Hive環境是否配置正確,提示找不到HCAT_HOME路徑。解決方法:export HCAT_HOME=$HIVE_HOME/hcatalog
然后重新運行腳本
解決方法:
vi ./bin/kylin.sh
需要對此腳本做兩點修改:
1. export KYLIN_HOME=/home/grid/kylin # 改成絕對路徑
2. export HBASE_CLASSPATH_PREFIX=${tomcat_root}/bin/bootstrap.jar:${tomcat_root}/bin/tomcat-juli.jar:${tomcat_root}/lib/*:$hive_dependency:$HBASE_CLASSPATH_PREFIX # 在路徑中添加$hive_dependency
官方doc給出解決思路:Kylin是采用Spring security framework做用戶認證的,需要配置${KYLIN_HOME}/tomcat/webapps/kylin/WEB-INF/classes/kylinSecurity.xml 的sandbox,testing
部分
<beans profile="sandbox,testing"> <scr:authentication-manager alias="authenticationManager"> <scr:authentication-provider> <scr:user-service> ... <scr:user name="ADMIN" password="$2a$10$o3ktIWsGYxXNuUWQiYlZXOW5hWcqyNAFQsSSCSEWoC/BRVMAUjL32" authorities="ROLE_MODELER, ROLE_ANALYST, ROLE_ADMIN" /> <scr:user name="xxx" password="xxx" authorities="ROLE_MODELER, ROLE_ANALYST, ROLE_ADMIN" /> ...
password需要spring加密:
<dependency> <groupId>org.springframework.security</groupId> <artifactId>spring-security-core</artifactId> <version>4.0.0.RELEASE</version> </dependency>
String password = "123456"; org.springframework.security.crypto.password.PasswordEncoder encoder = new org.springframework.security.crypto.bcrypt.BCryptPasswordEncoder(); String encodedPassword = encoder.encode(password); System.out.print(encodedPassword);
莫名其妙的錯誤,在kylin.log看不到root cause,需要去hive配置的log查看(log4j中設置,默認目錄是/tmp/$user/),找到原因是error message: "Error: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z"
原來是壓縮格式的問題,kylin默認并沒有采用hadoop的lzo壓縮格式,而是采用了snappy。
有3個解決方案:
1.用kylin apache-kylin-1.5.2.1-HBase1.x-bin.tar.gz 代替apache-kylin-1.5.2.1-bin.tar.gz重新部署,因為我用的是hbase0.98,所以被pass。。
2. 改換成lzo壓縮,要麻煩一點,具體查考http://kylin.apache.org/docs15/install/advance_settings.html
3. hive和hbase不采用壓縮(cube build時間也許會變長,具體自行評估),在配置文件conf/kylin.properties和conf/*.xml (grep snappy),然后全部刪掉snappy和compress的配置。
解決方案:
這個issue花費了太多時間,網上查到都說是yarn端口配置問題,但是我修改了yarn-site.xml之后還是不行。后來又以為是hive metastore server的問題。但是修改了之后還是同樣問題。
沒辦法,我最后只好換成HBase 1.1.6,同時kylin版本也要找到對應hbase1.x版。問題解決。。。
解決方法:
這個問題真的困擾了好幾天,我kylin.coprocessor.local.jar=/../kylin/lib/kylin-coprocessor-1.5.4.jar 已經配好了的,網上的解決方法是find-hbase-dependency.sh腳本里 hbase_dependency=絕對路徑/habse-1.1.6/lib,但貌似還是沒什么用。
最后是完全刪除了hdfs上hbase的數據,重啟hbase才成功。估計還是cube創建過程中出現了什么問題,以后再考證吧。
有了處理count distinct的問題的經驗,我們發現,針對Kylin sql列出如下的區別:
不能limit beg, end 只能limit length
不支持 union, union all
不支持 where exists 子句
Kylin在創建cube過程中會在HDFS上生成很多的中間數據。另外,當我們對cube執行build/drop/merge時,一些HBase的表可能會保留在HBase中,而這些表不再被查詢,所以需要我們能夠每隔一段時間做一些離線存儲的清理工作。具體步驟如下:
1. 檢查哪些資源需要被清理,這個操作不會刪除任何內容:
${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob --delete false
Time taken: 1.339 seconds OK kylin_intermediate_kylin_sales_cube_desc_2b8ea1a6_99f4_4045_b0f5_22372b9ffc60 kylin_intermediate_weibo_cube_0d26f9e5_0935_409a_9a6d_6c1d03773fbd kylin_intermediate_weibo_cube_1d21fe49_990c_4a34_9267_e693421689f2 Time taken: 0.33 seconds, Fetched: 3 row(s) ------ Intermediate Hive Tables To Be Dropped ------ ---------------------------------------------------- 2016-10-12 15:24:25,881 INFO [main CubeManager:132]: Initializing CubeManager with config kylin_metadata@hbase 2016-10-12 15:24:25,897 INFO [main CubeManager:828]: Loading Cube from folder kylin_metadata(key='/cube')@kylin_metadata@hbase 2016-10-12 15:24:25,952 INFO [main CubeDescManager:91]: Initializing CubeDescManager with config kylin_metadata@hbase 2016-10-12 15:24:25,952 INFO [main CubeDescManager:197]: Reloading Cube Metadata from folder kylin_metadata(key='/cube_desc')@kylin_metadata@hbase 2016-10-12 15:24:26,035 DEBUG [main CubeDescManager:222]: Loaded 2 Cube(s) 2016-10-12 15:24:26,038 DEBUG [main CubeManager:870]: Reloaded new cube: userlog_cube with reference beingCUBE[name=userlog_cube] having 1 segments:KYLIN_WEK77BKP6M 2016-10-12 15:24:26,040 DEBUG [main CubeManager:870]: Reloaded new cube: weibo_cube with reference beingCUBE[name=weibo_cube] having 1 segments:KYLIN_5N8ZRC7Z1F 2016-10-12 15:24:26,040 INFO [main CubeManager:841]: Loaded 2 cubes, fail on 0 cubes 2016-10-12 15:24:26,218 INFO [main StorageCleanupJob:218]: Skip /kylin/kylin_metadata/kylin-779df736-75b0-4263-b045-6a49401b4516 from deletion list, as the path belongs to segment userlog_cube[19700101000000_20160930000000] of cube userlog_cube 2016-10-12 15:24:26,218 INFO [main StorageCleanupJob:218]: Skip /kylin/kylin_metadata/kylin-e9805d06-559a-4c15-ab1e-d6e947460093 from deletion list, as the path belongs to segment weibo_cube[19700101000000_20140430000000] of cube weibo_cube --------------- HDFS Path To Be Deleted --------------- /kylin/kylin_metadata/kylin-07e8f9b1-8dfc-4c57-8e5b-e9800392af0d /kylin/kylin_metadata/kylin-0855f8ed-89a5-4676-a9bb-f8c301ead327 /kylin/kylin_metadata/kylin-0cdef491-d0b7-438d-ba54-091678cb463d /kylin/kylin_metadata/kylin-121752c8-ab9d-434b-812f-73f766796436 /kylin/kylin_metadata/kylin-12b442a0-0c6d-43e7-830f-2f6e5826f23a /kylin/kylin_metadata/kylin-5ba7affe-d584-4f6e-85b2-2588e31a985c /kylin/kylin_metadata/kylin-5e1818bd-4644-4e8e-b332-b5bb59ff9677 /kylin/kylin_metadata/kylin-680f7549-48be-496a-82c5-084434bfee74 /kylin/kylin_metadata/kylin-707d1a65-392e-456f-97ea-d7d553b52950 /kylin/kylin_metadata/kylin-7520fc6e-8b76-43cc-9fb8-bfba969040da /kylin/kylin_metadata/kylin-75e5b484-4594-4d31-83ce-729a6b3de1c2 /kylin/kylin_metadata/kylin-79535d79-cd36-4711-858c-d8fa28266f7f /kylin/kylin_metadata/kylin-81eb9119-c806-4003-a6d6-fc43281a8c01 /kylin/kylin_metadata/kylin-839e80d8-d116-4061-80d6-379c85db7114 /kylin/kylin_metadata/kylin-843b185d-ed09-48c7-958c-1ee1e0e2cde5 /kylin/kylin_metadata/kylin-97c0cdc6-c53e-4115-995e-b90f4381d307 /kylin/kylin_metadata/kylin-998aa0aa-279c-44f0-8367-807b9110ae74 /kylin/kylin_metadata/kylin-ad2ad0c7-bee5-46f2-9fc3-e60b10941ffa /kylin/kylin_metadata/kylin-b5939b9b-2a6e-4acb-aaf7-888a83113ad7 /kylin/kylin_metadata/kylin-b65b555d-90e5-4455-95ce-10b215b00482 /kylin/kylin_metadata/kylin-d5ac36b3-b021-4ac6-87ae-f3a38f90eb06 /kylin/kylin_metadata/kylin-e7a9b0d1-a788-4ddf-88f5-37671eaa7dc3 /kylin/kylin_metadata/kylin-f7094827-00f8-474b-9542-ea001797a148 ------------------------------------------------------- 2016-10-12 15:24:26,475 INFO [main StorageCleanupJob:91]: Exclude table KYLIN_WEK77BKP6M from drop list, as it is newly created 2016-10-12 15:24:26,475 INFO [main StorageCleanupJob:102]: Exclude table KYLIN_5N8ZRC7Z1F from drop list, as the table belongs to cube weibo_cube with status READY --------------- Tables To Be Dropped --------------- ----------------------------------------------------
2. 如上圖所示,列出了在hive/HDFS/Hbase中可以被刪除的表或文件(同時自動過濾掉最近生成或者查詢過的表)。 根據上面的輸出結果,查看表是否真的不再需要。確定之后,用1的命令把“–delete false”改成true就開始執行清理操作。
${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob --delete true
看完上述內容,你們掌握使用kylin的示例分析的方法了嗎?如果還想學到更多技能或想了解更多相關內容,歡迎關注億速云行業資訊頻道,感謝各位的閱讀!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。