-
Notifications
You must be signed in to change notification settings - Fork 129
Deployment
- JRE 1.8+
- Apache Hadoop v2.X
- Apache Hive v0.14+
- Apache Drill v1.8+. You can visit http://drill.apache.org/docs/ to learn how to deploy and use it.
- Apache Zookeeper. We use v3.4.8, consider using an adaptable version.
- Apache Kafka v0.8.0+. Optional. Used for realtime ingestion.
Note - Please make sure you have all those requirements installed corrrectly before deploying IndexR.
- Copy the correct lib file in
indexr-<version>/libto/usr/local/lib/on all cluster nodes, including those nodes where you may run Hive or indexr-tool scripts.
e.g. On Linux platform, you should use the
libbhcompress.sofile.
- Edit
${HADOOP_HOME}/etc/hadoop/mapred-site.xml, add/usr/local/libtoLD_LIBRARY_PATHinmapred.child.envparameter. e.g.
<property>
<name>mapred.child.env</name>
<value>LD_LIBRARY_PATH=/usr/local/lib</value>
</property>
- Edit
${HADOOP_HOME}/etc/hadoop/hadoop-env.sh, add/usr/local/libtoLD_LIBRARY_PATH. e.g.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
- Copy IndexR Hive aux jars
indexr-<version>/indexr-hive/aux/*to Hive'sHIVE_AUX_JARS_PATH.HIVE_AUX_JARS_PATHcan be set in${HIVE_HOME}/conf/hive-env.sh. e.g.
cp -r indexr-<version>/indexr-hive/aux/* /usr/local/hive/aux/
- [Optional] Sometimes you will need to upload those hive aux jars to HDFS in the same path. e.g.
hdfs dfs -put /usr/local/hive/aux/* /usr/local/hive/aux/
- Restart HiveServer2 if you have.
- Now you should be able to create an IndexR hive table via Hive console. e.g.
hive (default)> CREATE EXTERNAL TABLE IF NOT EXISTS test (
`date` int,
`d1` string,
`m1` int,
`m2` bigint,
`m3` float,
`m4` double
)
PARTITIONED BY (`dt` string)
ROW FORMAT SERDE 'io.indexr.hive.IndexRSerde'
STORED AS INPUTFORMAT 'io.indexr.hive.IndexRInputFormat'
OUTPUTFORMAT 'io.indexr.hive.IndexROutputFormat'
LOCATION '/indexr/segment/test'
;
hive (default)> insert into table test partition (dt=20160701) values(20160701,'mac',100,192444,1.55,-331.43555);
hive (default)> select * from test limit 10;
indexr-tool is a tool box to manage IndexR. It only need to deploy on one node, usaully your manage node.
-
Copy
indexr-<version>/indexr-toolto a path, like/usr/local/indexr-tool -
Copy
${HADOOP_CONF}/core-site.xmland${HADOOP_CONF}/hdfs-site.xmltoconffolder. -
Modify configurations in
conffolder. Especially theindexr.fs.connectionsetting inindexr.config.properties, make sure it is set to the same value asfs.defaultFSincore-site.xml.env.sh indexr.config.properties log4j.xml
- Copy all files in
indexr-<version>/indexr-drill/*to Drill installation home dir${DRILL_HOME}/, for example,/usr/local/drill. Do it on all Drill nodes in cluster. - Copy
drill-indexr-storage-<version>.jarto${DRILL_HOME}/jars/. - Modify configuration
${DRILL_HOME}/conf/indexr.config.properties. It should be keep in sync with indexr-tool and all Drillbit nodes. - Copy
${HADOOP_CONF}/core-site.xmland${HADOOP_CONF}/hdfs-site.xmlto${DRILL_HOME}/conffolder if not yet exists. - Modify
${DRILL_HOME}/conf/drill-env.sh, add/usr/local/libtoLD_LIBRARY_PATH. e.g.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib
-
Synchronize
${DRILL_HOME}/confto all Drillbit nodes and restart them. -
Go to Drill Web Console, create an new Storage call indexr, and input the follow text, click Create. You only need to do this once in one of your Drill Web Console in the cluster.
{
"type": "indexr",
"enabled": true
}
Node that IndexR plugin can only create one storage in a Drill cluster, so you should always use the name indexr
- Now you can create an IndexR table and enjoy.
Create IndexR table by indexr-tool
cd ${INDEXR-TOOL_HOME}
bin/tools.sh -cmd settb -t test -c test_schema.jsontest_schem.json:
{
"schema":{
"columns":
[
{"name": "date", "dataType": "int"},
{"name": "d1", "dataType": "string"},
{"name": "m1", "dataType": "int"},
{"name": "m2", "dataType": "bigint"},
{"name": "m3", "dataType": "float"},
{"name": "m4", "dataType": "double"}
]
}
}Do some query via Drill console
cd ${DRILL_HOME}
bin/drill-conf
0: jdbc:drill:> select * from indexr.test limit 10;Node: We only support Spark 2.1.0+.
Simple copy all jars in indexr-<version>/indexr-spark/jars/* to ${SPARK_HOME}/jars/, you are good to go.