The following figure
provides a logical view of using the
SAS/ACCESS Interface to Hadoop
to access a Hive Server. The Hive Server is shown running on the same
machine as the Hadoop NameNode.
Setting up a connection
from SAS to a Hadoop Server is a two-stage process:
-
Register the Hadoop
Server.
-
Register the Hadoop
via Hive library.
This example shows the
process for establishing a SAS connection to a Hive Server. In order
for the
SAS/ACCESS Interface to connect with the Hive Server, the
machine that is used for the SAS Workspace Server must be configured
with several JAR files. These JAR files are used to make a JDBC connection
to the Hive Server. The following prerequisites have been satisfied:
-
installation of
SAS/ACCESS Interface
to Hadoop. For configuration information, see the Install Center at
http://support.sas.com/documentation/installcenter/93 and use the operating system and SAS version to locate
the appropriate SAS Foundation Configuration Guide.
-
installation of the Hadoop JAR
files required by SAS.
-
setting the SAS_HADOOP_JAR_PATH
environment variable.
This section describes
the steps that are used to access data in Hadoop as tables through
a Hive Server. SAS Data Integration Studio offers a series of transformations
that can be used to access the Hadoop Distributed File System (HDFS),
submit Pig code, and submit MapReduce jobs.