What's New in SAS Data Integration Studio 4.5
Overview
The main enhancements
for SAS Data Integration Studio 4.5 include the following:
-
-
Experimental High-Performance Analytics
Components
-
New Business Rules Transformation
-
Support for Hadoop
Hadoop is an open-source
software project that supports scalable, distributed computing. The
following transformations support the use of Hadoop Clusters in the
context of SAS Data Integration Studio jobs:
-
The Hadoop Container transformation
enables you to connect all the sources and targets for the various
steps in a container step. This container step allows for one connection
to the Hadoop Cluster in the context of a SAS Data Integration Studio
job. Then, all of the steps that are included in the container are
submitted during the connection.
-
The Hadoop File Reader and Hadoop
File Writer transformations support reading and writing files from
and to the Hadoop Cluster into SAS in the context of a SAS Data Integration
Studio job.
-
The Hive transformation supports
submitting Hive code to the Hadoop Cluster in the context of a SAS
Data Integration Studio job. Hive is a data warehouse system for Hadoop.
You can easily summarize data, run ad hoc queries, and generate the
analysis of large data sets stored in file systems that are compatible
with Hadoop. Hive also enables you to project structure onto this
data and query the data by using an SQL-like language called HiveQL.
-
The Map Reduce transformation supports
submitting MapReduce code to the Hadoop Cluster in the context of
a SAS Data Integration Studio job. Hadoop MapReduce enables you to
write applications that reliably process vast amounts of data in parallel
on large clusters. A MapReduce job splits the input data set into
chunks that are processed by the map tasks in parallel. The outputs
of the maps are sorted and then input to the reduce tasks. The input
and the output of the job are typically stored in a file system.
-
The Pig transformation supports
submitting Pig code to the Hadoop Cluster in the context of a SAS
Data Integration Studio job. The transformation contains an enhanced,
color-coded editor specific to the Pig Latin language. Pig Latin is
a high-level language used for expressing and evaluating data analysis
programs. Pig Latin supports substantial parallelization and can handle
very large data sets.
-
The Transfer From and Transfer
To transformations support the transfer of data from and to the Hadoop
Cluster in the context of a SAS Data Integration Studio job.
-
The
Hadoop Monitor items
in the
Tools menu enable you to run reports
that monitor the performance of a Hadoop Cluster.
-
The
Hive Source
Designer enables you to register tables in a Hive database.
Experimental High-Performance Analytics Components
SAS® LASR™
Analytic Server is a direct-access, NoSQL, NoMDX server that is engineered
for maximum analytic performance through multithreading and distributed
computing. SAS Data Integration Studio provides the following experimental
High-Performance Analytics transformations for SAS LASR Analytic Servers:
-
The SAS Data in HDFS Loader transformation
is used to stage data into a Hadoop cluster.
-
The SAS Data in HDFS Unloader transformation
loads data from Hadoop into a SAS LASR Analytic Server.
-
The SAS LASR Analytic Server Loader
transformation loads data to a SAS LASR Analytic server.
-
The SAS LASR Analytic Server Unloader
transformation unloads data that has previously been loaded into a
SAS LASR Analytic Server.
Source Designer wizards
are used to register tables on the SAS Metadata Server. SAS Data Integration
Studio provides the following experimental Source Designers for High-Performance
Analytics tables:
-
The
SAS Data in HDFS Source
Designer enables you to register SAS tables in a Hadoop Cluster.
-
The
SAS LASR Analytic
Server Source Designer enables you to register SAS LASR
Analytic tables.
For more information
about these experimental components, contact SAS Technical Support.
New Business Rules Transformation
The Business Rules transformation
enables you use the business rule flow packages that are created in
SAS® Business Rules Manager in the context of a SAS Data Integration
Studio job. You can import business rule flows, specify flow versions,
map source table columns to required input columns, and set business
rule options.
The Business Rules transformation
enables you to map your source data and output data into and out of
the rules package. Then, the SAS Data Integration Studio job applies
the rules to your data as it is run. When you run a job that includes
a rules package, statistics are collected, such as the number of rules
that were triggered, and the number of invalid and valid data record
values. You can use this information to further refine your data as
it flows through your transformation logic.
For more information,
see Using a Business Rule Flow in a Job.
Other New Features
Here are some of the
most notable enhancements included in this release:
Support for SQL Server
user-defined functions (UDFs) enables you to import UDFs for models
registered through Model Manager for supported databases that include
DB2, Teradata, and Netezza. You can also import native UDFs from Oracle,
DB2, and Teradata. After the UDFs are imported, you can access them
on the
Functions tab of the
Expression
Builder window.
For more information, see Adding User-Defined Functions.
Performance enhancements
for the SCD Type 2 Loader transform include the following:
-
the ability to use character-based
columns for change tracking
-
an option to create an index on
the permanent cross reference table
-
an option to specify the SPD server
update technique
-
an option to sort target table
records before creating the temporary cross reference table
In the past, if you
selected
ToolsOptionsData Quality tab and changed
the DQ Setup Location, the new location could not be applied to data
quality transformations in existing jobs. Now, if you change the global
DQ Setup Location, you have the option to apply the new location to
data quality transformations in existing jobs. To apply the global
DQ Setup Location to a transformation, click the
Reset
DQ Setup Location button on the appropriate tab, such
as the
Standardization tab for the Apply
Lookup Standardization transformation. The following data quality
transformations support this option: Apply Lookup Standardization
transformations, Standardize with Definition transformations, and
Create Match Codes transformations.
For more information,
see General Prerequisites for Data Quality Transformations.
A new Federation Server
Source Designer enables you to register data sources that are available
through a DataFlux® Federation Server. You can then access these
data sources in a SAS Data Integration Studio job.
Direct lookup using
a hash object in the Data Validation transformation is now supported.
Copyright © SAS Institute Inc. All rights reserved.