SEMINARS AND PROJECTS: SAP BW USING DATA BASE 2 EEE

INTRODUCTION

Business processes no longer consist solely of isolated, single transactions but span many interfacing business transactions and processes. Virtually every company is in the middle — it has to integrate the supply chain side, in which it is both a customer of the products and services it needs to do business — and, at the same time, integrate its own business processes and Customer Relationship Management applications with its own consumers.

The SAP Business Information Warehouse supports this complete Business Process Integration and supports cross application information from every source in the network. Everybody in the organization, depending on his or her role and authorization, can now get the complete information needed do a better job. As the software is delivered by SAP there is already pre-built industry-specific content ready for access by the user.

SAP BW – IBM DB2 LAYERED TECHNOLOGY

By breaking down the SAP BW into logical layers, a unique concept has been created that allows the SAP BW to integrate seamlessly with the mySAP.com information portal (mySAP Workplace) and to present many logical views from a single unified environment— for both end-users and the SAP BW’s IT administrators. As shown in the diagram, (Figure 1) all layers of the SAP BW interact with the Common Meta Data —a huge repository of ‘data about the data’ that has knowledge about every piece of data residing in the SAP BW. Included in the meta data repository is information about content identification, date and time of the extraction, content mapping, default IT and vertical industry end-user presentations, data transformation and load methods, and so on. This tight integration of the layers of the SAP BW with the Meta Data allows the Business Warehouse to present all of these logical views from a single unified environment.

ANALYZING THE LAYERS

The Data Extraction, Transformation and Load (ETL) Layer

Starting with the bottom layer, and working up, the Data Extraction, Transformation and Load (ETL)layer manages all of the traditional functions that extract the data from the data source, convert it to an acceptable database format, and then load it into the information warehouse. Because SAP BW utilizes the common Meta Data store created with SAP R/3, this support layer can do the entire mapping, cleansing, and loading with minimum user intervention. The people responsible for maintaining the SAP BW can use user-friendly content transport wizards and the Administrative Workbench feature to control this layer and update the Common Meta Data store. As the ETL process is the part of the Data Warehouse process on which organizations building their own data Warehouse spend the majority of their time and money, the ability to have this information already available makes this an extremely powerful approach. Research reports indicate that companies often spend more than a full year to get this problem solved — and, in some cases, even abandon their attempts because they simply cannot resolve the problem. Oftentimes, the major problem with an information warehouse effort is not to design the data warehouse, but to get the consistently clean data inside the warehouse and loaded in a given timeframe. In many cases, organizations need to purchase separate software just for the ETL process — apart from data warehouse design and implementation software. With SAP BW, no additional software is needed; all functions, including a flow manager, are already built into the software. Pre-built, optionally customizable, solutions are already built-in for several vertical industries. So, enterprises can be ready to exploit their business warehouse in a couple of months, rather than years.

The Data Modeling Layer

The Data Modeling layer exploits the many built-in functions of DB2. SAP has always known that data-base and business logic should be separated and that a smart generator is needed to interface with the data-base. This means that the business logic can ask for complex queries, and depending on the database engine used, the database interface can formulate the proper SQL query or queries to service this request. If a certain database does not support a function then the database interface can emulate it. As much as possible, SAP moves search requests into the database. This is a smart approach because by doing so DB2 can decide which data is participating in the queries and process queries in parallel. That way, the queries will execute much faster than if the data is transported into SAP BW and all the decisions are made there. The data model is optimized for info cube processing and aggregation. If the table becomes too large, IBM DB2 allows the data to be partitioned. By doing so, DB2 can do smart processing that allows only data processing in those partitions needed for the results, and for access to several partitions to go on in parallel. IBM DB2 comes with a set of tools to support partitioning (like balancing partitions).

Information Analysis Layer

The Information Analysis Layer is the heart of the SAP BW. Here, the many analytic functions, such as Online Analytical Processing (OLAP), take place. With OLAP, data is presented to the end-user in a grid format. In general, the grid is presented in the form of a data ‘cube’ that shows the multidimensional interaction of database tables. Data that exists in only two dimensions can be represented as a single table. Data that has multiple dimensions and viewpoints is more easily represented by a cube. The cubes are formed in response to the program’s analysis of the data. The end-user can interact with the cubes by turning dimensions on the rows and columns, asking for more detail by adding dimensions to the report (drill down). In this way, the end-user can run deep analysis on the data presented. With SAP BW, data cubes displaying the information that is commonly used in specific vertical industries are pre-built into the program. In other words, once the data is uploaded into the database, database programmers do not need to start from scratch to determine data relationships needed to build the cubes. Instead, they can just customize the cubes that have already been built .User requests for data in the pre-built cubes can be calculated and presented to the user in a minimum of time. When requests for data that has not been mapped into the cubes occurs, this layer provides the ability to drill back to the Online Data Store (ODS) or, if necessary, to the raw data in SAP R/3. Reports requested by the end-user are built into this layer and then sent via the above layers to the end-user.

Information Presentation Layer

The Information Presentation Layer includes Web-based cockpits and built-in geographic information systems that allow data to be presented geographically, aggregated and navigated. During navigation the end-user explores the Common Meta Data and makes choices of which data should be included in the next report. The result of separating the presentation layer from the data access and analysis layers is a highly organized architecture that enables future enhancements to be implemented very quickly. This method of presentation, however, makes almost a 180-degree turn from earlier methods. It is uniquely designed to allow the incorporation of new types of data, without needing to redesign the data system. For example, if the Wireless Access Protocol (WAP) really takes off, then it would be this layer that would quickly support it.

Enterprise-Wide Information Deployment Layer

This top layer deals with the deployment of enterprise-wide information. This is the piece that is primarily responsible for the smooth integration of the SAP BW into the existing SAP infrastructure. The four items listed in figure 1 mySAP.com, Reporting agent, Web reporting and Knowledge management integration indicates the end-user features with which the SAP BW seamlessly integrates. This means that the end-user has one common view on SAP (the portal called mySAP Workplace) no matter what kind of function of the end-user they are performing.

SAP BW ARCHITECTURE AND DATA MANAGEMENT

The main engine is the OLAP processor, which receives all incoming requests. The requests come from the Internet transaction server (ITS), which also deals with content management. Web reporting and other content specific issues like channels and roles are handled here. The OLAP processor deals with the functional requirements. From its connection to the Common Meta Data, it knows which info cubes contain which data or which are remote. Using the Common Meta Data repository, it creates the proper query using advanced SQL that can be processed by IBM DB2.These queries typically do an OLAP type processing on the data, such as navigate, select, aggregate, drilldown or filter. The OLAP processor can also use data from multiple cubes to combine these into a single report. The user drills down until the lowest possible aggregation level is reached. Unlike most OLAP tools, SAP BW can go deeper and collect data from the Online Data Store. It is even possible that the data requested is neither in a cube nor in the Online Data Store. In this case the Common Meta Data knows where the data can be located and then the SAP BW can extract it from the supplying systems (normally SAP R/3).The combination OLAP processor / Common Meta Data is a very powerful tool. Typically, the end-user starts navigating in the Meta Data, exploring which data is available. Once the end-user makes a choice of data to navigate, the OLAP processor opens the underlying cube(s) and selects the correct data. The SAP BW has the ability to aggregate commonly-called-for data. The end-users do not know that these aggregations exist, but they do know they get a speedy response. The SAP BW administrator builds these aggregations and because SAP BW keeps track of all the requests, the system itself can advise the administrator about which data to aggregate. In addition to OLAP processing, SAP BW supports flat queries (single information queries with results that cannot be drilled down). The data for these kinds of queries normally comes from the Online Data Store and/or the supplying system (SAP R/3).

HOW IBM DB2 EXPLOITS THE SAP BW ARCHITECTURE

Processing queries from the SAP BW requires a very powerful database. The database should not only be powerful in solving queries; but it should also provide extra value to the SAP BW. By de-coupling Function and Database, SAP is able to exploit powerful functions in a specific database. In fact, IBM DB2 was built with Data Warehouse processing in mind.

Parallelism

The optimizer inside DB2 is capable of breaking a query into multiple parts. It then processes these pieces in parallel; this is one of the reasons IBM DB2 is able to achieve such high throughput. To the end-user and the SAP BW administrator, this process is completely transparent. This parallelism works for all kinds of database access like table scans, index scans, joins and aggregation. Enterprises seeking to exploit these features will need a hardware infrastructure that allows for parallelism (e.g. pSeries, xSeries, zSeries, sysplex or Sun Solaris). Database developers don’t have to do anything in the database interface or SQL queries to exploit this powerful feature of IBM DB2. Once the query is accepted by one of the nodes, the optimizer will talk to the other nodes and distribute the query transparently over the available processing power. The results are collected by the requesting node and passed back to SAP BW. IBM DB2 is even able to exploit this feature within just one node by using multiple processors available inside the node. This parallelism is also useful in supporting SAP BW’s maintenance process, including building the information cubes and maintaining the aggregations.

Powerful Parallelism Features in IBM DB2

Many of the features built into IBM DB2 support parallelism. For example, IBM DB2 EEE on Unix and Windows ® NT ® has Automatic Data Placement, a facility that takes a key from the data and assigns a partition for this record using a hashing algorithm. This means that there is no pressure on the SAP BW administrator to partition the data; it is automatically done by the system. Another powerful feature on this platform is the automatic Skew Correction. IBM DB2 can correct the distribution over nodes on demand. Using this facility, all nodes have the same approximate amount of data and the system is properly balanced during parallelism. Both of these features are unique in the industry and allow the SAP BW administrator to concentrate on the Business Warehouse itself instead of spending hours to get the database right. Because it was so difficult in the past to setup databases the proper way; it was not very popular to use parallelism in the database. With the modern features built into IBM DB2, those days are gone.

Dynamic Bitmap Indexing and Hash joins

Many relational databases can only use one index at a time to solve a query. If a query has multiple selection criteria, then hopefully an index exists that can assist in selecting only the data requested by these multiple predicates. If not, the DBMS will select far too much data to filter the data at the lowest (row) level. This requires DBAs to know in advance what the queries will be and to design the needed indexes accordingly. Queries however, can be difficult to predict. In addition, these types of indexed databases are not very suitable for random query processing. IBM DB2 has a very smart built-in facility that allows multiple indexes to be used in one query. First the optimizer will select all indexes that can solve one or more selection criteria. Then it will access these indexes (preferably in parallel) and it will store the results in a bitmap. Every bit in the bitmap represents a row. If the bit is ‘on’ then the row qualifies according to this index. By combining the bitmaps using AND /OR processing, as given in the query, the truly qualifying rows are detected and only these rows are accessed. No matter what kind of processor is used, process of AND/OR on a bit array is always extremely fast. If some of the predicates cannot be resolved by using indexes, then they have to be resolved by accessing the data. The idea is that accessing the data is postponed as much as possible and filtering is done using (multiple) indexes. This way the I/O is down to a minimum. Some databases claim they have bitmaps —but most often these are single indexes that have been pre-built by the DBA, with no idea of what the queries will be. IBM DB2 also comes with a tool called Index Optimizer that will detect those predicates that could not be solved using indexes. If this happens often, then the index optimizer will advise the DBA to apply additional indexes.

Hash Joins

IBM DB2 takes the bitmap index approach a step further with the hash joins. Using this algorithm, DB2can join a large fact table and several dimensions in parallel with a minimum number of I/O operations. For example: with the dynamic bitmap for multiple index access, the system will dynamically build bitmap smart built-in facility that allows multiple indexes to be used in one query. First the opti-again — this time to do a semi-join between the dimension and the foreign key index on the fact table. The size of the bitmaps is determined at run time, based on the amount of memory that is available to the query at runtime. The hash join first applies a local filter on the dimension tables and then redirects the qualifying rows to the other nodes so that parallel processing can be done with each partition owned by each database instance. Each node will now start to allocate a bitmap that fits into the available memory and DB2 will access the foreign key index for each dimension. The record IDs from these indexes are hashed into the bitmap. Because the number of rows that qualify should be relatively small, compared to the fact table, the number of collisions (hash to the same position) should be small. The same is done with the next dimension. After two indexes are bitmapped, the two bitmaps are ANDed and the result is used as an input for the next dimension. After the last index is processed, the rows are retrieved from the fact table and the predicates are applied again. This is because if there was a hash collision there is a chance that the row doesn’t qualify. The amount of memory plugged into each engine determines how well the algorithm works. The more memory every engine has, the better this process works. It is not the speed of the engine but the amount of memory available that determines the effectiveness of the system.

Enhanced Star (snowflake) Joins

The traditional data warehouse approach of a star schema (one fact table and ‘n’ dimension tables) is not sufficient enough anymore. That is why SAP BW has an enhanced star or snowflake schema. The dimension tables in this schema are arranged in a star array. This is done to support all business requirements and enables very flexible reporting. The fact table in the middle can contain several hundred of millions of rows and can scale easily to many gigabytes or even terabytes. Each dimension contains from a few to a million rows. The combination fact table and first layer of dimensions are unique. These guarantee consistency everywhere, even when the master data is changed. The majority of the queries are just on the fact table and the first layer of dimensions. But sometimes the end-user wants extra filters to be applied. Or there may be a hierarchy present in the data. This is where the extra tables in the snowflake have a role. In this case, SAPBW will generate very complex queries (involving fifty or more tables is not uncommon). In order to solve such a query you need an extremely powerful DBMS like IBM DB2. Only by applying techniques like bit map indexing, hash joins and query rewrite it is possible to resolve such a query in an acceptable timeframe. Many other DBMS are simply not able to resolve such queries. They either die or process at 100% of the CPU resource for hours or days.

Recursive SQL Using Common Table Expressions

IBM DB2 supports recursive SQL and although this type of SQL is in the standard, DB2 is the only DBMS that currently supports it. Recursive SQL allows a single SQL statement to do a complete traversal of a tree structure. The most common sample used for recursive SQL is the part-explosion in a single SQL statement, but it is also capable of being used for much more. Recursive SQL statements can find the best solution for a problem if there are many solutions to choose from and to calculate every solution that involves a cycle through data. For example, consider this problem. “Find me the cheapest flight between New York and Singapore allowing for 2 stops. However, the total time should be less then 24 hours”. There are many solutions to this problem — especially if you consider that multiple airlines are allowed. SAP BW exploits recursive SQL using a common table expression. Common table expressions are sometimes referred to as inline views, although this is not their official name. A common table expression allows for a new SQL expression inside an SQL statement —at the position where a table is expected in the statement. Common table expressions allow generators (like SAP BW) to isolate pieces of a query within a query without having to build a view for this.

Query Rewrite

Because the SQL generated by SAP BW is not optimal (all generators suffer this problem) it becomes the task of the DBMS to optimize the request in a smart way. IBM DB2 has built-in functions that will rewrite complex queries before it evaluates the proper join sequences. These techniques are similar to those used by optimizing compilers. By analyzing the SQL statement given to DB2, parts of the SQL are rewritten to achieve the best performance regardless of the query’s syntax. Of course the result of the rewrite is exactly the same as the original request. This removes the burden from SAP developers to produce the best possible SQL, and allows them instead, to focus on their product. Currently, IBM DB2 has the most advanced optimizer available. IBM DB2’s starburst optimizer is capable of the following rewrites:

• Predicate push-downs and pull-ups.

This means that predicates are moved to other parts of the query. This becomes extremely powerful when the query is based on views. When the optimizer combines the view text with the original statement, the predicates are moved to places where the result is the same, but the execution is more cost effective. Especially in a parallel environment, it helps if a predicate can be given to anode so it can narrow down its search.

• Predicate transitive closure.

This is a facility where predicates are applied to other parts of the query by using standard Boolean algebra. IBMDB2 can do this for all join types including outer-joins. This is not an easy task for a human — and it is not easy to build this into a code generator. By applying more predicates, the next step of optimizing becomes easier. This is because there will be more predicates to apply. Hence, more data will be filtered at a lower level in the database.

• De-correlation of correlated sub queries.

IBM DB2 will apply a technique where the sub-query is only executed once and thereafter used as input to the main query. Compare this with the traditional optimization where the sub-query is evaluated again and again. The performance boost of this rewrite can be phenomenal, especially in a parallel environment where the sub-query can be evaluated once and then the result can be given to each node.

• Shared aggregation.

Where the optimizer detects that multiple aggregation functions are going on at the same time (e.g. two sum functions), it will rewrite the statement. By using the query with common table expressions in the rewrite, the aggregation functions are only invoked once, hence the name shared aggregation.

• Sub query to join transformation.

Non-correlated sub-queries can be rewritten to become joins. This is straightforward Boolean algebra, again where the end result will perform better.

• Redundant join elimination.

The optimizer rewrite function can detect that a query has redundant joins inside. May be humans don’t make this “mistake” that easy, but generators very often do. IBM DB2 detects them and eliminates them; where as some other DBMSs would just execute them multiple times. The general idea is that by spending a little more time on analyzing the query, the execution time of the query is reduced, sometimes dramatically. Queries generated by SAP BW can be very complex and some-times involve fifty or more tables. Only by having all of the above techniques, is it possible to execute these complex queries. It is because of all of these techniques that DB2 scores so well in the SAP bench-marks. There are other features in IBM DB2 that SAPBW does not yet exploit; including automatic summary tables with which DB2 keeps track of the aggregation functions, OLAP processing done by DB2 (i.e., CUBE, ROLLUP and RANK functions and new functions built into DB2’s engine such as covariance and correlation). SAP BW and IBM DB2 make a powerful combination, both now and in the future.

OTHER FEATURES OF DB2

Like any other DBMS, DB2 comes with it own set of tools. The tools that apply most to the SAP BW environment are:

• Control Centre. A Java ® based environment management tool that controls the complete DB2 environment from ad-hoc to automation; from performance measurement to utility scheduling.

• Query Patroller. A proactive query management tool that allows DBAs to control all queries going on in the system. It has many functions, including cost analyses, prioritize users, load balancing, view query status, cancel queries and charge back accounting.

• Query Governor. A tool that completes the query patroller. By setting some threshold on system resources used, DBAs can change the priority of a query or even cancel a query. By doing so they can prevent certain users from dominating the system.

BUSINESS BENEFITS

The choice of the DBMS is an important one. It is the DBMS that will make or break the success of SAPBW. Enterprises require a high performance, flexible, scalable and reliable DBMS that can deliver high ROI with a relatively low total cost of ownership. IBM DB2 provides the following:

• Performance:

As discussed there are many features built into DB2 that speed up this DBMS and the benchmarks from SAP shows that it works. DB2 is faster than any other DBMS. For SAP users, it is a notable advantage that all of these features work without having to do something special or formulate different SQL(with the exception of recursive SQL where the syntax is really different).

• Scalability:

IBM DB2 comes in many flavors. In a parallel environment the queries become extremely fast. By adding more hardware resources (more nodes, memory etc.) the system automatically performs better. IBM has made sure that this facility works well on all of their platforms from Windows/NT to OS/390 ® , AIX ®and OS/400 ® . IBM DB2 is the only database offering a parallel feature for SAP BW.

• Total Cost of Ownership (TCO)/ROI for a given performance:

Typically in a Data Warehouse environment people complain that it is like a “black hole” that just absorbs money. Keep in mind that SAP BW comes with all the software needed and SAP has excellent tools to assist the customer in determining the correct configuration for a given DBMS. The TCO of DB2 has always been better than other DBMS. DBAs using SAPBW together with DB2 are exploiting leading edge qualities of the market leaders in each of these fields. Because of this, total TCO will likely be lower and ROI based on usability higher for any given enterprise.

• The relationship SAP has with IBM:

This will dictate the reliability and likely long-term availability of service and support. Because of the very good relationship between SAP and IBM, many new features go into DB2 that are very beneficial to SAP customers. Currently DB2 is SAP’s #1 choice. SAP BW will be developed for IBM DB2 first and then ported to other DBMSs. This combination will dictate the best support from both IBM and SAP as well as a performance and features availability edge.

A HETEROGENEOUS SYSTEM LANDSCAPE WITH

RS/6000 AND NETFINITY

In the course of the installation of SAP R/3, a new IBM server, the IBM RS/6000 S70 was employed. Because IBM offers the complete server range for SAP, a seamless integration into the existing installation was possible. The RS/6000 now functions as the database server on the basis of AIX, and the IBM Netfinity servers function as application servers on the basis of Windows NT. The complete workload is processed by the high performing RS/6000 S70, which can bear the data expansion without problems and offers the necessary security. The cost-effective, easy to administer, Netfinity servers offer the ideal complement to this heterogeneous system landscape, which supports approximately 600 users. The installation is complemented by three additional Netfinity servers for development, test and for archiving with ADSM. Today, the tradition-conscious trading company is already considering e-commerce. The first multimedia step in this direction is its self-developed presentation system Deco DESIGNER, available on a CD-ROM. Since 1996, JAB has marketed the multimedia Deco DESIGNER, an innovative Point-of-Sales instrument for visualizing decoration and furnishing materials. With Deco DESIGNER, new dimensions in consultation and sales in the area of home textiles are presented. Over 5000 different materials can be combined as desired in over 130 illustrations, and within seconds the consumers can be shown a picture of their individual selections of different materials. More than 800 interior decorators are already working with Deco DESIGNER, and with it they can show their customers complete rooms in a new look from numerous decorating suggestions.

Both in the SAP R/3 system and in BW, the available interfaces allow for the problem-free acceptance and integration of data from external systems, which makes additional customizing redundant. DB2 ideally manages data load. BW is employed on two separate IBM Netfinity servers to reduce the load on the productive system. The evaluation of the massive quantity of data requires a reliable database.

FUTURE UPGRADES

At this time up gradation is going on from SAPR/2 to R/3. The SAP R/3 HR module has been used separately since 1998, and plans for the implementation of SD, FI, CO, MM, WM and PP modules are underway. SAP R/3 with MM, WM and PP modules are already used in production in two operations. The upgrade to SAP R/3 is expected to provide the pre-requisites for connecting foreign companies even beyond German borders with the central office, and to optimize other processes. Another step is towards assuring targeted growth and the continuing use of modern information technology as a goal for the future.

TO DOWN LOAD REPORT AND PPT

DOWNLOAD

SEMINARS AND PROJECTS

SAP BW USING DATA BASE 2 EEE

Pages

Labels

Followers

About Me