I picked up on a great blog by Jonathan Haun, Consulting Manager, Decision First Technologies recently where he talked at some length about the Data Modelling aspects of SAP HANA. Click here to link through to it.
I commented back that I felt that HANA projects needed to go through a data source discovery phase to identify the tables and relationships relevant to the project in hand and that this was rarely talked or written about.
Jonathan’s response was that such a process is gone through but that it was generally achieved by a combination of asking the users (and experts), using SAP EIM tools such Data Services and Information Manager, and then hand building up an appropriate data model.
So this says to me that using data modelling as a discovery/design process is needed (and done), but not in an automated manner using tools dedicated to the process.
Having developed such a tool and process (Safyr) which is used across a wide range of projects we are keen to see its adoption on HANA projects to help ensure the accuracy and efficiency of the discovery phase.
In summary my views are that early adopters of HANA saw it as a ‘faster data warehouse’ environment for SAP. Data would be pulled from the SAP Transaction system into HANA, much like the process for any Data Warehouse or Data Mart, but with the advantage of blistering reporting speed.
But HANA is now being positioned as the RDBMS underpinning the SAP application itself and as such there is less need to create a separate reporting environment for BI, as reporting could be done in place, taking advantage of HANAs performance to allow reporting on the production system.
However, the data discovery task that I have discussed earlier becomes even more vital in this scenario.
When a Data Warehouse/Data Mart is being created, the development team do the ‘data discovery’ process as part of the design, working out which data needs to be abstracted from the SAP system into a form suitable for BI.
When reporting can be done in place however, there is the full range of SAP tables at the ‘BI consumer’s’ disposal.
“Which of the 90,000+ tables in SAP do I need for my report or analytics?
I would be interested to hear feedback from HANA project teams out there as to their experiences and what was the scale and effectiveness of their HANA design phase.
Graham Simpson
Managing Director, Silwood Technology
See more information about using Safyr with HANA here
Depending on the solution you use to provision HANA and the source of your data, the metadata discovery process will vary. If a SAP application is the source, SAP has several prebuilt models to simplify the discovery process. If a non-SAP data source is used, you are on your own to develop the model. There are also three main tools used to obtain data. SLT is used for real-time replication. DXC is used to obtain data from SAP sources in batch. Finally SAP Data Services is used for traditional ETL and aquisition of data from any source. In short, depending on the selected architecture and source system, the amount of metadata management and data modeling that is needed will vary.
HANA Live & Analytics on HANA Live exactly answers your questions to speed up the HANA projects…!
I think Graham has made some valid points. There are sets of pre-built content for HANA (HANA Live) and these cover a number of popular areas that user may want to work with. But outside these, metadata exploration will be required. Tools like SAP Data Services are great if you know which tables you need to work with, but if you don’t they give very limited ‘discovery’ capability. That’s not a criticism of Data Services or other similar products, their job is data movement and they do that very well. The ‘discovery’ exercise is often overlooked and can put a real shunt in a project if it’s not recognized.
Information Steward (an add-on to the Data Services platform) can also give a helping hand with the profiling of data and ongoing metadata management.
http://www54.sap.com/pc/tech/enterprise-information-management/software/data-integrity-steward/index.html