Perhaps it is time to start acknowledging the challenge of source data analysis?

“Garbage out, garbage in”

I know, I know I have got that the wrong way round. Well, actually I did it on purpose to illustrate a problem with BI/Data Warehousing projects and incidentally many other Information Management initiatives.

I recently read a very interesting and clear blog by Boris Evelson of Forrester on what factors to consider when deploying a business intelligence platform and how “Initial business intelligence (BI) deployment efforts are often difficult to predict and may dwarf the investment you made in BI platform software.”  The link is here.

Obviously there is a huge variability in both the cost and the amount of resources required to deliver a successful outcome and this in turn will depend on a large number of parameters all of which are described really well in the blog.

My purpose in this piece is not to rewrite Mr Evelson’s excellent document but to focus on a specific area which he refers to as “Source applications integration” and which I believe is often overlooked or ignored until it becomes critical and can then impact significantly on the overall cost and success of a BI project.

datawarehouse, business intelligence, data source,

Struggling to find the right data quickly, especially in complex application systems can delay implementation of Datwarehouse and Business Intelligence projects and lead to inaccurate information being delivered.

The problem of course is that unless we can be absolutely confident that we have found the right elements of the source system with which to populate the data warehouse we cannot be confident that the business is using the right information on which to base its decisions.  Herein lays my “garbage out, garbage in” thought because no matter how clever your ETL tools are or how good your data quality tool is, if you are working with the wrong files or tables or an incomplete set of tables then the data you extract from the source will be of very little use to the data warehouse or BI users who might make ill-advised decisions based on incorrect data. A secondary challenge is the level of the quality of that data, but it is more important to identify the appropriate sources first.

In most instances it is true of course that the source applications are relatively easy to understand. They are straightforward to reverse engineer or have good documentation, or tools exist which assist the process of identifying the tables and columns for use in the DW/BI implementation and often they present no real problem to the project teams.

Others however, including those from SAP and Oracle (eBusiness Suite, JD Edwards, Siebel, PeopleSoft) which are the main systems of record for hundreds of thousands of organisations, are much more complex and require much more effort to find the tables needed.

Part of the problems lie with the sheer scale of these applications, they all have many thousands of tables (SAP systems for example will have over 90,000 before customisations are applied) and a myriad network of inter-relationships meaning that traditional reverse engineering by modelling tools is either impossible or totally impractical.  Imagine having a visualisation of a complete SAP system on a single screen. One of our customers called it a ‘death star visualisation’. It would also be impossible to navigate and search.

Additionally there are the sometimes idiosyncratic naming conventions and most implementations have been quite heavily customised meaning that it can be a challenge that even specialists will struggle with to find what they need quickly and easily.

Finally the information about tables (metadata) in these applications, and others like them is stored in data dictionary tables rather than the system catalogue. This reduces further the effectiveness of traditional tools which are designed to help you find the right tables and can significantly extend the time it takes to locate exactly what you are looking for and a distinct possibility that a good deal of ‘informed guesswork’ will be involved.

So coming back to ‘garbage out, garbage in’; without accurate knowledge about which tables hold the required information in the source systems, the mapping between source and target Datawarehouse via ETL or DI will be less likely to meet business requirements. This will potentially lead to overspend, re-work, lengthening delivery times and at worst risking ill-informed decisions being based on wrong information.

Perhaps it is time to start taking the challenge of source data analysis, or ‘metadata discovery’ as we sometimes call it, more seriously before embarking on Data warehouse or Business Intelligence projects?

Posted in Business Process, Data and Application integration, Data Model, Data Warehouse, Enterprise Information Management, JD Edwards, Oracle, PeopleSoft, Safyr, SAP, SAP HANA, Siebel, Uncategorized | Tagged , | 1 Comment

Are you a Data Modeler or a Data Worker?

A few months ago I was at a meeting with a customer who was involved with a BI project that was pulling data from an SAP system into a Data Mart.

One of the people at the meeting was very experienced in the ETL tool the customer was using for this project. We were showing how our product Safyr can be used to locate tables and relationships in an SAP system in support of the ETL/BI task that faced the project. Part of the demo was to show a set of user-selected tables visualized in an Entity-Relationship diagram.

I’ve got most of the popular data modeling tools on my PC and I forget which one we used for this, but it was a graphical model of a set of SAP tables, showing ‘readable’ table and column names and the relationships between tables.

I also showed how the ‘technical’ names for tables and columns could be displayed – these names tend to be unintelligible for most people not versed in SAP architecture (e.g. Material Master is a table called MARA) but the BI expert became quite animated at this point: “that’s what I want!”, he said, “I’m not really interested in a data model, I just want to see the tables, columns and the joins between tables – that’s what I need for my ETL work”. I was surprised that what I call a ‘logical model’ showing nice ‘English’ names for objects was of no interest to the BI expert. “It’s just a pretty picture”, he said dismissively. (By the way, I’m paraphrasing here and names and locations have been changed to protect the innocent).

I was somewhat taken aback by this. As a dyed-in-the-wool data modeler, I’ve always thought that seeing a ‘readable’ model of tables and relationships was a lynchpin of any project that used data. (So that means any project then).

But here was a consumer of that same information, who could see value in the contents of the model, but was only interested in the model as a means to an end. The data model in this circumstance was like a sketch of a construction project. “This is what we need, now let’s get on and build it”. Once the work is done the sketch (or data model) is discarded, because it’s a working document, not a ‘deliverable’ in its own right.

So maybe we have two categories of ‘People that can benefit from Data Modeling’ – let’s call then ‘Data Modelers’ and ‘Data Workers’. Data Modelers typically work in Information Management roles and use the Data Modeling approach and Data Modeling tools to arrive at ‘artifacts’ that reflect key areas of the data across the Enterprise.

Metadata, information management, metadata discovery, data discovery, data model, data architect, data tools

Finding the sources of valued data in your systems can be tough

Data Workers on the other hand, are normally ‘at the coal face’ hewing out chunks of data with a range of ETL, Data Integration, BI and other tools. The success, or otherwise, of the Data Workers’ efforts depend on a good understanding of the data structures involved, particularly getting the joins between tables right.

In my many years of involvement with the data modeling community, I’ve met a lot of people who can get very tied up in the pros and cons of different modeling methodologies and different modeling tools. Long discussions can take place over the meaning of Conceptual, Logical and Physical models.

What I’ve also seen in these times of squeezed budgets is that the business value of ‘Data Modeling’ in its purest sense can often be questioned and sometimes found wanting. It’s just my opinion, but helping the Data Worker to get to grips with the data they need, in terms they are happy with, is a laudable aim for any Data Modeler, and offers a demonstrable raison d’etre for many data modeling groups that may be struggling to survive in a post-recession world.

Nick Porter
Technical Director

Want to know more about Silwood Technology? – please click here

Posted in Data, Data Model, Data Warehouse, Safyr, SAP, Uncategorized | Tagged , , , , | Leave a comment