I have managed quite a few data integration projects. These are projects defined by the development of software and business projects that help organizations move data between systems and better understand the data they have. While each one is different in data sources, and project owners, overall my approach remains the same. I adapt the tools, timelines, and specific tasks depending on the organization and systems involved. Today, I’m reviewing the framework and critical decisions I rely on.
At it’s most basic level, data implementation projects have 4 core phases: Discovery & Requirements, Consensus/Sign Off, Development & Handoff.
- Discovery & Requirements – This first phase is most critical. It is at this point that you truly determine all that you need to know to design a solution.
- What business problem are you trying to solve? This lays the groundwork for everything else. Without knowing this information, it would be difficult to solve the right problem, or determine the right metrics by which to measure your success.
- Where is the data to solve the business problem? Now that you know what problem you are trying to solve, you need to understand where the data resides. This might be 1 source system or integrating 5 source systems. Are the systems internal or external to the organization? Does it only reside in someone’s head? Make sure to document the owners, stakeholders & gatekeepers. Your design could vary significantly based on what you find.
- What is the data format? This information and subsequent conversation should drive additional requirements around software, security, encryption and data transformations (AKA business rules).
- How is the data accessed? This should actually help you answer how should the data be accessed. Choose the best tool for the job based on the requirements documented in the prior conversations.
- What needs to happen to the data? Sometimes a project is as simple as making data sourced from one system available in part, or in entirety to another system. Often times, it doesn’t align exactly and business rules must be applied before it can be leveraged by other systems.
- How often is the data needed? And what triggers the transfer? Does one system push the data? Does the other system pull it? Is this an infrequent process? Or something needed real-time? or is there a triggering event?
- Which software is best? It’s finally time to start thinking about the tools, languages, & frameworks, etc. Also, make sure to include where the code resides & how security policies impact integration.
- How does feedback work? At a minimum, you need to consider how errors & exceptions are handled. In more complex implementations, data will flow both ways.
- Consensus/Sign Off – Document everything you learned and decided in the Discovery & Requirements phase. Everything from the high level problem to the detailed technical decisions that were made. Please, please get sign off from all the relevant stakeholders.
- Development – Development & validation go hand in hand.
- How will your results be measured? Write your test plans before you begin development. These are based on all the decisions & requirements documented in phase 1. You also want to do periodic data quality checks with the business stakeholders throughout the development process. There are ALWAYS things you find while engaging the business stakeholders, using real data, that you would not find on your own.
- Handoff – This is not simply a matter of flipping a switch and transferring ownership. This phase includes documentation, knowledge transfer during transition of ownership, and end-user training (if applicable). If not already, make sure the project artifacts are complete, compiled and made available to the organization. Often times, data integration projects require a period of hypercare where developers work closely with support people, and both the project implementation team and the support team work closely with the end-users, to make sure there are no gaps in knowledge.
Ultimately, the goal of data integration projects is information. There is some set of data in one system that could be made more useful or help derive better insights if connected with data from other systems.
Let me know if you think I’m missing any critical decisions in the process.
A infographic of this methodology is available in the case studies section of the Digital Ambit site.