vdwh-logo

Data Solution Design Patterns

Implementation and automation for a flexible data solution

Training with Roelant Vos


Register now!


"For a data warehouse, we do not have enough time."

… sounds familiar to you?

Automation and code generation enables faster, and more flexible, data solution implementation. Learn the revolutionary approach for a fully automated solution from Roelant Vos.

  • Implement a Persistent Staging Area
  • Apply hybrid modeling techniques, based on Data Vault
  • Define robust patterns for data logistics, suitable for code generation
  • Define a metadata model for automation, code generation, and virtualization
  • Apply DevOps, testing, orchestration, and control frameworks
  • Ensure that the delivered data meets the consumers’ expectations

This practical design and implementation training provides you with everything you need to build and maintain an automated data solution, from start to finish.


What can Data Solution Automation offer?


Working with data can be complex, and often the ‘right’ answer for the purpose is the result of a series of iterations where business subject matter experts (SMEs) and data professionals collaborate.

This is an iterative process by its very nature. Even with the best effort and available knowledge, the resulting data model will be subject to progressive understanding that is inherent in working with data.

In other words, the data solution model is not always something you always can get right in one go. In fact, it can take a long time for a model to stabilise, and in the current fast-paced environments this may even never be the case.

Choosing the right design patterns for your data solution helps maintain both the mindset and capability for the solution to keep evolving with the business, the technology, and to reduce technical debt on an ongoing basis.

This mindset also enables some truly fascinating opportunities such as the ability to maintain version control of the data model, the design metadata, and their relationship - to be able to represent the entire data solution as it was at a certain point in time - or to even allow different data models for different business domains.

This idea, combined with the capability to automatically (re)deploy different structures and interpretations of data as well as the data logistics to populate or deliver these we call ‘Data Solution Virtualisation’.

The idea of an automated virtual data solution was conceived while working on improvements for generating Data Warehouse loading processes. It is, in a way, an evolution in ETL generation thinking. Combining Data Vault with a Persistent Staging Area (PSA) provides additional functionality because it allows the designer to refactor all, or parts, of the solution.

Being able to deliver a virtual data solution provides options. It does not mean you have to virtualise the entire solution, but you can pick-and-choose which approach works best for the given scenario and change technologies and models over time.

To allow ideas to grow, creators need an immediate connection to what they are creating. This means that, as a creator, you need to be able to directly see what the effect of your changes are on what you are working on.

This is what the virtual data solution as a concept and mindset intends to enable: to enable a direct connection to data to support any kind of exploration and enabling creativity while using it.

Thinking of Data Warehousing in terms of virtualisation is in essence about following the guiding principle to establish a direct connection to data. It is about finding ways to seek simplification, to keep working on removing barriers to deliver data and information. It is about enabling ideas to flourish because data can be made available for any kind of discovery or assertion.

Virtual Data Warehousing is the ability to present data for consumption directly from a raw data store by leveraging data warehouse loading patterns, information models and architecture. In many data solutions, it is already considered a best practice to be able to ‘virtualise’ Data Marts in a similar way. The Virtual Data Warehouse takes this approach one step further by allowing the entire data solution to be refactored based on the original raw transactions.

This ability requires a Persistent Staging Area (PSA), also known as a Persistent Historized Data Store, where the data that is received is stored as it has been received, at the lowest level of detail. If data is retained this way, everything you do with your data can always be repeated at any time – deterministically. In the best implementations, the virtual data solution allows you to work at the level of simple metadata mappings, modelling and interpretation "business logic", abstracting away the more technical details.

A virtual data solution is not the same as data virtualisation. These two concepts are fundamentally different. Data virtualisation, by most definitions, is the provision of unified direct access to data across many ‘disparate’ data stores.

It is a way to access and combine data without having to physically move the data across environments. Data virtualisation does not however focus on loading patterns and data architecture and modelling.

The virtual data solution, on the other hand, is a flexible and manageable approach towards solving data integration and time variance topics using data warehouse concepts, essentially providing a defined schema-on-read.

The Virtual Data Warehouse is enabled by virtue of combining the principles of data logistics generation, hybrid data warehouse modelling concepts and a Persistent Staging Area (PSA). It is a way to create a more direct connection to the data because changes made in the metadata and models can be immediately represented in the information delivery.

Persisting data in a more traditional Data Warehouse sense is always still an option, and may be required to deliver the intended performance. The deterministic nature of a Virtual Data Warehouse allows for dynamic switching between physical and virtual structured, depending on the requirements.

In many cases, this mix of physical and virtual objects in the Data Warehouses changes over time itself, when business focus changes. A good approach is to ‘start virtual’, and persist where required.


Download Brochure

Your Trainer

Roelant Vos has been active in Data Warehousing (DWH) and Business Intelligence (BI) for more than 20 years, and is well known as an expert in the Data Vault community.

For more than 10 years, he has been sharing his ideas, tips, and thoughts on his blog roelantvos.com.

Having worked as a software developer, consultant, trainer, and decision maker in the corporate world, Roelant has observed data management from various distinctly different points of view.

The common theme has always been a passion for automation, code generation, reusable patterns, and model-driven design – the key to making data solutions manageable and flexible.

His focus is now on providing training, consultancy, and open-source software development to make the delivery of robust data solutions easier. As part of this, he initiated the Data-Solution-Automation-Engine on GitHub.

You want to ...

  • Learn what kind of solution architecture supports flexible data delivery that can evolve with the business
  • Fully understand the concepts behind the essential data loading patterns, what options can be considered, and how to implement these
  • Leverage generation techniques for data logistics (‘ETL’), to be able to spend more time on more value-adding work such as data modelling and improving data delivery
  • Work on a Do-It-Yourself (DIY) data solution framework, or have adopted a Data Warehouse Automation (DWA) product and seek deeper understanding on the used patterns and modelling approaches
  • Get a full overview of all components that are necessary for a robust and manageable data solution

    This course covers advanced modelling and implementation techniques, and applies to a wide range of data professionals including Data Warehouse professionals, data modelers, architects, and data engineers.

Prerequisites

  • Sufficient understanding of English (the course language is English)
  • Understanding of data engineering, for example Data Warehousing and ETL development
  • Knowledge of SQL (e.g. joining, window functions)
  • Some scripting / programming experience
  • Familiarity with data modeling techniques for data warehousing (e.g. Dimensional Modeling, Ensemble Logical Modelling techniques including Data Vault)

Is this for me?

By adopting hybrid / Ensemble Logical Model patterns (e.g. Data Vault) on top of a Persistent Staging Area (PSA) – a historised record of all original transactions – an unparalleled level of flexibility in implementing and maintaining a data solution can be achieved. The repetitive aspects of data preparation are reduced, and it becomes easier to adjust the solution to ever-changing business- and technical requirements.

These patterns are seemingly straightforward – almost deceptively so.

But, in fact, every pattern requires far-reaching considerations at a technical and conceptual level to truly match the business expectations.

Data Vault modeling provides elegant features to manage complexities, but success still depends on correct modeling of the data, and correct application of the patterns. Leveraging data logistics (‘ETL’) generation and virtualization techniques allows for a great degree of flexibility, because you can quickly refactor and test different modelling approaches to understand which one fits best for your use-case.

This enables you to spend more time on more value-adding work such as improving the data models and delivery of data.

This advanced training is relevant for anyone seeking to understand how to leverage ‘model-driven-design’ and ‘pattern-based code-generation’ techniques to accelerate development. The content applies to a wide range of data professionals including Data Warehouse specialists, data modellers and architects as well as data engineers and data integration developers.

Flexible design and implementation

The intent of the training is to cover the architecture and concepts for a flexible data solution, with a focus to ‘deep dive’ into the patterns and practical implementation techniques as quickly as possible.

To facilitate this, the training discusses the implementation of the main Data Vault modeling concepts including their various edge-cases and considerations. The mechanisms to deliver information for consumption by business users (i.e. ‘marts’) will also be covered, including details on how to produce the ‘right’ information by implementing business logic and managing multiple timelines for reporting (‘bitemporal’).

The training provides tools and configurations which you can use to start automating your own development – or understand the approaches used in commercial ‘off-the-shelf’ software so that these can be fully utilized.

Training content and schedule

Day 1

  • Pattern-based design
  • Data solution architecture
  • Data staging concepts
  • Modeling concepts
  • Introducing design metadata
  • Code generation

Day 2

  • Core Business Concept pattern
  • Natural Business Relationships pattern
  • Context pattern & historization
  • Control framework
  • Testing
  • Technical considerations
  • Orchestration, workflows, and parallelism
  • DevOps and versioning

Day 3

  • Temporality concepts
  • Data delivery for consumption
  • Application of business logic
  • Completing the solution

Download Course Module Overview

Practical content

The training also provides an opportunity to get ‘hands on’ with some of the frameworks that are necessary to deliver a robust, manageable, and flexible solution.

This is done through short exercises as part of the regular content, during training hours.

These exercises use the Microsoft stack (SQL Server, Windows), but the content, approach and templates apply equally to other environments.

By following the exercises, we will go through:

  • Setting up a new data automation environment
  • Defining source-to-target mappings
  • Generate data logistics code, and
  • Run & test the solution in various ways

Needed Software

As part of the practical content, we use the following software:

We’ll configure these tools as part of the workshop.

Training available world-wide

This training is offered globally on a regular basis, and can be planned in-house on request to meet specific company objectives. It is also an option to organise on-line coaching, spread across multiple sessions, following the outline of the training or focusing on specific content. Please have a look on the dates or contact us.

Dates & Prices

Coaching
  • Flexible coaching support and quality assurance for individuals or small teams
  • info@dwhpatterns.com
Price on request
In-house
Price on request

Registration





If you have any further questions, please contact us:

info@dwhpatterns.com


Copyright: Roelant Vos

Imprint | Privacy Policy | Image Sources