workingwithdata_ebook_april21_awc2op 4

TREATING DATA

AS A PRODUCT

TREATING DATA AS A PRODUCT

Table of Contents

1. The challenges of working in data in 2021 ......................................2

2. A guide to data team structures with examples ..........................18

3. Breaking communication barriers with a universal language ....32

4. Reducing data downtime with data observability ......................40

5. How data storytelling can make your insight s more effective ....48

1

CHAPTER 1

THE CHALLENGES

OF WORKING

IN DATA IN 2021


It’s an exciting time to be working in data.

The opportunity for data teams to make an

impact is huge. They have more tools at their

disposal, with exciting technology hitting the

market each day, than ever before.

Hiring in data, despite a slowdown in 2020, continues to be hot, with new job

titles emerging like the Analytics Engineer and growing investment in data

capabilities, and the market is projected to be worth $103 billion by 2023.

Watch the webinar:

People and process in data with Alex Dean and Scott Breitenother

3


But it’s not all roses. Building and managing a modern data capability is a

tremendous challenge. Aside from the technology involved, it requires

working closely with technologists with a broad range of skill sets. It means

working with (i.e. winning over) internal teams and stakeholders and

educating them around the value of data. Sometimes it’s a challenge just to

get a seat at the table and make the case for data in the business.

And even with all the right things in place: the tools, the people and buy-in

from the right stakeholders, there is only so much one team can achieve.

Small data teams can find themselves stretched, under-resourced while

overwhelmed by company demands.

To get a better understanding of the challenges of working in data, we spoke

to some of our customers, asking about the key pain points in their everyday

working lives. In this chapter, you’ll find featured snippets and quotes from

conversations with our customers.

The answers below were not gathered in a formal or scientific way, nor is the

list of challenges exhaustive. Nonetheless, we hope they shed some light on

the experience of a data professional in 2021.

N.B. Our quotes below have been anonymized and paraphrased to protect the privacy of our interviewees.

4


Internal customers and responsibilities

During our conversations, the difficulties of working with internal customers

made up around 21% of the challenges mentioned.

When we think of ‘customers’ we might imagine people buying products

or services. But pretty much all the data professionals we spoke to listed

their primary ‘customers’ as internal colleagues. That could mean working

with finance, marketing, sales teams, operational teams, or sometimes even

the C-suite.

A crucial part of working with those teams was winning their trust. The data

professionals we talked to spoke about the need to win over these

colleagues, build strong relationships with them and serve them efficiently.

5


‘I want my customers [internally] to trust the data. I want them to

be able to get the answers they’re looking for from the data, and

get them faster, eventually through self-service. Marketing,

finance, business development all depend on us on a daily basis.’

In a sense, data teams have to act a bit like internal detectives – investigating

what their internal customers need and building a plan to deliver it.

‘I try to understand what my counterparts in finance, accounting,

operations and marketing are trying to accomplish – and then

look ahead to see if we have the technology to do what they

want to do.’

‘I often say “if we want to deliver what product wants in 9

months, we have to start today’

6

7



Based on recent conversations with

data leaders and data practitioners

A breakdown of the main challenges facing data professionals,

based on conversations with Snowplow customers

Time is of the essence. There is pressure for data teams to deliver data

efficiently to their data consumers. Marketing and product teams don’t want

to wait around for data they need to execute use cases. Equally important is

getting the data at the right latency.

‘We are responsible for working with other engineering teams to

enhance and/or create data sets. We ingest the data, process it

and deliver it to where it needs to be’

8


‘I want to make sure we can deliver data at the speed our

teams need it’

‘Our job is to help the business to work with data in the most

efficient way possible’

But above all, data teams mentioned that the data needs to be clear and

coherent. It needs to be a single source of truth, so all teams can be on the

same page.

‘Our main customers are our business units. We provide support

to partners, marketing, sales, account managers – we make

sure they’re all speaking the same language and looking at the

same data’.

9


Tool evaluation

Evaluating and purchasing tools made up over 10% of the challenges

mentioned by our customers. With technology constantly evolving in the

data industry, perhaps this comes as no surprise.

A quick look at Indicative’s recent post on the ecosystem of modern data

infrastructure shows you that the challenge of choosing the right tools for the

data stack is no joke. Even with Indicative’s selection and helpful breakdown

of each tech category, it’s clear from their diagram that modern data teams

face a number of choices for how they build out their data capability.

10


A view of the modern data infrastructure ecosystem by Indicative

11


Our customers explained that tool evaluation (and selection) was an ongoing

challenge for them. They told us that picking the right tools demands

constant research, investigation, trialing and careful planning to ensure their

teams are well equipped.

‘[When I’m evaluating tools] I’m looking at what is our business

need, where are we going, how are we growing? – what projects

do we have on the table and are they staffed properly?’

‘A really big part of my job is “how do we not spend tons of money

on tools we’re not going to use?”’

When it comes to buying tools, our customers were clear that they preferred

to put in the research themselves before contacting a sales team. Not only

does this save them time in the long run, it also gives them the opportunity

to investigate not just pricing and features, but the ‘softer’ side of the tools –

e.g. is there a community? Are there other teams using this solution who I can

reach out to?

‘I do a lot of research. I like to know a lot about the product before

I call the sales person.’

12


People and communication challenges

Challenges around people and communication were brought up the most,

making up around 29% of the difficulties our data professionals mentioned.

Communication was actually a common thread in all of our conversations

with our customers. It boils down to the need to be able to educate and

inspire confidence in internal clients – demonstrating the value in data and

data processes.

And data professionals, despite the stereotypes, are good communicators.

One customer summed it up perfectly:

‘Working agile forces you to be better at working with people.

The stereotype of the engineer working alone with headphones

on is totally wrong’.

13


According to the customers we spoke to, technical ability isn’t enough. Data

professionals need to back it up with strong communication skills and the

ability to to guide people towards a common goal.

‘You can have the greatest technical skills but if you are an

ineffective communicator or ineffective at having influence to

move a group of smart people forward toward a common goal –

it’s really hard to get the job done’

‘Communication is 50% of the work. Understanding one’s

audience – whether executive level or at the front-line level, I have

to be able to calibrate my message for the audience.’

That’s not to say that technical skills are not important. But for data leaders,

it’s also about amplifying your team’s abilities with appropriate team

structures and processes in place. Put another way, the best tools cannot be

leveraged without the right people.

‘Tech is still really important. Tools position you well, but you can

have all the chess pieces in place but if the people aren’t ready,

you’re going to have real challenges’

14


One of the biggest challenges for data teams is to prove their value, or the

value of their work. Our customers mentioned the constant battle to win buyin

from their colleagues and stakeholders.

Sometimes it’s about showcasing a new, exciting way of working with data.

But often (and more challenging) it’s the case that data leaders need to

convince colleagues that an investment in a certain data project is worth the

time and resources. In both cases, communication and the ability to ‘sell’ the

value in data are key.

‘We run showcases for the business to show new builds, new

features, how they work and how they impact peoples’ jobs’.

‘It’s hard to get stakeholders excited about a large investment

just for accurate insights’

‘When you talk to a developer their initial thoughts [around

tracking] are “this is going to take a lot more time.” It’s our job to

convince them of the value of that investment.’

At other times, education around data is crucial. Working with data means

handling a vital business asset that should be treated as such, especially

when sensitive data and customer information (PII) is involved.

15


And while ‘self-service’ data is often the dream, data professionals are tasked

with coaching the rest of the organization on how they can find, understand

and work with their data. While some in the business (e.g. developers and

engineers) may already be data-proficient, that’s not always the case for

other stakeholders.

‘Self-serve is a huge challenge. We want lots of departments to be

able to access data and work with it. But it’s really hard to teach

everyone to be competent with data and use data safely.’

‘Building consensus can be challenging when there are so many

stakeholders involved with different levels of knowledge.’

Without education around data practices, resources can go to waste and

teams can grow frustrated with their data product. As one customer found,

it’s a long term challenge when internal teams are ‘sold the dream’ by a

packaged solution, only for it to go wrong in the longer term. This could have

been avoided if the data team was involved earlier in the evaluation process.

‘I would see marketing go out and buy some tool, get sold on the

pretty pictures and then they’d have us make a bunch of changes

to implement it, only then not see use it in the long run. This

happened a few times.’

16


Working in data is a constant challenge

Between the continual demands of the organization, ongoing tool

evaluations, hiring, decision making and dealing with internal stakeholders –

data teams have it tough.

Thankfully the data industry is maturing. Each year there are more tools

available, more budding communities of helpful contributors and

examples of content to guide data teams towards success.

Sadly there’s no silver bullet to the challenges of working in data. But as

one customer summed it up, one key element might lie in hiring, educating

and equipping the right people, arguably the most important part of your

data capability.

‘Getting the ‘people bit’ right is actually the hardest part for

companies when it comes to data.’

Despite the challenges, data teams are still driving huge value in their

organizations, from empowering their colleagues with insights to turning

game-changing use cases into reality.

At Snowplow, we want to make it as easy as possible for data professionals to

manage and work with their behavioral data.

17

CHAPTER 2

A GUIDE TO

DATA TEAM

STRUCTURES

WITH EXAMPLES


Data team structures with examples from

Snowplow customers

The business world is witnessing the rise of the data team. When companies

first worked with data departments, it was in fragmented silos, with

marketing teams, business intelligence (BI) teams, data scientists, engineers

and analysts within product teams, each handling data individually. Since

then, data has become recognized as a valuable business asset, and the

dedicated ‘data team’ (or teams) has emerged. Given the rapidly growing

data driven organizations we work with, we wanted to write the ‘how to build

a data team’ guide, but after reviewing the options we saw that only you can

decide which structure will work best for you.

19


Data is now a full-time job for engineers, architects, analysts and data

scientists who are grouped into teams – tasked with collecting and managing

data and deriving value from it. But one size doesn’t fit all. Each data team is

as individual and multifaceted as the companies they support, and most are

in a state of flux, continually evolving alongside their business.

In search of what makes up an ‘ideal’ data team composition, and the

optimal way for data professionals to interact with other teams, we asked our

customers how they structure their data teams, and how they operationalize

data across the business.

20


Example data team structures

Each company has its own, individual data requirements and a unique

approach to organizing the data team. Examples of data team structures that

we see often among Snowplow customers include the centralized team, a

distributed model and a structure of multiple data teams.

• The centralized data team is arguably the most straightforward

team structure to implement and a go to for companies who are

taking the first steps to become a data-informed organization.

This model can lead to a central data ‘platform’ that can serve

the rest of the business, enabling data professionals to work

towards their own key projects.

• The distributed model shares data resources with the rest of the

business by equipping other teams with individual data

professionals, sometimes with data ‘pods’ that might contain an

engineer and an analyst.

• Multiple data teams share data responsibilities such as data

engineering, data science and business intelligence. Choosing

multiple teams can be a robust solution for companies that handle

high-scale data operations, without wanting to ‘bloat’ a single

data team.

21


Tourlane offers customers hyper-personalized travel experiences, tailor-made

for them based on their individual interests by teams of experienced specialists.

Option 1: A ‘centralized’ data team

The centralized data team is a tried-and-tested team model that will allow

companies to deliver data with the least possible complexity. One advantage

of a central data team is that it can serve other teams while working towards

its own core business projects – it’s a flexible model that can adapt to the

changing needs of a growing business.

Perhaps it comes as no surprise that, among our customers, the centralized

data team was the most popular structural choice. Several of our customers

told us that the centralized model forms a basis for the data team to work on

long-term projects, while serving surrounding teams.

Some data teams, like at Tourlane, embrace the role of data ‘suppliers’ who

encourage inquiries from other teams for website or marketing-related data. For

Tourlane, the central data team is responsible for democratizing data insights.

22


“Our mindset across the company is to make data available to

everyone. We also hold internal training for team leads for

Metabase so they can get data themselves.” – Tourlane

Promoting a culture of self-serve data is also a core focus for Auto Trader’s

central data platform. Auto Trader has an experienced and capable team,

made up of data engineers, developers, analysts and data scientists, but they

also stress the importance of empowering other teams to help themselves

and preventing a bottleneck.

“Our teams are empowered to care about the analytics their

products are generating and the insights they want to drive.”

– Auto Trader

But Auto Trader’s data team is not one dimensional. By creating an agile

‘project team’, data engineers can get in the trenches alongside developers,

analysts, and data scientists to build product features together. In one such

project, Auto Trader’s data team is working on a cross-functional project to

enhance customer performance. For Auto Trader, as with many other data

teams, it is important to strike the right balance between making themselves

available to others while maintaining focus on core data projects.

23


A balancing act

The centralized data team is not without its challenges. As the first port of call

for any data-related queries from the rest of the business, it’s easy for a data

team to be pulled in so many different directions that it cannot focus on its

own tasks.

At Peak Labs, the data team faced exactly that challenge. Inundated with

demands from internal stakeholders, such the product team, Peak’s data

team were so busy with requests that they were forced to compromise on

their own endeavors.

People are habitual creatures, and despite efforts to limit outside

distractions, employees from other teams simply got used to approaching

individuals in the data team. Those approaches meant the team had to be

constantly code-switching.

“We were all becoming less productive because of the context

switching we were having to do multiple times per day.

Sometimes per hour!” – Peak

To tackle the issue, Dr. Emma Walker, Lead Data Scientist at Peak, drew up

new communication rules around data. She set up public Slack channels with

each team or project managers, and encouraged other teams to use those

channels as their first point of contact, rather than messaging individuals.

She also established ‘office hours’, when a member of the team would host

an hour-long data clinic in the company kitchen for employees to ask any

data-related question. These questions would range from finding user

information, how to track new features, determining the success of a

marketing campaign or even questions about GDPR.

24


Peak is a leading cognitive training subscription app built by

neuroscientists to help exercise mental skills such as memory,

mental agility, problem solving and language.

Peak’s proactive approach to data communication paid off. Now the team has

the headspace to focus on their long-term goals, while making data

accessible and approachable to other team members.

“By controlling the channels of communication, but making sure

that we have a daily presence in the office, we’ve almost entirely

eliminated the context switching that comes from questions over

Slack and have increased our ability to focus. It’s a win for

everyone.” – Peak

25


PEBMED is an app and web portal that supports doctors and healthcare

professionals to make clinical decisions with informative medical content.

Option 2: The ‘distributed’ data team

While a centralized team is limited in its ability to sit alongside others, the

distributed data team can work alongside existing business teams, such as

product and marketing. For some, the centralized data team is a stepping

stone on the journey to a distributed model, but the centralized and

decentralized models aren’t always mutually exclusive.

For PEBMED, decentralizing the data team was a process of incremental

steps. They first deployed a centralized team to build business-critical data

models, then augmented their product teams with two data analysts, before

moving to a system of distributed pods in 2020.

26


Animoto is a cloud-based, DIY video creation solution

that makes it easy to make impressive videos in minutes.

Option 2.5: The ‘hybrid’ data team

Taking a different approach, Animoto decided on a ‘hybrid’ between

centralized and distributed structures. Describing their system as ‘semiembedded’,

Animoto has a central analytics team while at the same time

equipping other teams with data ‘ambassadors’.

The ambassadors are responsible for data analytics within each team, as well

as coordinating a unified approach to data for the company overall. The

system works well, and means that Animoto has the best of both worlds

when it comes to the structure of their data function.

“On each team, we have somebody who is an ambassador. We’re

trying to democratize the data and we trust this person to be

more advanced and to help the other members of the team with

data analytics.” – Animoto

27


Omio (formerly GoEuro) is Europe’s leading online travel platform for booking

the fastest, cheapest and easiest journeys via train, bus and plane.

Option 3: Multiple ‘federated’ data teams

There are times when one data team just isn’t enough. As a company scales

and data volumes increase, it can be necessary to divide and conquer data

responsibilities to keep up with business demands.

At Omio, there is not just one data team, but three, divided by discipline.

Firstly, there is a large data engineering team that provides a central source

of business intelligence. Secondly, a smaller data team that supports

marketing, and finally a team dedicated to data science and insights – one of

the primary consumers of data provided by the data engineering team.

28


This ‘federated’ model allows Omio to operate a central data function, while

operating minor contingents that serve other parts of the business. Each

contingent can operate with a level of independence, without relying on one

large central team that slows down operations.

Omio may eventually transition to a fully distributed structure where each

team has its own data engineers and analysts. As they continue to grow, Omio

is focused on expanding their data capabilities, without bloating an individual

team so much that it is no longer agile enough to meet business demands.

“It’s a problem of scale. We started with four people last year,

we’re not a large team, and the demands are increasing. But it

doesn’t make sense to keep adding more and more people to

this big, central group.” – Omio

29


How Snowplow helps data teams

(of all shapes and sizes)

Whether you already have a thriving data function or you’re looking to

expand your data team, here’s how Snowplow can help you power your data

journey to success.

• Unified data collection: Snowplow enables you to unify your

data collection strategy and establish a shared tracking

methodology across the business.

• Data quality you can trust: With complete, accurate data from

Snowplow, your data team has access to high-quality data they can

rely on.

• Empowered data consumers: Snowplow helps you to empower

your data consumers such as analysts and data scientists with clean,

well-structured data that’s ready for use.

• Freedom and flexibility: Snowplow gives you complete freedom

to collect and model your data on your terms, with no vendor lock-in

or prescribed rules. That means you can manage your data delivery

in a way that makes the most sense for your business.

30

CHAPTER 3

BREAKING

COMMUNICATION

BARRIERS WITH A

UNIVERSAL LANGAGE


As companies increasingly invest in building out their data capability, it’s

important to keep the ultimate goal of collecting and analyzing data front of

mind: to take data-informed actions that drive business value.

However, as we discovered by talking with our customers, communication

barriers are the single biggest reason data doesn’t get actioned in the real world.

Several such barriers exist but let’s focus on two of the most important ones:

1 Front-end developers send non-uniform tracking diluting the quality,

and therefore the value of the data making it hard to consume

2 Data doesn’t get actioned because data consumers don’t know what

fields in the warehouse mean, this eventually leads to the

organization losing trust in their insights

32


UNENFORCED EVENT DICTIONARIES

The status quo: unenforced event dictionaries

To explore why these two issues occur, it’s important to look at how most

companies implement tracking. Often, an unenforced event dictionary

created by the tracking designer is at the center of the tracking

implementation.

For this to work well, the creator/owner of the unenforced event dictionary

must clearly communicate the design intent. For example, that a search

event should fire with these properties on search results being displayed –

rather than the search button being clicked. The design intent must be

made clear to both key stakeholders: the front-end developers and the

data consumers.

33


This approach does sometimes work, particularly when the dictionary owner

is invested in its long term success, perhaps as one of the data consumers.

However, the dictionary is often created by a specialist consultant and

ongoing ownership is unclear.

This results in long Slack threads with both sets of stakeholders asking what

rows in the sprawling event dictionary mean:

1 Devs can’t interpret the event dictionary and their goals and incentives

often don’t line up with ensuring tracking matches intent exactly,

instead they are focused on getting “good-enough” live on time.

2 Data consumers either can’t interpret the event dictionary or

aren’t sure if the values loading in the database match the data

dictionary intent.

34


The solution: a source of truth

in a universal language

Create one central source of truth – a ruleset for what data is allowed to load

to the warehouse. This ruleset is created by a designer in a standard format

(e.g. JSON schema) and can therefore be universally interpreted (human and

machine readable) and maintained long after their departure.

Going back to the two sets of stakeholders:

CREATING A SINGLE SOURCE OF TRUTH

35


1 A dev needs to set up tracking to conform to the ruleset because if

they don’t the data fails validation so they can be held accountable

by viewing the failed event logs.

2 This shifts the power to the consumers of the data as they can

collaborate to create the ruleset in a universal language (e.g. JSON

schema). They then control the structure of the data in the

warehouse (and other targets) and therefore have confidence in

what the input of their models will look like. Furthermore all newjoiners

to the data team know exactly what each field means.

No one needs to communicate design intent using their own tracking

conventions and no one is left to interpret this intent. As a result, no two

humans need to communicate directly – this breaks down communication

barriers to data being actioned.

36


What this could look like in practice

We can define what the events coming to your data warehouse look like

before they are even sent by writing a set of rules. For example, the ruleset for

a click event:

{

"element_name": {

"enum": [

"share",

"like",

"submit_email",

"rate",

...

"close_popup”

],

"description": "The name of the element that is clicked"

},

"value": {

"type": ["string", "null"],

"description": "What is the value associated with the click"

},

"element_location": {


"description": "Where on the screen is the button shown eg.

Top, left"

},

"click_error_reason": {


"description": "If the click resulted in an error, what was

the reason eg. Invalid character in text field"

}

}

37


Prior to loading the data to your warehouse, each event is checked to see if it

conforms to the rules laid out. There are two ways of doing this:

• If you are using a 3rd party data collection vendor such as

GA – validate client side

• If you have 1st party data collection such as a home-built pipeline

or Snowplow – validate in the data collection pipeline, prior to

warehouse loading

Either method means the structure of data in the warehouse is controlled

strictly by those consuming it.

Subset of properties sent

automatically for every event

Custom properties of the click event

User ID Platform Timestamp Event

Name

Element_name Value Element_location Click_error_reason

Joe

Web

2019-10-01

12:33:21

Page_view

Joe

Web

2019-10-01

12:33:29

Click submit_email joe@email.com homepage_footer

Joe

iOS

2019-10-01

23:31:03

Click rate no_rating_selected

With this simple change to the setup of introducing an enforced ruleset, your

front-end devs can finally QA your analytics in the same way as they would

QA the rest of any build, by adding to their integrated testing suite using

something like the open-source tool Micro.

38


How Snowplow approaches enforced workflows

Validating data up front enforces workflows around the ruleset of definitions.

At Snowplow, we have done some thinking around these workflows.

Snowplow is a first-party data delivery platform that validates events in the

pipeline prior to loading to targets. Good events load to the warehouse (and

other targets) while bad events are stored for debugging and reprocessing.

Snowplow tracking can also be versioned – definitions can be updated

according to semantic versioning with all changes automatically manifesting

in the warehouse table structure.

Typical tracking workflow:

1 Collaborate in a tracking design workbook

2 Upload the rules (event and entity definitions) to the pipeline

3 Test tracking against these rules in a sandbox environment

4 Set up integrated tests to ensure each code push takes analytics

into account

5 Set up alerting for any spike in events failing validation

Summary

The case for enforced rulesets:

• Front-end devs don’t need to interpret an unenforced event

dictionary packed full of naming conventions

• Consumers of the raw data don’t need to guess what keys and values mean

• High quality analytics in every code push given the wealth of QA

tooling that exists when working with machine readable rulesets

• Far less data cleaning required since data is validated up-front

39

CHAPTER 4

REDUCING DATA

DOWNTIME WITH

DATA OBSERVABILITY


Data downtime is a hot topic in data at the moment, and for obvious reasons.

The cost of data downtime – a term coined by Monte Carlo to refer to periods

where data is partial, erroneous, missing or otherwise inaccurate – can be

significant for companies who rely on behavioral data for decision making.

If making key strategic decisions based on inaccurate data or wasting

valuable time finding and diagnosing issues with data sounds commonplace,

then your company suffers from data downtime.

But how exactly does data downtime occur? And what can we do to eliminate it?

41


A real-life example of data downtime at Acme

Every Monday morning at 9am, a weekly strategy meeting takes place at Acme

with attendees dialling in from around the world. Ralph, the SVP of

Commerce runs through the numbers for the past week, and key decisions are

made for the week and month ahead. The report includes data from multiple

sources; from online and offline sales, to payments, promotions and so on.

The report lands in Ralph’s inbox ahead of the meeting every Monday, giving

him time to look through the data and prepare. However, this week there is a

problem. Ralph believes the numbers look off; he was expecting much better

performance last week and sends an urgent email out to the entire Data team

questioning the accuracy of the data and requesting that it is resolved as

soon as possible.

42


The team frantically tries to find the problem. Was Ralph correct? Is the data

inaccurate, or is there data missing? And if so, the matter is made worse by

the complex data stack at Acme, with multiple sources, pipelines, data

modelling jobs and siloed teams, all feeding this one business critical report.

Where is the issue or the bottleneck?

It takes valuable time to find, root cause and resolve the issue and by the

time it has been resolved, the weekly strategy meeting has already taken

place. Ralph has lost confidence in the data and so has the rest of the global

Leadership team who also had to run blind in this week’s strategy meeting.

43


The spiralling cost of detecting data quality

issues too late

This kind of scenario is not uncommon, but what is most damaging is

how far downstream the data quality issue is detected, making it significantly

more costly to Acme. It is far better to spot an issue, debug and resolve it

at the point it occurs and as far upstream as possible – in order to minimise

data downtime.

You do not want Ralph, or any other data consumer spotting your data

quality issues, or worse, using incomplete or inaccurate data to drive key

decisions. Towards the end of the graph, once the data is out and being used

by a plethora of data consumers, damage limitation is difficult to contain. At

least without eroding trust in the data.

44


The same applies for data being used in real time. If your product

recommendations engine isn’t using the freshest data, then your users are

going to be served outdated recommendations, negatively impacting the

user experience and harming your bottom line.

The need for data observability

The challenge outlined above is the exact problem that data observability

aims to fix. Data observability gives you transparency and control over the

health of your data pipeline, such that when an issue does occur you can

quickly understand:

1 Where is the problem?

2 Who needs to resolve it?

Knowing this information makes it possible to find and resolve issues far

quicker and minimize data downtime.

But, how is data observability any different to monitoring?

The best way to describe the difference is that monitoring covers the ‘known

unknowns’, whereas observability covers the ‘unknown unknowns’.

MONITORING

Known unknowns

Monitoring tells you when

something is wrong

Assumes you know

what questions to ask

OBSERVABILITY

Unknown unknowns

Doesn’t assume that

something is wrong

Assumes we don’t know what

all the questions are to ask

45


To take one example: as a Data Engineer, I know that I need to monitor the

CPU usage of a microservice. But what is the complete landscape of things

that could go wrong that could impact the delivery of complete and

accurate data?

It is impossible to predict every issue that could arise, and this is where

observability steps in. Data observability assumes we don’t know what all the

questions are to ask, and instead gives us visibility of the things that really

matter so that when something does go wrong we can investigate and

resolve it quickly.

46


Our approach to observability at Snowplow

What Ralph and the rest of the business really care about is whether they can

trust the data. Is it complete and is it fresh? Crucially, data observability

should align the more technical data with business outcomes so that

business and engineering teams are talking the same language and moving

in the same direction.

Our approach to observability at Snowplow is to focus on two key metrics;

throughput and latency. These are emitted from each part of the pipeline

and as a result, if a bottleneck occurs at any point then it is far simpler to

diagnose, allowing you to take corrective action immediately.

Our plan at Snowplow is to make it the most observable behavioural data

pipeline available. We have already added observability to our BigQuery

Loader, and we’ll soon be launching it within our RDB loader and Enrich

assets too.

47

CHAPTER 5

HOW DATA

STORYTELLING CAN

MAKE YOUR INSIGHTS

MORE EFFECTIVE


So, you have done it. Your trackers are in place, your data is clean, modeled

and easily available. All the information you need is at your fingertips.

But what about the business side of your company? What about your

colleagues in different departments and teams, who may be completely

unaware of where this data comes from and what it means? Ideally they

would be making data driven decisions as well and have a solid

understanding of where the numbers you’re producing are coming from.

This is where good data storytelling comes in.

49


What is data storytelling?

Storytelling is one of the oldest forms of education. Humans

struggle to process too much complex information, but are great

at remembering and retelling stories.

Data storytelling is the practice of transforming data into easily

understandable insights. You can have the most advanced technologies at

your disposal, but those won’t provide any value until you can tell the story

behind the numbers you’re producing.

Having worked with many online retailers, especially those coming from a

more traditional brick and mortar background, I can tell you that most raw

numbers don’t mean anything to a lot of people. Things like conversion rates

and marketing attribution are hard to do in a regular store and are even

harder to grasp as a concept online for those unfamiliar with online spaces.

But the stories behind the numbers are absolutely universal.

50

For example, a 30% increase in conversion rates might sound good, but on its

own, it’s a useless number. Perhaps the number of purchases went down

rapidly and only your most loyal customers kept ordering, or maybe you had

a very successful marketing attribution project which increased your

conversion rate beyond the 15% you had hoped for. Both stories could come

from the same number, but they provide very different insights depending on

who you’re telling them to.

Storytelling is powerful. Even someone who has only worked in a physical

store will recognize the effect of customers intrigued by a new display or

good demo, as well as the strength of word-of-mouth advertising, whether

that is done through conversation or a social media share.

So you can understand that this practice extends to all forms of showing and

explaining data. Whether you are presenting at a meeting, building

dashboards or writing guides. Whatever the form information is presented in,

it will always benefit from building a narrative to take the consumer on a

journey from data to business outcomes.

51


Why should I do it?

Storytelling is one of the oldest forms of education. Humans struggle to

process too much complex information, but are great at remembering and

retelling stories. And this is exactly what you want to do with your data. You

want people to remember the important information, act on it and draw the

right conclusions that will help them in their roles. The business critical choices

people made need to be rooted in the data you provide them with. And this can

only happen if they remember and understand what they have been told.

Beyond that, stories can be retold. New coworkers, other teams and

customers can all be provided with this information by anyone who can retell

the story. And if that happens, it can actually relieve the data team from

repetitive work, so they can focus on more exciting things like enhancing the

data or driving powerful data use cases.

Lastly, providing data as a story means that data literacy is no longer

expected or required from every team. Teams which are affected by the data,

but not directly involved in collecting or consuming it can still benefit from a

strong narrative. This way they do not have to invest in additional resources

to create understanding, but they can focus on their everyday work.

52


How do I tell a good (data) story?

Creating a good data story relies on three key aspects:

1 Know your audience

2 Build a compelling narrative

3 Make use of clear visuals

As with any presentation, knowing your audience is the best way to

communicate effectively. If you’re telling your story to a board of directors it

will look totally different from telling it to a team of customer service reps,

even if both stories rely on the same data set. So before you dive into creating

your narrative, it’s vital you empathize with the audience whom you’re

presenting to.

53


Ask yourself who they are, what knowledge they already have about the

subject and why would they be interested in what you’re about to tell them.

Emotions are strong drivers in humans – you want to ask yourself which

emotions you want to play on. Are the numbers you’re showing something to

be celebrated, or are you spinning a cautionary tale?

With that in mind, also consider the context in which you’re telling your story.

Going back to the earlier example about conversion, just highlighting a

number or metric won’t really tell anyone anything.

Think about what background information your audience needs to

appreciate what you are telling them. Any good storyteller knows that JRR

Tolkien’s The Hobbit is a very entertaining story, but everything that happens

becomes much more powerful if you have any notion of the plot in the Lord

of the Rings. Obviously the suggestion is not to write an extensive trilogy, but

do remember to properly set the scene. The right context will set your

audience up for success.

Once you know your audience and you have a good understanding of their

needs you can focus on building a robust narrative. Like any good story there

will be a beginning, middle and end. In the beginning you want to set the

scene and make sure the audience has the right context. Give them the

information they need to understand the meat of the story.

54


Context is key

After an introduction we come to the core of the story. This is where the data

is revealed, which is another important part to think out in advance. Make

sure to reveal the data in the right order, an understandable order, that the

audience can follow.

Your audience has to understand where something comes from, as well as

understand what all of this information is leading up to. Few people will

remember numbers out of context, but they will have a much better time

remembering them if they have the right context. And if they’re led through a

journey with a logical reveal at the end, they will be much more engaged with

the content as well.

Timing is also crucial. Give your audience the time to process the data and

information you’re sharing. Humans need time to process numbers, visuals and

complex information. Don’t bombard them with number after number. Give the

data context and allow people time to reach their own conclusions as well.

55


Wrap it up with the important parts

An important thing to remember is that visuals should add to

your story, not distract from it. Keep them clean and simple

Finally, at the end, wrap everything up. The data has been given proper

context and has been revealed and now you drive home the insights and take

home messages. Keep this simple. The best endings are short, sweet, and

easy to remember.

So with the story planned out, the next step is to think about your visuals.

Visuals are a great way to support learning, remembering and understanding.

An important thing to remember is that visuals should add to your story, not

distract from it. Keep them clean and simple. Too many colors, details and

moving parts will only serve as a distraction. While your audience is trying to

figure out what they are looking at, they are not listening to your story and

therefore missing important context or information. So illustrate the

important parts and leave out anything else.

So when you know your audience, your narrative and your visuals you can

tell a data story that captivates.

By doing this you create the opportunity for anyone listening to learn, to

remember and to act on the important information you have to share.

People remember stories, not raw numbers. And the rewards for telling a

good story will last a long time, form strong relationships, and others can

continue to share your story, giving you and your team more time to focus on

the next project.

56

snowplowanalytics.com

workingwithdata_ebook_april21_awc2op 4

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?