workingwithdata_ebook_april21_awc2op 4
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
TREATING DATA
AS A PRODUCT
TREATING DATA AS A PRODUCT
Table of Contents
1. The challenges of working in data in 2021 ......................................2
2. A guide to data team structures with examples ..........................18
3. Breaking communication barriers with a universal language ....32
4. Reducing data downtime with data observability ......................40
5. How data storytelling can make your insight s more effective ....48
1
CHAPTER 1
THE CHALLENGES
OF WORKING
IN DATA IN 2021
TREATING DATA AS A PRODUCT
It’s an exciting time to be working in data.
The opportunity for data teams to make an
impact is huge. They have more tools at their
disposal, with exciting technology hitting the
market each day, than ever before.
Hiring in data, despite a slowdown in 2020, continues to be hot, with new job
titles emerging like the Analytics Engineer and growing investment in data
capabilities, and the market is projected to be worth $103 billion by 2023.
Watch the webinar:
People and process in data with Alex Dean and Scott Breitenother
3
TREATING DATA AS A PRODUCT
But it’s not all roses. Building and managing a modern data capability is a
tremendous challenge. Aside from the technology involved, it requires
working closely with technologists with a broad range of skill sets. It means
working with (i.e. winning over) internal teams and stakeholders and
educating them around the value of data. Sometimes it’s a challenge just to
get a seat at the table and make the case for data in the business.
And even with all the right things in place: the tools, the people and buy-in
from the right stakeholders, there is only so much one team can achieve.
Small data teams can find themselves stretched, under-resourced while
overwhelmed by company demands.
To get a better understanding of the challenges of working in data, we spoke
to some of our customers, asking about the key pain points in their everyday
working lives. In this chapter, you’ll find featured snippets and quotes from
conversations with our customers.
The answers below were not gathered in a formal or scientific way, nor is the
list of challenges exhaustive. Nonetheless, we hope they shed some light on
the experience of a data professional in 2021.
N.B. Our quotes below have been anonymized and paraphrased to protect the privacy of our interviewees.
4
TREATING DATA AS A PRODUCT
Internal customers and responsibilities
During our conversations, the difficulties of working with internal customers
made up around 21% of the challenges mentioned.
When we think of ‘customers’ we might imagine people buying products
or services. But pretty much all the data professionals we spoke to listed
their primary ‘customers’ as internal colleagues. That could mean working
with finance, marketing, sales teams, operational teams, or sometimes even
the C-suite.
A crucial part of working with those teams was winning their trust. The data
professionals we talked to spoke about the need to win over these
colleagues, build strong relationships with them and serve them efficiently.
5
TREATING DATA AS A PRODUCT
‘I want my customers [internally] to trust the data. I want them to
be able to get the answers they’re looking for from the data, and
get them faster, eventually through self-service. Marketing,
finance, business development all depend on us on a daily basis.’
In a sense, data teams have to act a bit like internal detectives – investigating
what their internal customers need and building a plan to deliver it.
‘I try to understand what my counterparts in finance, accounting,
operations and marketing are trying to accomplish – and then
look ahead to see if we have the technology to do what they
want to do.’
‘I often say “if we want to deliver what product wants in 9
months, we have to start today’
6
7
TREATING DATA AS A PRODUCT
TREATING DATA AS A PRODUCT
Based on recent conversations with
data leaders and data practitioners
A breakdown of the main challenges facing data professionals,
based on conversations with Snowplow customers
Time is of the essence. There is pressure for data teams to deliver data
efficiently to their data consumers. Marketing and product teams don’t want
to wait around for data they need to execute use cases. Equally important is
getting the data at the right latency.
‘We are responsible for working with other engineering teams to
enhance and/or create data sets. We ingest the data, process it
and deliver it to where it needs to be’
8
TREATING DATA AS A PRODUCT
‘I want to make sure we can deliver data at the speed our
teams need it’
‘Our job is to help the business to work with data in the most
efficient way possible’
But above all, data teams mentioned that the data needs to be clear and
coherent. It needs to be a single source of truth, so all teams can be on the
same page.
‘Our main customers are our business units. We provide support
to partners, marketing, sales, account managers – we make
sure they’re all speaking the same language and looking at the
same data’.
9
TREATING DATA AS A PRODUCT
Tool evaluation
Evaluating and purchasing tools made up over 10% of the challenges
mentioned by our customers. With technology constantly evolving in the
data industry, perhaps this comes as no surprise.
A quick look at Indicative’s recent post on the ecosystem of modern data
infrastructure shows you that the challenge of choosing the right tools for the
data stack is no joke. Even with Indicative’s selection and helpful breakdown
of each tech category, it’s clear from their diagram that modern data teams
face a number of choices for how they build out their data capability.
10
TREATING DATA AS A PRODUCT
A view of the modern data infrastructure ecosystem by Indicative
11
TREATING DATA AS A PRODUCT
Our customers explained that tool evaluation (and selection) was an ongoing
challenge for them. They told us that picking the right tools demands
constant research, investigation, trialing and careful planning to ensure their
teams are well equipped.
‘[When I’m evaluating tools] I’m looking at what is our business
need, where are we going, how are we growing? – what projects
do we have on the table and are they staffed properly?’
‘A really big part of my job is “how do we not spend tons of money
on tools we’re not going to use?”’
When it comes to buying tools, our customers were clear that they preferred
to put in the research themselves before contacting a sales team. Not only
does this save them time in the long run, it also gives them the opportunity
to investigate not just pricing and features, but the ‘softer’ side of the tools –
e.g. is there a community? Are there other teams using this solution who I can
reach out to?
‘I do a lot of research. I like to know a lot about the product before
I call the sales person.’
12
TREATING DATA AS A PRODUCT
People and communication challenges
Challenges around people and communication were brought up the most,
making up around 29% of the difficulties our data professionals mentioned.
Communication was actually a common thread in all of our conversations
with our customers. It boils down to the need to be able to educate and
inspire confidence in internal clients – demonstrating the value in data and
data processes.
And data professionals, despite the stereotypes, are good communicators.
One customer summed it up perfectly:
‘Working agile forces you to be better at working with people.
The stereotype of the engineer working alone with headphones
on is totally wrong’.
13
TREATING DATA AS A PRODUCT
According to the customers we spoke to, technical ability isn’t enough. Data
professionals need to back it up with strong communication skills and the
ability to to guide people towards a common goal.
‘You can have the greatest technical skills but if you are an
ineffective communicator or ineffective at having influence to
move a group of smart people forward toward a common goal –
it’s really hard to get the job done’
‘Communication is 50% of the work. Understanding one’s
audience – whether executive level or at the front-line level, I have
to be able to calibrate my message for the audience.’
That’s not to say that technical skills are not important. But for data leaders,
it’s also about amplifying your team’s abilities with appropriate team
structures and processes in place. Put another way, the best tools cannot be
leveraged without the right people.
‘Tech is still really important. Tools position you well, but you can
have all the chess pieces in place but if the people aren’t ready,
you’re going to have real challenges’
14
TREATING DATA AS A PRODUCT
One of the biggest challenges for data teams is to prove their value, or the
value of their work. Our customers mentioned the constant battle to win buyin
from their colleagues and stakeholders.
Sometimes it’s about showcasing a new, exciting way of working with data.
But often (and more challenging) it’s the case that data leaders need to
convince colleagues that an investment in a certain data project is worth the
time and resources. In both cases, communication and the ability to ‘sell’ the
value in data are key.
‘We run showcases for the business to show new builds, new
features, how they work and how they impact peoples’ jobs’.
‘It’s hard to get stakeholders excited about a large investment
just for accurate insights’
‘When you talk to a developer their initial thoughts [around
tracking] are “this is going to take a lot more time.” It’s our job to
convince them of the value of that investment.’
At other times, education around data is crucial. Working with data means
handling a vital business asset that should be treated as such, especially
when sensitive data and customer information (PII) is involved.
15
TREATING DATA AS A PRODUCT
And while ‘self-service’ data is often the dream, data professionals are tasked
with coaching the rest of the organization on how they can find, understand
and work with their data. While some in the business (e.g. developers and
engineers) may already be data-proficient, that’s not always the case for
other stakeholders.
‘Self-serve is a huge challenge. We want lots of departments to be
able to access data and work with it. But it’s really hard to teach
everyone to be competent with data and use data safely.’
‘Building consensus can be challenging when there are so many
stakeholders involved with different levels of knowledge.’
Without education around data practices, resources can go to waste and
teams can grow frustrated with their data product. As one customer found,
it’s a long term challenge when internal teams are ‘sold the dream’ by a
packaged solution, only for it to go wrong in the longer term. This could have
been avoided if the data team was involved earlier in the evaluation process.
‘I would see marketing go out and buy some tool, get sold on the
pretty pictures and then they’d have us make a bunch of changes
to implement it, only then not see use it in the long run. This
happened a few times.’
16
TREATING DATA AS A PRODUCT
Working in data is a constant challenge
Between the continual demands of the organization, ongoing tool
evaluations, hiring, decision making and dealing with internal stakeholders –
data teams have it tough.
Thankfully the data industry is maturing. Each year there are more tools
available, more budding communities of helpful contributors and
examples of content to guide data teams towards success.
Sadly there’s no silver bullet to the challenges of working in data. But as
one customer summed it up, one key element might lie in hiring, educating
and equipping the right people, arguably the most important part of your
data capability.
‘Getting the ‘people bit’ right is actually the hardest part for
companies when it comes to data.’
Despite the challenges, data teams are still driving huge value in their
organizations, from empowering their colleagues with insights to turning
game-changing use cases into reality.
At Snowplow, we want to make it as easy as possible for data professionals to
manage and work with their behavioral data.
17
CHAPTER 2
A GUIDE TO
DATA TEAM
STRUCTURES
WITH EXAMPLES
TREATING DATA AS A PRODUCT
Data team structures with examples from
Snowplow customers
The business world is witnessing the rise of the data team. When companies
first worked with data departments, it was in fragmented silos, with
marketing teams, business intelligence (BI) teams, data scientists, engineers
and analysts within product teams, each handling data individually. Since
then, data has become recognized as a valuable business asset, and the
dedicated ‘data team’ (or teams) has emerged. Given the rapidly growing
data driven organizations we work with, we wanted to write the ‘how to build
a data team’ guide, but after reviewing the options we saw that only you can
decide which structure will work best for you.
19
TREATING DATA AS A PRODUCT
Data is now a full-time job for engineers, architects, analysts and data
scientists who are grouped into teams – tasked with collecting and managing
data and deriving value from it. But one size doesn’t fit all. Each data team is
as individual and multifaceted as the companies they support, and most are
in a state of flux, continually evolving alongside their business.
In search of what makes up an ‘ideal’ data team composition, and the
optimal way for data professionals to interact with other teams, we asked our
customers how they structure their data teams, and how they operationalize
data across the business.
20
TREATING DATA AS A PRODUCT
Example data team structures
Each company has its own, individual data requirements and a unique
approach to organizing the data team. Examples of data team structures that
we see often among Snowplow customers include the centralized team, a
distributed model and a structure of multiple data teams.
• The centralized data team is arguably the most straightforward
team structure to implement and a go to for companies who are
taking the first steps to become a data-informed organization.
This model can lead to a central data ‘platform’ that can serve
the rest of the business, enabling data professionals to work
towards their own key projects.
• The distributed model shares data resources with the rest of the
business by equipping other teams with individual data
professionals, sometimes with data ‘pods’ that might contain an
engineer and an analyst.
• Multiple data teams share data responsibilities such as data
engineering, data science and business intelligence. Choosing
multiple teams can be a robust solution for companies that handle
high-scale data operations, without wanting to ‘bloat’ a single
data team.
21
TREATING DATA AS A PRODUCT
Tourlane offers customers hyper-personalized travel experiences, tailor-made
for them based on their individual interests by teams of experienced specialists.
Option 1: A ‘centralized’ data team
The centralized data team is a tried-and-tested team model that will allow
companies to deliver data with the least possible complexity. One advantage
of a central data team is that it can serve other teams while working towards
its own core business projects – it’s a flexible model that can adapt to the
changing needs of a growing business.
Perhaps it comes as no surprise that, among our customers, the centralized
data team was the most popular structural choice. Several of our customers
told us that the centralized model forms a basis for the data team to work on
long-term projects, while serving surrounding teams.
Some data teams, like at Tourlane, embrace the role of data ‘suppliers’ who
encourage inquiries from other teams for website or marketing-related data. For
Tourlane, the central data team is responsible for democratizing data insights.
22
TREATING DATA AS A PRODUCT
“Our mindset across the company is to make data available to
everyone. We also hold internal training for team leads for
Metabase so they can get data themselves.” – Tourlane
Promoting a culture of self-serve data is also a core focus for Auto Trader’s
central data platform. Auto Trader has an experienced and capable team,
made up of data engineers, developers, analysts and data scientists, but they
also stress the importance of empowering other teams to help themselves
and preventing a bottleneck.
“Our teams are empowered to care about the analytics their
products are generating and the insights they want to drive.”
– Auto Trader
But Auto Trader’s data team is not one dimensional. By creating an agile
‘project team’, data engineers can get in the trenches alongside developers,
analysts, and data scientists to build product features together. In one such
project, Auto Trader’s data team is working on a cross-functional project to
enhance customer performance. For Auto Trader, as with many other data
teams, it is important to strike the right balance between making themselves
available to others while maintaining focus on core data projects.
23
TREATING DATA AS A PRODUCT
A balancing act
The centralized data team is not without its challenges. As the first port of call
for any data-related queries from the rest of the business, it’s easy for a data
team to be pulled in so many different directions that it cannot focus on its
own tasks.
At Peak Labs, the data team faced exactly that challenge. Inundated with
demands from internal stakeholders, such the product team, Peak’s data
team were so busy with requests that they were forced to compromise on
their own endeavors.
People are habitual creatures, and despite efforts to limit outside
distractions, employees from other teams simply got used to approaching
individuals in the data team. Those approaches meant the team had to be
constantly code-switching.
“We were all becoming less productive because of the context
switching we were having to do multiple times per day.
Sometimes per hour!” – Peak
To tackle the issue, Dr. Emma Walker, Lead Data Scientist at Peak, drew up
new communication rules around data. She set up public Slack channels with
each team or project managers, and encouraged other teams to use those
channels as their first point of contact, rather than messaging individuals.
She also established ‘office hours’, when a member of the team would host
an hour-long data clinic in the company kitchen for employees to ask any
data-related question. These questions would range from finding user
information, how to track new features, determining the success of a
marketing campaign or even questions about GDPR.
24
TREATING DATA AS A PRODUCT
Peak is a leading cognitive training subscription app built by
neuroscientists to help exercise mental skills such as memory,
mental agility, problem solving and language.
Peak’s proactive approach to data communication paid off. Now the team has
the headspace to focus on their long-term goals, while making data
accessible and approachable to other team members.
“By controlling the channels of communication, but making sure
that we have a daily presence in the office, we’ve almost entirely
eliminated the context switching that comes from questions over
Slack and have increased our ability to focus. It’s a win for
everyone.” – Peak
25
TREATING DATA AS A PRODUCT
PEBMED is an app and web portal that supports doctors and healthcare
professionals to make clinical decisions with informative medical content.
Option 2: The ‘distributed’ data team
While a centralized team is limited in its ability to sit alongside others, the
distributed data team can work alongside existing business teams, such as
product and marketing. For some, the centralized data team is a stepping
stone on the journey to a distributed model, but the centralized and
decentralized models aren’t always mutually exclusive.
For PEBMED, decentralizing the data team was a process of incremental
steps. They first deployed a centralized team to build business-critical data
models, then augmented their product teams with two data analysts, before
moving to a system of distributed pods in 2020.
26
TREATING DATA AS A PRODUCT
Animoto is a cloud-based, DIY video creation solution
that makes it easy to make impressive videos in minutes.
Option 2.5: The ‘hybrid’ data team
Taking a different approach, Animoto decided on a ‘hybrid’ between
centralized and distributed structures. Describing their system as ‘semiembedded’,
Animoto has a central analytics team while at the same time
equipping other teams with data ‘ambassadors’.
The ambassadors are responsible for data analytics within each team, as well
as coordinating a unified approach to data for the company overall. The
system works well, and means that Animoto has the best of both worlds
when it comes to the structure of their data function.
“On each team, we have somebody who is an ambassador. We’re
trying to democratize the data and we trust this person to be
more advanced and to help the other members of the team with
data analytics.” – Animoto
27
TREATING DATA AS A PRODUCT
Omio (formerly GoEuro) is Europe’s leading online travel platform for booking
the fastest, cheapest and easiest journeys via train, bus and plane.
Option 3: Multiple ‘federated’ data teams
There are times when one data team just isn’t enough. As a company scales
and data volumes increase, it can be necessary to divide and conquer data
responsibilities to keep up with business demands.
At Omio, there is not just one data team, but three, divided by discipline.
Firstly, there is a large data engineering team that provides a central source
of business intelligence. Secondly, a smaller data team that supports
marketing, and finally a team dedicated to data science and insights – one of
the primary consumers of data provided by the data engineering team.
28
TREATING DATA AS A PRODUCT
This ‘federated’ model allows Omio to operate a central data function, while
operating minor contingents that serve other parts of the business. Each
contingent can operate with a level of independence, without relying on one
large central team that slows down operations.
Omio may eventually transition to a fully distributed structure where each
team has its own data engineers and analysts. As they continue to grow, Omio
is focused on expanding their data capabilities, without bloating an individual
team so much that it is no longer agile enough to meet business demands.
“It’s a problem of scale. We started with four people last year,
we’re not a large team, and the demands are increasing. But it
doesn’t make sense to keep adding more and more people to
this big, central group.” – Omio
29
TREATING DATA AS A PRODUCT
How Snowplow helps data teams
(of all shapes and sizes)
Whether you already have a thriving data function or you’re looking to
expand your data team, here’s how Snowplow can help you power your data
journey to success.
• Unified data collection: Snowplow enables you to unify your
data collection strategy and establish a shared tracking
methodology across the business.
• Data quality you can trust: With complete, accurate data from
Snowplow, your data team has access to high-quality data they can
rely on.
• Empowered data consumers: Snowplow helps you to empower
your data consumers such as analysts and data scientists with clean,
well-structured data that’s ready for use.
• Freedom and flexibility: Snowplow gives you complete freedom
to collect and model your data on your terms, with no vendor lock-in
or prescribed rules. That means you can manage your data delivery
in a way that makes the most sense for your business.
30
CHAPTER 3
BREAKING
COMMUNICATION
BARRIERS WITH A
UNIVERSAL LANGAGE
TREATING DATA AS A PRODUCT
As companies increasingly invest in building out their data capability, it’s
important to keep the ultimate goal of collecting and analyzing data front of
mind: to take data-informed actions that drive business value.
However, as we discovered by talking with our customers, communication
barriers are the single biggest reason data doesn’t get actioned in the real world.
Several such barriers exist but let’s focus on two of the most important ones:
1 Front-end developers send non-uniform tracking diluting the quality,
and therefore the value of the data making it hard to consume
2 Data doesn’t get actioned because data consumers don’t know what
fields in the warehouse mean, this eventually leads to the
organization losing trust in their insights
32
TREATING DATA AS A PRODUCT
UNENFORCED EVENT DICTIONARIES
The status quo: unenforced event dictionaries
To explore why these two issues occur, it’s important to look at how most
companies implement tracking. Often, an unenforced event dictionary
created by the tracking designer is at the center of the tracking
implementation.
For this to work well, the creator/owner of the unenforced event dictionary
must clearly communicate the design intent. For example, that a search
event should fire with these properties on search results being displayed –
rather than the search button being clicked. The design intent must be
made clear to both key stakeholders: the front-end developers and the
data consumers.
33
TREATING DATA AS A PRODUCT
This approach does sometimes work, particularly when the dictionary owner
is invested in its long term success, perhaps as one of the data consumers.
However, the dictionary is often created by a specialist consultant and
ongoing ownership is unclear.
This results in long Slack threads with both sets of stakeholders asking what
rows in the sprawling event dictionary mean:
1 Devs can’t interpret the event dictionary and their goals and incentives
often don’t line up with ensuring tracking matches intent exactly,
instead they are focused on getting “good-enough” live on time.
2 Data consumers either can’t interpret the event dictionary or
aren’t sure if the values loading in the database match the data
dictionary intent.
34
TREATING DATA AS A PRODUCT
The solution: a source of truth
in a universal language
Create one central source of truth – a ruleset for what data is allowed to load
to the warehouse. This ruleset is created by a designer in a standard format
(e.g. JSON schema) and can therefore be universally interpreted (human and
machine readable) and maintained long after their departure.
Going back to the two sets of stakeholders:
CREATING A SINGLE SOURCE OF TRUTH
35
TREATING DATA AS A PRODUCT
1 A dev needs to set up tracking to conform to the ruleset because if
they don’t the data fails validation so they can be held accountable
by viewing the failed event logs.
2 This shifts the power to the consumers of the data as they can
collaborate to create the ruleset in a universal language (e.g. JSON
schema). They then control the structure of the data in the
warehouse (and other targets) and therefore have confidence in
what the input of their models will look like. Furthermore all newjoiners
to the data team know exactly what each field means.
No one needs to communicate design intent using their own tracking
conventions and no one is left to interpret this intent. As a result, no two
humans need to communicate directly – this breaks down communication
barriers to data being actioned.
36
TREATING DATA AS A PRODUCT
What this could look like in practice
We can define what the events coming to your data warehouse look like
before they are even sent by writing a set of rules. For example, the ruleset for
a click event:
{
"element_name": {
"enum": [
"share",
"like",
"submit_email",
"rate",
...
"close_popup”
],
"description": "The name of the element that is clicked"
},
"value": {
"type": ["string", "null"],
"description": "What is the value associated with the click"
},
"element_location": {
"type": ["string", "null"],
"description": "Where on the screen is the button shown eg.
Top, left"
},
"click_error_reason": {
"type": ["string", "null"],
"description": "If the click resulted in an error, what was
the reason eg. Invalid character in text field"
}
}
37
TREATING DATA AS A PRODUCT
Prior to loading the data to your warehouse, each event is checked to see if it
conforms to the rules laid out. There are two ways of doing this:
• If you are using a 3rd party data collection vendor such as
GA – validate client side
• If you have 1st party data collection such as a home-built pipeline
or Snowplow – validate in the data collection pipeline, prior to
warehouse loading
Either method means the structure of data in the warehouse is controlled
strictly by those consuming it.
Subset of properties sent
automatically for every event
Custom properties of the click event
User ID Platform Timestamp Event
Name
Element_name Value Element_location Click_error_reason
Joe
Web
2019-10-01
12:33:21
Page_view
Joe
Web
2019-10-01
12:33:29
Click submit_email joe@email.com homepage_footer
Joe
iOS
2019-10-01
23:31:03
Click rate no_rating_selected
With this simple change to the setup of introducing an enforced ruleset, your
front-end devs can finally QA your analytics in the same way as they would
QA the rest of any build, by adding to their integrated testing suite using
something like the open-source tool Micro.
38
TREATING DATA AS A PRODUCT
How Snowplow approaches enforced workflows
Validating data up front enforces workflows around the ruleset of definitions.
At Snowplow, we have done some thinking around these workflows.
Snowplow is a first-party data delivery platform that validates events in the
pipeline prior to loading to targets. Good events load to the warehouse (and
other targets) while bad events are stored for debugging and reprocessing.
Snowplow tracking can also be versioned – definitions can be updated
according to semantic versioning with all changes automatically manifesting
in the warehouse table structure.
Typical tracking workflow:
1 Collaborate in a tracking design workbook
2 Upload the rules (event and entity definitions) to the pipeline
3 Test tracking against these rules in a sandbox environment
4 Set up integrated tests to ensure each code push takes analytics
into account
5 Set up alerting for any spike in events failing validation
Summary
The case for enforced rulesets:
• Front-end devs don’t need to interpret an unenforced event
dictionary packed full of naming conventions
• Consumers of the raw data don’t need to guess what keys and values mean
• High quality analytics in every code push given the wealth of QA
tooling that exists when working with machine readable rulesets
• Far less data cleaning required since data is validated up-front
39
CHAPTER 4
REDUCING DATA
DOWNTIME WITH
DATA OBSERVABILITY
TREATING DATA AS A PRODUCT
Data downtime is a hot topic in data at the moment, and for obvious reasons.
The cost of data downtime – a term coined by Monte Carlo to refer to periods
where data is partial, erroneous, missing or otherwise inaccurate – can be
significant for companies who rely on behavioral data for decision making.
If making key strategic decisions based on inaccurate data or wasting
valuable time finding and diagnosing issues with data sounds commonplace,
then your company suffers from data downtime.
But how exactly does data downtime occur? And what can we do to eliminate it?
41
TREATING DATA AS A PRODUCT
A real-life example of data downtime at Acme
Every Monday morning at 9am, a weekly strategy meeting takes place at Acme
with attendees dialling in from around the world. Ralph, the SVP of
Commerce runs through the numbers for the past week, and key decisions are
made for the week and month ahead. The report includes data from multiple
sources; from online and offline sales, to payments, promotions and so on.
The report lands in Ralph’s inbox ahead of the meeting every Monday, giving
him time to look through the data and prepare. However, this week there is a
problem. Ralph believes the numbers look off; he was expecting much better
performance last week and sends an urgent email out to the entire Data team
questioning the accuracy of the data and requesting that it is resolved as
soon as possible.
42
TREATING DATA AS A PRODUCT
The team frantically tries to find the problem. Was Ralph correct? Is the data
inaccurate, or is there data missing? And if so, the matter is made worse by
the complex data stack at Acme, with multiple sources, pipelines, data
modelling jobs and siloed teams, all feeding this one business critical report.
Where is the issue or the bottleneck?
It takes valuable time to find, root cause and resolve the issue and by the
time it has been resolved, the weekly strategy meeting has already taken
place. Ralph has lost confidence in the data and so has the rest of the global
Leadership team who also had to run blind in this week’s strategy meeting.
43
TREATING DATA AS A PRODUCT
The spiralling cost of detecting data quality
issues too late
This kind of scenario is not uncommon, but what is most damaging is
how far downstream the data quality issue is detected, making it significantly
more costly to Acme. It is far better to spot an issue, debug and resolve it
at the point it occurs and as far upstream as possible – in order to minimise
data downtime.
You do not want Ralph, or any other data consumer spotting your data
quality issues, or worse, using incomplete or inaccurate data to drive key
decisions. Towards the end of the graph, once the data is out and being used
by a plethora of data consumers, damage limitation is difficult to contain. At
least without eroding trust in the data.
44
TREATING DATA AS A PRODUCT
The same applies for data being used in real time. If your product
recommendations engine isn’t using the freshest data, then your users are
going to be served outdated recommendations, negatively impacting the
user experience and harming your bottom line.
The need for data observability
The challenge outlined above is the exact problem that data observability
aims to fix. Data observability gives you transparency and control over the
health of your data pipeline, such that when an issue does occur you can
quickly understand:
1 Where is the problem?
2 Who needs to resolve it?
Knowing this information makes it possible to find and resolve issues far
quicker and minimize data downtime.
But, how is data observability any different to monitoring?
The best way to describe the difference is that monitoring covers the ‘known
unknowns’, whereas observability covers the ‘unknown unknowns’.
MONITORING
Known unknowns
Monitoring tells you when
something is wrong
Assumes you know
what questions to ask
OBSERVABILITY
Unknown unknowns
Doesn’t assume that
something is wrong
Assumes we don’t know what
all the questions are to ask
45
TREATING DATA AS A PRODUCT
To take one example: as a Data Engineer, I know that I need to monitor the
CPU usage of a microservice. But what is the complete landscape of things
that could go wrong that could impact the delivery of complete and
accurate data?
It is impossible to predict every issue that could arise, and this is where
observability steps in. Data observability assumes we don’t know what all the
questions are to ask, and instead gives us visibility of the things that really
matter so that when something does go wrong we can investigate and
resolve it quickly.
46
TREATING DATA AS A PRODUCT
Our approach to observability at Snowplow
What Ralph and the rest of the business really care about is whether they can
trust the data. Is it complete and is it fresh? Crucially, data observability
should align the more technical data with business outcomes so that
business and engineering teams are talking the same language and moving
in the same direction.
Our approach to observability at Snowplow is to focus on two key metrics;
throughput and latency. These are emitted from each part of the pipeline
and as a result, if a bottleneck occurs at any point then it is far simpler to
diagnose, allowing you to take corrective action immediately.
Our plan at Snowplow is to make it the most observable behavioural data
pipeline available. We have already added observability to our BigQuery
Loader, and we’ll soon be launching it within our RDB loader and Enrich
assets too.
47
CHAPTER 5
HOW DATA
STORYTELLING CAN
MAKE YOUR INSIGHTS
MORE EFFECTIVE
TREATING DATA AS A PRODUCT
So, you have done it. Your trackers are in place, your data is clean, modeled
and easily available. All the information you need is at your fingertips.
But what about the business side of your company? What about your
colleagues in different departments and teams, who may be completely
unaware of where this data comes from and what it means? Ideally they
would be making data driven decisions as well and have a solid
understanding of where the numbers you’re producing are coming from.
This is where good data storytelling comes in.
49
TREATING DATA AS A PRODUCT
What is data storytelling?
Storytelling is one of the oldest forms of education. Humans
struggle to process too much complex information, but are great
at remembering and retelling stories.
Data storytelling is the practice of transforming data into easily
understandable insights. You can have the most advanced technologies at
your disposal, but those won’t provide any value until you can tell the story
behind the numbers you’re producing.
Having worked with many online retailers, especially those coming from a
more traditional brick and mortar background, I can tell you that most raw
numbers don’t mean anything to a lot of people. Things like conversion rates
and marketing attribution are hard to do in a regular store and are even
harder to grasp as a concept online for those unfamiliar with online spaces.
But the stories behind the numbers are absolutely universal.
50
For example, a 30% increase in conversion rates might sound good, but on its
own, it’s a useless number. Perhaps the number of purchases went down
rapidly and only your most loyal customers kept ordering, or maybe you had
a very successful marketing attribution project which increased your
conversion rate beyond the 15% you had hoped for. Both stories could come
from the same number, but they provide very different insights depending on
who you’re telling them to.
Storytelling is powerful. Even someone who has only worked in a physical
store will recognize the effect of customers intrigued by a new display or
good demo, as well as the strength of word-of-mouth advertising, whether
that is done through conversation or a social media share.
So you can understand that this practice extends to all forms of showing and
explaining data. Whether you are presenting at a meeting, building
dashboards or writing guides. Whatever the form information is presented in,
it will always benefit from building a narrative to take the consumer on a
journey from data to business outcomes.
51
TREATING DATA AS A PRODUCT
Why should I do it?
Storytelling is one of the oldest forms of education. Humans struggle to
process too much complex information, but are great at remembering and
retelling stories. And this is exactly what you want to do with your data. You
want people to remember the important information, act on it and draw the
right conclusions that will help them in their roles. The business critical choices
people made need to be rooted in the data you provide them with. And this can
only happen if they remember and understand what they have been told.
Beyond that, stories can be retold. New coworkers, other teams and
customers can all be provided with this information by anyone who can retell
the story. And if that happens, it can actually relieve the data team from
repetitive work, so they can focus on more exciting things like enhancing the
data or driving powerful data use cases.
Lastly, providing data as a story means that data literacy is no longer
expected or required from every team. Teams which are affected by the data,
but not directly involved in collecting or consuming it can still benefit from a
strong narrative. This way they do not have to invest in additional resources
to create understanding, but they can focus on their everyday work.
52
TREATING DATA AS A PRODUCT
How do I tell a good (data) story?
Creating a good data story relies on three key aspects:
1 Know your audience
2 Build a compelling narrative
3 Make use of clear visuals
As with any presentation, knowing your audience is the best way to
communicate effectively. If you’re telling your story to a board of directors it
will look totally different from telling it to a team of customer service reps,
even if both stories rely on the same data set. So before you dive into creating
your narrative, it’s vital you empathize with the audience whom you’re
presenting to.
53
TREATING DATA AS A PRODUCT
Ask yourself who they are, what knowledge they already have about the
subject and why would they be interested in what you’re about to tell them.
Emotions are strong drivers in humans – you want to ask yourself which
emotions you want to play on. Are the numbers you’re showing something to
be celebrated, or are you spinning a cautionary tale?
With that in mind, also consider the context in which you’re telling your story.
Going back to the earlier example about conversion, just highlighting a
number or metric won’t really tell anyone anything.
Think about what background information your audience needs to
appreciate what you are telling them. Any good storyteller knows that JRR
Tolkien’s The Hobbit is a very entertaining story, but everything that happens
becomes much more powerful if you have any notion of the plot in the Lord
of the Rings. Obviously the suggestion is not to write an extensive trilogy, but
do remember to properly set the scene. The right context will set your
audience up for success.
Once you know your audience and you have a good understanding of their
needs you can focus on building a robust narrative. Like any good story there
will be a beginning, middle and end. In the beginning you want to set the
scene and make sure the audience has the right context. Give them the
information they need to understand the meat of the story.
54
TREATING DATA AS A PRODUCT
Context is key
After an introduction we come to the core of the story. This is where the data
is revealed, which is another important part to think out in advance. Make
sure to reveal the data in the right order, an understandable order, that the
audience can follow.
Your audience has to understand where something comes from, as well as
understand what all of this information is leading up to. Few people will
remember numbers out of context, but they will have a much better time
remembering them if they have the right context. And if they’re led through a
journey with a logical reveal at the end, they will be much more engaged with
the content as well.
Timing is also crucial. Give your audience the time to process the data and
information you’re sharing. Humans need time to process numbers, visuals and
complex information. Don’t bombard them with number after number. Give the
data context and allow people time to reach their own conclusions as well.
55
TREATING DATA AS A PRODUCT
Wrap it up with the important parts
An important thing to remember is that visuals should add to
your story, not distract from it. Keep them clean and simple
Finally, at the end, wrap everything up. The data has been given proper
context and has been revealed and now you drive home the insights and take
home messages. Keep this simple. The best endings are short, sweet, and
easy to remember.
So with the story planned out, the next step is to think about your visuals.
Visuals are a great way to support learning, remembering and understanding.
An important thing to remember is that visuals should add to your story, not
distract from it. Keep them clean and simple. Too many colors, details and
moving parts will only serve as a distraction. While your audience is trying to
figure out what they are looking at, they are not listening to your story and
therefore missing important context or information. So illustrate the
important parts and leave out anything else.
So when you know your audience, your narrative and your visuals you can
tell a data story that captivates.
By doing this you create the opportunity for anyone listening to learn, to
remember and to act on the important information you have to share.
People remember stories, not raw numbers. And the rewards for telling a
good story will last a long time, form strong relationships, and others can
continue to share your story, giving you and your team more time to focus on
the next project.
56
snowplowanalytics.com