NC Feb-Mar 2023
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
FEATURE: DATA PERSPECTIVES<br />
petabytes of data globally within an<br />
enterprise and for most, the challenge can<br />
seem daunting. Successful enterprises can<br />
navigate through these requirements,<br />
however - but it can be costly.<br />
Businesses in regulated industries must<br />
prove that they can effectively store a<br />
certain type of sensitive data, but also<br />
must be able to prove that, when<br />
permitted, this data no longer exists<br />
anywhere. If this data has been deleted,<br />
but still shows up in any form or location,<br />
the data can still be recalled, which can<br />
bring on litigation cases against the<br />
business. It does not matter whether the<br />
enterprise is aware of the existence of the<br />
rogue data or not, they are still liable. The<br />
Data Life Cycle Management process is<br />
the sequence of the creation, use,<br />
retention, and eventual erasure of data. In<br />
some industries, the duration of this life<br />
cycle can span decades or more.<br />
The associated costs and burden which<br />
businesses have of managing data,<br />
tracking its movements, replication, and<br />
its locations, can place a massive strain<br />
on the ability of an organisation to<br />
conduct the business they need to do. The<br />
issue is the "use" phase of the data's life<br />
cycle - how do businesses make data<br />
useful, accessible, and analysable across<br />
a vast web of multinational regulations,<br />
without losing track of it? The answer is<br />
perhaps simpler than expected - leave the<br />
data in place, where it is safe and<br />
controllable - leave the original as the<br />
original.<br />
Even though analysing data-in-place<br />
sounds like an easy solution to this<br />
industry problem, it is not the first time this<br />
approach has been tried. The problem of<br />
network latency comes into play - it is not<br />
sufficient to just access the original data<br />
where it persists from anywhere. The race<br />
between network latency and data size<br />
has been a back-and-forth struggle<br />
throughout the history of computer<br />
networking. Even as the world gets<br />
digitally smaller, network latencies can<br />
make accessing data seem too far away<br />
to be efficiently analysed with high<br />
performance analytical databases engines<br />
which are already on the market.<br />
WHAT IS THE SOLUTION TO<br />
NETWORK LATE<strong>NC</strong>Y?<br />
There are three primary types of network<br />
latency which are; latency caused by<br />
distance, latency caused by congestion,<br />
and latency caused by the network design<br />
itself, intentionally or by accident.<br />
Combinations of these latency types in the<br />
same network, makes the issue much<br />
worse. All three types can, however, cause<br />
analytic access to data to be too slow to<br />
be useful, which reduces the usable<br />
throughput, which is required to gain<br />
insight from critical data, to outright<br />
intolerable.<br />
The instinctive solution is to place the<br />
data near the processing engines, where it<br />
is needed. This means copying data to<br />
local storage locations to give the data<br />
local performance access. However, this<br />
creates a whole new set of issues.<br />
Needing to keep track of where all of<br />
these data copies are located and when<br />
the use of the data is completed and<br />
removing the data from all locations, can<br />
be difficult and costly. This includes<br />
tracking down potential locally backed up<br />
copies and any off-site media copies, and<br />
local disaster recovery replicas in those<br />
remote locations.<br />
The simplest and most practical<br />
solution is to leave the original data in<br />
place. This is possible today with the<br />
combination of technologies<br />
which are already on the<br />
market. When businesses<br />
choose the right<br />
combination, they could optimise<br />
latencies in Wide Area Networks (WAN)<br />
and potentially increase throughput by<br />
over seven-times when directly compared<br />
to the same WAN by itself.<br />
Businesses could be able to use as much<br />
as 95 percent of the WAN connection to<br />
analyse data where it is stored versus<br />
copying and staging the data closer to<br />
their analytic engines. That is global<br />
analytics with data-in-place and at scale.<br />
This frees up IT teams to solve bigger<br />
issues, rather than needing to keep track<br />
of where sensitive data is being copied.<br />
They can manage and control data where<br />
they need to. This also has the potential<br />
to minimise regulatory requirements as<br />
some regulations allow the transient<br />
inflight use of data versus the persistence<br />
of data in other countries.<br />
The cost savings and reduced<br />
management spent could also play a role<br />
in planning data access methods.<br />
Combining the right technology is perfect<br />
for on-premises, private, hybrid, public<br />
and multi-cloud environments where long<br />
network latency might halt enterprises<br />
from being able to fully leverage access to<br />
their sensitive data. <strong>NC</strong><br />
WWW.NETWORKCOMPUTING.CO.UK @<strong>NC</strong>MagAndAwards FEBRUARY/MARCH <strong>2023</strong> NETWORKcomputing 27