pdfcoffee

soumyasankar99
from soumyasankar99 More from this publisher
09.05.2023 Views

Chapter 12In a TFX pipeline, a unit of data, called an artifact, is passed between components.Normally a component has one input artifact and one output artifact. Every artifacthas an associated metadata that defines its type and properties. The artifact typedefines the ontology of artifacts in the entire TFX system, while the artifact propertyspecifies the ontology specific to an artifact type. Users have the option to extend theontology globally or locally.TFX pipeline componentsThe following diagram shows the flow of data between different TFX components:Flow of data between TFX componentsAll the images in the TFX section have been adapted from theTensorFlow Extended official guide: https://www.tensorflow.org/tfx/guide.To begin with we have ExampleGen, which ingests the input data, and can also splitthe input dataset. The data then flows to StatisticsGen, which calculates the statisticsof the dataset. Then comes SchemaGen, which examines the statistics and createsa data schema; then an ExampleValidator, which looks for anomalies and missingvalues in the data; and Transform, which performs feature engineering in thedataset. The transformed dataset is then fed to the Trainer, which trains the model.The performance of the model is evaluated using Evaluator and ModelValidator.Finally, if all is well, the Pusher deploys the model on the serving infrastructure.[ 457 ]

TensorFlow and CloudTFX librariesTFX provides several Python packages that are used to create pipeline components.Quoting from the TensorFlow Extended User Guide (https://www.tensorflow.org/tfx/guide).These packages are the libraries which you will use to create the components ofyour pipelines so that your code can focus on the unique aspects of your pipeline.Different libraries included in TFX are:• TensorFlow Data Validation (TFDV) is a library for analyzing andvalidating machine learning data• TensorFlow Transform (TFT) is a library for preprocessing data withTensorFlow• TensorFlow is used for training models with TFX• TensorFlow Model Analysis (TFMA) is a library for evaluating TensorFlowmodels• TensorFlow Metadata (TFMD) provides standard representations formetadata that are useful when training machine learning models withTensorFlow• ML Metadata (MLMD) is a library for recording and retrieving metadataassociated with ML developers and data scientists' workflowsThe following diagram demonstrates the relationship between TFX libraries andpipeline components:Figure 7: Relationships between TFX libraries and pipeline components, visualized[ 458 ]

Chapter 12

In a TFX pipeline, a unit of data, called an artifact, is passed between components.

Normally a component has one input artifact and one output artifact. Every artifact

has an associated metadata that defines its type and properties. The artifact type

defines the ontology of artifacts in the entire TFX system, while the artifact property

specifies the ontology specific to an artifact type. Users have the option to extend the

ontology globally or locally.

TFX pipeline components

The following diagram shows the flow of data between different TFX components:

Flow of data between TFX components

All the images in the TFX section have been adapted from the

TensorFlow Extended official guide: https://www.tensorflow.

org/tfx/guide.

To begin with we have ExampleGen, which ingests the input data, and can also split

the input dataset. The data then flows to StatisticsGen, which calculates the statistics

of the dataset. Then comes SchemaGen, which examines the statistics and creates

a data schema; then an ExampleValidator, which looks for anomalies and missing

values in the data; and Transform, which performs feature engineering in the

dataset. The transformed dataset is then fed to the Trainer, which trains the model.

The performance of the model is evaluated using Evaluator and ModelValidator.

Finally, if all is well, the Pusher deploys the model on the serving infrastructure.

[ 457 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!