Transcoding 101 (Tutorial) - PBS
Transcoding 101 (Tutorial) - PBS
Transcoding 101 (Tutorial) - PBS
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Transcoding</strong> <strong>101</strong>
Introductions<br />
• Who is Rhozet<br />
– Spun out from Canopus in 2004<br />
– Maker of ProCoder and Carbon Coder<br />
– Provides transcoding for Yahoo!, Amazon, Microsoft, Hulu, CBS,<br />
NBC, Turner, BBC, Fox, Discovery, Lifetime, etc.<br />
– Acquired by Harmonic in 2007<br />
• Who is Harmonic<br />
– Leading equipment provider to cable, satellite, and IPTV networks<br />
– Publicly traded (NASDAQ:HLIT)<br />
– 690 people, $365M in revenue
This <strong>Transcoding</strong> Thing…<br />
• The process of converting one format to another<br />
– Facilitates moving media across production, post-production,<br />
archival, and delivery ecosystems<br />
– Acts as the “glue” between different manufacturers<br />
– Provides future proofing<br />
• Allows repurposing and monetization of assets<br />
– Every destination viewer has different requirements<br />
– Allows for the automated creation of custom assets<br />
(commercials, promos, logos, etc.)<br />
• Enables advanced workflows<br />
– Sending a file to 10 people who all see it differently depending<br />
on their needs (preview, timecode, language…)<br />
– Database integration<br />
• An engine that can be integrated into various applications<br />
and devices
• Codecs<br />
• Profiles<br />
• Containers<br />
• Formats<br />
• Platforms<br />
<strong>Transcoding</strong> Terminology
Codecs<br />
• Codec = Compressor/Decompressor<br />
– The software or hardware engine that moves uncompressed frames<br />
into the compressed domain (and vice versa)<br />
• Typically “lossy”<br />
– Reduction in information at each encode<br />
• Typically asymetrical<br />
– Decompression is often 10x (or more) faster than compression<br />
• Techniques<br />
– Subsampling<br />
• 4:2:0 vs. 4:2:2 or 4:4:4<br />
• 8 bit vs. 10 bit color resolution<br />
– Transformation and simplification<br />
• Discard high-frequency changes in color<br />
• DCT (MPEG-2)<br />
• Wavelet (JPEG-2000)<br />
• Intra-frame = within a single frame<br />
– Motion analysis and estimation<br />
• Most video frames are similar to the ones around them<br />
• Inter-frame = between multiple frames
Codecs (cont.)<br />
• GOP – Group of Pictures<br />
– Set of frames (max 15 for MPEG-2)<br />
– All P and B references to frames are within the GOP<br />
• I: intra-frame<br />
– Only data within the frame is used<br />
• P: predicted frame<br />
– Variation from previous I or P frame<br />
– ~twice the compression of I frame<br />
• B: bi-directionally predicted frame<br />
– Variation from previous or future frames<br />
– ~twice the compression of P frame<br />
I P B B P B B P B B
Codecs (cont.)<br />
• Codecs using intra-frame compression<br />
– DV, MPEG-2 (IMX), AVC-Intra, JPEG-2000, DNxHD, etc.<br />
– Typically acquisition and editing formats<br />
• Codecs using inter-frame compression<br />
– H.264, MPEG-2 LongGOP, WMV, VP6, etc.<br />
– Typically distribution formats<br />
• Standards like H.264 only specify how to decode the<br />
content<br />
– Allows for compatibility while leaving room for innovation in<br />
compression techniques
Profiles (and Levels)<br />
• A “Profile” defines a specific type of compression for a particular<br />
codec<br />
– Defines the syntax that is supported<br />
– The decoder must match the encoder’s profile support<br />
– A codec vendor does not have to support all possible profiles<br />
– A “Level” defines maximum resolution and data rate<br />
• H.264 Examples<br />
– Baseline Profile (BP): limited computing power required for decode<br />
– High Profile (HiP): primary profile for broadcast and BluRay<br />
– High 4:2:2 Profile (Hi422P): 4:2:2 chroma<br />
• MPEG-2 Examples<br />
– Main Profile@Main Level (MP@ML): standard def at max 15Mb/s<br />
– Main Profile@High Level (HP@HL): up to HD at 80Mb/s<br />
– 4:4:4 Profile@High Level (422P@HL): supports 4:2:2 chroma
Containers<br />
• AKA “wrappers”<br />
• A container can contain multiple types of codecs<br />
• A container can contain more than just video<br />
– Animation, music, speech, text, subtitles, etc.<br />
• A container is used to identify, interleave, and synchronize<br />
the various components<br />
– Critically important for successful playback<br />
– Most of the idiosyncrasies of a particular device or distribution<br />
medium are expressed in the container specifications<br />
– Single biggest source of incompatibility is in containers rather<br />
than codecs<br />
• Example containers<br />
– QuickTime, AVI, ASF, WMV, MXF, M2TS, M2PS, MP4, VOB,<br />
LXF, GXF, WAV, 3GPP
Formats<br />
• The combination of a container and a specified set of<br />
codecs (essence) and metadata<br />
– Example: M2TS with H.264 (HP) video and MPEG-1 Layer 2<br />
audio<br />
• In more detail includes parameters<br />
– M2TS<br />
– H.264 video<br />
• 720x480, 29.97fps, upper field first<br />
• CBR, 3 Mbps data rate,<br />
• High profile, 3.2 Level, ATSC closed-captioning<br />
• …and about 50 other parameters<br />
– MPEG-1 Layer 2 audio<br />
• Stereo, 16-bits per sample, 48Khz sample rate<br />
• 128 Kbps data rate
Platforms<br />
• The device on which a particular format will be played back,<br />
archived, edited, etc.<br />
• Formats can be platform dependent or independent<br />
– MPEG-1 is platform independent<br />
– Flash and WMV are platform dependent (Flash Media Player and<br />
Windows Media Player respectively)<br />
• Just to make things confusing, codecs, containers, formats,<br />
and platforms can all be named similarly<br />
– For example, MPEG-2 is both a codec and a container
The <strong>Transcoding</strong> Pipeline<br />
Video<br />
Decode<br />
Video<br />
Transform<br />
Video<br />
Encode<br />
DeMultiplex<br />
Multiplex<br />
Audio<br />
Decode<br />
Audio<br />
Transform<br />
Audio<br />
Encode<br />
• Multipex = “wrapping” in the container<br />
• Transform = scale, frame rate, crop, logos, concatenation, filtering, etc.<br />
• Different transcoders can yield very different results even if they use the same codecs
Transformation In The Transcode<br />
Basic Video Operations<br />
– Frame size conversion<br />
– Frame rate conversion<br />
– Color space conversion<br />
– Aspect ratio conversion<br />
– Interlace/De-interlace conversion<br />
– Telecine / inverse telecine<br />
– PAL/NTSC conversion<br />
– SD/HD conversion<br />
– Cropping<br />
Video Processing<br />
– Fade in/out<br />
– Black/white correction<br />
– Blur<br />
– Color correction<br />
– Gamma correction<br />
– NTSC-safe<br />
– Median<br />
– Rotate<br />
– Sharpen<br />
– Temporal noise reduction<br />
Audio Processing<br />
– Normalize<br />
– Fade In/Out<br />
– Low-pass<br />
– Volume<br />
– Dynamic range compressor<br />
Additional Operations<br />
– Timecode imprint<br />
– Subtitle/CC imprint<br />
– XML controllable titler<br />
– Metadata transport and conversion<br />
– Line 21/CC preservation/conversion<br />
– Quality checking<br />
– Logo insertion<br />
– 601/709 color space support<br />
– Video capture board support<br />
– Multiple simultaneous target outputs<br />
– Edit decision list support<br />
– Remote job submission<br />
– Batch processing<br />
– Watch folder automation<br />
– Segment extraction/insertion<br />
– FTP delivery
The Full <strong>Transcoding</strong> Workflow<br />
Acquire<br />
Transcode<br />
Deliver<br />
- Capture<br />
- FTP<br />
-MAM<br />
- etc.<br />
QA<br />
- Decode<br />
- Transform<br />
- Encode<br />
QA<br />
- DRM<br />
- FTP<br />
- MAM<br />
- etc.<br />
• QA steps can be automated, manual, or skipped<br />
• <strong>Transcoding</strong> can be controlled manually, with watch folders, or via API’s
Calculating Performance<br />
• Stream-based systems<br />
– Live transformation of video<br />
– Realtime supporting one or more streams<br />
– Typically dedicated hardware<br />
• File-based systems<br />
– Can be slower or faster than real-time<br />
– Typically generic hardware<br />
– Dependency on CPU and network storage performance<br />
– Benchmarks are extremely dependent on target settings<br />
• One pass versus two pass<br />
• Size of GOP<br />
• 70+ parameters in H.264…
Calculating Performance (cont.)<br />
• Estimate number of hours of source content per day<br />
• Perform test encodes from sample source file to approved target<br />
type<br />
– Test on appropriate machine<br />
– Calculate in units of RT (real time)<br />
– 10 minute source transcoding in 5 minutes = 2X RT<br />
– 10 minute source transcoding in 20 minutes = .5X RT<br />
– Check CPU utilization – can do more than one transcode at a time<br />
• Example<br />
– 16 hours per day of incoming source, SD MPEG-2<br />
– Transcode to 4 outputs<br />
• Transcode performance is 1.5RT, 2.2RT, .8RT, and .5RT for the targets<br />
– 16 hrs/1.5 + 16 hrs/2.2 + 16 hrs/.8 + 16 hrs/.5 = 70 hrs transcoding/day<br />
– Each machine does 24 hrs of transcoding/day<br />
– 70/24 = 2.91<br />
– 3 machines to do all the formats
The Great Thing About Standards…<br />
H.264 MXF DPX Flash AAC M2TS<br />
MPEG-2<br />
DPS WMV<br />
VOB<br />
Dolby<br />
DVCPro100<br />
VC-1 MPEG-4 AVC-Intra DV50 M2PS 3GPP<br />
JPEG-2000<br />
DNxHD<br />
3G2<br />
ASF<br />
F4V<br />
OPAtom<br />
DV25 DVCPro AVI HDV<br />
MP4<br />
GXF<br />
OP1a QuickTime LXF WAV<br />
MPEG-1<br />
AC-3 Omneon WAV DivX<br />
AVCHD
Why Can’t We Just Use One Format<br />
• Specific purposes<br />
– Acquisition/editing (highest quality, generational fidelity, direct<br />
frame access)<br />
– Distribution (bandwidth, acceptable quality)<br />
• Hardware restrictions<br />
– Set top boxes<br />
– Cable bandwidth<br />
– Mobile phone processing power<br />
• Money<br />
– Manufacturer “lock-in”<br />
– Platform ownership<br />
– Royalties<br />
– Rhozet needs your business
What’s Next<br />
• Reduction in bitrate for the same quality<br />
• Will there be a codec twice as “good” as H.264<br />
Ghanbari et al, University of Sussex
JPEG-2000 vs. I-frame H.264<br />
• Mathematical “quality”<br />
– PSNR – Peak to peak Signal to Noise Ratio<br />
– Comparing compressed image with pristine source<br />
– H.264 is slightly better at all bitrates<br />
• Blurring<br />
– H.264 better at all bitrates<br />
• Blocking<br />
– JPEG-2000 better at low bit rates since uses transform on<br />
entire image rather than macro-blocks<br />
• At high bit rates, these two formats are indistinguishable<br />
– For archival purposes, consider them identical<br />
– Principal deciding factor is wrappers and workflow
Place Your Bets…<br />
• Acquisition<br />
– H.264 (AVC-Intra)<br />
• Television<br />
– H.264 in M2TS<br />
• Web<br />
– H.264 in MP4/F4V<br />
– WMV/VC-1 in ASF<br />
• Mobile<br />
– H.264 in MP4, 3GPP<br />
• Archiving<br />
– Whatever you acquired in<br />
• <strong>Transcoding</strong><br />
– Even if everything is in H.264 you will still be transcoding
Beyond <strong>Transcoding</strong><br />
• Watermarking & Fingerprinting<br />
• DRM<br />
• Smooth Streaming<br />
• Royalties<br />
• ROI
Watermarking and Fingerprinting<br />
• Watermarking<br />
– Invisible information embedded into image data, typically<br />
embedding data in color frequency information<br />
– Can be used to track individual assets<br />
– Philips/Teletrax (now Civolution), Thomson NexGuard, Dolby<br />
Cinea, etc.<br />
– Embedder, investigator, manager, database<br />
– Watermarks can “step” on each other<br />
• Fingerprinting<br />
– No information embedded into file<br />
– Audio and video are sampled to create “fingerprint” that can be<br />
searched and matched against central database<br />
– No tracking ability for individual versions of assets<br />
– Civolution, Vobile, Audible Magic, YouTube, etc.
Digital Rights Management (DRM)<br />
• Technology to restrict unauthorized use or distribution of<br />
content<br />
– Most computer-based systems use a combination of a<br />
license server and public key encryption which ties specific<br />
content to a specific machine or device<br />
– Reduces but does not eliminate piracy<br />
– All broadly deployed technologies have been beaten<br />
• Popular Systems<br />
– FairPlay (Apple, iTunes and iPod specific)<br />
– Windows Media DRM (Microsoft, Windows or SilverLight)<br />
– Flash DRM (Adobe, Flash specific)<br />
– MagicGate (Sony, PSP and MemoryStick specific)
Emerging Streaming Technologies<br />
• Adobe Dynamic Streaming<br />
– Available in new Flash Media Server<br />
– True streaming protocol RTMP (Adobe created)<br />
– Switches between mulitple files encoded at different bitrates<br />
• Microsoft Smooth Streaming<br />
– Not really “streaming”, but rather smart download<br />
– Uses HTTP rather than RTSP<br />
– Encodes video at different bitrates in many small “chunks”<br />
– Player requests “chunks” of different bitrates depending on<br />
available connection speed<br />
– Requires Silverlight player and Microsoft IIS server
H.264 Royalties<br />
• H.264 patent pool administered by MPEG-LA<br />
– Four categories: title-by-title, subscription, free TV, free internet<br />
• Title-by-title (includes VOD and Disc)<br />
– No royalty for 1M<br />
• Free TV<br />
– $2,500 one-time per AVC transmission encoder or…<br />
– $2,500 annual per Broadcast Market of 100K to 500K, $5K for 500K to 1M,<br />
$10K > 1M<br />
• Free Internet<br />
– No royalty before 2011<br />
– After that, no more than for “economic equivalent” of free television<br />
– Unclear exactly how that applies
• You have questions…<br />
Q&A
Thank You!<br />
• Demos, white papers, case studies, performance guides,<br />
etc. are all available at www.rhozet .com