MPEG-2 video compression standard

MPEG-2 video compression

Esa Eklund
Media Brewery
Telecommunications Software and Multimedia Laboratory
Helsinki University of Technology

History of MPEG standard

The Moving Picture Experts Group (MPEG) is a working group under ISO/IEC in charge of the development of international standards for compression, decompression, processing and coded representation of moving pictures, audio and their combination. So far MPEG has produced the following standards

MPEG-1 was issued in 1992. It was designed for video which would be played mainly from data storages, such as cd-roms or hard disks. The datarate of 1,5 Mbit/s was too high to be used in network applications.

At the time when MPEG-1 was released the work for MPEG-2 was well on way. MPEG-2 was issued in 1994 and the initial target was to issue a standard for coding of TV-pictures with CCIR Rec. 601 resolution at data rates below 10 Mbit/s. In 1992 the scope of MPEG-2 was enlarged to suit coding of HDTV as well. Typically MPEG-2 compression is in the range on 1:30.

The video coding scheme used in MPEG-2 is again generic and similar to the one of MPEG-1, however with further refinements and special consideration of interlaced sources. Furthermore many functionalities such as "scalability" were introduced. In order to keep implementation complexity low for products not requiring the full video input formats supported by the standard (e.g. SIF to HDTV resolutions), so called "Profiles", describing functionalities, and "Levels", describing resolutions, were introduced to provide separate MPEG-2 conformance levels.

MPEG-3 standard was originally planned for HDTV. However these requirements could be met with MPEG-2 definition and the MPEG-3 became obsolete.

MPEG-4 standardize algorithms and tools for coding and flexible representation of audio-visual data to meet the challenges of future Multimedia applications and applications requirements. The standard is planned to be available by the end of 1998.

Other standards for video compression:

Quicktime layer

Developed by Apple Computer, Quicktime is a cross platform product. Compression is achieved through codecs, such as Cinepac, Indeo, Sorenson and so on. Compression rate depends on the codec used and it may vary from 1:5 to MPEG level and higher. Version 3 contains channels for several video channels and multimedia components and interactivity. There is also available QTVR for 360 panorama pictures. Quicktime is widely used in digital video editing and CD-ROM products.

MJPEG

Consists of JPEG encoded still frames. MJPEG video is only intra-frame compressed. MJPEG format is also used in video-editing systems.

The Basics of Video

Live video picture consists of still frames which are produced in a serial manner, generally more than 15 frames per second to create the illusion of moving image. Each frame is devided into two fields, top field consisting of the odd lines of the image and bottom field of the even liness of the image. Frame is displayed so that the odd lines are first drawn on the screen and then the even lines are produced. This process is called interlacing.

There are two major standards used worldwide with some variations.

PAL video systems are used in Europe. PAL video fields consists of fields of 288 lined with the resolution of 768 pixels. The field display rate is 50 Hz which adds up to 25 frames per second and the display resolution is therefore 768*576.

NTCS video is used in North America and Japan. NTCS field consist of 243 lines with the resolution of 720 pixels. Refresh rate is 60 Hz which adds up to 29,97 (or 30) frames per seconds, depending on the implementation. Display resolution is 720*486.

In both PAL and NTSC systems the actual size of video is wider but it is narrowed down on the video screen. The figure 1 shows how much of the signal is shown on in the digital television standard.

Figure 1. PAL and NTSC resolutions according to digital television standard CCIR 601.

MPEG compression

Video data is compressed in several phases in the MPEG process. First the RGB video signal is transformed in to YUV components which adapts better to human eye physiology. Then the compression takes place as intra-frame compression and inter-frame compression. In intra-frame compression the information in the frame is used to compress the picture. Inter-frame compression uses motion-prediction and common elements in succeeding frames to compress the video further. Sound is also compressed using CD-quality sound as source.

Intra-frame compression

Video frame is presented in RGB format, in which there is a signal reserved for red, green and blue components of the image. However, this signal processing does not consider human eye physiology. Human eye is more sensitive to the changes is brigtness than in the colour. Therefore the first step in the compression is to transform the RGB video signal to YUV format.

YUV signal presents the video as a luminance (brightness) component and two chrominance components. Human eye is more sensitive to changes in luminance than in chrominance and therefore this signal can safely downsampled without compromising in the image quality.

MPEG-1 supports 4:2:0 downsampling, in which both horizontal and vertical chrominance level is half of the luminance resolution. This means that only half as much information is used for the U and V chrominance components than what is used for the Y luminance component.

MPEG-2 supports also studio quality level 4:2:2 downsampling in which only horizontal chrominance resolution is half of the luminance resolution.

The next step of the compressio is to devide the screen into 16*16 sized macroblocks which are further compressed using Discrete Cosine Transformation DCT. Discrete cosine transformation is a mathematical operation similar to Fourier transformation. The result is a similar 16*16 matrix, which stores the high-frequency information in the top-left corner of the matrix and the low-frequency information towards the bottom-right corner.

The resulting DCT transformed matrix is quantized, so that 8 bits are used to describe the values in each matrix element. Actually MPEG-2 supports up to 11 bits for the quantized values. This is the phase where most of the compression occurs.

Inter-frame compression

Inter-frame compression uses the similarities in succeeding frames and the fact that some macroblocks remain in succeeding frames or move inside the frame.

When a similar macroblock is found a motion estimation vector is created for it. If there are changes in the frame the differences can be calculated and further compressed using DCT transformation technique.

MPEG stream

MPEG video stream consists of three kind of frames. I-frames are intra-frame coded and they are independent frames. P-frames or predicted frames are frames which are created using motion prediction based on I-frames. Third class of frames are B or Bi-directional frames, which depend on both the previous and succeeding frames around the B-frame.

Typical MPEG data stream could consist of following type of frames:

I B B P B B P B B P B B I ...

It should be noted that because B frames depend on the frames around them the actual trasmission order of the frames is different. In the above case it would be

I P B B P B B P B B I B B ...

I-frames are usually sent twice a second. They are needed to correct possible errors which can appear during the transmission. These appear as small blocks on the screen for a fraction of a second. I and P frames can be sent more often if desired so or B frames can be dropped altogether. This however will increase the data-rate of the MPEG stream.

References

MPEG Digital Video-Coding Standards, Thomas Sikora, IEEE Signal Processing Magazine, September 1997.

Moving Picture Expert Group Information Page, http://www.vol.it/MPEG/.

Summary: A introduction to the video and audio compression standard MPEG, http://www.mpeg1.de/mpegfaq/.

MPEG pointers and Resources, http://www.mpeg.org/index.html/.