Audiotek

Notes about the video format

DV is a professional and consumer digital videotape standard created by a consortium often listed with Matsushita (Panasonic) as its first stakeholder. MiniDV is a standard cassette format which allows two hours to be recorded on a tape not much larger than a matchbox, and tapes of this size are common in consumer camcorders and small professional cameras. All standard DV formats use a tape width of 6.3 mm. DV is also known as Digital Video Cassette (DVC.)

DV competes with the Sony-controlled Digital 8 format in the consumer domain and with DVCam and DVCPro in the professional domain. DVCam and DVCPro cameras and decks can play DV tape (and are often viewed as legitimate extensions to DV for high-end professional users,) and MiniDV cassettes provided a suitable adaptor is used. All these are digital tape types which use very similar bit-level encodings—the difference is at the image filtering and compression level, as well as the mechanical level (e.g. the distance between the video stripes on the tapes.)

For a solid grounding in video, you should read the Audiotek Notes about the VHS video format and Notes about the SVHS video format. These notes can provide a deeper understanding of digital video for readers of Notes on the DVD video format; it should be considered a companion to that document.

Who are consumers and professionals?

If you are asking this question, you are likely to be a consumer. If you are ready to put up and defend an answer, you are almost certainly a professional. (Or at least, as some professionals would put it, a "prosumer.") Some hobbyists verge on being professionals, and it is fair to describe them as such even if they don't derive a living from their work. The important distinction is that professionals care very deeply about the quality of the end product, and will go to great lengths and costs to achieve it. Consumers care too of course, but usually only to the extent that "VHS/Video 8 is bad" and "Digital is good."

DV is interesting as it straddles the boundary between consumer and professional in a rather unique way. DV has, more than any other format, put broadcast-quality capture and editing equipment into the hands of consumers at a moderate cost. At the same time it has revolutionized the professional production chain, allowing faster editing, more creative composition and mind-blowingly high-quality results.

The format itself, therefore, cannot be used to distinguish a professional from a consumer, only the complexity of the equipment can begin to gauge that. There is a wide range of equipment which uses the MiniDV format, from micro cameras which professionals would only consider using in a hidden camera exposé, right up to 3CCD^[1] over-the-shoulder cameras with optical viewfinders.

What does DV offer over its consumer competitors?

DV leaves all analogue capture formats well behind—this includes Hi 8 and SVHS. However, to some extent, low-end digital video cameras can produce results as weak as high-end Hi 8 and SVHS cameras, particularly in terms of colour accuracy. DV is capable of producing a analogue horizontal resolution of 500 lines on a display device with a 4:3 aspect. This is marginally greater than what is often termed "broadcast quality." Colour resolution outstrips the analogue formats by 2:1 or better (but bear in mind the aforementioned caveat that low-end equipment may hamstring this.)

Audio in DV is far superior to its analogue counterparts also. DV offers either two channels at 48 kHz, with 16 bit samples (better than CD quality,) or four channels at 32 kHz, with 12 bit samples. DV supports audio dubbing in this mode, which is beyond Hi 8 and SVHS equipment—only linear audio dubbing with a relatively poor analogue resolution is possible on these legacy formats. DV audio is not compressed, and suffers no generational loss.

The Sony Digital 8 format has exactly the same digital characteristics as MiniDV, but can be recorded onto Video 8 or Hi 8 tapes, the Sony 8 mm wide tape formats. The tape transport speed is increased from the analogue formats to allow for the digital data to be stored more reliably. This means that a 90 minute tape (PAL) will hold only 60 minutes. (A 120 minute NTSC tape will also hold only 60 minutes.)

The main advantage MiniDV offers over Digital 8 is that a wider range of equipment supports the MiniDV format, and this includes professional DV gear which uses the 6.3 mm tape width—DV, DVCam and DVCPro. Digital 8 is only useful when backwards compatibility with the analogue transports of Video 8 and Hi 8 is required.

The physics of DV

DV, DVCam and DVCPro all use 6.3 mm wide tapes. (Digital 8 uses an 8 mm wide tape.) DV can be recorded at SP or LP speed, but compatibility with professional DVCam and DVCPro decks and cameras is only assured if SP speed is used. (Digital 8 tapes are not compatible with DV, DVCam or DVCPro decks or cameras, and only support an SP speed.)

A DV cassette is 125 × 78 × 14.6 mm. A MiniDV cassette is 66 × 48 × 12 mm, which is a similar size to Digital Audio Tape (DAT.) DVCPro "L" profile cassettes are the same size as DV cassettes, but use a different tape formulation. DVCPro also has its own "M" profile cassettes used for field work: these are a little larger than MiniDV cassettes.

Metal Evaporated (ME) formats

In the manufacture of Metal Evaporated (ME) tapes, iron is evaporated onto the base plastic film, which can result in a relatively abrasive surface which requires specially formulated lubricants to protect the tape and heads from wear. ME is cheaper to produce than the alternative Metal Particle (MP.)

DV SP uses a track pitch of 10 µm recorded at a rate of 18.81 mms^-1, and has no control track. DV tapes typically store four hours of video. MiniDV tapes typically store 60 minutes of video.

DV LP uses a track pitch of 6.7 µm recorded at a rate of 12.60 mms^-1, and has no control track. MiniDV 60 minute tapes store 90 minutes of video in LP mode.

DVCam uses a track pitch of 15 µm recorded at a rate of 28.21 mms^-1, and has no control track. It would be more informatively named "DV Fine," as it is essentially an additional mode beyond DV SP that can be employed when recording to standard DV tapes. DVCam fits three hours on a four hour DV tape, and 40 minutes on a MiniDV cassette. DVCam is backed by Sony.

Metal Particle (MP) formats

In the manufacture of Metal Particle (MP) tapes, iron particles are bonded onto the base plastic film, and coated with an anti-oxidant formulation. MP is therefore more expensive to produce than ME, but it is more conducive to a long-life for both tapes and equipment due to its glass-like finish.

DVCPro or SMPTE (Society of Motion Picture and Television Engineers) D7 uses a track pitch of 18 µm recorded at a rate of 33.82 mms^-1, and has control and audio cue tracks to better support linear and insert editing. DVCPro "L" profile cassettes typically store 123 minutes, while "M" profile cassettes store 63 minutes. DVCPro is backed by Matsushita (Panasonic.)

Physics are independent of colour system

There is no distinction made at the physical level for different television colour systems like PAL and NTSC. For comparison of DV with SVHS and VHS, those formats use a 39 µm (PAL) track pitch (on a 12.6 mm wide tape.) Video 8 and Hi 8 use a 24 µm (PAL) track pitch.

How is DV encoded?

DV is defined in the "Blue Book," later revised to form IEC 61834, a publication of the International Electrotechnical Commission. DV achieves 5:1 compression over the raw digital signal components by applying a Discrete Cosine Transform (DCT) to each frame of video, in a similar way to the Joint Photographic Experts Group's (JPEG) method for compressing still images. Unlike MPEG, no attempt is made to compress along the temporal axis, so that each frame in DV is effectively a "key frame." Like JPEG and MPEG, DV is a lossy compression scheme which sacrifices some image detail, hopefully limited to that which the human eye is less sensitive.

The 5:1 ratio means that the DV stream contains 3.6 MB of data for each second of video—which works out to about 100 kB per frame. This ratio is supposed to be the digital equivalent of a recorded bandwidth of 5.7 MHz with a signal to noise exceeding 54 dB. This places DV better than UVW Betacam, a professional analogue acquisition format, and much better than analogue television broadcast bandwidths which are typically only 4 MHz.

At 3.6 MBs^-1, a 60 minute MiniDV tape can store well over 12 GB of compressed video data, which is far in excess of the capacity of DVD-9, the dual layer DVD format. The luminance signal is recorded at the same fidelity of about 720 × 588 (PAL,) but the chrominance signal is reduced to one quarter bandwidth using either a 4:2:0 (PAL) or 4:1:1 (NTSC) subsampling scheme. The reason why 12GB of DV is only 60 minutes is because DV compresses individual frames only, and in order to keep a high quality, it errs on the side of caution with respect to how much it compresses each frame so as not to lose image detail in the capture phase. This is less important in DVD, as it is at the delivery end of the digital content spectrum.

It is worth looking at how colour subsampling works. First of all, it is used because the human eye is much less sensitive to colour (chrominance) variations than it is to light level (luminance) variations. Therefore, a better overall image quality can be achieved if more of the available bitstream is dedicated to luminance than is dedicated to chrominance. Even analogue video systems attempt to favour luminance over chrominance. DVD (SMPTE 4:2:2) chooses a ratio of 2:1 with its 4:2:2 colour subsampling. DV chooses 4:1. The differences are summarized by the following diagrams:

The intesections of grid lines show where in time and space samples are taken. Each sphere represents a colour sample in one of the Y (luminance) or C_b, C_r (chrominance) domains taken at the adjacent grid intersection. Because PAL and NTSC frames are interlaced, with odd lines coming from one field (a) and even lines coming from another field (b) which are separated in time by either 0.02 (PAL) or 0.017 (NTSC) seconds, there is some additional complexity in when (i.e. at what time) the chrominance samples are taken.

Beyond the compressed video data, the DV bitstream also contains subcode information including timecode as well as unlocked or locked audio samples, as well as redundant data for the purpose of error correction. This error correction provides protection against both transient read errors and also more permanent tape dropouts. Head clogs and other more sustained errors cannot be compensated for, however, so DV can still fail to deliver a continuous clear image at times. Visible or audible dropouts are supposed to occur at a rate of less than one per 60 minutes, but in practice the rate is much lower than this with well maintained tapes and equipment.

What is FireWire?

Just like SVHS and Hi 8 introduced their own type of cable—S-Video—to better maintain the quality offered by their new system, so too DV has introduced its own type of cable for the same reason—FireWire. Much is to be gained from keeping video captured on digital equipment in the digital domain, and FireWire, or IEEE 1394^[2], can do that and much more. The document describing how DV streams are transported over FireWire is IEC 61833.

FireWire is an extremely fast serial bus which supports local area networking of a small number of nodes (around 16.) Its first incarnation, standardized in 2000 as IEEE 1394a, can support bitrates up to 400 Mbs^-1, which means that it can transfer 50 MB of data every second. This is the equivalent of 13 concurrent DV streams with change to spare. IEEE 1394b, which has yet to be widely adopted, can support 1-4 Gbs^-1, which is good enough for eight simultaneous full-bandwidth uncompressed SMPTE D1 video streams!

Most camcorders and decks, and DVD and DVHS recording devices, have a DV input/output in the form of a 4-pin FireWire port. This allows lossless DV dubbing between consumer equipment, although it should be mentioned that DVD is not capable of the full DV bitrate so there will be some loss in the transfer to this medium. This will be offset because DVD uses full MPEG2 compression, although first generation decks may struggle to to take advantage of the better compression methods available if forced to encode in real time.

Another option afforded by FireWire which offers immense advantages over other forms of editing is Non-Linear Editing (NLE.) By transferring rushes or dailies across FireWire into a computer system for storage, and by then employing software tools in the assembly of footage into the finished product, very complex video compositions can be produced in next to no time. The computer storage space needed for a number of hours of DV is easily within the reach of consumers, which will mean that professional-quality video production capacity will be in the hands of more people than ever before.

How is locked audio different to unlocked audio?

In locked audio, the video field clock is locked to the audio sampling clock, so that there are a defined number of audio samples for each video field. Because this number of samples may be fractional, it may be several fields before an audio sample and video field are recorded in synchrony. The minimum time (measured in fields) between two coincidences likes this is defined to be the length of the audio frame.

	Audio frame length
Colour system	48 kHz samples	32 kHz samples
PAL	1 field	1 field
NTSC	10 fields	30 fields

The DV specification also allows unlocked audio so that consumer DV equipment does not have to be held to exacting professional standards. In unlocked audio, the number of audio samples accompanying a field is allowed to vary by 12 or 13, and the total drift from the video field clock can be as many audio samples as are taken in 1.3 fields. However, in the long run, there should be no consistent drift.

In practice, all DV equipment uses unlocked audio, and all DVCam and DVCPro equipment uses locked audio. This will not normally be a problem, particularly if a DV device is fed locked audio from FireWire, as in this case locked audio is recorded. NLEs sometimes use unlocked audio and sometimes locked audio, and sometimes offer a choice. If 44.1 kHz is chosen as an output sample rate, the audio is necessarily unlocked as there is no feasible audio frame length to use for the locking.

You may need to worry about drift in unlocked audio when using some NLEs to edit long clips. When the audio data is separated from the video data, all bets are off. However, so long as the DV stream remains intact, the rules will be followed, and everything will be fine.

Beyond DV

On the consumer front, JVC's DVHS is beyond DV in terms of both data rate and capacity. It is not likely to exceed even DV's capacity for editing, however. It is more like a digital bit bucket and less like a video format. DVHS is also very bulky—but then so is Digital-S (see below,) as it uses the same cassette size.

Because professionals like to push the envelope and have equipment that consumers and prosumers could never dream of having, there are a number of existing standard definition (SDTV) professional formats beyond DV, DVCam and DVCPro (D7.) The most common of these are Matsushita's DVCPro 50 and JVC's Digital-S (SMPTE D9.) These formats are strikingly similar, and can be best described as having two DV codecs working in parallel, and it is often described as virtually lossless. The main difference is that DVCPro 50 sticks with the 6.3 mm tape and Digital-S uses JVC's favourite tape width of 12.6 mm. DVCPro 50 tape (which is exactly the same formulation as DVCPro tape in most instances) runs past the heads twice as fast to match Digital-S's doubled width. It is also worth noting that Digital-S uses physically different cassettes to DVHS even though it resembles them on a superficial examination—it is much more robust, and the tape transport uses sapphire guide roller flanges and cleaning blades.

Just beyond SDTV is 720 × 580 P, effectively an offer of PAL resolution in a progressive scan mode. Some DVCPro cameras and decks support this as an option, but DVD is really the only delivery format which can take advantage of the additional resolution. Otherwise, you are looking at a HDTV upconversion, and in that case you might as well be using HDTV.

What formats for HDTV? Would you believe DVCPro HD and D9 HD? These double the bitrate over DV again (we're at four times that now) and offer 1280 × 1080 I with colour subsampled at 4:2:2. Sony offers HDCam, which raises the bitrate by another 30 Mbs^-1 and produces 1440 × 1080 I with colour subsampled at 4:1:1. In all these formats 580 P at 25 Hz, 580 P at 50 Hz, and 720 P at 25 Hz (the other DVB-T HDTV resolutions) are also supported, provided the equipment in use is flexible enough. HDCam has very little in common with the DV family of formats.

Notes about Audiotek MiniDV cassettes and numbering schemes

The catalogue of Audiotek MiniDV cassettes is the ATMO catalogue. The catalogue consists almost exclusively of original material produced by Audiotek Productions. Numbers are allocated consecutively starting at 001 (and ending at 899) as each cassette's contents are finalized. Cassettes may be freely deleted from the catalogue, and the number reallocated (or not) as required.

Metainformation captured in the ATMO catalogue

As is the case with all videotape catalogues (including ATKV and ATKW,) the catalogue lists the programmes, which may be further divided into subprogrammes, which appear on each cassette. Programme is a synonym for series when applied to dailies, rushes and television shows, and movies when applied to cinematic entertainment. Subprogrammes are defined as episodes within a series, titled sequences within dailies and rushes, acts or segments from TV programmes and chapters from movies. There is no requirement that a programme consist of subprogrammes—that is, a programme may be self-contained. When translated to videodisc medium, a programme is usually equated with a title, and a subprogramme is usually equated with a chapter, but there is scope for subprogrammes to be converted to entire titles, as chapters have less relevance in some catalogues (ATKU, for example.)

For each programme/subprogramme pair, the following metainformation is captured:

a second-accurate timecode
the designation Vision for continuous video, or Photo for still frames
a programme name, and a date and time or other metainformation in parentheses (a performer name may be included for musical programmes, prefixing the title of the programme and followed by a colon)
a subprogramme name, which may be in inverted commas when it is a title for a sequence of scenes in dailies or rushes—otherwise it is either a recognised title or a clear description of the subprogramme^[3],[4]:
1. An episode title, with an episode number in box brackets, prefixed by a hash. The episode number is relative to the start of the first series named by the programme name, unless prefixed by a number followed by a hyphen—this number then identifies which season of the series the episode belongs to.
2. An official episode title, with an official production code in box brackets not prefixed by a hash.
3. A song title and performer name, separated by a hyphen and a space (to be set as a colon or an em-dash,) even in the case of a parody. Details of the performance needed to distinguish it from others ("live" is common to distinguish guest performances on variety programmes or in concerts from video clips) may appear in parentheses after the song title. The performer name may be omitted if included in the programme name.
4. A description or name of the guest or segment within a news, current affairs, interview, variety entertainment or documentary programme.
the video colour system and shape (separated by a slash, with the shape measured in scanlines) and recording speed used (and, optionally, the equipment model number in parentheses)
the audio sampling method, and sample rate and sample width (separated by a slash)
a series of ratings for perceived degradation (see Notes on the SVHS video format)
a series of ratings for perceived gradation (see Notes on the SVHS video format)
a series of generational counts for objectively measuring degradation (see Notes on the DVD Video format)
the nominal horizontal resolution of the subprogramme
the aspect ratio of the programme (the width if the height is one unit, rounded to one of 1.33, 1.78 or 2.35)
the footprint of the programme (the area of the active frame if the total available frame is one unit)
the colour saturation of the programme (the colour saturation if full saturation is one unit)
the hue of the programme, being the name of the colour that the white point tends towards

For each videotape, the following metainformation is captured:

the index in the catalogue
the length of the media if recorded at SP speed
the title of the production
the production company
the tape brand, media type code and formulation type (either ME or MP)
the predominant audio sampling method, and sample rate and sample width (separated by a slash)

Special range in ATMO catalogue for project master tapes

The range 900-999 in the ATMO catalogue is reserved for production project master tapes which are used to deliver programmes to external agents or for internal backup purposes. Typically these tapes are used to produce DVDs for external disemination. Numbers in this range are permanently allocated on a programme by programme basis, and there is no requirement that any cassette catalogued actually physically exist in the Audiotek library.

^[1] 3CCD (Charged Coupled Device) cameras have a sampling array for each primary colour.

^[2] IEEE stands for the Institute of Electrical and Electronic Engineers, an international body which publishes electrical and electronic standards.

^[3] Metainformation which otherwise identifies the segment should appear in parentheses. The word "fragment" may appear when a segment is incomplete due to recording beginning in the middle of the subprogramme or ending prematurely. The word "edit" is used to connote that edits have been made to make the subprogramme shorter than it otherwise may have been.

^[4] Subprogramme names may be prefixed by a number followed by a period to indicate a track or chapter number.

Exit:

Copyright in the material—literary, programmatic, graphic and otherwise—comprising this XHTML document and embedded external elements is claimed by the author, and its publication on this web site does not waive that copyright. The material may not be copied in any form (including printed and electronic forms) excepting the copying actions occuring during the normal course of a HTTP transaction. Anything other than temporary storage in a cache is expressly prohibited.

Author and editor: Kade "Archer" Hansson; e-mail: archer@kaserver5.org

Last updated: Friday 31st October 2003