Pop-on and roll-up are the two styles of presentation commonly used in television captioning. Paint-on captions are sometimes used for special effects but are much less common. Accurate Secretarial LLC endorses the use of the pop-on style for all television captioning. WebVTT supports pop-on as well as paint-on captions.
Pop-on captions, as their name suggests, appear in boxes one or two line long at the bottom of the screen, though they may be placed on the screen in such a way to indicate the current speaker, and then pop off again to be replaced by the next caption box. In the case of audio content that cannot be confined to a single pop-on box, one pop-on box is replaced by another at logical breaks in the text (ends of phrases or sentences). Pop-on captions include edited audio information to aid viewer comprehension of the program. This may include dialog, narration, sound effects, indications of offscreen activity, etc. Pop-on captions are synchronized with the audio content of the program and thus are used typically for prerecorded programs that include multiple speakers.
Roll-up captions are used predominantly in conjunction with live television broadcasts, such as news reports and live sporting events. Up to four lines of captioned text appear in a box at the bottom of the screen. While the box remains stable on the screen, captions appear word by word and scroll to the top of the box, where the top line will disappear and be replaced by the next line. Essentially new lines as they are created will appear at the bottom of the box and push the older text upward.
Due to the fact that roll-up captions are created to reflect audio content generated in real time, there is always some lack of synchronization of the text to the words being spoken onscreen. This can be disconcerting to some viewers, not unlike the effect of watching a program and discovering the audio is on a several-second delay. Viewers who use lip-reading in addition reading captions may also find this problematic.
In addition, the time constraints involved in actively transcribing a program as it is being broadcast invariably result in a loss of accuracy in transcription, including misspellings, misunderstood words, or simply mistaken keystrokes. The time pressure also frequently requires substantial editing by the transcriptionist, for example, in the case of multiple speakers talking at the same time or whenever multiple audio inputs overlap one another.