Chapter 1: An Embodied Multimodal Approach to Visualizing Captions and Subtitles

Throughout this book, I visualize closed and open captions and subtitles as instruments of connection that embody how we all communicate with each other through multiple modes and languages, including bodies, voices, and signs. To provide a foundation and a framework for visualizing captions and subtitles as the embodiment of accessible multimodal communication, this chapter first connects the key themes of embodiment, space, access, multimodality and caption studies, and interdependence, and then transitions to details of an embodied multimodal approach to analyzing the design of captions and subtitles. That segues into a journey through different captioned and subtitled screens. These examples demonstrate the value of captions and subtitles in our collaborative work to connect and communicate across different modes, languages, and identities.

SPEECHLESS: EMBODYING THE THEME

To illustrate the affordances of captions and subtitles as the embodiment of accessible multimodal communication, this chapter’s discussion is intertwined with my own embodied responses to a series of videos that integrate captions in meaningful ways. The Fall 2016 television season included Speechless, a mainstream show about a character, JJ, with cerebral palsy who uses a wheelchair, played by Micah Fowler, who has cerebral palsy in real life—a rarity in popular media⁠¹. In the days leading up to the show’s premiere on ABC, the show’s Facebook page released a series of short promotional videos designed for mobile viewing. These videos were mostly comprised of short scenes from the show with open captions in large white and blue fonts placed on screen in synchronization with the characters’ speech, although the most rhetorically effective one designed genuine interaction with words on screen.

The prominent and visible use of open captions and textual synchronization with speech in these promotional videos embody how the character JJ communicates through eye contact with letters and words. JJ wears a laser pointer that is attached to his glasses and aims his pointer at words and letters on his communication board to form sentences that the person next to him then reads and vocalizes. By embedding open captions on screen, the videos recreate how JJ—and those who track the sentence he forms on his communication board—verbalizes thoughts through eye contact with written text. Each spoken word appears on screen one by one in synchronization with the character’s speech. They are purposefully oversized and attract sighted viewers’ eyes. Most of the videos with open captions use a mix of white and blue fonts reflecting the show’s thematic colors, shown in Figures 1.1 and 1.2. The images show an open captioned scene that features a dialogue between JJ’s mother and Kenneth, a school employee at JJ’s new school who later starts working for the family.

Kenneth is speaking to JJ's mother. Capitalized open captions state, "They even had a big meeting." The word “BIG” is larger than the other words. The first four words are white while the last two words are blue.

Figure 1.1: Open captions in blue and white that embody the colors of Speechless

JJ's mother responds to Kenneth with the capitalized open captions saying, "What other school?" The first and last words are white while “other” is blue.

Figure 1.2: Open captions in blue and white that embody the colors of Speechless

In this promotional video, the blue and white words appear on screen in successive lines. This aesthetic color choice could facilitate reading speed because the contrast between blue and white makes each line stand out against—and complement—the ones above and below it. This color scheme embodies the thematic colors of the show, blue and white, and accentuates the connections that viewers make between this scene and the entire show.

When I watched the pilot episode a few days after the promotional videos were released, this same scene was shown in its original format with closed captions at bottom of the screen instead of open captions. I did not feel the same connection to the broadcast version because the quick back-and-forth conversation between these two characters did not register as effectively without the open captions. While it might not have been practical to watch this entire 30-minute episode with rapid-fire open captions, the implementation of open captions in social media contexts succeeded in connecting with this viewer.

The proliferation of open captions on social media and the related affordances of open captions are reviewed in further depth in Chapter 7. Before we reach that point, we can start with the fundamentals of our exploration and the importance of attending to the embodiment of human beings, including performers on screen and those of us in the audience. My embodiment as a Deaf viewer may have accentuated the appeal of aesthetic captions in Speechless’ promotional videos for me, and my experiences as a rhetoric and composition scholar may have made me commend how the captions embody the main character’s communication practices. And these coalesce to strengthen my recognition of the potential for captions and subtitles in rhetorically and aesthetically conveying a message and connecting with audiences. Captions and subtitles can embody the communication practices of speakers, signers, and those who use communication boards.

EMBODIMENT AND EMBODIED RHETORICS

A core message in this book is that captions and subtitles can be the embodiment of multimodal communication and that we can learn from Deaf-embodied experiences, or embodied rhetorics, to strengthen captioning and subtitling practices. To continue and build on the previous chapter’s introduction to embodiment, the term embodiment includes how we experience the world differently through our body and how our body influences the ways that we interpret the world (Knoblauch & Moeller, 2022; Melonçon, 2013; Wilson & Lewiecki-Wilson, 2001). We interpret and navigate the world because of the bodies that we have and our “particular senses and experiences” (Wysocki, 2012, p. 3). People with different bodies communicate differently and may choose to communicate differently because of their bodies—as I may because of my embodiment as a Deaf rhetoric and composition scholar. This difference in embodiments is crucial for recognizing how different captioning and subtitling approaches can make multimodal communication accessible.

Scholarship in rhetoric and composition and related fields strengthen our understanding of embodiment and embodied rhetorics, including Knoblauch and Moeller (2022), who note the complexity of these terms and how they are defined by scholars. As I discuss in the introduction, this is not a limitation, but rather a benefit because the complexity of the term embodiment reflects the complexity of being a human being. And the intricacy of being a human being is reflected in the variety of captioning and subtitling approaches, including those at the bottom of the screen or those around the screen. As discussed throughout this book, captions and subtitles of different types can embody the different ways that we each communicate with each other—and even at different moments in the same conversation, such as when someone switches from spoken English to ASL or from English to Spanish and vice versa.

The design of captions and subtitles can embody an individual’s communication practices, emotions, and conversations with others on screen. This reflects how embodiment is “a result of connection and interaction,” and includes “the experience of being a being with a body” and “the experience of orienting one’s body in space among others” (Knoblauch and Moeller, 2022, p. 8). Throughout this book’s study of captions and subtitles, we should be conscious of our own embodiments and the embodiments of those on screen. After all, as Johnson et al. (2015) write in a key concept statement on embodiment, embodied rhetorics, and embodying feminist rhetorics in scholarship, “our bodies inform our ways of knowing,” and embodiment “conveys an awareness or consciousness about how bodies—our own and others’—figure in our work” (p. 39). They add that researchers “can make all bodies and the power dynamics invested in their (in)visibility visible” (p. 39). Through this book’s study of captioning and subtitling approaches, we all should strive to be even more attuned to our own and different individuals’ bodies and identities.

To make embodiments visible and value differences, this book shows how captions and subtitles can be designed in ways that honor a creator’s or performer’s identity and make them visible. Traditional lines at the bottom of the screen can transcribe Spanish words instead of stating “[Speaking non-English],” which renders the original language invisible for those reading the captions. In other contexts, creative subtitles can embody the four-dimensional language of ASL, including the dimension of time, and make the dimensions visible beyond linear text. Various captioning and subtitling approaches affirm the value of different embodiments and different individuals’ embodied rhetorics, or how we communicate and actively make meaning.

I use rhetoric and rhetorics in this book to refer to how each individual might communicate through various, different, and multiple modes in interaction with audiences to make meaning and accomplish a particular purpose, including to connect with audiences. My definition of embodied rhetorics aligns with Knoblauch and Moeller’s (2022) definition: “multilayered, encompassing linguistic and textual markers of the body, the body itself as rhetoric, discussions of visual or textual representations of the body, and bodily communicative practices” (p. 10). Later in this book, I use the term embodied rhetorics to highlight how captions and subtitles can make it possible for viewers to sense and access the embodied rhetorics of performers or creators of a video. To return to an earlier example, ASL music videos with dynamic visual text are multilayered as visual lyrics move in interaction with musicians’ bodies and facial expressions and the multisensory sonic rhythm. The pulsating words on screen provide access to the embodied rhetorics of the musicians, including how they communicate the song through ASL.

Embodied rhetorics can become salient when considering how ASL is an embodied language created through the movement and interpretation of coordinated gestures and facial expressions. Brueggemann (2009) argues that we can learn about language and rhetoric through the study of ASL “as a nonprint, nonwritten, visual, and embodied language” (p. 34) and articulates the value of deafness through ASL and English (2013). Sanchez (2015) writes that “spoken and written language can be separated from the body, whereas there can be no disembodiment of ASL” (p. 25). Deaf rhetorics are, then, embodied rhetorics in which signers essentially make meaning through our bodies. And several of the examples throughout later chapters of this book, especially Chapter 3, show how subtitles can provide viewers with access to signers’ embodied rhetorics.

By foregrounding Deaf rhetorics in this study of captions and subtitles, I intend to argue for the benefits of captions and subtitles for our multimodal and multilingual world along with the importance of accessibility for DHH individuals. This process reflects Knoblauch’s (2012) suggestion that attending to our embodied knowledge “can highlight difference instead of erasing it in favor of an assumed privileged discourse” (p. 62) and Wilson and Lewiecki-Wilson’s (2001) “embodied rhetoric of difference” (p. 18). I further endorse Kerschbaum’s (2014) rhetoric of difference, which calls on us to “acknowledge [our] responsibilities to others in communication” and to be open to different possibilities for interacting with others (p. 118) as well as the ensuing attention to disability in “everyday movements in the world” (2022, p. 22). Different possibilities include open and closed captions and subtitles.

When I explore embodied rhetorics on screen, I build on Jay Dolmage’s (2014) presentation of the “central role of the body in rhetoric— as the engine for all communication” (p. 3)—and more crucially, on how “all rhetoric is embodied” (p. 89) since, among other points, “rhetoric, as a tool and an art and a way to move, mediates and is mediated by the body” (p. 89). Disability studies scholarship informs our appreciation of embodied rhetoric and communication by reminding us of how our dis/abilities shape how we know, communicate, and move in different ways and how rhetoric is not about “flawless delivery” of ideas but “as the embodied struggle for meaning” (Dolmage, 2014, p. 235). Throughout this book’s review of various examples of captions and subtitles, we must remember that rhetoric is not simply about accurately conveying information in one direction to a viewer—rather, we can explore how creators, performers, and audiences interact with and through captions and subtitles to connect.

Embodiment is a process of experiencing the world amongst fellow human beings, and as a Deaf individual, I am always aware that we all live in a predominantly hearing world where sound is a major source of communication. I recognize that my embodiment as a Deaf woman influences my research approaches as I make “sense with, of, and through other embodied people and our social worlds,” as Ellingson (2017) underscores in concluding Embodiment in Qualitative Research (p. 196). Captioned and subtitled access to sound echoes with scholars in rhetoric and composition and sound studies who approach sound as an embodied experience that can be sensed through multiple senses and through the body in our spaces (for example, Buckner & Daley, 2018; Fargo Ahern, 2013; Hocks & Comstock, 2017; Selfe, 2009; and Shipka, 2006) Notably, Ceraso’s (2018) approach to embodied listening intrinsically values the multisensory experience of engaging and composing with sound. Through creating a space in our conversations and compositions for open and closed captions and subtitles, we can visualize and verbalize access to sound and signs in textual form.

Throughout the rest of this book, different embodiments and embodied rhetorics (or lived experiences as well as communication practices) will be honored as we explore examples of effective strategies and new possibilities for captioning and subtitling accessible multimodal communication.

Speechless: Embodying Multimodal Communication

Let’s turn to another promotional video for Speechless. While this video has added captions to pre-recorded scenes, it embodies the overall experience of Speechless as a show that challenges conventional portrayals of communication as well as the online context of this captioned video, which was intended for online dissemination.

One of the videos, “This father-son heart to heart…,” shows JJ’s father and younger brother holding a spoken conversation in the car (Speechless, 2016a). The open captions, along with emoticons and icons, appear on screen throughout their conversation, including one mistyped word, “dissappointed” [sic]. The inclusion of a mistyped word in this video’s open captions, which cannot be edited as easily as captions that can be turned on and off because open captions are embedded into the video file, should serve as a reminder to composers to always verify spelling during the video editing process.

This same video also includes occasional emoticons and icons, as when the father talks about their house, and an icon of a house appears below the text, or when icons appear to symbolize “right” and “lazy,” as shown in the following images (Figures 1.3 and 1.4).

JJ's father speaks as white capitalized open captions appear next to him that read, "We've got the old house for a couple of weeks. . .". A white icon of a house appears below the text.

Figure 1.3: Open captions and symbols, including an emoji, that illustrate meaning multimodally in a video about Speechless

JJ's younger brother is talking. White capitalized open captions appear below his chin on both sides of the frame. The captions on the left say, "I get to be right" with a red checkmark above the text; the right-side captions show a snoring emoticon with, “You get to be lazy" in white.

Figure 1.4: Open captions and symbols, including an emoji, that illustrate meaning multimodally in a video about Speechless

The icons supplement the words and may not be fully necessary, but they do embody the textual and multimodal communication that occurs with JJ and through this show, showcasing the creators’ awareness of how to appeal to social media and online viewers. One commonality is true in these videos: the captions appear in the space below or between bodies to avoid covering arguably the most important visual element in these scenes: the faces of the performers.

SPACE

If we are to design a space, literally, for captions and subtitles on screens and design spaces for captions and subtitles in our real-world conversations, then we should consider the concept of space itself. I have argued previously for the value of designing a space for captions and subtitles so that these words become integral components of our screens (Butler, 2018b). My interpretation of space is informed by my identity as a Deaf multimodal composition scholar who processes and moves through the world of communication in predominantly visual ways, including through ASL. I and members of Deaf culture value eye contact, and our “push [of the boundaries of vision] springs from the innate human need to communicate” (Bahan, 2007, p. 99)⁠².

I also apply the architectural principles of Deaf Space, which designs a space for Deaf values and embodied communication (Hansel Bauman, 2015), to show how spaces can be created for integral subtitles to recreate embodied communication. Deaf Space is the intentional design of architectural space for embodied and visual-spatial communication. When spaces are designed for visual and embodied experiences, such as creating open walkways allowing individuals to see each other from different floors, individuals can extend their connective space.

When composers design a space for captions and subtitles on our screens, they are designing Deaf Space “as viewed through the lens of visual ways of seeing the world and the enhancing of one’s place in space” (Leigh et al., 2014, p. 358). As hearing architect Hansel Bauman (2015) explains in a video, Deaf Space creates “a greater connection” between people and buildings, or “body and design,” and Deaf people have for many years known innately “how to alter the environment so that it fits their way and their embodiment.” Deaf Space as “embodied design” (Bauman, 2015) can likewise inform the design of integral captions and subtitles that bring together written, signed, and spoken words on screen—and this Deaf experience can augment the relatively limited discussion of captions and subtitles in our fields.

In what turns out to be a demonstration of how captions and subtitles can intensify connections, the embodied experiences of Deaf sound artist Christine Sun Kim are illustrated through an online video entitled Exploring the Sound of Silence with Christine Sun Kim (Uproxx, 2016). This video introduces audiences to Kim, whose performances and art engage with sound, particularly the vibrations of sound as felt through the body. Each time Kim signs to the camera, open subtitles are embedded on screen in the space to embody how she experiences sound.

The Uproxx (2016) video opens with a clip of Kim performing and the subtitles running across the middle of the screen reading, “What is sound?” with a wavelength image running horizontally across the screen to visualize the performance of sound. Viewers then watch Kim engage with her art and performances as voice-over and large captions on screen translate into English what she says. The sans serif subtitles are significantly large in a conspicuous way that draws attention to them, and they are alternatively white or black depending on the composition of the background to accentuate readability. A few words or lines appear on screen at once and each successive line appears to complete the sentence. To provide visual access to her emphasis, some phrases or words appear larger than the words around them, and the other words are already relatively large for captions, as in the following screen captures (Figures 1.5 and 1.6). The saliency of the large open subtitles on screen—which often occupy half of the screen itself—foregrounds the visual and spatial experience of sound. The design scheme in which subtitles seem to blend on and blend off screen reflects the appearance and disappearance of sounds and signs.

Christine Sun Kim is seated as she signs. Lowercase open captions appear on the right side: “because when I think about what is sound?” The last three words are larger than the others for emphasis.

Figure 1.5: Open subtitles that emphasize the meaning of sound and silence

Kim continues, with the captions reading, “I then start to think about what is silence?” The last three words are larger than the others for emphasis.

Figure 1.6: Open subtitles that emphasize the meaning of sound and silence

This video is illuminative because only Kim’s statements are incorporated into the space of the screen, while the captions for sound descriptions, such as “(train noise)” and “(muffled train station noise)” and hearing speakers, are kept at the bottom of the screen. This juxtaposition amplifies the multimodal and visual nature of Kim’s signs and her own embodied interactions with the world of sound and silence—which is key since this video intentionally centers on her experiences. At the same time, the incorporation of highly salient words on screen in different scenes shows the potential for designing a space for captions in videos with spoken languages.

This video’s incorporation of open subtitles for mainstream audiences underscores that we can intentionally draw from Deaf experiences to design a more accessible world of multimodal communication. We can expand spaces in our conversations about captions and subtitles and access, including through traditional open and closed captions and subtitles as several examples later in this book reveal, and intensify the value of our captioned and subtitled connections.

SPEECHLESS: INTEGRAL CAPTIONS

The first two major elements of my approach, embodiment and space, are key in the analysis of how captions and subtitles can support accessible communication practices, especially when examining captions that are integral to the meaning of a video. Embodiment and space come to the forefront in what I find to be Speechless’ most interactive and appealing use of open and integral captions. This particular scene from Speechless, “Meet the dynamic duo…,” shows Cedric Yarbrough next to Micah Fowler. Fowler plays JJ in the show and is shown with his communication board as Yarbrough, who plays Kenneth, stands next to him. Kenneth has been hired to vocalize JJ’s words in the show.

Yarbrough greets viewers with large-sized “HEY!” text that appears next to him on screen. He then speaks directly to viewers explaining who they are and how they communicate. Open captions appear and disappear from the space around him in alternative blue and white font as he speaks, “In the show, I play Kenneth (JJ’s voice). JJ spells out what he wants to say on his board and I speak it for him. We’ll show you…” (see Figures 1.7 and 1.8).

Cedric stands next to Micah, a white young male with short hair and glasses, says with open captions next to him, "In the show I play." The first line is in white and the second line is in blue.

Figure 1.7: Large captions in blue and white that show the audience how the characters communicate in a video about Speechless

The same scene shows captions, “"In the show I play Kenneth (JJ's voice)." The first and third lines are in white while the second and fourth lines are in blue text.

Figure 1.8: Large captions in blue and white that show the audience how the characters communicate in a video about Speechless

Fowler then directs his laser pointer to letters on his communication board and Yarbrough reads the words out loud as they appear on screen.

This promotional video goes beyond simply placing captions on screen; the actor begins to actually interact with the captions, as the following images show (Figures 1.9, 1.10, and 1.11). After Yarbrough introduces the name of their show, Speechless, with the first syllable in white and the second syllable in blue, he uses his right arm to “push” the word off screen.

Cedric, standing next to Micah, is speaking. The word "Speechless” appears at the bottom, with the first part of the word in white text and the second part in blue.

Figure 1.9: Interacting with the title of the show, Speechless, by pushing it offscreen

Cedric, standing next to Micah, is speaking. Cedric's arm is pushing the word “Speechless” off screen with only part of the word shown.

Figure 1.10: Interacting with the title of the show, Speechless, by pushing it offscreen

Figure 1.11: Interacting with the title of the show, Speechless, by pushing it offscreen

These physical and textual interactions—two of which are shown in the following images—are effective in making viewers experience the construction of meaning through multiple modes: speech, bodies, and visual text. After Yarbrough pushes the word Speechless off screen, his arm returns to his side and he points to the viewer. In Figure 1.12, Yarbrough’s gesture guides viewers’ eyes to the next word that appears on screen next to him.

The second screenshot (Figure 1.13) is taken from later in the promotional video. When the sentence Fowler is spelling out starts to become suggestive, Yarbrough uses his arms to physically brush the words on screen away and to say to a laughing Fowler, “Keep this clean!” The natural, informative, and lighthearted feel of the scene organically welcomes viewers into their communication dynamics.

Cedric is standing next to Micah, speaking as he points to the area next to him where the word "In" appears.

Figure 1.12: Interacting with captions by waving them offscreen

Cedric is standing next to Micah, waving away the word “come”

Figure 1.13: Interacting with captions by waving them offscreen

While the other two Speechless promotional videos embedded open captions into prerecorded scenes from the pilot episode, this particular promotional video was intentionally designed to interact with the words on screen. The fact that Yarbrough physically brushes the captions away clearly indicates that the action was premeditated. This promotional video also reorients the camera to a portrait orientation so that there is space for both the seated Fowler and the standing Yarbrough with captions appearing above Fowler’s head (and thus next to Yarbrough’s face) or below Yarbrough’s head (and thus next to Fowler’s face).

Embodied meaning is constructed through the synchronization of eye contact and words, as just the character JJ verbalizes meaning through the movement of his eyes in contact with his communication board. The captions in this singular video are integral: space has been designed for the captions during the production and editing process, the captions provide visual access to their bodies and facial expressions, and they embody the multimodal nature of communication in this show. Yarbrough and Fowler immerse viewers their interdependent process of supporting each other to communicate to their online audience.

I shared the Speechless clip of Yarbrough and Fowler with deaf and hard of hearing (DHH) participants as a small component of a larger research study devoted to DHH individuals’ perspectives on captions (Butler, 2020). Participants had different responses to the rate at which the stylized captions in this Speechless clip appeared; a factor seemed to be their hearing level. Some who used their hearing approved of and enjoyed the pacing of captions, some profoundly Deaf participants felt that the words appeared too rapidly on screen, especially when popping up on opposite sides of the screen. At the same time, some recognized that the design choices reflected the social media context as well as the potential for the world to become accustomed to captions on screen. With the rise of vertical short-form videos and the increase in visual communication in social media videos, DHH and hearing individuals may be becoming more attuned to dynamic captions, as explored further in a later chapter. As captioning practices evolve, we should continue to recognize the importance of balancing aesthetic/alternative approaches (or stylized appearance, as commonly found in social media videos) with accessible approaches (Butler, 2020).

JUXTAPOSITION: SPACE AND EMBODIMENT IN A VIDEO

The fundamental themes of space and embodiment coalesce in the Speechless video, further reinforcing how captions and subtitles can be seen as the embodiment of accessible multimodal communication in our videos. Below is a link to a video; as you view it, consider the affordances, or benefits, of designing a space for captions and subtitles so that they embody multimodal and multilingual messages in accessible ways.

In this short video, I capitalize on the affordances of subtitled videos to reinforce the value of space and embodiment. At one moment in the video, I begin to sign, “Now, let’s think about space.” After I sign, “…about,” I look next to me and place my hands under the subtitles to emphasize that line. Then, as one hand remains under the subtitles for emphasis, my other hand signs the word “space” and I move that hand around the line. By using both hands to reinforce the concept of space in more ways than one, I explicitly show and celebrate my embodiment and embodied rhetorics. Creators can likewise consider different strategies for foregrounding multimodal and embodied communication with subtitles on screen.

By integrating subtitles into the space around me, and signing around the subtitles, I am advocating for the value of my embodiment as a Deaf multimodal composition scholar in ways that might not be possible with more traditional subtitles found at the bottom of the screen. Video creators can now carefully consider the most suitable approach for captioning and subtitling your own videos in different contexts. While we might not always integrate captions or subtitles into the center of the screen, we can all consider how we can use captions and subtitles to embody the ways in which we experience the world and interact with each other through multiple modes and languages.

The key themes of space and embodiment connect with the other themes in my embodied multimodal approach, including the theme of access.

ACCESS

I recognize the limitations of endorsing captions as access since captions are a visual mode of communication. I consciously focus on visual access—which is not full access—when valuing captions and subtitles in improving the accessibility of multimodal communication. This is enhanced by Kleege (1999, 2005, 2016a, 2016b), who vividly makes readers sense her embodied experiences as a legally blind individual and understand that transmission of messages through reading, writing, and communication can occur and be accessed in multiple and different ways with the goal of connecting to another individual outside the self. Kleege⁠³ (2005) also informs my use of visual terms, particularly when she writes about how our “language” has been “designed by and for the sighted” (p. 180) and that Helen Keller argued that “to deny her the use of seeing-hearing vocabulary would be to deny her the ability to communicate at all” (p. 185). With full awareness of the sighted language that I use, I ask readers to work with me and attend to how we exclude certain bodies from our compositions and to how we can design access into our compositions for different bodies.

My exploration of captions and subtitles demonstrates the benefits and the limitations of different styles and approaches in full alignment with the work of disability studies scholars who have articulated that access is always a work-in-progress and a process, not a checklist to satisfy and complete, particularly Jay Dolmage (2008, 2009, 2017). A collaborative web article shows the value of designing inclusive multimodal compositions while attending directly to the needs of users and creators with disabilities (Yergeau et al., 2013). As Yergeau states regarding participatory design of access, we must include different bodies in “the design of social and virtual spaces” and design is “an act of embodiment and reclamation” (Yergeau et al., 2013, n.p.). Such scholarly works remind us of a main purpose of accessibility and multimodality: making rhetorical and aesthetic decisions that enhance a composition while always ethically attending to viewers/readers with specific embodiments.

This book can further contribute to a culture of access in rhetoric and composition and related spaces (Brewer et al., 2014; Hubrig et al., 2020; Womack, 2017) as well as in digital media accessibility and participation through access to captioned media (Ellcessor, 2012, 2016, 2018). We can pull captions and subtitles to the center of our online and offline conversations.

In my theoretical framework, I build on Kerschbaum’s call for us to “transform the reactive dimensions of providing access” by communicating meaning as equally as possible across multiple modes (Yergeau et al., 2013, n.p.). Like Kerschbaum, I do not challenge the use of modes to complement each other, “such as the way a musical score can enhance an audience’s feeling or mood alongside visual cues during a well-realized film” (n.p.). Rather, this approach critiques “the way that multimodality almost universally celebrates using multiple modes without considering what happens if a user cannot access one or more of them” (n.p.). As Dolmage (2017) asks in his disability studies-informed interrogation of multimodality, “In what ways will the [multimodal] text move, move through, or move past (which bodies)? Reception needs to be reconsidered in terms of accessibility—this expands the author’s responsibility” (p. 113). Furthermore, “Which bodies can take up texts and move (with) them?” (p. 114).

Throughout this book, I center Deaf individuals, who are arguably more mindful of the value of captions given our reliance on captions to access spoken content and our awareness leading to the captioning/subtitling of our own videos. Hamraie (2016) argues that universal design—through which we design for differences—should not use disability-neutral terms such as “all users” (p. 297) because such discourse suggests that design is stable and does not need to be adapted for individual differences. Instead, we should be informed by critical disability theories that “claim disability, treat disabled users as valuable knowers and experts, understand accessibility as an aesthetic and functional resource, and foreground the political, cultural, and social value of disability embodiments” (Hamraie, 2016, p. 304). In this book, the exploration of captions and subtitles is appropriately informed by Deaf experiences to increase the accessibility of multimodal communication across different spaces while strengthening our understanding of the value of open and closed captions and subtitles for composers and audiences with a range of hearing levels.

My line of research and this book intersect with the translation and accessibility work of filmmaker and academic researcher Pablo Romero-Fresco of Universidade de Vigo in Spain. He has published consistently on media accessibility and subtitles, including an article (Romero-Fresco, 2021) that builds on my exploration of Gallaudet: The Film (Bauman & Commerson, 2010) and Sean Zdenek’s (2018) Kairos article. In two more recent articles, Romero-Fresco and Dangerfield (2022) provide an academic study of creative and alternative media access and creative media accessibility (Romero-Fresco, 2022), along with subtitling practices that “seek to become an artistic contribution in their own right and to enhance user experience in a creative or imaginative way” (Romero-Fresco & Dangerfield, 2022, p. 23; Romero-Fresco, 2022, p. 305). These explorations of practices that show the subjective, creative, transformative, and individualistic nature of subtitles also include Romero-Fresco’s studies of subtitled and captioned media created by Deaf and disabled creators. As he points out, more analysis, study, and training in this area need to occur.

Romero-Fresco’s research and activities include an in-depth exploration of how films are translated and subtitled in different languages. His intensive work includes a commitment to accessible filmmaking, “the consideration of translation and/or accessibility” (2019, p. 5) during the creation of the original media so that filmmakers can make choices in how their films are translated and made accessible across languages and other versions (p. 6). In other words, there can be more of a collaboration between filmmakers and translators that benefit “persons with sensory disabilities and foreign viewers” (p. 17). Choices include consideration of subtitles and creative subtitles that include consideration of font, size, placement, rhythm and display mode, and effects (p. 209-210). By making the filmmaking process more accessible, filmmakers can strengthen not only “access to content,” but also “access to creation” (p. 14).

The aesthetic benefits of integrating subtitles are supported by Fox’s 2017 study of placement strategies in various commercial films and her study (2018) of how minimizing the distance between subtitles and the intended focus point on screen can improve viewers’ attention to visuals and the aesthetic experience. In this book, I guide us through different trends and styles to demonstrate the benefits and limitations of closed and open captions and subtitles in all their forms and placements on screen so that we can expand our commitment to captions and subtitles in our conversations and compositions—and thereby embody the value of accessible multimodal communication.

MULTIMODALITY AND CAPTIONS

Multimodal studies in rhetoric and composition—as revealed in Palmeri’s 2012 history of the field and contemporary scholarship in multimodality and Zdenek’s (2011, 2015, 2018) line of work—reveal the importance of written text in interaction with other modes. I have long perceived captions and subtitles as a bridge across multiple modes: aural, visual, gestural, spatial, and linguistic (Arola et al., 2013). In other words, written text on screen embodies multimodal communication through sound, visuals, gestures, space, and language. Captions and subtitles seem like a natural extension of Selfe’s (2009) argument for aurality as an important part of multimodal composition, especially since when we compose captions and subtitles, we are designing multiple modes—visual, textual, and other modes—to reach our audiences (Kress, 2003, 2005; Kress & van Leeuwen, 2001; Takayoshi & Selfe, 2007).

When attending carefully to the role of captions and subtitles in bridging sound, visuals, body language, and other aspects on screen, we can recognize how the modes interact and contribute to what Halbritter (2012) defines as multidimensional rhetoric: “rhetoric that integrates a variety of modes, media, and genres—sound, images, language, music, etc.” (p. 26). In aural-visual compositions, as Halbritter makes clear, the layers of aural media and visual media overlap and interact in certain ways as they participate in the overall rhetorical mission of the filmmaker’s composition (p. 104). For instance, what is the significance of a change in the aural layer when the visual layer remains the same, such as when the music changes abruptly but the characters remain still? If the scene is uncaptioned, viewers who rely on captions might not be able to answer that question.

Any study of captions must credit Zdenek for bringing a rhetorical approach to captioning studies. His 2015 book, Reading Sounds, thoroughly investigates the rhetorical choices that professional captioners of films and television shows make in transcribing sound in written form when creating closed captions, such as how different captioners might describe the same sound as [dramatic music], [music playing], [♪], or other effects in square brackets for DHH audiences (2015). Professional captioners and amateur video creators, such as students in Lueck’s (2013) composition course, do “make significant rhetorical decisions about how voice, identity, language(s), and meaning are represented on the screen” (n.p.). It is certainly important to consider the choices professional captioners make in describing music, ambient noise, and other non-speech elements of our compositions to make meaning accessible to viewers. However, Zdenek reminds us that television shows and films are captioned by captioners who are independent from, and often never in touch with, the directors/producers of these media, an argument that Udo and Fels (2010) make when pointing out that captioning is usually not part of the creative process of television and film. The examples explored in this book as embodied, multimodal moments are more often than not examples of captions and subtitles that are relatively more central to the creative process or composition than the examples of closed captions created by professional captioners that Zdenek studies.

In contrast to Zdenek’s 2016 extensive study of nonspeech sounds, usually found within brackets in closed captions, this book is a curation of open and closed captions and subtitles that embody accessible communication—such as moments in which creators, performers, and audiences connect through words on screen. In a challenge of the conventions of captions, Zdenek’s continuing work on captions bring us his published experiments using Adobe After Effects to create enhanced and animated captions for popular clips from movies and television shows, and his call on us to “make room for captions” and “futures [for captioning] that … elevate the needs of viewers who are deaf and hard of hearing” (“Designing Captions,” 2018, n.p.). He asks, “What if we didn't simply argue for the importance of closed captioning but treated it as a significant variable in our multimodal analyses and productions?” (n.p.).

INTERDEPENDENCE

At the heart of communication across differences, including accessible communication and multilingual communication, is the interdependent nature of our connections. In meaningful, multilayered conversations and compositions, each participant (from creator to performers to audiences) in a dialogue contributes to the construction of meaning.

The use of interdependence in this book draws from a 2020 article I collaborated on with Laura Gonzales in which we brought together multimodality, multilingualism, and accessibility in writing studies through interdependent and intersectional approaches. This article extended the recognition of intersectionality and interdependence, and Gonzales’ inspiring accomplishments in making her translation studies and interviews accessible across languages and modes, including multimodal and multilingual videos in Spanish and English with subtitles and captions (Gonzales, 2018).

Building on Julie Jung’s 2014 exploration of interdependency in writing studies, Gonzales and I defined interdependency: “In contrast to independence, interdependency is a product of the human condition in which we all rely on other human beings in various ways through different relationalities” (n.p.). Jung’s work on interdependency was in turn dependent on the work of disability studies scholars who fully recognized the interdependent nature of our lives; Jung pointed to the value of depending on each other and the possibilities of forming new connections (2014).

Captions and subtitles embody the interdependence of these connections, and this book brings together scholarship on interdependency in disability studies (Jung, 2014; Price, 2012; Price & Kerschbaum, 2016; Wheeler, 2018), disability justice (Berne et al., 2018; Hamraie, 2013; Mingus, 2011), digital writing research (VanKooten, 2019), and multilingual and multimodal writing (Gonzales & Butler, 2020). Interdependency and access dovetail in Price and Kerschbaum’s (2016) collaborative description of their “interdependent disability-studies (DS) methodology” (p. 20) and “our commitment to collective access—i.e., access not just for our participants alone, or for us alone, but for all of us together” (p. 28).

Just as Price and Kerschbaum worked with and relied on each other and their participants to ensure access for different bodies and experiences, I ask readers to consider the valuable role of captions and subtitles in supporting the interdependent nature of multilayered communication. Throughout the examples in this book, we will explore how individuals are interdependent in the process of understanding meaning across languages and modes of communication and how captions and subtitles enhance the connections we form with each other.

EMBODIED MULTIMODAL APPROACH

Space, embodiment, access, multimodality and caption studies, and interdependence coordinate to form the foundation for our journey through (closed and open) captioned and subtitled screens. They are core components of the six criteria in my embodied multimodal framework, which serves as a paradigm for identifying and evaluating the accessibility, effectiveness, and applicability of captioning and subtitling styles that embody the interdependent process of accessing and connecting meaning across modes of communication.

The embodied multimodal framework attends directly to captions and subtitles and accessible, embodied, multimodal communication as informed by multimodal analysis approaches that analyze how modes are interrelated and work meaningfully together (Maier et al., 2007; Norris, 2004; Norris & Maier, 2014; Scollon & LeVine, 2004). Norris (2004) explains how the multimodal analysis framework makes apparent when modes are “interlinked and often interdependent” (p. 102); Maier et al. (2007) point out how the approach includes “the connections across communicative modes” (p. 454); and De Saint-George (2004) notably attends to the “spatial emplacement” of language use in action (p. 72). We communicate within and through different spaces with other bodies to construct meaning, and this meaning can be shaped by the spaces themselves.

Furthermore, the embodied multimodal approach recognizes the moments in which captions and subtitles are designed to be integral components of a video as opposed to captions and subtitles that are added to a video while also identifying the effectiveness of traditional captions and subtitles in embodying multimodal communication with audiences. That recognition supports the analysis of the benefits and limitations of different caption and subtitle styles in this work.

Since many of the scenes analyzed in this book foreground multilingual communication, including ASL on screen, the embodied multimodal approach is supplemented by an understanding of Fraiberg’s (2010, 2013) multilingual multimodal approach. Fraigberg’s approach studies how individuals mediate meanings through various modes of communication, including languages, in the same space as they construct meanings “in a multilingual dialogue not only with one another, but with other texts…” (p. 23). The fluidity of language that Fraiberg celebrates is sensed in the spatial arrangement of captions and subtitles on screen, which in turn reflects Fleckenstein’s celebration of the “multidimensionality” of imageword, or how we make meaning through the relations between words and images (2003, p. 21). As a salient example, to borrow from Sanchez (2015), ASL is “dramatically, visible” (p. 28) and can help us understand the “intersections between images, bodies, and text” (p. 31). As discussed throughout the book, creators, performers, and audiences can engage in a multilingual, multi-textual dialogue with ASL, captions and subtitles, and other embodied modes of communication.

The six criteria I developed are based on years of research. The criteria began as five that I developed for the explicit purposes of analyzing and designing integral captions and subtitles, particularly those that creators intentionally design a space for within the screen (Butler, 2018b). In a later collaborative research project (2023), my colleague Stacy Bick, a visual communications senior lecturer, and I interviewed DHH filmmaking students about their experiences composing videos with sound and captions; the analysis revealed the depths to which DHH creators considered the communication preferences of a predominantly hearing audience and the access needs of DHH audiences and themselves. Our findings led to the addition of a sixth criteria on audience awareness, for the analysis and development of accessible multimodal compositions.

These criteria, attending specifically to the visualization of captions and subtitles, are:

1. Space for Captions and Subtitles and Access: Space has been reasonably designed for captions and subtitles.

This may mean that literal space on screen was dedicated to captions and subtitles (as when performers act with the awareness that integral captions and subtitles will be added next to their bodies in post-production).

This may mean that space has been committed to captions and subtitles (as in media that incorporate open subtitles alongside closed captions).

This may mean that space has been opened in dialogues (as in conversations about captions and subtitles or increased attention to the value of captions and subtitles).

2. Visual or Multiple Modes of Access: Captions and subtitles create or support visual access or multiple modes of access to the meaning of the video.

This may mean that captions and subtitles are placed strategically in key locations around screen (as when captions and subtitles are integrated next to faces or bodies).

This may mean that captions and subtitles enable readers to access meaningful sounds or moments occurring on screen.

3. Embodied Rhetorics and Experiences: Captions and subtitles enable audiences to experience meaning through the body in different ways.

This may mean that captions and subtitles enable viewers to sense performers’ embodied rhetorics (as in integral subtitles that move alongside signing bodies).

This may mean that captions and subtitles enable viewers to experience sound through the body.

4. Multimodal and/or Multilingual Communication: Captions and subtitles support the interdependent nature of multimodal and multilingual communication and the interconnection of modes, languages, and meaning.

This may mean that captions and subtitles capture different languages in written form.

This may mean that captions and subtitles represent meaning through different modes, such as emphasis of sounds or signs, or through interaction with other modes.

5. Rhetorical and Aesthetic Principles: Captions and subtitles reasonably enhance the rhetorical and aesthetic qualities of the video.

This may mean that captions and subtitles are clearly incorporated to support the purpose of the video and to communicate the creator’s message.

This may mean that captions and subtitles are clearly and aesthetically presented, including through appearance and readability.

6. Audience Awareness: Captions and subtitles are presented in a way that demonstrate awareness of how different audiences would engage with and access the composition in different ways.

This may mean that captions and subtitles are included to support audiences’ understanding of a different language.

This may mean that captions and subtitles are included to strengthen audiences’ connection.

This may mean that captions and subtitles are presented to show the value of access.

Each criterion structures our journey through the collection of examples in this book as we explore the effectiveness of captioned and subtitled moments to appreciate the central role of captions and subtitles in engaging with differences throughout our multilayered spaces of communication.

A study released in 2016 (the year Speechless premiered) found that 95% of disabled television characters are played by able-bodied actors (Anderson, 2016). ↩

While I value eye contact, I recognize that everyone experiences eye gaze differently and we cannot always expect eye contact. Price (in an individually written section in Price & Kerschbaum, 2016) reflectively interrogates the importance placed on eye gaze and describes “eye contact as exhausting” because “processing information on faces is cognitively demanding” (p. 43). We must recognize the diversity of experiences in our exploration of the design of captions and subtitles on screen. ↩

Kleege (1999) distinguishes between literal vision, or “the ability to receive and process visual stimuli,” and figurative vision, or “the ability to have, manipulate, and communicate ideas” (p. 145). ↩

Chapter 1:An Embodied Multimodal Approach to Visualizing Captions and Subtitles