Gazetteer of Southern Vowels

This site was created to allow you to interact with data extracted from the Digital Archive of Southern Speech. Please report bugs, suggestions, or comments to Joey Stanley at joeystan@uga.edu.

Where do the data come from?

The Digital Archive of the Southern Speech (DASS) is an audio corpus of semi-spontaneous linguistic atlas interviews (Kretzschmar et al. 2013) derived from the Linguistic Atlas of the Gulf States (Pederson et al. 1986). It contains speech from 64 natives (34 men and 30 women, born 1886–1965) of 8 Southern US states. This sample contains a mixture of ethnicities, social classes, education levels, and ages.

As of October 2019, transcription, forced alignment, and acoustic analysis of DASS has been completed. For insight into the methods, see Renwick et al. (2017) and Olsen et al. (2017). We use the Montreal Forced Aligner for forced alignment and FAVE for formant extraction. We have removed all filters from FAVE so that all vowel tokens, whether they be from unstressed syllables or stopwords, are included here. Currently, this site displays 1,374,992 vowel tokens from 63 speakers.

You may download the audio, transcriptions, TextGrids, speaker bios, and other information at DASS portion of the Linguistic Atlas Project website ( lap.uga.edu ).

The corpus can be licensed from the Linguistic Data Consortium, while the Linguistic Atlas Project hosts it via mp3s, speaker biographies, and more.

What does this site do?

Currently, the site has four main pages:

  • Vowel Plot Comparison: On this page, you can subset the DASS data by many demographic attributes and view the corresponding speakers' vowel tokens plotted in F1, F2 space. You can also subset by stress, vowel, word, and following consonant and choose what normalization technique (if any), filtering, and transcription system should be used. The plots are extremely customizable and you can change how the data is displayed. Two graphs are included on this page to—given a large enough screen size—facilitate side-by-side comparison of subgroups. Below each graph are tables that give basic summaries of the speakers and the vowels selected.
  • Interactive Vowel Plot: Here you can focus on specific portions of individual speakers' vowel space and see words rather than just points. If you click on the plot itself, a table at the top will display the five points nearest to where you clicked, showing you exact formant measurements, the word, and the speaker associated with that observation.
  • Point Pattern Analysis: This is an alternative way of viewing the vowel space, pioneered by Kretzschmar. On this page, you can again subset the data the same as on the other two pages and see a scatterplot in F1, F2 space. The underlaid grid indicates how many observations lie in each cell, with the number of rows and columns in the grid controllable by the user. Below the plot is a chart of the distribution of the grids, plotted in decreasing order of density. The resulting chart follows an Asymptotic Curve (or simply, "A-Curve").
  • Speaker Info: The speaker info page allows you to explore the metadata and distribution of speakers in DASS. The map has some flexibility as to how various demographic categories are displayed.

New content is being added regularly, so check back for additional features. See the bottom of this page for updates on recent changes.

How is this site powered?

This site is built in Shiny, a web application framework for R. With Shiny, users can utilize the computational power of the R programming language without having to learn R or install it to their computers. This is all bundled up and put on the web to allow for the interactive capabilities of web browsers. See the bottom of this page for a list of specific packages that are used to process this data and create this site.

How is this project funded?

This research is supported by: NSF BCS #1625680 to co-PIs Kretzschmar and Renwick, the University of Georgia Graduate School, and the American Dialect Society.

Who is involved?

The PIs for this project are William Kretzschmar and Margaret E. L. Renwick, of the University of Georgia. Our team has several graduate student researchers including Mike Olsen, Rachel Olsen, Lisa Lipani, Jeremy Shi, and Joey Stanley with assistance from Josh McNeill and Keiko Bridwell . We also had several dozen undergraduate student workers, funded by ADS or NSF, who did most of the transcribing work.

Contact information

For more information, please contact Joey Stanley at joeystan@uga.edu.

How can I cite this resource?

If you use or refer to this website, you must cite the Gazetteer of Southern Vowels as follows:

  • Stanley, Joseph A., Margaret E. L. Renwick, William A. Kretzschmar Jr., Rachel M. Olsen, & Michael Olsen. (2018). “The Gazetteer of Southern Vowels.” The American Dialect Society Annual Meeting. Salt Lake City, UT.

Bibliography

The following is an ongoing list of research that is directly related to DASS or utilizes its data.

Publications (alphabetical)

  • Kretzschmar, William A., Paulina Bounds, Jacqueline Hettel, Lee Pederson, Ilkka Juuso, Lisa Lena Opas-Hänninen, and Tapio Seppänen (2013). "The Digital Archive of Southern Speech (DASS)." Southern Journal of Linguistics, 27 (2). 17–38.
  • Olsen, Rachel M., Michael L. Olsen, & Margaret E. L. Renwick (2018). "The impact of sub-region on /aɪ/ weakening in the U.S. South." Proceedings of Meetings on Acoustics 31, 060005; doi: https://doi.org/10.1121/2.0000879.
  • Olsen, Rachel M., Michael Olsen, Joseph A. Stanley, Margaret E. L. Renwick, & William A. Kreztschmar, Jr. (2017). "Methods for transcription and forced alignment of a legacy speech corpus." Proceedings of Meetings on Acoustics 30, 060001; doi: http://dx.doi.org/10.1121/2.0000559.
  • Pederson, L., McDaniel, S. L., and Adams, C. M. (Eds.) (1986). Linguistic Atlas of the Gulf States, University of Georgia Press, Athens, Georgia, Vols. 1–7.
  • Renwick, Margaret E. L. & Rachel M. Olsen (2017). "Analyzing dialect variation in historical speech corpora." The Journal of the Acoustical Society of America 142, 406; doi: https://doi.org/10.1121/1.4991009.
  • Renwick, Margaret E. L. & Joseph A. Stanley (2017). “Static and dynamic approaches to vowel shifting in the Digital Archive of Southern Speech.” Proceedings of Meetings on Acoustics 30, 060003; doi: http://dx.doi.org/10.1121/2.0000582.

Conference Presentations (chronological)

  • Stanley, Joseph A. & Margaret E. L. Renwick (2020). "Back vowel distinctions and dynamics in Southern US English." 94th Annual Meeting of the Linguistic Society of America. New Orleans, LA.
  • Kretzschmar, William A., Margaret E. L. Renwick, Joseph A. Stanley, Katie Kuiper, Lisa Lipani, Michael Olsen, & Rachel Olsen (2020). "The View of Southern Vowels from Large-Scale Data." The American Dialect Society Annual Meeting. New York City, NY.
  • Olsen, Rachel Miller (2020). "Social identity is a pitch: Expressing who you are through prosody." The American Dialect Society Annual Meeting. New York City, NY.
  • Jones, Jonathan & Margaret E. L. Renwick (2020). "Heterogeneity in Southern speech: Evidence from the Mississippi Delta." The American Dialect Society Annual Meeting. New York City, NY.
  • Bigott, Bailey & Margaret E. L. Renwick (2019). "Diving into DASS: A multimedia exploration of Southern Speech." 6th Annual Linguistics Conference at UGA. Athens, GA.
  • Stanley, Joseph A. (2019). "Real Time Vowel Shifts in Georgia English." 6th Annual Linguistics Conference at UGA. Athens, GA.
  • Jones, Jonathan & Margaret E. L. Renwick (2019). "Detecting Southern vowel features with GIS mapping." 6th Annual Linguistics Conference at UGA. Athens, GA.
  • Kretzschmar Jr., William A. and Joseph A. Stanley (2019). "Visualization of Big Data phonetics" Digital Humanities Conference 2019. Utrecht, the Netherlands.
  • Lipani, Lisa, Yuanming Shi, Joshua McNeill, Margaret E. L. Renwick (2019). "Noise reduction in a legacy speech corpus." Poster presentation at the 177th Meeting of the Acoustical Society of America (ASA). Louisville, KY.
  • Stanley, Joseph A. & Margaret E. L. Renwick (2019). "Social factors in Southern US speech: Acoustic analysis of a large-scale legacy corpus." 93rd Annual Meeting of the Linguistic Society of America. New York City, NY.
  • Olsen, Rachel, Joseph A. Stanley, Mike Olsen, Lisa Lipani, & Margaret E. L. Renwick. (2019). "Reconciling perception with production in Southern speech" The American Dialect Society Annual Meeting. New York City, NY.
  • Stanley, Joseph A. & Margaret E. L. Renwick (2018). "Finding pockets of social variation in the Digital Archive of Southern Speech." 5th Annual Linguistics Conference at UGA. Athens, GA.
  • Olsen, Rachel M. & Margaret E. L. Renwick (2018). "The Impact of Social Factors on Vowel Duration in Natural Southern Speech." 85th Meeting of the SouthEastern Conference on Linguistics. Blacksburg, VA.
  • Stanley, Joseph A., Margaret E. L. Renwick, William A. Kretzschmar Jr., Rachel M. Olsen, & Michael Olsen (2018). “The Gazetteer of Southern Vowels.” The American Dialect Society Annual Meeting. Salt Lake City, UT.
  • Foster, Shawn, Joseph A. Stanley, & Margaret E. L. Renwick (2017). "Vowel Mergers in the American South." Poster presentation at the 174th Meeting of the Acoustical Society of America (ASA). New Orleans, LA.
  • Olsen, Rachel, Michael Olsen, & Margaret E. L. Renwick. (2017). "Acoustically quantifying /ai/ monophthongization in four southern dialect regions." Poster presentation at the 174th Meeting of the Acoustical Society of America (ASA). New Orleans, LA. Charleston, SC.
  • Olsen, Rachel M. & Michael Olsen (2017). "Lexical frequency effects on the southern shift in the Digital Archive of Southern Speech." New Ways of Analyzing Variation (NWAV) 46. Madison, WI.
  • Olsen, Rachel M. & Margaret E. L. Renwick (2017). "Linking acoustic correlates of rhoticity to pereption: How the past informs the present." New Ways of Analyzing Variation (NWAV) 46. Madison, WI.
  • Renwick, Margaret E. L., Michael Olsen, Rachel M. Olsen, & Joseph A. Stanley (2017). "Transcription and forced alignment of the Digital Archive of Southern Speech." Poster presentation at the 173rd Meeting of the Acoustical Society of America (ASA). Boston, MA.
  • Renwick, Margaret E. L. & Joseph A. Stanley (2017). “A historical perspective on vowel shifting: Acoustic analysis of the Digital Archive of Southern Speech.” Poster presentation at the 173rd Meeting of the Acoustical Society of America (ASA). Boston, MA.
  • Kretzschmar, William A., Joseph A. Stanley, & Katherine Kuiper (2017). "Automated Large-Scale Phonetic Analysis: DASS." 84th Meeting of the SouthEastern Conference on Linguistics. Charleston, SC.
  • Olsen, Rachel M., Michael Olsen, Joseph A. Stanley, & Margaret E. L. Renwick (2017). "Transcribing the Digital Archive of Southern Speech: Methods and Preliminary Analysis." 84th Meeting of the SouthEastern Conference on Linguistics. Charleston, SC.
  • Olsen, Rachel M., Michael Olsen, Katherine Kuiper, Joseph A. Stanley, Margaret E. L. Renwick, & William A. Kretzschmar, Jr. (2017). “New Perspectives on Historical Southern Speech.” Panel presented at the 2017 Integrative Research and Ideas Symposium. Athens, GA.

R Packages

  • Eric Bailey (2015). shinyBS: Twitter Bootstrap Components for Shiny. R package version 0.61.
  • Original S code by Richard A. Becker, Allan R. Wilks. R version by Ray Brownrigg. Enhancements by Thomas P Minka and Alex Deckmyn. (2016). maps: Draw Geographical Maps. R package version 3.1.1.
  • Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie and Jonathan McPherson (2017). shiny: Web Application Framework for R. R package version 1.0.e. See also shiny.rstudio.com
  • Hadley Wickham (2009). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. (R package version 2.2.1 is used here.) See also ggplot2.tidyverse.org
  • Hadley Wickham (2016). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.2.0. See also stringr.tidyverse.org
  • Hadley Wickham, Romain Francois, Lionel Henry and Kirill Müller (2017). dplyr: A Grammar of Data Manipulation. R package version 0.7.4. See also dplyr.tidyverse.org
  • Hadley Wickham, Jim Hester and Romain Francois (2017). readr: Read Rectangular Text Data. R package version 1.1.1. See also readr.tidyverse.org
  • Yihui Xie (2016). DT: A Wrapper of the JavaScript Library 'DataTables'. R package version 0.2.
  • Achim Zeileis (2014). ineq: Measuring Inequality, Concentration, and Poverty. R package version 0.2-13.

Change Log

Version 1.6 (November 26, 2019)

  • DASS has now been fully processed, so the GSV now contains the full dataset. Some minor textual changes have been made to this page to reflect the completed work, including a link to the DASS portion of the Linguistic Atlas website where you can download audio and transcriptions.
  • In fact, this site now contains two versions of the data, FAVE and DARLA. You can read more about them, and toggle between them using the new "corpus" tab.
  • Added several new references: LCGAU6, LSA2020, ASA, ADS.
  • Minor textual changes and bug fixes.

Version 1.5 (April 25, 2019)

  • The underlying dataset has been completely redone! We've completely revamped the methodology so produce what we hope to be a cleaner dataset. The transcriptions have been spell-checked, we're using the Montreal Forced Aligner for forced alginment, and an in-house version of FAVE for formant extraction. This brings the total number of observations from about 988,000 to over 1,642,211. We will soon have a complete version of this modified dataset online.
  • On the Point Pattern Analysis page, you have much more control over the shading of the cells. They are now now discrete by default to make it easier to see the differences (but you can change it to continuous or to no color). The number of discrete shades is 4 by default (the data appears to follow a 75-25 rule), but you can change that. And you can now set the color of the darkest shade; when you do, the other shades will be interpolated.
  • On the Point Pattern Analysis page, you can now change the size and opacity of the grid labels.
  • Added several new references: an ASA presentation and a DH2019 presentation.
  • Minor text changes in the help popovers.

Version 1.4.2 (October 3, 2018)

  • Added several new references: A JASA paper, an LSA presentation, and an ADS presentation.

Version 1.4.1 (September 12, 2018)

  • It is now easier to change the x- and y-axis ranges. Fill-in-the-blank boxes have been converted into sliders with ranges.

Version 1.4 (September 11, 2018)

  • It is now possible to download the images! A new "Download" tab has been created that allows you to specify the height, width, quality, and format.
  • The "Plot Option" tab was split into "Plot" and "Customization." The Customization tab was rearranged slightly to in preparation for additional options.
  • Added Stanley & Renwick's (2019) LSA presentation to the bibliography.

Version 1.3 (August 28, 2018)

  • We've added measurements from over 200,000 vowels to the corpus, bringing the total to 988,217. All speakers are now represented in the corpus with the exception of speaker 850 who had pretty awful data for some reason.
  • Updated the "Joey's filter" procedure to the latest development.
  • A third filtering option, the Mahalanobis distance, is now available.

Version 1.2 (June 5, 2018)

  • Updated the PPA page so that the bottom corner of the grid (A1) is in a reasonable place instead of being determined by the max(F1) and max(F2) measurements.
  • In the "Plot Options" tab, the "Fit all data" button has been changed to "Fit Vowel Space". Instead of zooming WAAAY out to fit all the bad data, it zooms to a comfortable vowel space that is consistent across speakers.
  • This standard vowel space is the default rather than the axes adjusting to accomodate all the selected data. This will make comparing different subsets easier, and when looking at a single vowel it'll provide some context as to what portion of the vowel space it occupies.
  • Added this change log ;)

Following consonant

We have processed the DASS two different ways. Both methods used identical audio files as input, which were transcribed manually and double- and triple-checked for accuracy.
First, we processed the data using the DARLA pipeline. At the time, DARLA was using ProsodyLab for forced-alignment, and then doing formant extraction with FAVE-Extract. Note that DARLA automatically filters out some data, including (most) stopwords, tokens with a short duration, and those with high bandwidth. Because of this filtering, the DARLA corpus is substantially smaller (726,102 tokens) and the plots will often appear more open because there are fewer points plotted.
The other option is to use an in-house pipeline. The audio and transcriptions were force-aligned using the Montreal Forced Aligner and then processed with FAVE-Extract. We did not do any filtering whatsoever, so there is data from every single vowel token (1,374,992 tokens).

Speaker Summary

Vowel Summary