Please wait a few moments for the site to load. If the site appears to be frozen, it's because the GSV is still loading and processing a bunch of data.


Gazetteer of Southern Vowels

This site was created to allow you to interact with data extracted from the Digital Archive of Southern Speech. Please report bugs, suggestions, or comments to Joey Stanley at joeystan@uga.edu.

Where do the data come from?

The Digital Archive of the Southern Speech (DASS) is an audio corpus of semi-spontaneous linguistic atlas interviews (Kretzschmar et al. 2013) derived from the Linguistic Atlas of the Gulf States (Pederson et al. 1986). It contains speech from 64 natives (34 men and 30 women, born 1886–1965) of 8 Southern US states. This sample contains a mixture of ethnicities, social classes, education levels, and ages.

As of October 2019, transcription, forced alignment, and acoustic analysis of DASS has been completed. For insight into the methods, see Renwick et al. (2017) and Olsen et al. (2017). We use the Montreal Forced Aligner for forced alignment and FAVE for formant extraction. We have removed all filters from FAVE so that all vowel tokens, whether they be from unstressed syllables or stopwords, are included here. Currently, this site displays 1,673,205 vowel tokens from 74 speakers.

You may download the audio, transcriptions, TextGrids, speaker bios, and other information at DASS portion of the Linguistic Atlas Project website ( lap.uga.edu ).

The corpus can be licensed from the Linguistic Data Consortium, while the Linguistic Atlas Project hosts it via mp3s, speaker biographies, and more.

What does this site do?

Currently, the site has four main pages:

  • Vowel Plot Comparison: On this page, you can subset the DASS data by many demographic attributes and view the corresponding speakers' vowel tokens plotted in F1, F2 space. You can also subset by stress, vowel, word, and following consonant and choose what normalization technique (if any), filtering, and transcription system should be used. The plots are extremely customizable and you can change how the data is displayed. Two graphs are included on this page to—given a large enough screen size—facilitate side-by-side comparison of subgroups. Below each graph are tables that give basic summaries of the speakers and the vowels selected.
  • Point Pattern Analysis: This is an alternative way of viewing the vowel space, pioneered by Kretzschmar. On this page, you can again subset the data the same as on the other two pages and see a scatterplot in F1, F2 space. The underlaid grid indicates how many observations lie in each cell, with the number of rows and columns in the grid controllable by the user. Below the plot is a chart of the distribution of the grids, plotted in decreasing order of density. The resulting chart follows an Asymptotic Curve (or simply, "A-Curve").
  • Speaker Info: The speaker info page allows you to explore the metadata and distribution of speakers in DASS. The map has some flexibility as to how various demographic categories are displayed.

New content is being added regularly, so check back for additional features. See the bottom of this page for updates on recent changes.

How is this site powered?

This site is built in Shiny, a web application framework for R. With Shiny, users can utilize the computational power of the R programming language without having to learn R or install it to their computers. This is all bundled up and put on the web to allow for the interactive capabilities of web browsers. See the bottom of this page for a list of specific packages that are used to process this data and create this site.

How is this project funded?

This research is supported by: NSF BCS #1625680 to co-PIs Kretzschmar and Renwick, the University of Georgia Graduate School, and the American Dialect Society.

Who is involved?

The PIs for this project are Bill Kretzschmar, and Margaret E. L. Renwick, of the University of Georgia. Our team has several graduate student researchers including Mike Olsen, Rachel Olsen, Katie Kuiper, Lisa Lipani, Yuanming (Jeremy) Shi, and Joey Stanley with assistance from Josh McNeill and Keiko Bridwell. We also had several dozen undergraduate student workers, funded by ADS or NSF, who did most of the transcribing work.

Contact information

For more information, please contact Joey Stanley at joeystan@uga.edu.

How can I cite this resource?

If you use or refer to this website, you must cite the Gazetteer of Southern Vowels as follows:

  • Stanley, Joseph A., Margaret E. L. Renwick, William A. Kretzschmar Jr., Rachel M. Olsen, & Michael Olsen. (2018). “The Gazetteer of Southern Vowels.” The American Dialect Society Annual Meeting. Salt Lake City, UT.

If you use or refer to the DASS transcriptions themselves (not a part of this website), you must cite them as follows:

  • William A. Kretzschmar Jr., Margaret E. L. Renwick, Lisa M. Lipani, Michael L. Olsen, Rachel M. Olsen, Yuanming Shi, and Joseph A. Stanley. (2019). “Transcriptions of the Digital Archive of Southern Speech.” Linguistic Atlas Project, University of Georgia. http://www.lap.uga.edu/Projects/DASS2019/

Bibliography

The following is an ongoing list of research that is directly related to DASS or utilizes its data.

Publications (alphabetical)

  • Dudley, Leah M. (2019). "Analyzing dialectal differences in relation to geography in relation to geography in the American South." Curo Thesis, UGA.
  • Kretzschmar, William A., Paulina Bounds, Jacqueline Hettel, Lee Pederson, Ilkka Juuso, Lisa Lena Opas-Hänninen, and Tapio Seppänen (2013). "The Digital Archive of Southern Speech (DASS)." Southern Journal of Linguistics, 27 (2). 17–38.
  • Olsen, Rachel M., Michael L. Olsen, & Margaret E. L. Renwick (2018). "The impact of sub-region on /aɪ/ weakening in the U.S. South." Proceedings of Meetings on Acoustics 31, 060005; doi: https://doi.org/10.1121/2.0000879.
  • Olsen, Rachel M., Michael Olsen, Joseph A. Stanley, Margaret E. L. Renwick, & William A. Kreztschmar, Jr. (2017). "Methods for transcription and forced alignment of a legacy speech corpus." Proceedings of Meetings on Acoustics 30, 060001; doi: http://dx.doi.org/10.1121/2.0000559.
  • Pederson, L., McDaniel, S. L., and Adams, C. M. (Eds.) (1986). Linguistic Atlas of the Gulf States, University of Georgia Press, Athens, Georgia, Vols. 1–7.
  • Renwick, Margaret E. L. & Rachel M. Olsen (2017). "Analyzing dialect variation in historical speech corpora." The Journal of the Acoustical Society of America 142, 406; doi: https://doi.org/10.1121/1.4991009.
  • Renwick, Margaret E. L. & Joseph A. Stanley (2017). “Static and dynamic approaches to vowel shifting in the Digital Archive of Southern Speech.” Proceedings of Meetings on Acoustics 30, 060003; doi: http://dx.doi.org/10.1121/2.0000582.
  • Renwick, Margaret E. L. & Joseph A. Stanley (2020). “Modeling dynamic trajectories of tense vs. lax vowels in the American South.” Journal of the Acoustical Society of America 147, 1, (579–595); doi: http://dx.doi.org/10.1121/10.0000549.

Conference Presentations (chronological)

  • Renwick, Margaret E. L. & Joseph A. Stanley (2020). "100 years of speech in Georgia." Workshop on Language, Technology, and Society series. Georgia Institute of Technology, Atlanta, GA (delivered remotely).
  • Jones, Jonathan & Margaret E. L. Renwick (2020). "Mapping Southern spoken dialect features with Geographic Information Systems." Poster presentation at the Poster presentation at the 179th Meeting of the Acoustical Society of America (ASA). Chicago, IL.
  • Stanley, Joseph A. & Margaret E. L. Renwick (2020). "Back vowel distinctions and dynamics in Southern US English." 94th Annual Meeting of the Linguistic Society of America. New Orleans, LA.
  • Kretzschmar, William A., Margaret E. L. Renwick, Joseph A. Stanley, Katie Kuiper, Lisa Lipani, Michael Olsen, & Rachel Olsen (2020). "The View of Southern Vowels from Large-Scale Data." The American Dialect Society Annual Meeting. New York City, NY.
  • Olsen, Rachel Miller (2020). "Social identity is a pitch: Expressing who you are through prosody." The American Dialect Society Annual Meeting. New York City, NY.
  • Jones, Jonathan & Margaret E. L. Renwick (2020). "Heterogeneity in Southern speech: Evidence from the Mississippi Delta." The American Dialect Society Annual Meeting. New York City, NY.
  • Bigott, Bailey & Margaret E. L. Renwick (2019). "Diving into DASS: A multimedia exploration of Southern Speech." 6th Annual Linguistics Conference at UGA. Athens, GA.
  • Stanley, Joseph A. (2019). "Real Time Vowel Shifts in Georgia English." 6th Annual Linguistics Conference at UGA. Athens, GA.
  • Jones, Jonathan & Margaret E. L. Renwick (2019). "Detecting Southern vowel features with GIS mapping." 6th Annual Linguistics Conference at UGA. Athens, GA.
  • Kretzschmar Jr., William A. and Joseph A. Stanley (2019). "Visualization of Big Data phonetics" Digital Humanities Conference 2019. Utrecht, the Netherlands.
  • Lipani, Lisa, Yuanming Shi, Joshua McNeill, Margaret E. L. Renwick (2019). "Noise reduction in a legacy speech corpus." Poster presentation at the 177th Meeting of the Acoustical Society of America (ASA). Louisville, KY.
  • Stanley, Joseph A. & Margaret E. L. Renwick (2019). "Social factors in Southern US speech: Acoustic analysis of a large-scale legacy corpus." 93rd Annual Meeting of the Linguistic Society of America. New York City, NY.
  • Olsen, Rachel, Joseph A. Stanley, Mike Olsen, Lisa Lipani, & Margaret E. L. Renwick. (2019). "Reconciling perception with production in Southern speech" The American Dialect Society Annual Meeting. New York City, NY.
  • Stanley, Joseph A. & Margaret E. L. Renwick (2018). "Finding pockets of social variation in the Digital Archive of Southern Speech." 5th Annual Linguistics Conference at UGA. Athens, GA.
  • Olsen, Rachel M. & Margaret E. L. Renwick (2018). "The Impact of Social Factors on Vowel Duration in Natural Southern Speech." 85th Meeting of the SouthEastern Conference on Linguistics. Blacksburg, VA.
  • Stanley, Joseph A., Margaret E. L. Renwick, William A. Kretzschmar Jr., Rachel M. Olsen, & Michael Olsen (2018). “The Gazetteer of Southern Vowels.” The American Dialect Society Annual Meeting. Salt Lake City, UT.
  • Foster, Shawn, Joseph A. Stanley, & Margaret E. L. Renwick (2017). "Vowel Mergers in the American South." Poster presentation at the 174th Meeting of the Acoustical Society of America (ASA). New Orleans, LA.
  • Olsen, Rachel, Michael Olsen, & Margaret E. L. Renwick. (2017). "Acoustically quantifying /ai/ monophthongization in four southern dialect regions." Poster presentation at the 174th Meeting of the Acoustical Society of America (ASA). New Orleans, LA. Charleston, SC.
  • Olsen, Rachel M. & Michael Olsen (2017). "Lexical frequency effects on the southern shift in the Digital Archive of Southern Speech." New Ways of Analyzing Variation (NWAV) 46. Madison, WI.
  • Olsen, Rachel M. & Margaret E. L. Renwick (2017). "Linking acoustic correlates of rhoticity to pereption: How the past informs the present." New Ways of Analyzing Variation (NWAV) 46. Madison, WI.
  • Renwick, Margaret E. L., Michael Olsen, Rachel M. Olsen, & Joseph A. Stanley (2017). "Transcription and forced alignment of the Digital Archive of Southern Speech." Poster presentation at the 173rd Meeting of the Acoustical Society of America (ASA). Boston, MA.
  • Renwick, Margaret E. L. & Joseph A. Stanley (2017). “A historical perspective on vowel shifting: Acoustic analysis of the Digital Archive of Southern Speech.” Poster presentation at the 173rd Meeting of the Acoustical Society of America (ASA). Boston, MA.
  • Kretzschmar, William A., Joseph A. Stanley, & Katherine Kuiper (2017). "Automated Large-Scale Phonetic Analysis: DASS." 84th Meeting of the SouthEastern Conference on Linguistics. Charleston, SC.
  • Olsen, Rachel M., Michael Olsen, Joseph A. Stanley, & Margaret E. L. Renwick (2017). "Transcribing the Digital Archive of Southern Speech: Methods and Preliminary Analysis." 84th Meeting of the SouthEastern Conference on Linguistics. Charleston, SC.
  • Olsen, Rachel M., Michael Olsen, Katherine Kuiper, Joseph A. Stanley, Margaret E. L. Renwick, & William A. Kretzschmar, Jr. (2017). “New Perspectives on Historical Southern Speech.” Panel presented at the 2017 Integrative Research and Ideas Symposium. Athens, GA.

R Packages

  • Eric Bailey (2015). shinyBS: Twitter Bootstrap Components for Shiny. R package version 0.61.
  • Original S code by Richard A. Becker, Allan R. Wilks. R version by Ray Brownrigg. Enhancements by Thomas P Minka and Alex Deckmyn. (2016). maps: Draw Geographical Maps. R package version 3.1.1.
  • Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie and Jonathan McPherson (2017). shiny: Web Application Framework for R. R package version 1.0.e. See also shiny.rstudio.com
  • Hadley Wickham (2009). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. (R package version 2.2.1 is used here.) See also ggplot2.tidyverse.org
  • Hadley Wickham (2016). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.2.0. See also stringr.tidyverse.org
  • Hadley Wickham, Romain Francois, Lionel Henry and Kirill Müller (2017). dplyr: A Grammar of Data Manipulation. R package version 0.7.4. See also dplyr.tidyverse.org
  • Hadley Wickham, Jim Hester and Romain Francois (2017). readr: Read Rectangular Text Data. R package version 1.1.1. See also readr.tidyverse.org
  • Yihui Xie (2016). DT: A Wrapper of the JavaScript Library 'DataTables'. R package version 0.2.
  • Achim Zeileis (2014). ineq: Measuring Inequality, Concentration, and Poverty. R package version 0.2-13.

Change Log

Version 1.7.1 (January 25, 2021)

  • Removed Speaker 850 and added 856C as a replacement since 850's data was no good.

Version 1.7 (January 25, 2021)

  • The completed DASS dataset, including speaker 850 which was previously excluded. Previously, you could toggle between DARLA-processed data and MFA + FAVE processed data, but we've decided to only use the latter.
  • In addition, a new subset of LAGS is now available. It's a 10-speaker sample from southeastern Gerogia processed by the UGA team in 2020.
  • Added a few additional citations: Dudley's 2019 CURO thesis, Jones & Renwick; 2020 ASA poster, Renwick & Stanley's 2020 invited talk.

Version 1.6.3 (August 5, 2020)

  • Max F1 in the grid moved from 1100 to 1150 to make cell size in Hz tidier.

Version 1.6.2 (July 23, 2020)

  • Added a note at the top saying the page is loading.
  • Fixed a bug with plot axes not flipping.

Version 1.6.1 (February 14, 2020)

  • Due to a clerical error, we thought we had the complete dataset. We were wrong and only about 80% of the data was in the GSV. Now the data should be complete. With the FAVE corpus, it went from about 1.3 million to 1.6 million tokens. An even larger corpus for you to work with!

Version 1.6 (November 26, 2019)

  • DASS has now been fully processed, so the GSV now contains the full dataset. Some minor textual changes have been made to this page to reflect the completed work, including a link to the DASS portion of the Linguistic Atlas website where you can download audio and transcriptions.
  • In fact, this site now contains two versions of the data, FAVE and DARLA. You can read more about them, and toggle between them using the new "corpus" tab.
  • Added several new references: LCGAU6, LSA2020, ASA, ADS.
  • Minor textual changes and bug fixes.

Version 1.5 (April 25, 2019)

  • The underlying dataset has been completely redone! We've completely revamped the methodology so produce what we hope to be a cleaner dataset. The transcriptions have been spell-checked, we're using the Montreal Forced Aligner for forced alginment, and an in-house version of FAVE for formant extraction. This brings the total number of observations from about 988,000 to over 1,642,211. We will soon have a complete version of this modified dataset online.
  • On the Point Pattern Analysis page, you have much more control over the shading of the cells. They are now now discrete by default to make it easier to see the differences (but you can change it to continuous or to no color). The number of discrete shades is 4 by default (the data appears to follow a 75-25 rule), but you can change that. And you can now set the color of the darkest shade; when you do, the other shades will be interpolated.
  • On the Point Pattern Analysis page, you can now change the size and opacity of the grid labels.
  • Added several new references: an ASA presentation and a DH2019 presentation.
  • Minor text changes in the help popovers.

Version 1.4.2 (October 3, 2018)

  • Added several new references: A JASA paper, an LSA presentation, and an ADS presentation.

Version 1.4.1 (September 12, 2018)

  • It is now easier to change the x- and y-axis ranges. Fill-in-the-blank boxes have been converted into sliders with ranges.

Version 1.4 (September 11, 2018)

  • It is now possible to download the images! A new "Download" tab has been created that allows you to specify the height, width, quality, and format.
  • The "Plot Option" tab was split into "Plot" and "Customization." The Customization tab was rearranged slightly to in preparation for additional options.
  • Added Stanley & Renwick's (2019) LSA presentation to the bibliography.

Version 1.3 (August 28, 2018)

  • We've added measurements from over 200,000 vowels to the corpus, bringing the total to 988,217. All speakers are now represented in the corpus with the exception of speaker 850 who had pretty awful data for some reason.
  • Updated the "Joey's filter" procedure to the latest development.
  • A third filtering option, the Mahalanobis distance, is now available.

Version 1.2 (June 5, 2018)

  • Updated the PPA page so that the bottom corner of the grid (A1) is in a reasonable place instead of being determined by the max(F1) and max(F2) measurements.
  • In the "Plot Options" tab, the "Fit all data" button has been changed to "Fit Vowel Space". Instead of zooming WAAAY out to fit all the bad data, it zooms to a comfortable vowel space that is consistent across speakers.
  • This standard vowel space is the default rather than the axes adjusting to accomodate all the selected data. This will make comparing different subsets easier, and when looking at a single vowel it'll provide some context as to what portion of the vowel space it occupies.
  • Added this change log ;)

Following consonant

The GSV comes with two datasets for you to view.
The first is DASS, the Digital Archive of Southern Speech. This is the main corpus that the GSV was designed for. Please see the "About" page for more information on how the data was processed.
In 2020, the team at UGA processed an additional 10 speakers from LAGS that come from southeastern Georgia. Demographically, they are balanced similar to how DASS is. There are five women and five men. Two are Black and the rest are "Non-Black". They were born between 1894 and 1954 and have a range of education levels, social classes, classifications, and Kurath types.

Speaker Summary

Vowel Summary