Documentation for pronouncing

Pronouncing is a simple interface for the CMU Pronouncing Dictionary. The library is designed to be easy to use, and has no external dependencies. For example, here’s all you need to do in order to find rhymes for a given word:

>>> import pronouncing
>>> pronouncing.rhymes("climbing")
['diming', 'liming', 'priming', 'rhyming', 'timing']

Read the documentation here: https://pronouncing.readthedocs.org.

I made this library because I wanted to be able to use the CMU Pronouncing Dictionary in my projects without having to install the grand behemoth that is NLTK. It’s designed to be friendly to beginner programmers who want to get started with creative language generation and analysis, and for experts who want to make quick prototypes of projects that deal with English pronunciation.

Installation

Install with pip like so:

pip install pronouncing

You can also download the source code and install manually:

python setup.py install

Contents

Tutorial and Cookbook

This tutorial will demonstrate how to perform several common tasks with the Pronouncing library and provide a few examples of how the library can be used creatively.

Word pronunciations

Let’s start by using Pronouncing to get the pronunciation for a given word. Here’s the code:

>>> import pronouncing
>>> pronouncing.phones_for_word("permit")
[u'P ER0 M IH1 T', u'P ER1 M IH2 T']

The pronouncing.phones_for_word() function returns a list of all pronunciations for the given word found in the CMU pronouncing dictionary. Pronunciations are given using a special phonetic alphabet known as ARPAbet. Here’s a list of ARPAbet symbols and what English sounds they stand for. Each token in a pronunciation string is called a “phone.” The numbers after the vowels indicate the vowel’s stress. The number 1 indicates primary stress; 2 indicates secondary stress; and 0 indicates unstressed. (Wikipedia has a good overview of how stress works in English, if you’re interested.)

Sometimes, the pronouncing dictionary has more than one pronunciation for the same word. “Permit” is a good example: it can be pronounced either with the stress on the first syllable (“do you have a permit to program here?”) or on the second syllable (“will you permit me to program here?”). For this reason, the pronouncing.phones_for_word() function returns a list of possible pronunciations. (You’ll need to come up with your own criteria for deciding which pronunciation is best for your purposes.)

Here’s how to calculate the most common sounds in a given text:

>>> import pronouncing
>>> from collections import Counter
>>> text = "april is the cruelest month breeding lilacs out of the dead"
>>> count = Counter()
>>> words = text.split()
>>> for word in words:
...   pronunciation_list = pronouncing.phones_for_word(word)
...   if len(pronunciation_list) > 0:
...     count.update(pronunciation_list[0].split(" "))
...
>>> count.most_common(5)
[(u'AH0', 4), (u'L', 4), (u'D', 3), (u'R', 3), (u'DH', 2)]

Counting syllables

To get the number of syllables in a word, first get one of its pronunciations with pronouncing.phones_for_word() and pass the resulting string of phones to the pronouncing.syllable_count() function, like so:

>>> import pronouncing
>>> pronunciation_list = pronouncing.phones_for_word("programming")
>>> pronouncing.syllable_count(pronunciation_list[0])
3

The following example calculates the total number of syllables in a text (assuming that all of the words are found in the pronouncing dictionary):

>>> import pronouncing
>>> text = "april is the cruelest month breeding lilacs out of the dead"
>>> phones = [pronouncing.phones_for_word(p)[0] for p in text.split()]
>>> sum([pronouncing.syllable_count(p) for p in phones])
15

Meter

Pronouncing includes a number of functions to help you isolate metrical characteristics of a text. You can use the pronouncing.stresses() function to get a string that represents the “stress pattern” of a string of phones:

>>> import pronouncing
>>> phones_list = pronouncing.phones_for_word("snappiest")
>>> pronouncing.stresses(phones_list[0])
u'102'

A “stress pattern” is a string that contains only the stress values from a sequence of phones. (The numbers indicate the level of stress: 1 for primary stress, 2 for secondary stress, and 0 for unstressed.)

You can use the pronouncing.search_stresses() function to find words based on their stress patterns. For example, to find words that have two dactyls in them (“dactyl” is a metrical foot consisting of one stressed syllable followed by two unstressed syllables):

>>> import pronouncing
>>> pronouncing.search_stresses("100100")
[u'afroamerican', u'afroamericans', u'interrelationship', u'overcapacity']

You can use regular expression syntax inside of the patterns you give to pronouncing.search_stresses(). For example, to find all words wholly consisting of two anapests (unstressed, unstressed, stressed), with “stressed” meaning either primary stress or secondary stress:

>>> import pronouncing
>>> pronouncing.search_stresses("^00[12]00[12]$")
[u'neopositivist', u'undercapitalize', u'undercapitalized']

The following example rewrites a text, replacing each word with a random word that has the same stress pattern:

>>> import pronouncing
>>> import random
>>> text = 'april is the cruelest month breeding lilacs out of the dead'
>>> for word in text.split():
...   pronunciations = pronouncing.phones_for_word(word)
...   pat = pronouncing.stresses(pronunciations[0])
...   replacement = random.choice(pronouncing.search_stresses("^"+pat+"$"))
...   out.append(replacement)
...
>>> ' '.join(out)
u"joneses kopf whats rathbun p's gavan midpoint nill goh the pont's"

Rhyme

Pronouncing includes a simple function, pronouncing.rhymes(), which returns a list of words that (potentially) rhyme with a given word. You can use it like so:

>>> import pronouncing
>>> pronouncing.rhymes("failings")
[u'mailings', u'railings', u'tailings']

The pronouncing.rhymes() function returns a list of all possible rhymes for the given word—i.e., words that rhyme with any of the given word’s pronunciations. If you only want rhymes for one particular pronunciation, the the pronouncing.rhyming_part() function gives a smaller part of a string of phones that can be used with pronouncing.search() to find rhyming words. The following code demonstrates how to find rhyming words for two different pronunciations of “uses”:

>>> import pronouncing
>>> pronunciations = pronouncing.phones_for_word("uses")
>>> sss = pronouncing.rhyming_part(pronunciations[0])
>>> zzz = pronouncing.rhyming_part(pronunciations[1])
>>> pronouncing.search(sss + "$")[:5]
[u"bruce's", u'juices', u'medusas', u'produces', u"tuscaloosa's"]
>>> pronouncing.search(zzz + "$")[:5]
[u'abuses', u'cabooses', u'disabuses', u'excuses', u'induces']

Use the in operator to check to see if one word rhymes with another:

>>> import pronouncing
>>> "wheeze" in pronouncing.rhymes("cheese")
True
>>> "geese" in pronouncing.rhymes("cheese")
False

The following example rewrites a text, replacing each word with a rhyming word (when a rhyming word is available):

>>> import pronouncing
>>> import random
>>> text = 'april is the cruelest month breeding lilacs out of the dead'
>>> out = list()
>>> for word in text.split():
...   rhymes = pronouncing.rhymes(word)
...   if len(rhymes) > 0:
...     out.append(random.choice(rhymes))
...   else:
...     out.append(word)
...
>>> print ' '.join(out)
april wiles's duh coolest month ceding pontiac's krout what've worthey wehde

Next steps

Hopefully this is just the beginning of your rhyme- and meter-filled journey. Consult Pronouncing API Reference for more information about individual functions in the library.

Pronouncing is just one possible interface for the CMU pronouncing dictionary, and you may find that for your particular purposes, a more specialized approach is necessary. In that case, feel free to peruse Pronouncing’s source code for helpful hints and tidbits.

Pronouncing API Reference

pronouncing.init_cmu(filehandle=None)[source]

Initialize the module’s pronunciation data.

This function is called automatically the first time you attempt to use another function in the library that requires loading the pronunciation data from disk. You can call this function manually to control when and how the pronunciation data is loaded (e.g., you’re using this module in a web application and want to load the data asynchronously).

Parameters:filehandle – a filehandle with CMUdict-formatted data
Returns:None
pronouncing.parse_cmu(cmufh)[source]

Parses an incoming file handle as a CMU pronouncing dictionary file.

(Most end-users of this module won’t need to call this function explicitly, as it’s called internally by the init_cmu() function.)

Parameters:cmufh – a filehandle with CMUdict-formatted data
Returns:a list of 2-tuples pairing a word with its phones (as a string)
pronouncing.phones_for_word(find)[source]

Get the CMUdict phones for a given word.

Because a given word might have more than one pronunciation in the dictionary, this function returns a list of all possible pronunciations.

>>> import pronouncing
>>> pronouncing.phones_for_word("permit")
['P ER0 M IH1 T', 'P ER1 M IH2 T']
Parameters:find – a word to find in CMUdict.
Returns:a list of phone strings that correspond to that word.
pronouncing.rhymes(word)[source]

Get words rhyming with a given word.

This function may return an empty list if no rhyming words are found in the dictionary, or if the word you pass to the function is itself not found in the dictionary.

>>> import pronouncing
>>> pronouncing.rhymes("conditioner")
['commissioner', 'parishioner', 'petitioner', 'practitioner']
Parameters:word – a word
Returns:a list of rhyming words
pronouncing.rhyming_part(phones)[source]

Get the “rhyming part” of a string with CMUdict phones.

“Rhyming part” here means everything from the vowel in the stressed syllable nearest the end of the word up to the end of the word.

>>> import pronouncing
>>> phones = pronouncing.phones_for_word("purple")
>>> pronouncing.rhyming_part(phones[0])
'ER1 P AH0 L'
Parameters:phones – a string containing space-separated CMUdict phones
Returns:a string with just the “rhyming part” of those phones
pronouncing.search(pattern)[source]

Get words whose pronunciation matches a regular expression.

This function Searches the CMU dictionary for pronunciations matching a given regular expression. (Word boundary anchors are automatically added before and after the pattern.)

>>> import pronouncing
>>> 'interpolate' in pronouncing.search('ER1 P AH0')
True
Parameters:pattern – a string containing a regular expression
Returns:a list of matching words
pronouncing.search_stresses(pattern)[source]

Get words whose stress pattern matches a regular expression.

This function is a special case of search() that searches only the stress patterns of each pronunciation in the dictionary. You can get stress patterns for a word using the stresses_for_word() function.

>>> import pronouncing
>>> pronouncing.search_stresses('020120')
['gubernatorial']
Parameters:pattern – a string containing a regular expression
Returns:a list of matching words
pronouncing.stresses(s)[source]

Get the vowel stresses for a given string of CMUdict phones.

Returns only the vowel stresses (i.e., digits) for a given phone string.

>>> import pronouncing
>>> pronouncing.stresses(pronouncing.phones_for_word('obsequious')[0])
'0100'
Parameters:s – a string of CMUdict phones
Returns:string of just the stresses
pronouncing.stresses_for_word(find)[source]

Get a list of possible stress patterns for a given word.

>>> import pronouncing
>>> pronouncing.stresses_for_word('permit')
['01', '12']
Parameters:find – a word to find
Returns:a list of possible stress patterns for the given word.
pronouncing.syllable_count(phones)[source]

Count the number of syllables in a string of phones.

To find the number of syllables in a word, call phones_for_word() first to get the CMUdict phones for that word.

>>> import pronouncing
>>> phones = pronouncing.phones_for_word("literally")
>>> pronouncing.syllable_count(phones[0])
4
Parameters:phones – a string containing space-separated CMUdict phones
Returns:integer count of syllables in list of phones

Credits and Acknowledgements

Lead developer: Allison Parrish <allison@decontextualize.com>.

This package was originally developed as part of my Spring 2015 research fellowship at ITP. Thank you to the program and its students for their interest and support!

History

0.2.0 (2018-07-01)

  • Removed dictionary data from this package in favor of a dependency on David L. Day’s very nice cmudict package.
  • Many fixes and improvements from hugovk (thanks!)

0.1.5 (2017-04-13)

  • Messed up the PyPI upload. Yay!

0.1.4 (2017-04-12)

  • Improved performance when retrieving rhyming words. (Based on pull request proposed by WillPiledriver.)

0.1.3 (2017-01-17)

  • Various tweaks and performance improvements.

0.1.2 (2015-06-23)

  • Pre-compiled regex for improved performance. (Contributed by John Wiseman.)

0.1.1 (2015-06-12)

  • First release on PyPI.