Word formatting for DNS v10 and lower

The dragonfly.engines.backend_natlink.dictation_format.WordParserDns10 class converts DNS v10 (and lower) recognition results to dragonfly.engines.backend_natlink.dictation_format.Word objects which contain written and spoken forms together with formatting information. For example:

>>> from dragonfly.engines.backend_natlink.dictation_format import WordParserDns10
>>> # Make sure the engine is connected before using WordParserDns10.
>>> from dragonfly import get_engine
>>> engine = get_engine("natlink")
>>> engine.connect()
>>> parser_dns10 = WordParserDns10()
>>> print(parser_dns10.parse_input("hello"))
Word('hello')
>>> print(parser_dns10.parse_input(".\\full-stop"))
Word('.', 'full-stop', no_space_before, two_spaces_after, cap_next, not_after_period)

Nonexistent words can be parsed, but don’t have any formatting info:

>>> print(parser_dns10.parse_input("nonexistent-word"))
Word('nonexistent-word')

This helper function allows for a compact notation of input string tests:

>>> from dragonfly.engines.backend_natlink.dictation_format import WordFormatter
>>> def format_dictation_dns10(input, spoken_form=False):
...     if isinstance(input, basestring):
...         input = input.split()
...     wf = WordFormatter(parser=WordParserDns10())
...     return wf.format_dictation(input, spoken_form)
>>> format_dictation_dns10("hello world")
u'hello world'

The following tests cover in-line capitalization:

>>> format_dictation_dns10("\\All-Caps hello world")
u'HELLO world'
>>> format_dictation_dns10("\\Caps-On hello world")
u'Hello World'
>>> format_dictation_dns10("\\Caps-On hello of the world")
u'Hello of the World'
>>> format_dictation_dns10("hello \\Caps-On of the world")
u'hello Of the World'
>>> format_dictation_dns10("\\Caps-On hello world \\Caps-Off goodbye universe")
u'Hello World goodbye universe'
>>> format_dictation_dns10("hello \\All-Caps-On world goodbye \\All-Caps-Off universe")
u'hello WORLD GOODBYE universe'
>>> format_dictation_dns10("hello \\All-Caps-On world \\Caps-On goodbye universe")
u'hello WORLD Goodbye Universe'
>>> format_dictation_dns10("hello \\All-Caps-On world goodbye \\Caps-Off universe")
u'hello WORLD GOODBYE universe'

The following tests cover in-line spacing:

>>> format_dictation_dns10("\\No-Space hello world")
u'hello world'
>>> format_dictation_dns10("hello \\No-Space world")
u'helloworld'
>>> format_dictation_dns10("\\No-Space-On hello world")
u'helloworld'
>>> format_dictation_dns10("\\No-Space-On hello world goodbye \\No-Space-Off universe")
u'helloworldgoodbye universe'
>>> format_dictation_dns10("\\No-Space-On hello world \\No-Space-Off goodbye universe")
u'helloworld goodbye universe'
>>> format_dictation_dns10("\\No-Space-On hello world \\space-bar goodbye universe")
u'helloworld goodbyeuniverse'

>>> # The following are different from some DNS installations!
>>> format_dictation_dns10("hello \\No-Space-On world goodbye universe")
u'helloworldgoodbyeuniverse'
>>> format_dictation_dns10("hello \\No-Space-On world \\space-bar goodbye universe")
u'helloworld goodbyeuniverse'

The following tests cover punctuation and other symbols that influence spacing and capitalization of surrounding words:

>>> format_dictation_dns10("hello \\New-Line world")
u'hello\nworld'
>>> format_dictation_dns10("hello \\New-Paragraph world")
u'hello\n\nWorld'
>>> format_dictation_dns10("hello .\\full-stop world")
u'hello. World'
>>> format_dictation_dns10("hello ,\\comma world")
u'hello, world'
>>> format_dictation_dns10("hello .\\full-stop \\New-Line world")
u'hello.\nWorld'
>>> format_dictation_dns10("hello -\\hyphen world")
u'hello-world'
>>> format_dictation_dns10("hello (\\left-paren world")
u'hello (world'
>>> format_dictation_dns10("hello )\\right-paren world")
u'hello) world'
>>> format_dictation_dns10("hello \\\\backslash world")
u'hello\\world'

The “.” character at the end of certain words is “swallowed” by following words that start with that same character:

>>> format_dictation_dns10(["hello", "etc.\\et cetera", "world"])
u'hello etc. world'
>>> format_dictation_dns10(["hello", "etc.\\et cetera", ".\\full-stop", "world"])
u'hello etc. World'
>>> format_dictation_dns10(["hello", "etc.\\et cetera", "...\\ellipsis", "world"])
u'hello etc... world'
>>> # The following are different from some DNS installations!
>>> format_dictation_dns10("hello .\\full-stop ...\\ellipsis world")
u'hello... world'
>>> format_dictation_dns10("hello ...\\ellipsis .\\full-stop world")
u'hello... World'

Letters and numbers spoken in line within dictation allow efficient spelling of for example words not present in the dictionary:

>>> format_dictation_dns10(["a\\alpha", "b\\bravo",
...                         "c\\charlie", "d\\delta",
...                         "e\\echo",
...                         "x\\xray", "z\\zulu",
...                         "y\\yankee"])
u'abcdexzy'

Words may contain spaces in their written and/or spoken forms. For example a custom word added by the user might have the following form with a space in both spoken and written forms:

>>> format_dictation_dns10(["custom written\\custom spoken"])
u'custom written'
>>> format_dictation_dns10(["custom written\\custom spoken",
...                         "\\All-Caps",
...                         "custom written\\custom spoken",
...                         "\\Cap",
...                         "custom written\\custom spoken"])
u'custom written CUSTOM written Custom written'
>>> format_dictation_dns10(["custom written\\custom spoken",
...                         "\\Caps-On",
...                         "custom written\\custom spoken",
...                         "\\All-Caps-On",
...                         "custom written\\custom spoken",
...                         "\\All-Caps-Off",
...                         "custom written\\custom spoken"])
u'custom written Custom Written CUSTOM WRITTEN custom written'

Certain words, such as numbers, are not formatted according to the same rules as “normal” words, i.e. those which specified written and spoken forms and formatting info:

#    >>> format_dictation_dns10("one two three")
#    u'123'

The spoken form of words is given instead of the written form, if specified:

>>> format_dictation_dns10(["custom written\\\\custom spoken"], True)
u'custom spoken'
>>> format_dictation_dns10(["custom written\\\\custom spoken",
...                         "\\all-caps\\all caps",
...                         "custom written\\\\custom spoken",
...                         "\\cap\\cap",
...                         "custom written\\\\custom spoken"], True)
u'custom spoken all caps custom spoken cap custom spoken'
>>> format_dictation_dns10(["custom written\\\\custom spoken",
...                         "\\caps-on\\caps on",
...                         "custom written\\\\custom spoken",
...                         "\\all-caps-on\\all caps on",
...                         "custom written\\\\custom spoken",
...                         "\\all-caps-off\\all caps off",
...                         "custom written\\\\custom spoken"], True)
u'custom spoken caps on custom spoken all caps on custom spoken all caps off custom spoken'