Topics created by clv1@mastodon.social

@clv1 @lexiconista Huge thanks for this list. I now took a fine-toothed comb through my TGSpeechBox TTS code, and yeah...We have some lurking: - Turkish İ/ı: our dictionary lookup uses ASCII tolower(). "İstanbul" won't match "istanbul". Oops. - German ordinals: "3. Mai" — that dot looks like end-of-sentence to our clause splitter. Sorry, May 3rd just became two sentences. - Chinese: no spaces between words means our entire word-based pipeline sees one giant "word." Current plan: double-space = word boundary, single space = phoneme separator. Simple, no NLP library needed. None of these are blocking 3.0 but they're fun reminders that English is not the default state of human language. And thanks to your article they will all get fixed.

CIRCLE WITH A DOT

clv1@mastodon.social

Topics

Great read for someone interested in programming and linguistics like me; thanks!