Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Brite
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (Cyborg)
  • No Skin
Collapse
Brand Logo

CIRCLE WITH A DOT

  1. Home
  2. Uncategorized
  3. Great read for someone interested in programming and linguistics like me; thanks!

Great read for someone interested in programming and linguistics like me; thanks!

Scheduled Pinned Locked Moved Uncategorized
2 Posts 2 Posters 0 Views
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • C This user is from outside of this forum
    C This user is from outside of this forum
    clv1@mastodon.social
    wrote last edited by
    #1

    RE: https://mastodon.ie/@lexiconista/116279296365565394

    Great read for someone interested in programming and linguistics like me; thanks!

    T 1 Reply Last reply
    2
    0
    • R relay@relay.publicsquare.global shared this topic
      R relay@relay.mycrowd.ca shared this topic
    • C clv1@mastodon.social

      RE: https://mastodon.ie/@lexiconista/116279296365565394

      Great read for someone interested in programming and linguistics like me; thanks!

      T This user is from outside of this forum
      T This user is from outside of this forum
      tamasg@mindly.social
      wrote last edited by
      #2

      @clv1 @lexiconista Huge thanks for this list. I now took a fine-toothed comb through my TGSpeechBox TTS code, and yeah...
      We have some lurking:
      - Turkish İ/ı: our dictionary lookup uses ASCII tolower(). "İstanbul" won't match "istanbul". Oops.
      - German ordinals: "3. Mai" — that dot looks like end-of-sentence to our clause splitter. Sorry, May 3rd just became
      two sentences.
      - Chinese: no spaces between words means our entire word-based pipeline sees one giant "word." Current plan:
      double-space = word boundary, single space = phoneme separator. Simple, no NLP library needed.
      None of these are blocking 3.0 but they're fun reminders that English is not the default state of human language. And thanks to your article they will all get fixed.

      1 Reply Last reply
      2
      0
      Reply
      • Reply as topic
      Log in to reply
      • Oldest to Newest
      • Newest to Oldest
      • Most Votes


      • Login

      • Login or register to search.
      • First post
        Last post
      0
      • Categories
      • Recent
      • Tags
      • Popular
      • World
      • Users
      • Groups