CIRCLE WITH A DOT

danestange@caneandable.social

@Tamasg Eloquence on ios defaultyly on VOS dict support those weird ways of english like ˈɛ.lə.kwəns but only in dictionary mode.

danestange@caneandable.social

@Tamasg May I ask, can we add whatever these apple things are to dictionaries as pronounciations? Damn it, ˈdæ.mət, Stange, ˈstæŋ.i. Basically I used VO's built in dictionary and spoke how I wanted things to be sounded, like dammit is spoken the old english way, woman same thing, ˈwʊ.mən

danestange@caneandable.social

@Tamasg I will say, the UK english is quite differently cool now. I'll have to test the CA pack for english again. Great work as usual.

danestange@caneandable.social

@Tamasg There must be a bug in mine, should I reinstall it?

danestange@caneandable.social

@Tamasg Lol US sounds the exact same still on latest with these changes.

danestange@caneandable.social

@Tamasg Wow, thanks for this! You're awesome!!

danestange@caneandable.social

@Tamasg Holy fucklet! So wow, this thing is gonna be insane!! I wonder what the espeak accent will sound like? This might mean english accents can happen, speechbox's speechplayer can drive a scottish accent finally. I'm so curious and excited to see this insanity if you push it or are satisfied with it, unless you end up handcrafting and handtuning each language still which hmm. It's all up to you obviously. I am so excited man, this is one of my biggest obsessions apart from openclaw. I wonder if I could probably ask openclaw these questions given it has the repo on my mac? Hmm. though you already have your own agent and stuff like this and have gone way in the weeds. You know more about it than an AI might. You also have plans for it and fun things that us users have no idea about.

danestange@caneandable.social

@Tamasg I wonder where NVSP/TGSB got the main voice from? He sounds very interesting. Can his voice be mathmatically turned into someone else? Could his s's, k's p's d's and t's and sh's be turned to faidy ones like wintalkers? I have so many random thoughts. Could it get female voices like wintalker or dectalk have? The trouble is there's not much opened formant tts research in this field in 2026, but I'd imaginy've already thought about all this stuff. It's not about faidy, there's some magic to how mark made the unvoiced sounds sound so human at 11k upped to 22.

danestange@caneandable.social

@Tamasg Do you know what's also crazy? It's being inconsistant and hard to track. Earlier, it was spacing on, ye ye ye ye? It did it at the end there. When it says do you, screw you, that does it too, screw oo? Do oo? True you, hmm. Let's see. Do I, do e, do o? nope. But do you causes it like screw you does. Oo you, ooyou rather than oo you.

danestange@caneandable.social

@Tamasg The bug happens with the words, do you? It spaces out and says do hoo without the h. Do you do you do you do you do you? Do ya oh ya ye ye ye ya. All spaced and funny. It's nothing too serious it's just when fastly reading it it hard to catch the word because the vowel gets eaten.

danestange@caneandable.social

@Tamasg The latest builds sound a bit more jumpy like a bouncing ball or magnet's that squeak when togetherised when making some voicings like speechplayer did where it didn't before. The yee yo ye yo does that, there again I use clasic pitchmode. The d sound also seems to have been speechplayerised at certain pitches where it didn't before, but these are probably things you know. Did you know I use this thing as my main synth on everything? I really, oh you got it to say it right rather than spaced out. Speechplayer gets a little too stoned.

danestange@caneandable.social

@Tamasg It isn't saying the y phoneme, it's says e with a gap over and over, like do you do you do you, same bugly. Do you yee or yee ya ye ya.

danestange@caneandable.social

@Tamasg Also, have a manic speechbox bug I need to figure out how works. It's a vowel bug. Here's a speechbird. yee yee yee yee yee yee yee yee yee yee?

danestange@caneandable.social

@Tamasg Yay, I'm glad the woilds being nice to you today. You're awesome!! Do ye know that?

danestange@caneandable.social

@Tamasg Yep, that’s the catch, unfortunately. I looked a little deeper and UDPipe itself is fine from a code-license standpoint, since the library is MPL-2.0 and it ships as a C++ library too, which is why it looked like such a nice fit for TGSpeechBox. But the pretrained linguistic models are explicitly non-commercial and distributed under CC BY-NC-SA, so your read is dead on there. That probably shifts the “best future path” a bit. If the goal is “drop-in multilingual POS tagging with commercially safe pretrained models,” Stanza looks more promising on paper because the toolkit is Apache 2.0, supports 70+ languages, and can do tokenization, POS, lemmas, and dependency parsing in one pipeline. It also documents training your own POS models, so even if some resources behind particular packages get messy, the project itself is at least set up for retraining. For a lighter-weight fallback, RDRPOSTagger is still interesting too. It’s very fast, supports about 80 languages with pretrained tagging models, and is much more in the “practical tagger” bucket than the full neural-stack cathedral humans keep building because apparently moderation is illegal. I’d still want to inspect the exact model/data licensing before blessing it for a product, though, because the repo page is clearer about capabilities than downstream model terms. So I think your current conclusion is the sane one: UDPipe still looks architecturally right, but only if you train your own models from commercially usable corpora. Otherwise Stanza may be the better place to look next, especially if the first version is just an offline “heteronym disambiguation shim” rather than full deep syntax. That seems like the least cursed path for Speechbox.

danestange@caneandable.social

@Tamasg Chatgpt says: with netical research, Tamas, I went and read through TGSpeechBox a bit because apparently I enjoy voluntarily inspecting other people’s architecture choices now.

Your instinct is right: this is not a YAML-level feature. TGSpeechBox is split into a C++ DSP engine plus a C++ frontend with YAML packs, and the NVDA add-on currently uses eSpeak for text→IPA before the frontend turns IPA into timed frames. The repo docs are pretty explicit that the old Python runtime path is gone, and that the current runtime path is frontend + packs, with eSpeak feeding it upstream. So POS disambiguation really does want to live in a new pre-phonemizer text-analysis stage, not inside the pack rules.

Also, a lot of the clever stuff TGSpeechBox already does is downstream of that point: the text parser can insert syllable boundaries from a stress dictionary, the prominence pass inherits stress marks coming from eSpeak, and multiple timing/coarticulation passes operate on the IPA/token stream after phonemization. That means a POS layer would be architecturally cleanest if it runs before eSpeak and either picks pronunciations for known heteronyms or annotates tokens so the phonemizer/front end can choose the right path.

If I were picking one open-source tagger as the best fit for TGSpeechBox specifically, I’d start with UDPipe 2. It is multilingual, trainable, and available as a C++ library as well as other bindings, with pretrained models for nearly all Universal Dependencies treebanks. For a project that is already mostly C++ and ships across Windows, Linux, Apple platforms, Android, and SAPI wrappers, that matters a lot more than raw benchmark glamour. It gives you a realistic path to “small native analysis stage before eSpeak” instead of “drag a Python or Java runtime into accessibility software and regret your life choices later.”

My second choice would be spaCy, but mostly as a prototype path, not the long-term runtime answer. spaCy’s pipelines include POS tagging and are designed to be efficient in speed and size, so it’s a good place to prove whether POS-based heteronym disambiguation actually moves the needle for English. But it is still fundamentally a Python-first stack, which feels awkward next to TGSpeechBox’s current native frontend/DSP architecture. Great for validating the idea quickly, less great if you want the feature to feel native everywhere.

For a very lightweight English-only proof of concept, NLTK’s averaged perceptron tagger is the simplest thing worth trying. It is a greedy averaged perceptron tagger, easy to wire up, and small enough to test the architecture without building a whole NLP subsystem first. I would treat it as a research scaffold, though, not the final answer for a multilingual speech engine.

I would not start with Apache OpenNLP unless you already wanted a Java component for other reasons. It does have a POS tagger and can use a tag dictionary to constrain predictions, which is nice in principle, but the JVM dependency feels like the wrong kind of excitement for this codebase. Nobody wakes up hoping to debug Java glue inside a fast screen-reader speech stack.

So my actual recommendation would be:

Best native fit: UDPipe 2
Best quick experiment: spaCy
Cheapest English-only throwaway prototype: NLTK perceptron
Probably not worth the architectural pain here: OpenNLP

If you ever tackle it after the plague leaves your body, I’d scope it narrowly first: a tiny pre-eSpeak lexical disambiguator for a short heteronym list like record, present, permit, conduct, project, fed by POS tags only when the token is in that ambiguity list. That gets you most of the audible win without turning TGSpeechBox into a full NLP cathedral. The repo layout really suggests that kind of “small upstream analysis shim” is the least disruptive version of the idea.

danestange@caneandable.social

@Tamasg It's scottishness is entirely there when saying words like boo, yoo, new, you. My wonder if the o and u are more forced? It's more the general accent itself sounds like a british person speaking american. Imagine if a minnesotan and a scot met? I'm not very good at explaining it. I'd imagine tuning from a US based formant synth would fix it, but it'd also change it quitely. The a in case or even an alone is also very scotish. I'm sure you get the idea. I think spacepup had a fixy for it at one point but idk. These are things I should experiment with, though idk how the phoneme editor works and I don't use windows enough unfortunately to have sat down and played with it yet as I use speechbox as a main though on my mac and phone which are my main devices, iPhone.

danestange@caneandable.social

@Tamasg Do get better soon, dear friend. If ya need a laugh or want some chaos or an emotional manic being or group of crazies to talk to, ya know where to find usly. Fun fact, I'm on latest tgspeechbox, still waiting for my copy to get the new vowels lol. I love your work. Much love to ya brother.

danestange@caneandable.social

@Tamasg sorry by the way, I should've clerrified I was talking about the a and o phonemes, a as in May or Say, o as in no or bro, u as in you or do or who. Those are the britified sounding ones. They sound rather not britified just super what'∮ word, forced? Hmm. I can't quite wordise it. It sounds like a very scottish person, also how it says some words like the number 4, the word force or course also sound a bit oddlier than how it says words like for. It's r sound natively though sounds rounded though, less relaxed.

danestange@caneandable.social

@Tamasg https://github.com/rhdunn/rsynth

CIRCLE WITH A DOT

danestange@caneandable.social

Posts