This is the last big version of CATT before the app officially launches, so I thought I'd just jump in and talk about it a bit. If you’re new to the app, try it here: https://cattbycatt.netlify.app/
There are some big-ish changes to this version, namely the listening model and the CSS customizer changes.
The listening models
The listening model (Automatic Speech Recognition - ASR if you want to be fancy) is a fundamental part of this app, and for the last few months, I've been trying to get my hands on a relatively good one.
Chirp (or some predecessors of it) is the model Google use for Web Speech API. It is free, and because of that, it's the default model. This app would not have existed without it. So even though its accuracy isn't very good, and it doesn't have any punctuation, I'm thankful that it exists anyway.
Whisper is OpenAI's open ASR model, and the variation that this app use is hosted on Groq. It is fast, almost absurdly so, despite not being a streaming model. The only problem it has is that it doesn't respect the language parameter. No matter which language is set to be the input language, the model would try and dictate for all languages at the same time. It could have been an incredibly good listening model, but the disobedience and hallucination it has held it back for this project.
Gemini is... well, Gemini. Google's multimodal AI model that is not really an ASR model, more of a general-purpose LLM, but was harnessed to do this one specific job. It respect parameters set by the prompt more, but it is quite a bit slower, especially at startup.
Which brings us to this new one: Deepgram's Nova-3. It is the new model and needs time to prove itself, so I am not going to bias your opinion about it by singing it praises, just try it for yourself first. What I will say, though, is that it has punctuation, and it is a streaming model, and from what I've seen, it hallucinates less. In theory, it should be a good option for people struggling to have a good transcription.
The CSS customizer
Just some quick updates. You know, some of the new features I list here should have been here so much sooner, but hey, better late than never, right?
Preset: CSS customizer now have presets for you to choose. There are 3 of them: Default (the classic white sans-serif text and blue glow), Cozy (warm color, slab-serif fonts) and Whimsy (hot pink accent, bubbly font, big stroke). Of course, you can always make your own tweaks, or just start fresh with the Custom option.
Stroke and shadow. Wow. This app has been in existence for almost a year, and only now that the CSS generated by this app can have text stroke. Before this, the only way your text can stand out from the background is by having a glow. Imagine that. Well, it's here now.
Custom font: Well, this is actually pretty cool. You can now add your own font, Google, Adobe, self-hosted, whatever you want. And the app would remember that font settings.
Guide on making your own CSS: This is for the nerds. It show the name of the classes of the export pages so if you want to make your own CSS with your own sick coding skill, you can. Or you can have an LLM do it for you.
Caption styled with CSS Customizer
UI Design
If you’ve used the app before this version, you’d probably be surprised that I only mentioned this now. Yes, the app has a new, much sleeker look, much owes to the logo design of LUMINA (@1UMIN4 on Twitter). Yes, that LUMINA. I really like the direction she had for the design and tried to follow it as closely as possible, but I am as much of a lousy designer as a lousy coder, even with the help of AI, so I think it would take another version or two to fully implement the style.
On another note about UI, on the official 1.0 release version, and later, the native version, there will be a lot of changes when it comes to UI, aimed to make the app as intuitive to use as possible.
Afterword
As I've mentioned before, this would be the last version before this app officially launches. I will be sharing more about this launch later, and what would change (or not change) with it. One big thing is monetization, because you know, inference cost money, and now that the app kind of actually works without breaking too often, I finally feel okay about charging people for it. And yet, as mentioned in the first doc ever released for this app, I am still committed to only charge you what the inference provider and payment portal charge me, with as little extra as humanly possible for stuff like maintenance. And who knows, if the app gets big enough, or if I get big enough, maybe we can afford a free tier! I'll share more about what the payment model's gonna look like on a separate post (because I have no idea myself either), so hang tight for that.
So yea, exciting things are going to happen. For now, thank you for reading, and thank you for following the development of CATTbyCatt.