My English is not the best but not the worst either. But I realized I can't boost it up after a certain level! In my belief, in order to truly learn a language, you need to be exposed to that language often. Vocabulary is the key factor here if you really want to improve in any language.
My experience is that when I read a book to improve my English vocabulary, I encounter words that I don't know so often and my reading gets disturbed. I go look for the meaning, come back, put it in context, re-read it, etc. It didn't work for me. So I tried listening to audiobooks - I listen to the book and read along, and whenever I encounter a word, I write it down. I get these 50 words in 2-3 pages and I ask ChatGPT to give me their meanings. I read them, take the book, and now read it myself. That helps for sure, but still after a while I lose those words because I never encounter them again. Well then, in order to not forget those words, I need some kind of exercise, right? A flashcard app maybe? Well, I still need to go out there, ask ChatGPT to create questions, put them in a flashcard app, etc. It's still time-consuming and this is supposed to be fun!
I need to be exposed to English in my daily life. I just need to save the words somewhere and whenever I want, I need to be able to practice them in a fun way, in Duolingo style maybe? So then I realized would it be better to store words in their own context? I mean, say I read Harry Potter and have a list of words I encountered in it, say I watch Breaking Bad and have a list of words I encountered watching it. I believe seeing those words together and practicing together makes it easier to remember them.
But I shouldn't be the one adding the meaning of the word and the one to generate exercises, right? It all should be automated. The exercise part will be handled by LLM for sure, but for the meaning of the word, I can fetch from a dictionary? But I really don't like the dictionary definitions and one word can have multiple meanings in their own context. So then I need to use LLM for this task too and have the word's meaning in its own context.
You create a list for your context, you add words, meanings get added automatically, and I see the word added in a different color (coloring is also a method used to remember words). It all takes seconds. And whenever I want to practice these lists, I can use learn mode to learn and test my knowledge in quiz mode. So I basically built this app ((thanks to Claude 3.5 Sonnet)). I want it to be like Duolingo, but of course I still have a way ahead to go, but wanted to share it in hopes of getting contributors.
You can read more in the repository. I would love to get your thoughts on this.
The best way I've found to identify vocabulary most important to my life is through journaling in the language I'm trying to learn. Describing exactly what I did that day, my thoughts, etc, as best I can.
I had thought of doing the journal entries digitally and gathering dictionary headwords from such journal entries, whether they're written in my mother tongue (English) or not, and use the built dictionary lists to drill vocab.
Traditionally you'd use a lemmatizer with a morphosyntactic tagger for the language to identify the dictionary words, but AI is serviceable these days to easily identify dictionary words from long-form text in many languages, though honestly would be surprised if AI outperforms the traditional methods already.
Good luck and have fun :)
Apologies, I should have linked beforehand.
- LangTurbo (by @sebnun) - langturbo.com : Learn through podcasts with transcriptions and contextual word definitions
- Nuenki (by @Alex-Programs) - nuenki.app : Browser extension that translates appropriate-difficulty sentences across websites, with hover-for-definitions feature
- Manabi Reader (by @wahnfrieden) - reader.manabi.io : Japanese-focused integrated reader with SRS and Anki integration
- (by @muth02446) - Spanish: appicenter.net/Apps/VocabES/ - English: appicenter.net/Apps/VocabEN/ : Uses spaced repetition and audio for basic vocabulary learning
- Vocabuo (by @kebsup) - vocabuo.com : Combines SRS flashcards with ebook/YouTube/website reader, using AI for content generation
- LingoStories (by @laurentlb) - github.com/laurentlb/lingostories/ : Open-source language learning tool
- Turkish Learning Tool (by @learning-tr) : Browser extension for colloquial translations with audio and pronunciation features
- Language Reactor (by @davidzweig) : Planning to open-source soon, looking for contributors
Note: above list is summarized by Claude 3.5 Sonnet.
Somebody has already suggested adding spaced repetition and audio, which I agree with completely.
One more suggestion: In addition to having the LLM give you the meaning and example for the context in which you originally saw the word, also ask it to provide the word’s other main meanings and examples of it being used in those senses. You might encounter a word first in a slang or technical sense; while it’s useful to learn that meaning, it’s also important to learn other, more common meanings.
Below are some examples of words you might encounter first in technical contexts but would also be worth knowing in their more general meanings. (Examples suggested and defined by ChatGPT o1.)
canonical
Religious/General: Relating to a canon (e.g., church law) or a recognized body of works.
Math/Computing: Conforming to a standard or simplest form (e.g., “canonical form” of an equation).
resolution
General: A firm decision or determination (often heard in “New Year’s resolution”).
Tech/Imaging: The detail an image holds, typically measured in pixels, dots per inch (DPI), etc.
protocol
Diplomatic/General: The official procedure or set of rules governing state or ceremonial events.
Computing: A set of conventions and rules for transmitting data between electronic devices.
flux
General: Continuous movement or change, often implying instability.
Physics/Engineering: The amount of some quantity (e.g., heat, magnetism) passing through a given area over time.
In one instance, I was having it correct akkusativ/dativ/nominativ sentences and it would say the sentence is in one case when I knew it was in another case. I'd ask ChatGPT if it was sure, and then it would change its answer. If pressed further, it would again change its answer.
I was originally quite excited about using an LLM for my language practice, but now I'm pretty cautious with it.
It is also why I'm very skeptical of AI-based language learning apps, especially if the creator is not a native speaker.
I was asking only for the meanings of the words and phrases, though. I didn’t ask for things like pronunciations, grammatical categories, etc. In the past, when I’ve tried to get that kind of granular information from LLMs, there were indeed errors, presumably because of tokenization issues.
A few days ago, I ran some similar tests with Japanese, asking for readings of kanji and jukugo in an extended text. All of the models I had tried before for such tasks had screwed up. This time, however, ChatGPT o1 scored 100%. It also was able to analyze sentence grammar accurately, unlike the other models I tried. I was impressed.
At current API prices, though, o1 might be a bit too expensive for such a task.
- A more user-friendly approach to running the app
- LiteLLM integration so we can use any LLM (it's done! thanks to @enessusan00!)
- Running the database locally
- Customizable language preferences (e.g., learning German through Turkish)
- A live version where anyone can easily try the app
- A protection mechanism for LLM responses to ensure getting valid JSON
- Fixing small bugs
- Customizable exercise types (ability to enable/disable specific question formats)
We'll be focusing on improving the app as much as we can, but help would be greatly appreciated! We'll be structuring the repository to make it easier for everyone to contribute together.
I'm truly amazed by all the insights and suggestions shared here. There are so many great ideas. Thank you all again for making this discussion so enriching and the support. I'll keep sharing updates here! All amazing suggestions shared here will be added to the roadmap in the README!
I originally planned to add some kind of SRS to it, but I found that I learned much better just reading things in context instead of explicitly using SRS to memorize them. Steve Kaufmann (creator of LingQ) explains this better here [2]
And for the second part, I'm planning to include SRS features @markvdb pointed out in comments, combining both contextual learning with SRS would be interested I guess.
"To recognize 99% of all the words in Netflix's subtitles, you'd need to know 37,247 words"
Interesting approach! I really don't know how they managed to gather this list, but it's an interesting and clever method.
*other than those blocked for privacy reasons
* https://www.appicenter.net/Apps/VocabES/ * https://www.appicenter.net/Apps/VocabEN/
Uses Spaced Repetition and Audio but is not personalized which is less of an issue for basic words. The hard part is getting good example sentences and "cross links". I had though about use AI for that but have not followed through.
My app [1] is basically a combination of SRS flashcards with an ebook/YouTube/Website reader. Unlike Anki though, AI creates example sentences, definitions, images and audio.
I find it interesting that you want to get inspired by Duolingo. My approach is to have the most efficient grind possible - no gamification. I've found Duolingo was wasting so much of my time with exercises that did not really teach me anything and took a long time to complete + the XP points/levels etc. were quite distracting.
(E.g., your vocabuo website prominently points to possible promo codes.)
You seem to focus on the English use-case. In my experience, getting exposure to other languages can be much more difficult, especially when you're not fluent yet. It would be interesting to see how to approach it: ideally, questions and answers should be in the target language, but the questions have to be very simple.
As someone else mentioned, having audio would be very useful. At some point, you could consider a hand-free mode: it reads the question out loud, pauses a few seconds, then tells the response.
On the latter part, there used to be a hard mode at least in browser mode where you could have it force you hand type every word. I always really liked that, but then they got rid of it. Of course with the heart system these days, I wouldn't last 5 minutes if I tried to do it that way so such is life I suppose
I actually had this idea of using Duolingo's style exercises, but now with your comment, I realize some might not be appropriate for individual learners with different goals.
The cool thing would be to have customizable exercise types, where users can choose which ones they want and which ones they don't want!
I will add this to the roadmap in the README, pointing out this comment! Thanks again!
Here's a few random suggestions: - spaced repetition. Again, anki style. - audio. Can you make it easy to record a phrase, anki style? Or maybe even make AI pronounce them correctly?
I would something like that.
I believe the spaced repetition feature must be prioritized because that's the most important thing in this app. I mean, what's the purpose of seeing the words over and over again if I already have confidence with them?
For the pronunciation feature, I had similar work before and there are great open source tools and libraries we can build upon that analyze your pronunciation and spot where you made mistakes. We can use open source TTS libraries to pronounce the correct version.
I also would definitely want to see audio questions in exercises similar to Duolingo, and it would be great to work on those features.
It also has audio and pronunciation. It is around the halfway mark in the demo.
Not: Bu benim en sevdiğim şarkılardan biridir!! (it's one my fav songs)
*thanks again!!
The app would work for any language, but the definitions and exercises will be written in English. I created a list just now for German words and added the German word "Zeitreise". It generated this definition:
<<"Zeitreise" in a German mystery series means time travel. It refers to the act of a character or characters moving through time, either to the past or the future, often as a central element of the mystery's plot.>>
Exercises were asked in English.
"What does "Zeitreise" mean?":
- Time travel - Train journey - Long wait - Difficult puzzle
Maybe a feature where you can choose the language would be cool. I mean, someone might prefer to learn German using German, or say Spanish using Turkish.
Again, thank you for pointing it out. I will update the README and hopefully add inference language preference feature.
A feature where it supports TR -> EN and vice versa would be amazing!
I'm on a Duolingo family plan, studying Ukrainian. It keeps throwing more and more difficult words without really building my knowledge and experience with the previous words.
I'm not sure if I can't hear the words correctly (it's possible, I'm partially deaf and it sounds like the voices it provides are low quality). I'm not sure if I'm not pronouncing them correctly (it often doesn't accept my pronunciations). Its feedback for improvement is extremely limited. For example, no matter how hard or slow or fast I pronounce it, it pretty much never accepts один ("oden" = "one") when I speak it.
When I was in 3rd through 6th grade of school, I learned English pronunciations using Spalding phonetics [0]. There are about 70 or so English phonemes if I recall. It would be handy to have that for other languages. It specifically taught how to put letters together to form sounds, and which combinations of letters are synonymous for sounds (but not for spelling, which was a separate class based much more on memorization of rules and exceptions). I excelled in both of these classes.
I've also sometimes asked ChatGPT for translations of words. It seems semi-OK. But it's much better to ask Ukrainian friends and colleagues. Friends and colleagues don't have a lot of time or patience to teach though. And they'd often throw additional meaning or context that was difficult to understand (for example, English has much less assignment of gender to words).
Not too much later in life (8th grade or so), I started writing software. I was homeschooled then, and had a lot of time on my hands. So I'd write software for most of the day every day for months at a time. There came a point where I stopped thinking in English and started thinking in objects and code relationships. I didn't realize it until my mother asked me what I was doing and I had to think to translate to English.
I've heard similar anecdotes: you start to become a native speaker when you can think in that language. I want that from Duolingo but haven't yet achieved it after 2.5 years. I imagine what's missing is just as @cat_multiverse said [1]: I don't really use the Ukrainian language in my daily life and should just start doing so even if it's just a journal. But without any feedback about correct pronunciation or grammar I worry that I would end up with my own mini language instead of truly a Ukrainian one.
Am I dictating insomuch as writing speaking free-form words and it writes them down? No.
Duolingo will present a word and ask me to verbalize them into the microphone.
Duolingo will present a phrase or sentence and ask me to verbalize them into the microphone.
I know that Duolingo will do this also for learning German.