Potential dictionaries #1

Open
opened 2024-09-01 18:14:16 +02:00 by Anri · 1 comment
Owner
- ODS8 from 2021 : https://github.com/Thecoolsim/French-Scrabble-ODS8/blob/main/French%20ODS%20dictionary.txt - 411430 words - No more info than that since its a `.txt` - freelang.com : https://github.com/Taknok/French-Wordlist/blob/master/francais.txt - 22739 words - No more info than that since its a `.txt` - lexique.org : http://www.lexique.org/databases/Lexique383/Lexique383.zip - 142695 words - A lot of info since a excel is available - Missing words.. (example: fleuriste not found) - Pain in the ass to parse a `.xslb` file (maybe?) - We have to retain only one form for a verb - Wiktionnary : https://dumps.wikimedia.org/frwiktionary/latest/frwiktionary-latest-all-titles.gz - 37381 words - They are clean words, kinda.. - Need some parsing from the data, not too hard - Missing words.. (example: fleuriste not found) when simple `français/` search query - Definition are easy to find since its Wiktionnary, so: [https://fr.wiktionary.org/wiki/`<word>`](https://fr.wiktionary.org/wiki/exemple) - Wiktionnary : https://dumps.wikimedia.org/frwiktionary/latest/frwiktionary-latest-pages-articles.xml.bz2 - 6866791 words - Need to be clean and parsed, could take some time (~2m30 I guess) - Searching for `{{S|nom|fr}}` may work? it's hard to read this big file (7.4G), I recommend using `less` - All included, I guess
Anri changed title from Potential dictionaries to Potential dictionary 2024-09-01 18:24:00 +02:00
Anri changed title from Potential dictionary to Potential dictionaries 2024-09-01 18:24:06 +02:00
Author
Owner

We could start with a basic .txt file at first, nailed it the app, and then explore more advanced dictionnaries

We could start with a basic `.txt` file at first, nailed it the app, and then explore more advanced dictionnaries
Anri pinned this 2024-09-01 19:53:49 +02:00
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: Anri/xtoyr#1
No description provided.