Data Accuracy Overhaul: From Black-Box to Verified Data
What Happened
When we first launched, our adjective definitions and examples came from aggregated public sources. Convenient, yes. Transparent? No. We knew this wasn’t sustainable if we wanted users to trust the platform.
So we rebuilt our entire data layer from the ground up.
The Old Problem
Our original approach had a fundamental flaw: we couldn’t tell you exactly where each definition came from. Users had to take our word that the information was accurate. That’s not transparency. That’s just hoping people believe you.
For a platform designed to help people find precise language, imprecise sourcing was unacceptable.
The New Foundation
We now source everything from three authoritative databases, each with clear commercial licensing:
1. Definitions: WordNet + Wiktionary
WordNet  (Princeton University’s Cognitive Science Laboratory) is our primary source. It’s used across academia and industry because its structure is rigorous and its definitions are precise.
When WordNet’s coverage gaps appear, Wiktionary  fills them. It’s community-maintained but brings cultural nuance that rigid databases miss.
Every definition now displays its source. You know exactly where it came from.
2. Examples: Three-Layer Sourcing
Tatoeba  gives us real-world usage. Over 10 million sentences in 400+ languages. We filter these algorithmically to find authentic examples where each adjective describes people—not cherry-picked marketing language, actual usage.
Wiktionary provides curated formal examples and idiomatic patterns that show how educated writers use these words.
AI-generated examples (manually reviewed by our team) fill remaining gaps. Every generated example gets human approval before it goes live.
3. Licensing: Everything Is Commercial-Friendly
| Source | License | Status |
|---|---|---|
| WordNet  | BSD License | ✅ Free for commercial use |
| Wiktionary  | CC BY-SA 4.0 & GFDL | ✅ Commercial use permitted with attribution |
| Tatoeba  | CC BY 2.0 France | ✅ Commercial use permitted with attribution |
We’re explicit about attribution because that’s what trust looks like.
Technical Changes
- WordNet queries use the Natural library  to pull exact adjective definitions
- Wiktionary scraping We built scrapers and processors to handle this, respects the Wikimedia Foundation’s terms by maintaining proper User-Agent headers and request rates
- Tatoeba processing filters millions of sentences through NLP to find person-relevant contexts
The details matter because cutting corners on technical implementation undermines the whole point.
Why This Matters
For accuracy: You’re reading adjective definitions from authoritative databases. Not our judgment alone, but established linguistic sources
For context: Examples come from real usage, not corporate copy. You see how words actually function in writing and conversation.
For confidence: Every source is commercial-licensed and clearly attributed. You can use our recommendations without wondering about legal or ethical issues.
For trust: We rebuilt this not because we had to, but because “good enough” shouldn’t exist when precision is the whole point.
What’s Next
This rebuild isn’t a one-time event. It’s the foundation for everything we do. Every new adjective, every refinement, follows this same standard.
We’re also monitoring our data sources. When Wiktionary  updates. When Tatoeba  expands. When linguistic consensus shifts, we shift with it.
Have questions about our data sources? Check our contact page. Questions about specific definitions or examples? We want to hear them.
Learn more about our ongoing commitment to data quality in our data quality policy.
Published: 2025-11-21