• Author: AdjectivesToDescribeAPerson.com Team
  • Date: 2025-11-21
  • Abstract: This article documents how we rebuilt our data infrastructure from black-box sources to verified foundations, sourcing adjective definitions from WordNet and Wiktionary while using Tatoeba and human-reviewed AI examples for authentic context—all with transparent, commercial-friendly licensing.

Data Accuracy Overhaul: From Black-Box to Verified Data

What Happened

When we first launched, our adjective definitions and examples came from aggregated public sources. Convenient, yes. Transparent? No. We knew this wasn’t sustainable if we wanted users to trust the platform.

So we rebuilt our entire data layer from the ground up.

The Old Problem

Our original approach had a fundamental flaw: we couldn’t tell you exactly where each definition came from. Users had to take our word that the information was accurate. That’s not transparency. That’s just hoping people believe you.

For a platform designed to help people find precise language, imprecise sourcing was unacceptable.

The New Foundation

We now source everything from three authoritative databases, each with clear commercial licensing:

1. Definitions: WordNet + Wiktionary

WordNet  (Princeton University’s Cognitive Science Laboratory) is our primary source. It’s used across academia and industry because its structure is rigorous and its definitions are precise.

When WordNet’s coverage gaps appear, Wiktionary  fills them. It’s community-maintained but brings cultural nuance that rigid databases miss.

Every definition now displays its source. You know exactly where it came from.

2. Examples: Three-Layer Sourcing

Tatoeba  gives us real-world usage. Over 10 million sentences in 400+ languages. We filter these algorithmically to find authentic examples where each adjective describes people—not cherry-picked marketing language, actual usage.

Wiktionary provides curated formal examples and idiomatic patterns that show how educated writers use these words.

AI-generated examples (manually reviewed by our team) fill remaining gaps. Every generated example gets human approval before it goes live.

3. Licensing: Everything Is Commercial-Friendly

SourceLicenseStatus
WordNet BSD License✅ Free for commercial use
Wiktionary CC BY-SA 4.0 & GFDL✅ Commercial use permitted with attribution
Tatoeba CC BY 2.0 France✅ Commercial use permitted with attribution

We’re explicit about attribution because that’s what trust looks like.

Technical Changes

  • WordNet queries use the Natural library  to pull exact adjective definitions
  • Wiktionary scraping We built scrapers and processors to handle this, respects the Wikimedia Foundation’s terms by maintaining proper User-Agent headers and request rates
  • Tatoeba processing filters millions of sentences through NLP to find person-relevant contexts

The details matter because cutting corners on technical implementation undermines the whole point.

Why This Matters

For accuracy: You’re reading adjective definitions from authoritative databases. Not our judgment alone, but established linguistic sources

For context: Examples come from real usage, not corporate copy. You see how words actually function in writing and conversation.

For confidence: Every source is commercial-licensed and clearly attributed. You can use our recommendations without wondering about legal or ethical issues.

For trust: We rebuilt this not because we had to, but because “good enough” shouldn’t exist when precision is the whole point.

What’s Next

This rebuild isn’t a one-time event. It’s the foundation for everything we do. Every new adjective, every refinement, follows this same standard.

We’re also monitoring our data sources. When Wiktionary  updates. When Tatoeba  expands. When linguistic consensus shifts, we shift with it.


Have questions about our data sources? Check our contact page. Questions about specific definitions or examples? We want to hear them.

Learn more about our ongoing commitment to data quality in our data quality policy.


Published: 2025-11-21