Sova — Public Discourse Intelligence

100k+

Community scopes

18mo

Production track record

2.1M

Records delivered

Public discourse intelligence

THE

SIGNAL

IN THE NOISE

Sova collects, structures, and enriches public discourse data at scale — delivering clean, annotated datasets your AI pipelines and research teams can actually use.

Request dataset View pricing

What we do

UNSTRUCTURED
DISCOURSE.
STRUCTURED
INTELLIGENCE.

Continuous ingestion

High-cadence collection across Reddit and a proprietary network of regional forums, specialized interest communities, and curated intelligence feeds. Scoped by community, keyword, topic cluster, engagement threshold, or geography signal.

Structured enrichment

Every document passes through multi-stage annotation — sentiment scoring, topic classification, entity extraction, and engagement metadata — before it reaches your pipeline.

Unmatched coverage depth

Broad global coverage with deep specialization in emerging markets and underrepresented regions. Discourse that doesn't exist in any commercial dataset — sourced, structured, and ready.

Flexible delivery

Parquet, NDJSON, REST, or direct data share to Snowflake, BigQuery, or S3. Format is a configuration, not a constraint.

The data layer

EVERY FIELD.
DOCUMENTED.

Every document in the pipeline passes through multi-stage annotation before delivery. Your team receives structured intelligence — not raw dumps requiring months of preprocessing.

Full schema documentation, data dictionaries, and sample records are provided before any purchase is confirmed. If the data doesn't perform on your benchmark, we don't want the contract.

Named entity extraction and dense vector embeddings are in active development — available now to early access partners on Enterprise tier.

FieldStatus

source

Origin network — reddit, regional_forum, intelligence_feed

Live

raw_structured

Content, author signals, engagement, timestamps

Live

sentiment_score

Polarity, confidence, intensity weighting per doc

Live

topic_labels

Multi-label taxonomy — standard or custom schema

Live

named_entities

Brands, products, people, locations — structured

Q3 2026

embeddings_dense

Per-doc vectors for RAG, clustering, similarity

Q3 2026

Live data preview

WHAT THE
DATA
ACTUALLY LOOKS LIKE.

Four source networks. One schema. Below is a representative slice of real enriched output — the same structure you'd receive on day one, with fictional authors and anonymised IDs.

reddit_africa_finance_posts.ndjsonsource: reddit · 3 of 2,841 records

"id": "af9q1r4",  "source": "reddit",  "subreddit": "r/Nigeria",  "flair": "Economy"
"title": "CBN raised rates again — what does this actually mean for small businesses?"
"score": 4821,  "upvote_ratio": 0.94,  "num_comments": 318
"created_utc": 1776420880,  "collected_at": "2026-04-18T07:06:31Z"
"sentiment_score": -0.42,  "topic_labels": ["monetary_policy","small_business","nigeria"]
────────────────────────────────────────────────────────
"id": "af9s7p2",  "source": "reddit",  "subreddit": "r/Kenya",  "flair": "Politics"
"title": "Gen Z protests changed the budget conversation — is it sticking?"
"score": 7203,  "upvote_ratio": 0.97,  "num_comments": 541
"created_utc": 1776466272,  "collected_at": "2026-04-18T07:06:31Z"
"sentiment_score": 0.61,  "topic_labels": ["civic_action","government","kenya"]
────────────────────────────────────────────────────────
"id": "af8m3n9",  "source": "reddit",  "subreddit": "r/Ethiopia",  "flair": "Development"
"title": "Addis tech scene is real now — here's what's actually being built"
"score": 3190,  "upvote_ratio": 0.93,  "num_comments": 204
"created_utc": 1776369987,  "collected_at": "2026-04-18T07:06:31Z"
"sentiment_score": 0.78,  "topic_labels": ["tech","startups","ethiopia"]

regional_forum_threads.ndjsonsource: regional_forum · 3 of 14,200 records

"id": "rf_7843921",  "source": "regional_forum",  "board": "Business"
"title": "How I turned ₦50k into a logistics business in 18 months — full breakdown"
"author_posts": 1842,  "replies": 214,  "views": 18400
"created_at": "2026-04-14T11:22:00Z",  "collected_at": "2026-04-18T06:00:00Z"
"sentiment_score": 0.83,  "topic_labels": ["entrepreneurship","logistics","nigeria","sme"]
────────────────────────────────────────────────────────
"id": "rf_7844103",  "source": "regional_forum",  "board": "Politics"
"title": "NNPC quarterly report dropped — Nigerians are reading it this time"
"author_posts": 6201,  "replies": 389,  "views": 31200
"created_at": "2026-04-15T08:45:00Z",  "collected_at": "2026-04-18T06:00:00Z"
"sentiment_score": -0.29,  "topic_labels": ["oil","government","accountability","nigeria"]
────────────────────────────────────────────────────────
"id": "rf_7845600",  "source": "regional_forum",  "board": "Technology"
"title": "MTN's API for developers is actually good — comparison with Airtel and Glo"
"author_posts": 920,  "replies": 97,  "views": 8800
"created_at": "2026-04-16T14:11:00Z",  "collected_at": "2026-04-18T06:00:00Z"
"sentiment_score": 0.55,  "topic_labels": ["telecom","developer_tools","nigeria"]

intelligence_feed_records.ndjsonsource: intelligence_feed · 3 of 6,100 records

"id": "if_bd_20260418_001",  "source": "intelligence_feed",  "outlet": "[business · NG]"
"title": "Nigeria's FX reserves cross $38bn for first time since 2020"
"country": "NG",  "published_at": "2026-04-18T06:30:00Z",  "collected_at": "2026-04-18T07:00:00Z"
"sentiment_score": 0.71,  "topic_labels": ["macroeconomics","forex","nigeria"]
────────────────────────────────────────────────────────
"id": "if_dn_20260418_044",  "source": "intelligence_feed",  "outlet": "[business · KE]"
"title": "Kenya's mobile money interoperability goes live — what changes for consumers"
"country": "KE",  "published_at": "2026-04-17T14:20:00Z",  "collected_at": "2026-04-18T07:00:00Z"
"sentiment_score": 0.64,  "topic_labels": ["fintech","mobile_money","kenya"]
────────────────────────────────────────────────────────
"id": "if_ea_20260418_017",  "source": "intelligence_feed",  "outlet": "[regional · MULTI]"
"title": "EAC trade volumes up 12% — Ethiopia's corridor is driving most of the growth"
"country": "MULTI",  "published_at": "2026-04-17T09:05:00Z",  "collected_at": "2026-04-18T07:00:00Z"
"sentiment_score": 0.58,  "topic_labels": ["trade","regional_economy","east_africa"]

unified_enriched.parquet · schemaAll live fields · all sources

Field                    Type         Status      Notes
────────────────────────────────────────────────────────────────────────────────
id                       string       ● live      Unique per source
source                   string       ● live      reddit | regional_forum | intelligence_feed
title                    string?      ● live      Thread/article title; null for comments
body                     string       ● live      Post text, comment body, or article summary
author_id                string?      ● live      Pseudonymised; null for news RSS
community                string?      ● live      Subreddit, Nairaland board, or outlet name
country                  string?      ● live      ISO-3166-1 alpha-2; inferred where absent
score                    int32?       ● live      Upvotes / views; null for news RSS
engagement               int32?       ● live      Replies or num_comments
published_at             timestamp    ● live      Original publish time (UTC)
collected_at             timestamp    ● live      Pipeline ingestion time (UTC)
sentiment_score          float32      ● live      Polarity: -1.0 → +1.0
sentiment_confidence     float32      ● live      Model confidence 0.0–1.0
topic_labels             list[str]    ● live      Multi-label; standard or custom taxonomy
named_entities           list[obj]    ◌ Q3 2026   {entity, type, salience}
embeddings_dense         float32[768] ◌ Q3 2026   Per-doc vectors for RAG / clustering

Packages

START SMALL.
SCALE FAST.

Brandwatch starts at $1,000/mo
Meltwater runs $6k–$15k/yr
Sova delivers more. For less.

One-time

SNAPSHOT

A single structured export built to your scope. Ideal for training runs, research, or market studies.

$299

per export · up to 100k records

Reddit + regional forum network
Custom scope & date range
Sentiment + topic enrichment
Parquet, JSON-L, or CSV
Schema docs included
5 day turnaround

Continuous

MONITOR

Scoped feed refreshed daily. Entry point for brand monitoring, consumer intelligence, and ongoing research.

$499

per month · no lock-in

Reddit + regional forum network
Up to 250k records/month
Daily delivery
Sentiment + topic enrichment
Scope adjustments included
Email support

Recommended

INTEL

High-volume continuous feed with full enrichment suite and dedicated support. For live AI pipelines and serious intelligence programs.

$1,800

per month · billed monthly

Full source network incl. intelligence feeds
Unlimited volume in scope
Hourly or real-time delivery
Full enrichment suite
Multiple concurrent scopes
Dedicated account contact
Priority SLA

Enterprise

PARTNER

White-label data supply for platforms reselling intelligence. Custom schemas, infrastructure, and SLAs built around your stack.

$5k+

per month · custom contract

Unlimited scope & volume
White-label output
Custom enrichment models
Snowflake / BigQuery share
Dedicated infrastructure
Uptime SLA + QBRs

Contact

EVERY DEAL
STARTS WITH A
SAMPLE.

We send a sample before you commit to anything.

Tell us your scope and use case. We return a representative sample dataset within 48 hours — benchmark the quality before any decisions are made.

Email hello@sova-data.xyz

Response time Within 48 hours

Samples Always before purchase

Starts at $299 one-time · $499/mo

UNSTRUCTURED DISCOURSE. STRUCTURED INTELLIGENCE.

Continuous ingestion

Structured enrichment

Unmatched coverage depth

Flexible delivery

EVERY FIELD.DOCUMENTED.

WHAT THEDATAACTUALLY LOOKS LIKE.

START SMALL.SCALE FAST.

EVERY DEALSTARTS WITH ASAMPLE.

We send a sample before you commit to anything.

UNSTRUCTURED
DISCOURSE.
STRUCTURED
INTELLIGENCE.

EVERY FIELD.
DOCUMENTED.

WHAT THE
DATA
ACTUALLY LOOKS LIKE.

START SMALL.
SCALE FAST.

EVERY DEAL
STARTS WITH A
SAMPLE.