LedgerBERT-Market-Sentiment

Model Description

Model Summary

LedgerBERT-Market-Sentiment is a fine-tuned version of LedgerBERT (https://huggingface.co/ExponentialScience/LedgerBERT) specialized for sentiment analysis of cryptocurrency and DLT-related content. The model classifies text into three market direction sentiment categories: bullish (positive market outlook), bearish (negative market outlook), and neutral (balanced or unclear market direction).

This model is particularly effective for analyzing cryptocurrency news headlines, social media posts, and other DLT-related content where understanding market sentiment is important.

Model type: BERT-base encoder for sequence classification
Language: English
License: Creative Commons Attribution-NonCommercial 4.0 International (CC-BY-NC 4.0)
Base model: LedgerBERT (ExponentialScience/LedgerBERT)
Fine-tuning dataset: DLT-Sentiment-News (23,301 examples)
Task: Multi-class sentiment classification (3 classes)

Model Architecture

Architecture: BERT-base for sequence classification
Parameters: 110 million
Hidden size: 768
Number of layers: 12
Attention heads: 12
Vocabulary size: 30,522 (SciBERT vocabulary)
Max sequence length: 512 tokens
Output: 3-class logits (bullish, bearish, neutral)

Intended Uses

Primary Use Cases

This model is designed for sentiment analysis tasks in the cryptocurrency and DLT domain:

Market sentiment analysis: Analyzing sentiment in cryptocurrency news articles, headlines, and market commentary
Social media monitoring: Understanding market direction sentiment in tweets, Reddit posts, and forum discussions
News aggregation: Automatically categorizing cryptocurrency news by market sentiment
Research applications: Studying sentiment trends and their relationship to market dynamics
Content filtering: Organizing DLT content based on market outlook

Example Applications

# Analyzing news headlines
"Bitcoin surges to new all-time high" → Bullish
"Ethereum faces regulatory scrutiny" → Bearish
"Stablecoin market remains stable" → Neutral

# Social media sentiment
"To the moon! 🚀" → Bullish
"Another crypto winter incoming" → Bearish
"Waiting for clear market direction" → Neutral

Out-of-Scope Uses

Investment decisions: This model should NOT be used as the sole basis for making investment or trading decisions
Financial advice: Not suitable for providing personalized financial or investment recommendations
Real-time trading: Should not be used for automated high-frequency trading systems
Market manipulation: Must not be used to coordinate or facilitate market manipulation
General sentiment analysis: Optimized for market direction sentiment; may not perform well on general emotional sentiment

Training Details

Training Data

The model was fine-tuned on the DLT-Sentiment-News dataset, which contains:

Size: 23,301 examples
Tokens: 1.85 million tokens (average 79.51 tokens per example)
Temporal coverage: January 2021 to May 2025
Source: CryptoPanic platform cryptocurrency news headlines and descriptions
Labels: Crowdsourced votes from active cryptocurrency community members
Classification method: Percentile-based labeling (25th and 75th percentiles as boundaries)

Label distribution by sentiment dimension:

Market Direction: bullish, bearish, neutral

The dataset provides domain expertise through crowdsourced annotations from cryptocurrency users, making the labels more relevant than general crowdworker annotations.

Note: News articles are absent from the DLT-Corpus used to pre-train LedgerBERT, making this an out-of-domain generalization test that demonstrates the model's robust language understanding.

For more details on the dataset used for tine-tuning, see: https://huggingface.co/datasets/ExponentialScience/DLT-Sentiment-News

Training Procedure

Fine-tuning hyperparameters:

Epochs: 3
Learning rate: 2×10⁻⁵
Warmup steps: 500
Batch size: 8 per device (training and evaluation)
Train/test split: 90% training, 10% testing
Optimizer: AdamW with fused operations
Precision: bfloat16
Max sequence length: 512 tokens (tokenizer default)
Truncation: Enabled
Padding: Enabled

Limitations and Biases

Known Limitations

Temporal lag: Not suitable for real-time sentiment analysis; trained on historical data (2021-2025)
Context dependency: Headlines and descriptions lack full article context, which may affect sentiment interpretation
Language coverage: English only; does not support other languages
Sarcasm and irony: May struggle with nuanced language common in cryptocurrency discourse (e.g., "HFSP" - Have Fun Staying Poor)
Evolving terminology: Cryptocurrency memes and terminology evolve rapidly; may not capture newest slang
Market volatility: Sentiment can change rapidly after news publication; static predictions may become outdated quickly

Potential Biases

The model may reflect biases present in the training data:

Platform bias: Data from CryptoPanic users only; may not represent broader market sentiment
User bias: Active crypto community members may have different perspectives than general investors
Temporal bias: Training data spans 2021-2025, reflecting specific market conditions (bull markets, bear markets, crypto winters)
Source bias: Certain news sources or cryptocurrencies may be over-represented in the training data
Geographic bias: English-language news sources are over-represented
Market condition bias: Dataset reflects specific market cycles that may not generalize to all conditions

Data Collection Biases

Vote manipulation: Despite quality filters, coordinated voting on the source platform cannot be completely ruled out
Minimum vote threshold: Filtering by median votes may exclude less popular but valid sentiment signals
Percentile-based labeling: Classification boundaries (25th/75th percentiles) are somewhat arbitrary

How to Use

Basic Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "ExponentialScience/LedgerBERT-Market-Sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Example texts
texts = [
    "Bitcoin reaches new all-time high amid institutional adoption",
    "SEC announces crackdown on cryptocurrency exchanges",
    "Ethereum network upgrade proceeding as planned"
]

# Classify sentiment
for text in texts:
    inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True, padding=True)
    
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
        predicted_class = predictions.argmax(dim=-1).item()
    
    # Map to labels (adjust based on your label mapping)
    labels = ["bearish", "bullish", "neutral"]  # Order may vary
    sentiment = labels[predicted_class]
    confidence = predictions[0][predicted_class].item()
    
    print(f"Text: {text}")
    print(f"Sentiment: {sentiment} (confidence: {confidence:.3f})\n")

Batch Processing

from transformers import pipeline

# Create sentiment analysis pipeline
classifier = pipeline(
    "text-classification",
    model="ExponentialScience/LedgerBERT-Market-Sentiment",
    tokenizer="ExponentialScience/LedgerBERT-Market-Sentiment"
)

# Process multiple texts
texts = [
    "DeFi protocol launches new staking mechanism",
    "Major cryptocurrency exchange faces liquidity crisis",
    "Blockchain adoption continues in enterprise sector"
]

results = classifier(texts, truncation=True, max_length=512)

for text, result in zip(texts, results):
    print(f"Text: {text}")
    print(f"Sentiment: {result['label']} (score: {result['score']:.3f})\n")

Integration with News Feeds

import feedparser
from transformers import pipeline

# Initialize classifier
classifier = pipeline(
    "text-classification",
    model="ExponentialScience/LedgerBERT-Market-Sentiment"
)

# Example: Analyze cryptocurrency news feed
feed_url = "https://example-crypto-news.com/rss"
feed = feedparser.parse(feed_url)

for entry in feed.entries[:5]:  # Process first 5 entries
    title = entry.title
    result = classifier(title, truncation=True, max_length=512)[0]
    
    print(f"Headline: {title}")
    print(f"Market Sentiment: {result['label']} ({result['score']:.2%})")
    print(f"Link: {entry.link}\n")

Citation

If you use LedgerBERT-Market-Sentiment in your research, please cite:

@article{hernandez2025dlt-corpus,
  title={DLT-Corpus: A Large-Scale Text Collection for the Distributed Ledger Technology Domain},
  author={Hernandez Cruz, Walter and Devine, Peter and Vadgama, Nikhil and Tasca, Paolo and Xu, Jiahua},
  year={2025}
}

Related Resources

Base Model (LedgerBERT): https://huggingface.co/ExponentialScience/LedgerBERT
Training Dataset: https://huggingface.co/datasets/ExponentialScience/DLT-Sentiment-News
DLT-Corpus Collection: https://huggingface.co/collections/ExponentialScience/dlt-corpus-68e44e40d4e7a3bd7a224402

Additional Fine-tuned Models

LedgerBERT can also be fine-tuned for other sentiment dimensions available in the DLT-Sentiment-News dataset (https://huggingface.co/datasets/ExponentialScience/DLT-Sentiment-News):

Content Characteristics (liked, disliked, neutral)
Engagement Quality (important, lol, neutral)

Model Card Contact

For questions or feedback about LedgerBERT-Market-Sentiment, please open an issue on the GitHub repository: https://github.com/dlt-science/DLT-Corpus

⚠️ Important Disclaimer: This model is provided for research and educational purposes only. It should not be used as financial advice or as the sole basis for investment decisions. Cryptocurrency markets are highly volatile and unpredictable. Always conduct your own research and consult with qualified financial advisors before making investment decisions.

Downloads last month: 13

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ExponentialScience/LedgerBERT-Market-Sentiment

Base model

allenai/scibert_scivocab_cased

Finetuned

ExponentialScience/LedgerBERT