LedgerBERT-Market-Sentiment
Model Description
Model Summary
LedgerBERT-Market-Sentiment is a fine-tuned version of LedgerBERT (https://huggingface.co/ExponentialScience/LedgerBERT) specialized for sentiment analysis of cryptocurrency and DLT-related content. The model classifies text into three market direction sentiment categories: bullish (positive market outlook), bearish (negative market outlook), and neutral (balanced or unclear market direction).
This model is particularly effective for analyzing cryptocurrency news headlines, social media posts, and other DLT-related content where understanding market sentiment is important.
- Model type: BERT-base encoder for sequence classification
- Language: English
- License: Creative Commons Attribution-NonCommercial 4.0 International (CC-BY-NC 4.0)
- Base model: LedgerBERT (ExponentialScience/LedgerBERT)
- Fine-tuning dataset: DLT-Sentiment-News (23,301 examples)
- Task: Multi-class sentiment classification (3 classes)
Model Architecture
- Architecture: BERT-base for sequence classification
- Parameters: 110 million
- Hidden size: 768
- Number of layers: 12
- Attention heads: 12
- Vocabulary size: 30,522 (SciBERT vocabulary)
- Max sequence length: 512 tokens
- Output: 3-class logits (bullish, bearish, neutral)
Intended Uses
Primary Use Cases
This model is designed for sentiment analysis tasks in the cryptocurrency and DLT domain:
- Market sentiment analysis: Analyzing sentiment in cryptocurrency news articles, headlines, and market commentary
- Social media monitoring: Understanding market direction sentiment in tweets, Reddit posts, and forum discussions
- News aggregation: Automatically categorizing cryptocurrency news by market sentiment
- Research applications: Studying sentiment trends and their relationship to market dynamics
- Content filtering: Organizing DLT content based on market outlook
Example Applications
# Analyzing news headlines
"Bitcoin surges to new all-time high" β Bullish
"Ethereum faces regulatory scrutiny" β Bearish
"Stablecoin market remains stable" β Neutral
# Social media sentiment
"To the moon! π" β Bullish
"Another crypto winter incoming" β Bearish
"Waiting for clear market direction" β Neutral
Out-of-Scope Uses
- Investment decisions: This model should NOT be used as the sole basis for making investment or trading decisions
- Financial advice: Not suitable for providing personalized financial or investment recommendations
- Real-time trading: Should not be used for automated high-frequency trading systems
- Market manipulation: Must not be used to coordinate or facilitate market manipulation
- General sentiment analysis: Optimized for market direction sentiment; may not perform well on general emotional sentiment
Training Details
Training Data
The model was fine-tuned on the DLT-Sentiment-News dataset, which contains:
- Size: 23,301 examples
- Tokens: 1.85 million tokens (average 79.51 tokens per example)
- Temporal coverage: January 2021 to May 2025
- Source: CryptoPanic platform cryptocurrency news headlines and descriptions
- Labels: Crowdsourced votes from active cryptocurrency community members
- Classification method: Percentile-based labeling (25th and 75th percentiles as boundaries)
Label distribution by sentiment dimension:
- Market Direction: bullish, bearish, neutral
The dataset provides domain expertise through crowdsourced annotations from cryptocurrency users, making the labels more relevant than general crowdworker annotations.
Note: News articles are absent from the DLT-Corpus used to pre-train LedgerBERT, making this an out-of-domain generalization test that demonstrates the model's robust language understanding.
For more details on the dataset used for tine-tuning, see: https://huggingface.co/datasets/ExponentialScience/DLT-Sentiment-News
Training Procedure
Fine-tuning hyperparameters:
- Epochs: 3
- Learning rate: 2Γ10β»β΅
- Warmup steps: 500
- Batch size: 8 per device (training and evaluation)
- Train/test split: 90% training, 10% testing
- Optimizer: AdamW with fused operations
- Precision: bfloat16
- Max sequence length: 512 tokens (tokenizer default)
- Truncation: Enabled
- Padding: Enabled
Limitations and Biases
Known Limitations
- Temporal lag: Not suitable for real-time sentiment analysis; trained on historical data (2021-2025)
- Context dependency: Headlines and descriptions lack full article context, which may affect sentiment interpretation
- Language coverage: English only; does not support other languages
- Sarcasm and irony: May struggle with nuanced language common in cryptocurrency discourse (e.g., "HFSP" - Have Fun Staying Poor)
- Evolving terminology: Cryptocurrency memes and terminology evolve rapidly; may not capture newest slang
- Market volatility: Sentiment can change rapidly after news publication; static predictions may become outdated quickly
Potential Biases
The model may reflect biases present in the training data:
- Platform bias: Data from CryptoPanic users only; may not represent broader market sentiment
- User bias: Active crypto community members may have different perspectives than general investors
- Temporal bias: Training data spans 2021-2025, reflecting specific market conditions (bull markets, bear markets, crypto winters)
- Source bias: Certain news sources or cryptocurrencies may be over-represented in the training data
- Geographic bias: English-language news sources are over-represented
- Market condition bias: Dataset reflects specific market cycles that may not generalize to all conditions
Data Collection Biases
- Vote manipulation: Despite quality filters, coordinated voting on the source platform cannot be completely ruled out
- Minimum vote threshold: Filtering by median votes may exclude less popular but valid sentiment signals
- Percentile-based labeling: Classification boundaries (25th/75th percentiles) are somewhat arbitrary
How to Use
Basic Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
model_name = "ExponentialScience/LedgerBERT-Market-Sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Example texts
texts = [
"Bitcoin reaches new all-time high amid institutional adoption",
"SEC announces crackdown on cryptocurrency exchanges",
"Ethereum network upgrade proceeding as planned"
]
# Classify sentiment
for text in texts:
inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = predictions.argmax(dim=-1).item()
# Map to labels (adjust based on your label mapping)
labels = ["bearish", "bullish", "neutral"] # Order may vary
sentiment = labels[predicted_class]
confidence = predictions[0][predicted_class].item()
print(f"Text: {text}")
print(f"Sentiment: {sentiment} (confidence: {confidence:.3f})\n")
Batch Processing
from transformers import pipeline
# Create sentiment analysis pipeline
classifier = pipeline(
"text-classification",
model="ExponentialScience/LedgerBERT-Market-Sentiment",
tokenizer="ExponentialScience/LedgerBERT-Market-Sentiment"
)
# Process multiple texts
texts = [
"DeFi protocol launches new staking mechanism",
"Major cryptocurrency exchange faces liquidity crisis",
"Blockchain adoption continues in enterprise sector"
]
results = classifier(texts, truncation=True, max_length=512)
for text, result in zip(texts, results):
print(f"Text: {text}")
print(f"Sentiment: {result['label']} (score: {result['score']:.3f})\n")
Integration with News Feeds
import feedparser
from transformers import pipeline
# Initialize classifier
classifier = pipeline(
"text-classification",
model="ExponentialScience/LedgerBERT-Market-Sentiment"
)
# Example: Analyze cryptocurrency news feed
feed_url = "https://example-crypto-news.com/rss"
feed = feedparser.parse(feed_url)
for entry in feed.entries[:5]: # Process first 5 entries
title = entry.title
result = classifier(title, truncation=True, max_length=512)[0]
print(f"Headline: {title}")
print(f"Market Sentiment: {result['label']} ({result['score']:.2%})")
print(f"Link: {entry.link}\n")
Citation
If you use LedgerBERT-Market-Sentiment in your research, please cite:
@article{hernandez2025dlt-corpus,
title={DLT-Corpus: A Large-Scale Text Collection for the Distributed Ledger Technology Domain},
author={Hernandez Cruz, Walter and Devine, Peter and Vadgama, Nikhil and Tasca, Paolo and Xu, Jiahua},
year={2025}
}
Related Resources
- Base Model (LedgerBERT): https://huggingface.co/ExponentialScience/LedgerBERT
- Training Dataset: https://huggingface.co/datasets/ExponentialScience/DLT-Sentiment-News
- DLT-Corpus Collection: https://huggingface.co/collections/ExponentialScience/dlt-corpus-68e44e40d4e7a3bd7a224402
Additional Fine-tuned Models
LedgerBERT can also be fine-tuned for other sentiment dimensions available in the DLT-Sentiment-News dataset (https://huggingface.co/datasets/ExponentialScience/DLT-Sentiment-News):
- Content Characteristics (liked, disliked, neutral)
- Engagement Quality (important, lol, neutral)
Model Card Contact
For questions or feedback about LedgerBERT-Market-Sentiment, please open an issue on the GitHub repository: https://github.com/dlt-science/DLT-Corpus
β οΈ Important Disclaimer: This model is provided for research and educational purposes only. It should not be used as financial advice or as the sole basis for investment decisions. Cryptocurrency markets are highly volatile and unpredictable. Always conduct your own research and consult with qualified financial advisors before making investment decisions.
- Downloads last month
- 13
Model tree for ExponentialScience/LedgerBERT-Market-Sentiment
Base model
allenai/scibert_scivocab_cased