BERT Reward Model for CoT Filtering
A BERT-based regression model fine-tuned to predict similarity scores between SQL queries, reasoning chains (Chain-of-Thought), and natural language descriptions.
Model Description
This model is based on bert-base-uncased and has been fine-tuned for regression to predict similarity scores in the range [0, 1]. The model takes as input a concatenation of:
- SQL query
- Reasoning/Chain-of-Thought explanation
- Predicted natural language description
And outputs a similarity score indicating how well the predicted NL matches the ground truth.
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("DarianNLP/bert_sequel_beagles")
model = AutoModelForSequenceClassification.from_pretrained(
"DarianNLP/bert_sequel_beagles",
num_labels=1,
problem_type="regression"
)
model.eval()
# Prepare input
sql = "SELECT movie_title FROM movies WHERE movie_release_year = 1945"
reasoning = "think: The SQL selects the movie title..."
predicted_nl = "What was the most popular movie released in 1945?"
input_text = f"SQL: {sql}\nReasoning: {reasoning}\nNL: {predicted_nl}"
# Tokenize and predict
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
# Apply sigmoid to get probability
similarity_score = torch.sigmoid(outputs.logits).item()
print(f"Predicted similarity: {similarity_score:.3f}")
Training Details
- Base Model: bert-base-uncased
- Training Dataset: Custom CoT dataset with corruptions (7,342 examples)
- Train/Val/Test Split: 75% / 12.5% / 12.5%
- Training Loss: MSE (Mean Squared Error)
- Evaluation Metrics:
- MSE: 0.0238
- MAE: 0.1229
- RMSE: 0.1543
Limitations
- Maximum input length: 512 tokens (BERT's limit)
- Trained on a specific domain (SQL to NL translation with CoT)
- Performance may vary on out-of-domain data
Citation
If you use this model, please cite:
@misc{bert_cot_reward_model,
title={BERT Reward Model for Chain-of-Thought Filtering},
author={Darian Lee},
year={2025},
}
- Downloads last month
- 63
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Evaluation results
- mse on Custom CoT Datasetself-reported0.024
- mae on Custom CoT Datasetself-reported0.123
- rmse on Custom CoT Datasetself-reported0.154