BERT Reward Model for CoT Filtering

A BERT-based regression model fine-tuned to predict similarity scores between SQL queries, reasoning chains (Chain-of-Thought), and natural language descriptions.

Model Description

This model is based on bert-base-uncased and has been fine-tuned for regression to predict similarity scores in the range [0, 1]. The model takes as input a concatenation of:

SQL query
Reasoning/Chain-of-Thought explanation
Predicted natural language description

And outputs a similarity score indicating how well the predicted NL matches the ground truth.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer

tokenizer = AutoTokenizer.from_pretrained("DarianNLP/bert_sequel_beagles")
model = AutoModelForSequenceClassification.from_pretrained(
    "DarianNLP/bert_sequel_beagles",
    num_labels=1,
    problem_type="regression"
)
model.eval()

# Prepare input
sql = "SELECT movie_title FROM movies WHERE movie_release_year = 1945"
reasoning = "think: The SQL selects the movie title..."
predicted_nl = "What was the most popular movie released in 1945?"

input_text = f"SQL: {sql}\nReasoning: {reasoning}\nNL: {predicted_nl}"

# Tokenize and predict
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
    # Apply sigmoid to get probability
    similarity_score = torch.sigmoid(outputs.logits).item()

print(f"Predicted similarity: {similarity_score:.3f}")

Training Details

Base Model: bert-base-uncased
Training Dataset: Custom CoT dataset with corruptions (7,342 examples)
Train/Val/Test Split: 75% / 12.5% / 12.5%
Training Loss: MSE (Mean Squared Error)
Evaluation Metrics:
- MSE: 0.0238
- MAE: 0.1229
- RMSE: 0.1543

Limitations

Maximum input length: 512 tokens (BERT's limit)
Trained on a specific domain (SQL to NL translation with CoT)
Performance may vary on out-of-domain data

Citation

If you use this model, please cite:

@misc{bert_cot_reward_model,
  title={BERT Reward Model for Chain-of-Thought Filtering},
  author={Darian Lee},
  year={2025},
}

Downloads last month: 63

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results

mse on Custom CoT Dataset
self-reported

0.024
mae on Custom CoT Dataset
self-reported

0.123
rmse on Custom CoT Dataset
self-reported

0.154

View on Papers With Code