BERT Reward Model for CoT Filtering

A BERT-based regression model fine-tuned to predict similarity scores between SQL queries, reasoning chains (Chain-of-Thought), and natural language descriptions.

Model Description

This model is based on bert-base-uncased and has been fine-tuned for regression to predict similarity scores in the range [0, 1]. The model takes as input a concatenation of:

  • SQL query
  • Reasoning/Chain-of-Thought explanation
  • Predicted natural language description

And outputs a similarity score indicating how well the predicted NL matches the ground truth.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer

tokenizer = AutoTokenizer.from_pretrained("DarianNLP/bert_sequel_beagles")
model = AutoModelForSequenceClassification.from_pretrained(
    "DarianNLP/bert_sequel_beagles",
    num_labels=1,
    problem_type="regression"
)
model.eval()

# Prepare input
sql = "SELECT movie_title FROM movies WHERE movie_release_year = 1945"
reasoning = "think: The SQL selects the movie title..."
predicted_nl = "What was the most popular movie released in 1945?"

input_text = f"SQL: {sql}\nReasoning: {reasoning}\nNL: {predicted_nl}"

# Tokenize and predict
inputs = tokenizer(input_text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
    outputs = model(**inputs)
    # Apply sigmoid to get probability
    similarity_score = torch.sigmoid(outputs.logits).item()

print(f"Predicted similarity: {similarity_score:.3f}")

Training Details

  • Base Model: bert-base-uncased
  • Training Dataset: Custom CoT dataset with corruptions (7,342 examples)
  • Train/Val/Test Split: 75% / 12.5% / 12.5%
  • Training Loss: MSE (Mean Squared Error)
  • Evaluation Metrics:
    • MSE: 0.0238
    • MAE: 0.1229
    • RMSE: 0.1543

Limitations

  • Maximum input length: 512 tokens (BERT's limit)
  • Trained on a specific domain (SQL to NL translation with CoT)
  • Performance may vary on out-of-domain data

Citation

If you use this model, please cite:

@misc{bert_cot_reward_model,
  title={BERT Reward Model for Chain-of-Thought Filtering},
  author={Darian Lee},
  year={2025},
}
Downloads last month
63
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results