# OpenRubrics/RubricRM-8B-Judge This is a 8B RubricRM-Judge model, finetuned from [Qwen3/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B). ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "OpenRubrics/RubricRM-8B-Judge" tok = AutoTokenizer.from_pretrained(model_id, use_fast=True) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto") ``` To evaluate the model, please use the following format to build up message. Here `rubric` should be generated with a `RubricRM-Rubric` ```python JUDGE_PROMPT_TEMPLATE = ( f"You are a fair and impartial judge. Your task is to evaluate 'Response A' and 'Response B' " f"based on a given instruction and a rubric. You will conduct this evaluation in distinct " f"phases as outlined below.\n\n" f"### Phase 1: Compliance Check Instructions\n" f"First, identify the single most important, objective 'Gatekeeper Criterion' from the rubric.\n" f"- **A rule is objective (and likely a Gatekeeper) if it can be verified without opinion. " f"Key examples are: word/paragraph limits, required output format (e.g., JSON validity), " f"required/forbidden sections, or forbidden content.**\n" f"- **Conversely, a rule is subjective if it requires interpretation or qualitative judgment. " f"Subjective rules about quality are NOT Gatekeepers. Examples include criteria like \"be creative,\" " f"\"write clearly,\" \"be engaging,\" or \"use a professional tone.\"**\n\n" f"### Phase 2: Analyze Each Response\n" f"Next, for each Gatekeeper Criterion and all other criteria in the rubric, evaluate each " f"response item by item.\n\n" f"### Phase 3: Final Judgment Instructions\n" f"Based on the results from the previous phases, determine the winner using these simple rules. " f"Provide a final justification explaining your decision first and then give your decision.\n\n" f"---\n" f"### REQUIRED OUTPUT FORMAT\n" f"You must follow this exact output format below.\n\n" f"--- Compliance Check ---\n" f"Identified Gatekeeper Criterion: \n\n" f"--- Analysis ---\n" f"**Response A:**\n" f"- Criterion 1 [Hard Rule]: Justification: <...>\n" f"- Criterion 2 [Hard Rule]: Justification: <...>\n" f"- Criterion 3 [Principle]: Justification: <...>\n" f"- ... (and so on for all other criteria)\n\n" f"**Response B:**\n" f"- Criterion 1 [Hard Rule]: Justification: <...>\n" f"- Criterion 2 [Hard Rule]: Justification: <...>\n" f"- Criterion 3 [Principle]: Justification: <...>\n" f"- ... (and so on for all other criteria)\n\n" f"--- Final Judgment ---\n" f"Justification: <...>\n" f"Winner: \n\n\n" f"Task to Evaluate:\n" "Instruction:\n{instruction}\n\n" "Rubric:\n{rubric}\n\n" "Response A:\n{response_a}\n\n" "Response B:\n{response_b}" ) user_text = JUDGE_PROMPT_TEMPLATE.format( instruction=instruction, rubric=rubric, response_a=response_a, response_b=response_b ) messages_list = [ {"role": "user", "content": user_text}, ] message = tok.apply_chat_template( messages_list, tokenize=False, add_generation_prompt=True, enable_thinking=False ) # Remaining step: Use either HF or vLLM for evaluation. # ... # ... ``` If you fidn our work helpful, please consider citing our paper: ``` @misc{liu2025openrubricsscalablesyntheticrubric, title={OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment}, author={Tianci Liu and Ran Xu and Tony Yu and Ilgee Hong and Carl Yang and Tuo Zhao and Haoyu Wang}, year={2025}, eprint={2510.07743}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2510.07743}, } ```