Need help deploying AutoRefine-3B with a retrieval server in this HF Space

#1
by yrshi - opened

Hi @pngwn ,

I’m glad you’re here! I’m trying to deploy the AutoRefine-3B model from my NeurIPS 2025 paper, Search and Refine During Think: Facilitating Knowledge Refinement for Improved Retrieval-Augmented Reasoning.

My setup is similar to Search-R1: the agent answers user queries while interacting with a local retrieval server, which can require up to 80 GB of GPU memory and 70+ GB of disk space. I can successfully run both the server and demo in dev mode, but I’m not sure how to handle this when deploying it as a Space App.

Here’s what I’ve tried so far:

  • Uploaded my retrieval server code and related files to the Files section.
  • Started the retrieval server manually in dev mode and set my demo in app.py → but the deployed Space still seems to use the default app.py.
  • Attempted to upload the 70+ GB corpus via git-lfs → but the dev machine crashed and had to reboot — all uploaded data was lost afterward.

Could you please advise on how to deploy a model that depends on a local retrieval server with large corpus files?
Is such an operation currently supported on Hugging Face Spaces, or should I explore an alternative deployment setup?

Thanks so much for your help and support! 😀

Best,
Yaorui, 3rd-year PhD of USTC

Hey there! Thanks for reaching out.

My first instinct would be to separate the retrieval server from the inference server. You could try deploying two spaces. The inference side should be simple enough and that could connect to the retrieval server.

Spaces certainly isn’t optimised for this usecase specifically so you may have more luck exploring other deployment options dedicated for this task but if you wanted to try spaces then I would recommend using persistent storage (although 80GB is a lot and could be expensive ). You could use the persistent storage to ensure that you didn’t need to reupload on every restart or crash. That should help with the DX while you are iterating but dedicated tools may be more efficient both in terms of time and cost.

I’m certainly not an MLOps expert but I hope this is helpful and gives you somewhere to start looking!

Let me know if there is anything else I can help with!

Sign up or log in to comment