Thought on architecture

#6
by sometimesanotion - opened

This is already one of my favorite models for its size! Could a derivative of it have dense layers appended to the head, to yield a model that can easily be finetuned without expert collapse? I see there are two dense layers at the foot.

Is that a sensible strategy for a model deployed on laptops and workstations? Is LiquidAI interested in making such models?

sometimesanotion changed discussion status to closed

Sign up or log in to comment