Open-Foundation-Models
/

PolyReLU_1B

@@ -1,46 +1,47 @@
----
-license: apache-2.0
-language:
-- en
-library_name: transformers
----
-# Introduction
-This repository contains the checkpoints of ICLR 2025 paper **[“Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models](https://arxiv.org/pdf/2411.03884)”.**
-In this work, we introduce a novel activation function called **Polynomial Composition (PolyCom)**, which enhances the expressiveness of large language models (LLMs) through dynamic polynomial compositions. Our method significantly improves the performance of dense and mixture of experts (MoE) models across a variety of downstream tasks, without adding significant computational overhead.
-# Datasets and Training
-We use the [RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) dataset and pretrain the PolyCom model on 250B tokens. For more training details, please refer to [the source code](https://github.com/BryceZhuo/PolyCom).
-# Inference
-Here is an example of how to use the PolyCom model for inference:
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model = AutoModelForCausalLM.from_pretrained(path_of_model, device_map="cuda",trust_remote_code=True)
-tokenizer = AutoTokenizer.from_pretrained(path_of_model, padding_side="right",trust_remote_code=True)
-prompt = "Hello, my name is"
-input_ids = tokenizer.encode(prompt, return_tensors='pt').to('cuda')
-greedy_output = model.generate(input_ids)
-print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))
-```
-# Citing this work
-If you find this work helpful or use it in your research, please consider citing our paper:
-```bibtex
-@inproceedings{zhuo2025polycom,
-  title={Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models},
-  author={Zhijian Zhuo and Ya Wang and Yutao Zeng and Xiaoqing Li and Xun Zhou and Jinwen Ma},
-  booktitle={ICLR 2025},
-  year={2025}
-}
 ```

+---
+language:
+- en
+library_name: transformers
+license: apache-2.0
+pipeline_tag: text-generation
+---
+# Introduction
+This repository contains the checkpoints of ICLR 2025 paper **[“Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models](https://arxiv.org/pdf/2411.03884)”.**
+In this work, we introduce a novel activation function called **Polynomial Composition (PolyCom)**, which enhances the expressiveness of large language models (LLMs) through dynamic polynomial compositions. Our method significantly improves the performance of dense and mixture of experts (MoE) models across a variety of downstream tasks, without adding significant computational overhead.
+# Datasets and Training
+We use the [RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) dataset and pretrain the PolyCom model on 250B tokens. For more training details, please refer to [the source code](https://github.com/BryceZhuo/PolyCom).
+# Inference
+Here is an example of how to use the PolyCom model for inference:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained(path_of_model, device_map="cuda",trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained(path_of_model, padding_side="right",trust_remote_code=True)
+prompt = "Hello, my name is"
+input_ids = tokenizer.encode(prompt, return_tensors='pt').to('cuda')
+greedy_output = model.generate(input_ids)
+print(tokenizer.decode(greedy_output[0], skip_special_tokens=True))
+```
+# Citing this work
+If you find this work helpful or use it in your research, please consider citing our paper:
+```bibtex
+@inproceedings{zhuo2025polycom,
+  title={Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models},
+  author={Zhijian Zhuo and Ya Wang and Yutao Zeng and Xiaoqing Li and Xun Zhou and Jinwen Ma},
+  booktitle={ICLR 2025},
+  year={2025}
+}
 ```