Spaces:

orionweller
/

human-mlm-clm-predictor

Runtime error

App Files Files Community

orionweller commited on Mar 4

Commit

8f1d1e1

verified ·

1 Parent(s): 7e7cce3

Update README.md

Browse files

Files changed (1) hide show

README.md +40 -39

README.md CHANGED Viewed

@@ -11,44 +11,45 @@ license: mit
 short_description: See if you can predict the masked tokens / next token!
 ---
-## MLM and NTP Testing App
-This Hugging Face Gradio space tests users on two fundamental NLP tasks:
-Masked Language Modeling (MLM) - Guess the masked words in a text
-Next Token Prediction (NTP) - Predict how a text continues
-#### Features
-Switch between MLM and NTP tasks with a simple radio button
-Adjust masking/cutting ratio to control difficulty
-Sample texts from the cc_news dataset (100 samples)
-Track and display user accuracy for both tasks
-Detailed feedback on answers
-#### How to Use
-##### For MLM Task
-Select "mlm" in the Task Type radio button
-Adjust mask ratio as desired (higher = more difficult)
-Click "New Sample" to get a text with [MASK] tokens
-Enter your guesses for the masked words, separated by spaces or commas
-Click "Check Answer" to see your accuracy
-##### For NTP Task
-Select "ntp" in the Task Type radio button
-Adjust cut ratio as desired (higher = more text is hidden)
-Click "New Sample" to get a partial text
-Type your prediction of how the text continues
-Click "Check Answer" to see your accuracy and the actual continuation
-#### Statistics
-The app keeps track of your accuracy for both tasks
-Click "Reset Stats" to start fresh
-#### Technical Details
-Uses HuggingFace's cc_news dataset (vblagoje/cc_news)
-Employs streaming to efficiently sample 100 documents
-Uses BERT tokenizer for consistent tokenization

 short_description: See if you can predict the masked tokens / next token!
 ---
+# MLM and NTP Testing App
+This Hugging Face Gradio space tests users on two fundamental NLP tasks:
+1. **Masked Language Modeling (MLM)** - Guess the masked words in a text
+2. **Next Token Prediction (NTP)** - Predict how a text continues
+## Features
+- Switch between MLM and NTP tasks with a simple radio button
+- Adjust masking/cutting ratio to control difficulty
+- Sample texts from the cc_news dataset (100 samples, limited to 2 sentences)
+- Track and display user accuracy for both tasks
+- Detailed feedback on answers
+- Token-by-token prediction for NTP task with immediate feedback
+## How to Use
+### For MLM Task
+1. Select "mlm" in the Task Type radio button
+2. Adjust mask ratio as desired (higher = more difficult)
+3. Click "New Sample" to get a text with [MASK] tokens
+4. Enter your guesses for the masked words, separated by spaces or commas
+5. Click "Check Answer" to see your accuracy
+### For NTP Task
+1. Select "ntp" in the Task Type radio button
+2. Adjust cut ratio as desired (higher = more text is hidden)
+3. Click "New Sample" to get a partial text
+4. Type your prediction for the next token/word
+5. Click "Check Answer" to see if you're correct
+6. Continue predicting the next tokens one by one
+## Statistics
+- The app keeps track of your accuracy for both tasks
+- Click "Reset Stats" to start fresh
+## Technical Details
+- Uses HuggingFace's mlfoundations/dclm-baseline-1.0-parquet dataset
+- Employs streaming to efficiently sample 100 documents
+- Uses BERT tokenizer for consistent tokenization
+- Limits samples to two sentences for better user experience