Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
datablations
https://github.com/huggingface/datablations
Activity Feed
Request to join this org
Follow
20
AI & ML interests
Scaling Data-Constrained Language Models
Recent Activity
Muennighoff
submitted
a paper
about 2 hours ago
Composer 2 Technical Report
craffel
authored
a paper
3 months ago
TokSuite: Measuring the Impact of Tokenizer Choice on Language Model Behavior
thomwolf
authored
a paper
6 months ago
Robot Learning: A Tutorial
View all activity
Team members
9
datablations
's datasets
13
Sort: Recently updated
datablations/scripts
Viewer
•
Updated
Jun 15, 2023
•
3.48M
•
1.25k
datablations/oscar-subsets
Viewer
•
Updated
Jun 14, 2023
•
365k
•
490
datablations/c4-subsets
Viewer
•
Updated
Jun 14, 2023
•
729k
•
871
•
6
datablations/c4-filter-megatron
Updated
May 28, 2023
•
346
datablations/oscar-filter-megatron
Updated
May 27, 2023
•
190
datablations/python-megatron
Updated
May 22, 2023
•
8.29k
•
1
datablations/subsets
Viewer
•
Updated
May 10, 2023
•
365k
•
111
datablations/oscar-filter
Viewer
•
Updated
May 10, 2023
•
432M
•
756
datablations/oscar-dedup-expanded
Viewer
•
Updated
May 10, 2023
•
432M
•
140
datablations/mup
Updated
Apr 24, 2023
•
1.01k
datablations/c4-filter
Viewer
•
Updated
Feb 1, 2023
•
365M
•
254
datablations/c4-filter-small
Viewer
•
Updated
Jan 17, 2023
•
100k
•
35
datablations/oscar-filter-small
Viewer
•
Updated
Nov 24, 2022
•
100k
•
7