Researchers propose using Neural Cellular Automata (NCA) as a synthetic pre-pre-training data source for language models, bypassing the need for natural language text. NCA trajectories — generated by randomly sampled neural networks acting as transition rules on a grid — are tokenized and used to train transformers via
Table of contents
We're running out of textNeural Cellular Automata as synthetic fuelThe surprising payoffWhat drives the transfer?A purer training signalBeyond one-size-fits-allCitationSort: