FNE: Precise Single-Token Number Embeddings via Fourier Features

Tianyi Zhou, Deqing Fu, Mahdi Soltanolkotabi, Robin Jia, Vatsal Sharan
tzhou029@usc.edu, deqingfu@usc.edu, soltanol@usc.edu, robinjia@usc.edu, vsharan@usc.edu

Fourier Number Embedding (FNE)
directly maps numbers into their Fourier representations, bypassing the tokenization step entirely
with Better Efficiency and Accuracy.

Project animation — (a) Extract all numbers from the input.
(b) Use FNE to map each number to its embedding; the first two entries represent 18 mod 10, the next two 18 mod 100.
(c) Pad FNE with zeros, add it to word embeddings, and feed into the model.
(d) For each digit, take two entries from the last hidden state and find the closest number.

Empirical Results

We train Llama-3.2-1B from scratch with different number embedding methods and evaluate its performance on various arithmetic tasks. Our Fourier Number Embedding (FNE) method demonstrates significant improvements in both data efficiency and parameter efficiency, achieving 99% accuracy with 64× less data compared to traditional embeddings. It also outperforms fine-tuned Llama-3.2 models and achieves perfect accuracy.

Figure: Comparison of accuracy trends for various arithmetic tasks with respect to model size and data size.

Decimal Addition Data Accuracy — 6-digit Decimal Addition: Model & Data size vs. Accuracy

Decimal Addition Model Accuracy — 6-digit Decimal Addition: Model & Data size vs. Accuracy

6-digit Integer Addition Data Accuracy — 6-digit Integer Addition: Model & Data size vs. Accuracy

6-digit Integer Addition Model Accuracy — 6-digit Integer Addition: Model & Data size vs. Accuracy

5-digit Integer Subtraction Data Accuracy — 5-digit Integer Subtraction: Model & Data size vs. Accuracy

5-digit Integer Subtraction Model Accuracy — 5-digit Integer Subtraction: Model & Data size vs. Accuracy

3-digit Integer Multiplication Data Accuracy — 3-digit Integer Multiplication: Model & Data size vs. Accuracy

3-digit Integer Multiplication Model Accuracy — 3-digit Integer Multiplication: Model & Data size vs. Accuracy

4-digit Integer Multiplication Data Accuracy — 4-digit Integer Multiplication: Model & Data size vs. Accuracy

4-digit Integer Multiplication Model Accuracy — 4-digit Integer Multiplication: Model & Data size vs. Accuracy

Why Design Like This?

As discussed in our pervious work [Tianyi et al. (NeurIPS 2024)], LLMs naturally learn Fourier Features during pre-training. With these Fourier features, models are able to perform arithmetic with perfect accuracy. However, due to the limitation of tokenization, LLMs can only embed numbers up to 520.
Below, we provide a simplified illustration of how pre-trained LLMs embed numbers and how this leads to Fourier Number Embedding (FNE).

How to Cite

If you found this project useful, please cite our work as follows:

      @article{zhou2024fne,
        title={FNE: Precise Single-Token Number Embeddings via Fourier Features},
        author={Tianyi Zhou, Deqing Fu, Mahdi Soltanolkotabi, Robin Jia, Vatsal Sharan},
        journal={arXiv preprint arXiv:???},
        year={2025},
        url={???}
      }