LLMs from scratch - Rust
This project aims to provide Rust code that follows the incredible text, Build An LLM From Scratch by Sebastian Raschka. The book provides arguably the most clearest step by step walkthrough for building a GPT-style LLM. Listed below are the titles for each of the 7 Chapters of the book.
- Understanding large language models
- Working with text data
- Coding attention mechanisms
- Implementing a GPT model from scratch to generate text
- Pretraining an unlabeled data
- Fine-tuning for classification
- Fine-tuning to follow instructions
The code (see associated github repo) provided in the book is all written in PyTorch (understandably so). In this project, we translate all of the PyTorch code into Rust code by using the Candle crate, which is a minimalist ML Framework.
Usage
The recommended way of using this project is by cloning this repo and using Cargo to run the examples and exercises.
# SSH
# HTTPS
It is important to note that we use the same datasets that is used by Sebastian
in his book. Use the command below to download the data in a subfolder called
data/
which will eventually be used by the examples and exercises of the book.
Navigating the code
Users have the option of reading the code via their chosen IDE and the cloned repo, or by using the project's docs.
NOTE: The import style used in all of the examples
and exercises
modules are
not by convention. Specifically, relevant imports are made under the main()
method
of every Example
and Exercise
implementation. This is done for educational
purposes to assist the reader of the book in knowing precisely what imports are
needed for the example/exercise at hand.
Running Examples
and Exercises
After cloning the repo, you can cd to the project's root directory and execute
the main
binary.
# Run code for Example 05.07
# Run code for Exercise 5.5
If using a cuda-enabled device, you turn on the cuda feature via the --features cuda
flag:
# Run code for Example 05.07
# Run code for Exercise 5.5
Listing Examples
To list the Examples
, use the following command:
A snippet of the output is pasted below.
| | |
+==============================================================================+
| | |
||
| | |
||
| | |
||
| | |
||
| | |
| | |
||
| | |
| | |
||
| | |
Listing Exercises
One can similarly list the Exercises
using:
# first few lines of output
| | |
+==============================================================================+
| | |
| | |
| | |
| | |
| |
[Alternative Usage] Installing from crates.io
Alternatively, users have the option of installing this crate directly via
cargo install
(Be sure to have Rust and Cargo installed first. See
here for
installation instructions.):
Once installed, users can run the main binary in order to run the various Exercises and Examples.
# Run code for Example 05.07
# Run code for Exercise 5.5