These are notes from an introductory workshop on generating text with large language models.

Approaches Text Generation

  • Markov chains (Stochastic/Probabilistic)
  • RNNs (Recurrent Neural Networks)
  • LSTMs (Long Short-Term Memory [Type of RNN])
  • Transformers (see “Attention is All your Need”)

Large Language Models

  • Transformer Architecture
  • Trained Huge Data sets (16GB to 745+GB)
  • Long Training time (355 GPU-Years on Tesla V100 GPU)
  • Expensive (~$4,600,000 in computing costs)
  • Re-usable

Further Reading:

Data (and Biases)

  • Trained on the internet
    • Common Crawl (
    • WebText2
    • Digitized Books
    • Wikipedia
    • The Pile (
  • Garbage in Garbage out
  • Known Racial, Gendered, and Religious Biases


Surprising Performance

  • Illusion of meaning
  • Translation
  • Text transformation
  • Tuning

Getting Started with Colab

  • Virtual environment
  • Interactive
  • Shareable
  • Free GPU

Workshop Notebook:

Black Boxed

  • Nobody knows how it works
  • Predictive
  • Mot explanatory

Environmental Cost

  • Energy intensive
    • Massive data
    • Long Training time
  • GPT-3 produced an estimated 552 metric tons of carbon dioxide
  • Roughly equivalent to the emissions of 120 cars over the course of a year

Further Reading:

Privacy Risk

  • Based on data scraped without permission
  • Reverse-engineered to disclose sensitive information
  • Can be queried to uncover training data

Further Reading:

Other Resources


Leave a Reply

Your email address will not be published. Required fields are marked *