These are notes from an introductory workshop on generating text with large language models.

Approaches Text Generation

  • Markov chains (Stochastic/Probabilistic)
  • RNNs (Recurrent Neural Networks)
  • LSTMs (Long Short-Term Memory [Type of RNN])
  • Transformers (see “Attention is All your Need”)

Large Language Models

  • Transformer Architecture
  • Trained Huge Data sets (16GB to 745+GB)
  • Long Training time (355 GPU-Years on Tesla V100 GPU)
  • Expensive (~$4,600,000 in computing costs)
  • Re-usable

Further Reading: https://dl.acm.org/doi/pdf/10.1145/3442188.3445922

Data (and Biases)

  • Trained on the internet
    • Common Crawl (https://commoncrawl.org/)
    • WebText2
    • Digitized Books
    • Wikipedia
    • The Pile (https://pile.eleuther.ai/
  • Garbage in Garbage out
  • Known Racial, Gendered, and Religious Biases

Source: https://arxiv.org/pdf/2101.00027.pdf

Surprising Performance

  • Illusion of meaning
  • Translation
  • Text transformation
  • Tuning

Getting Started with Colab

  • colaboratory.google.com
  • Virtual environment
  • Interactive
  • Shareable
  • Free GPU

Workshop Notebook: https://colab.research.google.com/drive/1Y8thkankYotdrUs3_K1R96UDxSJ2e7p0?usp=sharing

Black Boxed

  • Nobody knows how it works
  • Predictive
  • Mot explanatory

Environmental Cost

  • Energy intensive
    • Massive data
    • Long Training time
  • GPT-3 produced an estimated 552 metric tons of carbon dioxide
  • Roughly equivalent to the emissions of 120 cars over the course of a year

Further Reading: https://arxiv.org/pdf/2104.10350.pdf

Privacy Risk

  • Based on data scraped without permission
  • Reverse-engineered to disclose sensitive information
  • Can be queried to uncover training data

Further Reading: https://ieeexplore.ieee.org/document/9152761

Other Resources

  • https://www.decontextualize.com/
  • https://ml4a.github.io/guides/
  • https://towardsdatascience.com/

Leave a Reply

Your email address will not be published. Required fields are marked *