[2024] [2023]
  • digital garden local server commands:
 npx quartz build --serve --port 8081    

Artificial Intelligence for Social Good (AISG)

It is focused on the application of Artificial Intelligence (AI) technologies and techniques to address societal and humanitarian challenges, with the ultimate goal of improving the well-being of individuals and communities. It is closely aligned with the 17 United Nations’ Sustainable Development Goals (SDGs). Some of them are:

  1. SDG3 - Good Health and Well-Being
  2. SDG 4 - Quality Education
  3. SDG11 - Sustainable Cities and Communities

deep learning

  • Frameworks: Tensorflow / PyTorch / JAX
  • It usually requires a good GPU.

JAX is a new deep learning framework that uses GPU and TPU to accelerate computations.

The number of trainable parameters in a Neural Networks is called the model capacity. The model capacity is a measure of how much information the model can store. A model with a high capacity can learn more complex functions. However, a model with a high capacity is more likely to over-fit the training data.

Tensorflow is a deep learning framework developed by Google. It’s a Python library that allows you to define and run computations involving tensors. Tensors are multi-dimensional arrays of numbers. They are the basic data structure in Tensorflow.

In tensorflow, the model capacity can be calculated using the code from here.

Here’s a quick and easy way to benchmark your hardware using Tensorflow.

# Source your virtual environment
source ~/venv/bin/activate
 
# Install Tensorflow
pip install tensorflow
 
# Download script
curl https://raw.githubusercontent.com/abreheret/tensorflow-models/master/tutorials/image/mnist/convolutional.py -o model.py
 
# Make script compatible with Tensorflow 2.0
sed -i 's/import tensorflow as tf/import tensorflow.compat.v1 as tf\ntf.disable_eager_execution()/g' model.py
 
# Run script
python model.py

Now you can see how long it takes to run the script at each step on your hardware. Here’s my output:

Step 0 (epoch 0.00), 139.6 ms
Minibatch loss: 8.334, learning rate: 0.010000
Minibatch error: 85.9%
Validation error: 84.6%
Step 100 (epoch 0.12), 5.9 ms
Minibatch loss: 3.229, learning rate: 0.010000
Minibatch error: 6.2%
Validation error: 8.0%
Step 200 (epoch 0.23), 5.7 ms
Minibatch loss: 3.371, learning rate: 0.010000
Minibatch error: 10.9%

Here’s the results from my various hardware:

HardwareTime per step
Raspberry Pi 4460 ms
m1 mac mini60 ms
GeForce 2080TI5 ms
NVIDIA V100s4 ms
  • hydra for managing configuration files.

  • nccl is the backend for multi-GPU training.

  • tensorboard for visualizing training metrics. Install it using pip install tensorboard.

  • Distributed Data Parallel for multi-GPU training.

  • Loss functions:

In pytorch, the model capacity can be calculated using the code from here.

from prettytable import PrettyTable
 
def count_parameters(model):
    table = PrettyTable(["Modules", "Parameters"])
    total_params = 0
    for name, parameter in model.named_parameters():
        if not parameter.requires_grad:
            continue
        params = parameter.numel()
        table.add_row([name, params])
        total_params += params
    print(table)
    print(f"Total Trainable Params: {total_params}")
    return total_params
    
count_parameters(net)
  • If you get a CUDA error: no kernel image is available for execution on the device, a possible reason could be the mismatch between system’s CUDA version and torch’s CUDA version. Ensure that both are same by running nvidia-smi and torch.version.cuda.

  • If you get a Unexpected key(s) in state_dict: ... type error, a possible reason could be that you have used the DataParallel module for making the code run on multi-GPU environment and the line model = torch.nn.DataParallel(model) adds a module key to the state_dict so to solve it ensure that DataParallel comes before torch.load.

model = yourModel(**cfg["model_args"])
model_path = '/home/path'
model = torch.nn.DataParallel(model)
model.load_state_dict(torch.load(model_path))

Tensorflow can use a GPU to accelerate computations. To enable GPU support, you need to install the GPU version of Tensorflow. You can do this by running:

conda install -c anaconda tensorflow-gpu
conda create --name tf_gpu tensorflow-gpu
conda activate tf_gpu

and now you can run Tensorflow on your GPU.

sudo dpkg -i cuda-repo-ubuntu1810-10-1-local-10.1.105-418.39_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-<version>/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda
 
sudo find / -name 'libcudart.so*' # this should return the required cuda version
tar -xvf cudnn-linux-x86_64-8.x.x.x_cudaX.Y-archive.tar.xz
sudo cp cudnn-*-archive/include/cudnn*.h /usr/local/cuda/include 
sudo cp -P cudnn-*-archive/lib/libcudnn* /usr/local/cuda/lib64 
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

Torch can also GPU to accelerate computations. Installation instructions:

pip install torch torchvision torchaudio

and then test it out using:

import torch
 
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
1
 
>>> torch.cuda.current_device()
0
 
>>> torch.cuda.device(0)
<torch.cuda.device at 0x7efce0b03be0>
 
>>> torch.cuda.get_device_name(0)
'Tesla V100S-PCIE-32GB'

Monitor GPU usage and temperature using gpustat -cp --watch which can be installed using pip install gpustat.

NVIDIA GPUs have a compute capability associated with each of them, here’s the ones that I have tried and their compute capability:

NVIDIA GPUCompute Capability (higher is better)
RTX A50008.6
GTX 1660 TI7.5
V1007.0
Jetson Nano3.5

Apple GPUs can be used for deep learning using the Metal Performance Shaders (MPS) framework. PyTorch comes with a mps backend to support the same.

devops

Self-hosting is a great way to learn about DevOps.

  • Google Cloud Platform (GCP)

Installing gsutil on Linux:

curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-437.0.1-linux-x86_64.tar.gz
tar -xf google-cloud-cli-437.0.1-linux-x86_64.tar.gz
chmod +x ./google-cloud-sdk/install.sh
./google-cloud-sdk/install.sh

CNN

CNN stands for Convolutional Neural Network. Compared to the traditional NNs that uses pre-defined features, CNNs use filters to extract features from an image.

Downsampling the image is useful for saving memory and computation costs. Pooling layer is used to reduce the size of the image. It is done by taking the maximum value of the pixels in a window.

Backpropagation is used to update the weights of the network by moving in the direction of the better output.

Convolutational layer is used to extract features from the image. It is done by sliding a filter over the image and multiplying the filter with the image. The filter is then moved to the next pixel and the process is repeated. The output of the convolutional layer is a feature map.

Traditional CNNs loose spatial information when the image is compressed. This is because the image is flattened into a vector. The encoder-decoder architecture is used to preserve the spatial information of the image. The encoder is a CNN that compresses the image into a vector. The decoder is a CNN that decompresses the vector into an image. The encoder and decoder are trained together. Traditional CNNs are used as the encoder and the decoder is a CNN with the same architecture as the encoder but with the weights reversed.

CNNs are the best for image inputs. Transformers are the best for text inputs.

docker

  • Installing Docker on a Linux machine
curl -fsSL https://get.docker.com -o install-docker.sh
sh install-docker.sh --dry-run
sudo sh install-docker.sh
  • Building a Docker image
docker build -t <image-name> .
  • Running a Docker image
docker run -it <image-name>
  • Tips:

  • Use Build Cache when building images iteratively to retrieve cached layers and save time.

  • docker-compose is a tool for defining and running multi-container Docker applications. It is used to run multiple containers at the same time.

  • healthcheck is a command that is run periodically to check the health of the container and restart it if it is unhealthy. The healthcheck can also be used to check if the container is ready to accept connections and start the other containers that depend on it.

grafana

Grafana is a tool for visualizing time series data. It is a web application that can be used to display data from Prometheus and Loki.

Grafana Cloud is a good hosted solution for managing and organazing log files. I use it for monitoring my Raspberry Pi server. It is free for up to 10,000 log lines per day.

  • Setup

Grafana provides a node exporter for monitoring the Raspberry Pi. The exporter is a simple Go program that exposes system metrics via HTTP. It is available for Linux, Windows, and macOS.

  • Install Node Exporter

Grafana Cloud also provides a Linux integration for monitoring the Raspberry Pi. The integration is a simple bash script that runs the node exporter and sends the metrics to Grafana Cloud. It is available for Debian, Ubuntu, and Raspbian.

Kubernetes

Kubernetes is a container orchestrator that is useful for running the docker images without downtime.

certbot is a tool that is used to generate SSL certificates for the domains. It is used to generate the certificates for the domains that are used in the kubernetes cluster.

metalLB is a load balancer that is used to load balance the traffic to the services that are running in the cluster. It is used when the cluster is running on bare metal.

k3s is a lightweight kubernetes distribution that is used to run the kubernetes cluster on a single node. It is used to run the kubernetes cluster on the Raspberry Pi. k3s comes preinstalled with traefik that occupies the role of the ingress controller at port 80 and 443, hence these ports are not available for other services.

I have deployed the following services in the cluster:

linux

  • Distributions (Distros)

A relatively easier way to different variant flavours of linux is using distributions. Some of the popular ones are Mint, Ubuntu, Debian, CentOS/RHEL, Arch, Gentoo and Slackware.

  • Cronjobs are effective for scheduling tasks for a later time. They are also a great way to automate tasks that need to be done regularly. I use cronguru to help me write them. It’s also a great way to learn how they work.

  • List the users using less /etc/passwd. The first column is the username.

  • Window Manager

i3wm is a good tiling window manager. It is very customizable and has a lot of features. It is also very lightweight. I use the i3-gaps version of i3wm. It can be installed on Ubuntu from a PPA.

sudo add-apt-repository ppa:regolith-linux/release
sudo apt update
sudo apt install i3-gaps

Ubuntu 22.04 Jammy Jellyfish does not have a PPA for i3-gaps. It can be installed from source. First, install the necessary dependencies:

sudo apt install libxcb1-dev libxcb-keysyms1-dev libpango1.0-dev \
  libxcb-util0-dev libxcb-icccm4-dev libyajl-dev \
  libstartup-notification0-dev libxcb-randr0-dev \
  libev-dev libxcb-cursor-dev libxcb-xinerama0-dev \
  libxcb-xkb-dev libxkbcommon-dev libxkbcommon-x11-dev \
  autoconf libxcb-xrm0 libxcb-xrm-dev automake libxcb-shape0-dev

Next, clone the i3-gaps repository and build it:

:::caution The installation guide is outdated. The autoreconf command fails,

autoreconf --force --install
rm -rf build
mkdir build
cd build
../configure --prefix=/usr --sysconfdir=/etc
make
sudo make install

with the following error:

autoreconf: error: 'configure.ac' is required
-bash: ../configure: No such file or directory
make: *** No targets specified and no makefile found.  Stop.
make: *** No rule to make target 'install'.  Stop.

:::

Instead, use:

cd /tmp
git clone https://www.github.com/Airblader/i3.git i3-gaps
cd i3-gaps
git checkout gaps && git pull
sudo apt install meson asciidoc
meson -Ddocs=true -Dmans=true ../build
meson compile -C ../build
sudo meson install -C ../build

:::danger RDP won’t work with i3wm by default. I ended up installing ubuntu-desktop and changing the default WM from GNOME to i3wm.

:::

  • dwm is another tiling manager made by suckless org.
  • I use vim for light text editing and VSCode for more complex tasks. I also use vim for editing files on the server.

  • I like using SSH key based authentication. It is more secure than using passwords. I use ssh-copy-id to copy my SSH key to the server. It is available on Ubuntu by default.

  • To check if a port is in use

It will show the process ID (PID) of the process using the port.

sudo lsof -i -P -n | grep ${PORT}

tmux is a terminal multiplexer. It allows you to create multiple windows and panes in a single terminal. It is very useful for running multiple processes at the same time.

  • Installing color themes on Gnome terminal My favourite theme is Gruvbox. Here’s the colors for it:

I use this friendly utility, Gogh, to install the theme on my terminal emulator.

  • Controlling CPU and GPU fan speed

:::note

TODO.

:::

  • Printing

cupsd is a type of scheduler that is used to manage printers. Printing can be notoriusly difficult to set up. Some of the issues that I have faced are (along with their solution):

  • No suitable Destination Host found by cups-browsed, 1325417

  • How to delete empty directories?

First, get a list of empty directories:

find . -type d -empty -print

Then, delete them:

find . -type d -empty -delete
  • Tools

  • rsync can be used for syncing files between source and destination.

  • scp can be used to copy files between server and local pc over ssh.

Machine learning

Machine learning is about using existing data to make the machine learn to perform tasks at a human-level of efficiency. It involves the use of traditional algorithms to make predictions.

Some common subsets:

  • Supervised Learning
  • Semi-supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning
  • Deep Learning
  • Transfer Learning

Machine learning involves changing numbers in a function (possibly randomly, possibly in a structured way) to make some output number as good as you can get it. You’re either trying to minimise or maximise the output number. That output number is how “good” the machine learning algorithm is. Frequently it will be the error rate for some decision problem and you want to minimise the error rate (make as few mistakes as possible).


  • Supervised Learning

Further classification:

  • regression: simple linear | multiple linear | ridge | lasso
  • classification: logistic
  • k-nearest neighbor
  • naive bayes
  • linear discriminant analysis
  • support vector machine
  • decision trees
  • total_error = bias^2 + variance
  • overfitting = high variance
  • underfitting = high bias

to avoid this:

  • cross-validation: leave-one out (LOO) | k-fold
  • multi-layer perceptron
  • feed-forward neural network

  • Unsupervised Learning

  • clustering methods: k-mean | k-medoid | heirarchical


  • Scikit-learn

Scikit-learn is a Python library for traditional machine learning.

  • Numpy

If you face this error:

AttributeError module 'numpy' has no attribute 'object'

A quick and easy workaround is to downgrade to numpy v1.23 by

pip install numpy==1.23

which still supports np.object.

m1 mac stuff

  • Installing ubuntu on m1 mac Currently, I am using multipass to install ubuntu 20.04 on my m1 mac mini.

  • Connecting to ubuntu instance through SSH instance of multipass shell This requires generating a SSH keypair on my host system and then copying the public key to the ubuntu instance.

  • Mounting the home directory to the ubuntu instance This is done using multipass mount $HOME docker command. This mounts the home directory to the home directory in the ubuntu instance.

  • Minor bug with mounting On macOS Monterey, the mount command does not give the correct permissions. Go to System Preferences -> Security & Privacy -> Privacy -> Full Disk Access and add multipassd to the list of applications. This should fix the issue.

  • Accessing GUI using RDP

  1. Install xrdp and ubuntu-desktop on the ubuntu instance using sudo apt install ubuntu-desktop xrdp.
  2. Set password for the default user using sudo passwd ubuntu.
  3. Get the IP address from multipass list.
  4. Access the GUI using Microsoft Remote Desktop app on macOS.
  • Installing python on m1 mac

I use Anaconda for virtual environments, which provides the conda command to install python packages. Due to the ARM architecture of m1 mac, Tensorflow support has been buggy for a while. It would give the following error:

Illegal instruction: 4 (SIGKILL)

when installing tensorflow using conda install tensorflow.

  • Installing Tensorflow on m1 mac

I tried various methods to install tensorflow and here’s the one that worked.

conda install anaconda=2022.05
conda create -n tf tensorflow
conda activate tf
  • Monospace font on m1 mac I use Fira Code as my monospace font. I installed it using brew tap homebrew/cask-fonts && brew install --cask font-fira-code, followed by pasting the following lines in vscode settings.json:
"editor.fontFamily": "Fira Code",
"editor.fontLigatures": true,
  • Enabling hibernation on m1 mac Running
sudo pmset -a hibernatemode 25

enables hibernation. The hibernatemode can be set to 0, 3, or 25. 0 means that the system will not hibernate and will only sleep. 3 means that the system will hibernate when the battery is low. 25 means that the system will hibernate when the battery is low and the system is sleeping.

The default value is 0 on mac mini. To check the current value, run pmset -g | grep hibernatemode.

  • Notes
  • NTFS formatted drives are not supported by the m1 mac. I had to format my external hard drive to FAT32 to be able to use it on my m1 mac.

traj prediction

Class of problem: Spatio-temporal sequence to sequence generation task

Assumption: The vehicle can be represented as a single point wherein the points location is the centroid of the location. The line passing through the centroid is called the centerline.

  • K = number of trajectories to predict

Datasets:

  • Nuscenes
  • Argoverse 1
  • Argoverse 2
  • Waymo

Argoverse 1 Baselines:

  • Constant Velocity
  • Nearest Neighbor
  • LSTM
  • LSTM (with map prior)

Feature Engineering:

  • Social Features
  • Map Features / Spatial Features

Metrics:

  • minFDE
  • minADE

Simulators:

  • highway-env (low fidelity)
  • waymax by Waymo Research
  • nocturne by Facebook Research
  • CARLA (high fidelity)

latex

  • The text written under an equation is called an \underbrace
  • To write multiple lines in a subscript, use \underbrace{...}_{\substack{\text{Some long text that} \\ \text{should be multiline}}}.

math

  • Expectation

Expectation is the weighted average of a random variable. It is denoted by and is defined as:

For problems where P(X=x) is a constant, we can simplify the equation to:

where is the probability of each outcome. This is similar to mean.

Expectation is distributive over addition:

Law of total expectation states that the expectation of a random variable is the weighted average of the expectation of that random variable under different distributions.

  • Bellman Equation

Bellman’s principle of optimality states that the whatever be the initial state and initial action, the remaining actions will constitute an optimal policy stemming from the initial state and initial action. Essentially,

This implies that our equation has an optimal substructure since the optimal policy for the entire problem is the optimal policy for the subproblem.

multi armed bandits

  • Multi-Armed Bandit (MAB) is a problem in which a fixed limited set of resources must be allocated between competing (alternative) choices in a way that maximizes their expected gain, when each choice’s properties are only partially known at the time of allocation, and may become better understood as time passes or by allocating resources to it. We wish to minimize cumulative regret and maximize cumulative reward.

  • The exploration-exploitation dilemma is the problem of balancing between exploiting what is already known and exploring what is not known. This is a problem in multi-armed bandits.

  • Greedy Policy
  • Greedy policy is a policy that always takes the action with the highest expected reward. It is a greedy algorithm. It is the simplest policy to implement. It is also the most naive policy. It is not always the best policy. Greedy is suboptimal because it does not consider future rewards and exploits only one action.
  • Epsilon Greedy Policy
  • Epsilon-greedy policy is a policy that takes a random action with probability epsilon and takes the greedy action with probability 1-epsilon.

  • Policy gradient methods are a class of reinforcement learning algorithms that can directly optimize the policy function used by the agent to select actions. They are an alternative to value-based methods such as Q-learning, which optimize a value function that estimates future rewards. Policy gradient methods are based on the idea that the policy should be updated in a direction that increases the expected total reward.

  • Soft max is a function that takes a vector of K real numbers, and normalizes it into a probability distribution consisting of K probabilities proportional to the exponentials of the input numbers. It is used in the softmax policy.

  • UCB Algorithm
  • UCB algorithm gives a confidence bound for each action value. The bound gets stronger as the number of times the action is selected increases. It stops selecting actions that don’t have have a good value.

  • Hoeffding’s inequality is an inequality in probability theory that relates the probability of the difference between the expected value of a random variable and its observed value to the variance of the random variable. It is used in the UCB algorithm.

  • Derivation of the UCB algorithm is here.

  • Bayesian UCB (BUCB)
  • Bayesian bandits keep track of the model’s distribution, by modelling the reward distribution as a beta distribution.

  • Bayesian UCB is a variant of the UCB algorithm that uses a beta distribution to model the reward distribution.

networking

  • netstat -i
  • ip a or ip addr

activation functions

  • they are used to introduce non-linearity into the neural network. They squash the values between a min and a max. The most common activation functions are sigmoid, tanh, relu, and softmax. Generally, tanh is better than sigmoid.

optimization

Optimisation is a sub-field of AI that focuses on finding the best solution (e.g., maximising or minimising a particular objective) for a given problem, often subject to constraints.

  • “Gradient descent” is fancy math wording for “if you you want to find the bottom, go downhill”.
  • Gradient is a Math operator that takes in a bunch of numbers and spits out a vector that points in the direction the numbers increase fastest. That’s mathematically the “most uphill” direction. If you go opposite that, you’re going downhill, that’s gradient descent.
  • Run a bunch of versions of your algorithm, so you get a bunch of output numbers. Figure out the gradient of those output numbers. Then go the opposite direction and that will tend to be towards the minimum output value. Then do that over and over again until you can’t get the output number any smaller. You’ve found the “local minimum”, anywhere you go from there is “up”.
  • This may or may not be the “global minimum”, the best possible value you can find. This is analogous to standing in a valley and going downhill until you hit the stream at the bottom, but there’s a valley next door that’s even deeper. You’d have to have started in that other valley to find that minimum. Finding local minima is relatively easy, finding global minima (and proving they’re global) is difficult. Hence we find the local and, if it’s good enough (particularly common in machine learning), we call that good.

python

  • What is python good for?

I use Python for most of my programming tasks because of its simplicity and readability. Its good for scripting, automation, data science, and machine learning.

My commonly used libraries are:

  • numpy

  • pandas

  • matplotlib

  • scikit-learn

  • tensorflow

  • keras

  • pytorch

  • How to run shell commands from Python?

os.system is an easy way to run shell commands from Python. It is available in the os module. It is not recommended to use it for security reasons.

import os
os.system("ls -l")

A better way to run shell commands is to use the subprocess module. It is more secure and allows you to capture the output of the command.

import subprocess
subprocess.run(["ls", "-l"])
  • How to install dependencies and ignore errors?
cat requirements.txt | xargs -n 1 pip install
  • How to list the installed packages in the current directory?
pip list
  • What are the different ways to install Python packages?

  • conda works great but has issues on some platforms

  • mamba is a rewrite of conda and shows improvements for other people

raspberry pi

  • I run DietPi on my Raspberry Pi 4. The GPIO pin layout is available at pinout.xyz. I use a 5v fan to cool the Pi along with a 5v 2A power supply. The ideal temperature is 40-50 degrees Celsius and the pi shuts off at 60 degrees Celsius to prevent damage to the CPU.

RNN

  • perceptron without recurrence y(t) = f(x(t))

  • perceptron with recurrence (RNNs) of 1 step memory ; y(t) = f(x(t), h(t-1))

  • 1 step memory isn’t enough for long term dependencies

  • Backpropogation through time

  • Exploding gradient problem

  • Vanishing gradient problem

  • Long Short Term Memory (LSTM)

  • developed by Schmidhuber

  • type of RNN

  • solves vanishing gradient problem

  • has a memory cell

  • has 3 gates: input, output, forget

  • has a hidden state

  • has a cell state

  • uses sigmoid and tanh activation functions

  • outperforms RNNs and MLPs in many tasks

  • Limitations

  • Encoding issues

  • Slow, not parallel

  • Small memory

Building an autonomous mobile robot (AMR) from scratch

The robot is built using the following components:

  • Arduino Uno R3, for sensor interfacing
  • Nvidia Jetson Nano, for wifi connectivity and processing power
  • L298N Motor Driver, for controlling the motors
  • 4x DC Motors + 4x Wheels, for locomotion
  • Ultrasonic Sensor, for obstacle detection
  • 10000 mAh Power Bank, for power supply
  • Acrylic Sheet, for the chassis
  • Jumper Wires, for connecting the components
  • Capacitor, for preventing voltage spikes
  • Interfacing between Arduino and Jetson Nano

The goal is to send sensor data from the Arduino to the Jetson Nano, and receive motor control signals from the Jetson Nano to the Arduino. This is done using the Serial Communication Protocol.

  • sustainable transportation

The goal of sustainable transportation is to reduce the environmental footprint of transportation systems.

Key components of sustainable transportation include:

  1. Reduce air pollution, greenhouse gas emission, and habitat disruption.
  2. Reduce dependency on fossil fuels.
  3. Alleviate traffic congestion to lower emissions.
  4. Use technology to predict the traffic flow which can help in re-routing and minimising the resource consumption.
  5. Design economically sustainable transportation initiates such as autonomous mobile robots that are environment friendly logistics and mobility solutions.

Our work at the Intelligent Computation and Complex Networks (ICCN) Lab has focused on the same.

Dataset:

  1. SIND: A Drone Dataset at Signalized Intersection in China
  2. CitySim: A Drone-Based Vehicle Trajectory Dataset for Safety Oriented Research and Digital Twins

reinforcement learning

  • What is Reinforcement Learning?

Reinforcement Learning is a set of methods that “learn how to behave (optimally)” in an environment, whereas MDP is a formal representation of such environment. In practice, environments are usually partially observable and are called partially observable Markov decision processes (POMDPs).

RL can be used for optimization problems only when the environment is markovian.


  • Markovian chain states that the probability of next state depends only on the current state and the action taken and not on the sequence of events that preceded it. This property was given by Andrey Markov in 1906.
  • Markov Decision Process (MDP) is a mathematical framework for modeling decision making in situations where outcomes follow Markovian property. If the problem can be well described as a MDP, then reinforcement learning (RL) may be a good framework to use to find solutions. Conversely, if the problem cannot be mapped onto a MDP, then the theory behind RL makes no guarantees of any useful result.

  • A POMDP tuple is defined as:

    • is the set of states
    • is the set of actions
    • is the transition function
    • is the reward function
    • is the discount factor
  • In practice, POMDPs are computational intractable. Therefore, we use RL to approximate the solution.

  • Every reinforcement learning (RL) problem consists of:

    • State space
    • Action set
    • Reward function
    • Discount factor
  • The transition function is not explicitly given. It is learned by the agent through interaction with the environment.

  • History in RL is a sequence of states and actions.

:::tip Any goal can be formalized as the objective of maximizing a cumulative reward provided the reward function accutely reflects the goal. :::

:::caution There’s no supervision or supervised learning in RL, reward and values define the state and actions to take. :::

  • A mapping from states to actions is called a policy.

  • Policies are stochastic, meaning that they assign probabilities to each action.

  • Value function has a discount factor, which is a number between 0 and 1. It is used to discount future rewards. The discount factor is used to balance the importance of immediate rewards and future rewards. A discount future of 0 means that only immediate rewards are considered. A discount factor of 1 means that all future rewards are considered equally. This led to the Bellman equation.

  • Therefore,the value function is the expected reward from the current state.

  • The value function contains the model that will predict the next action.

  • Use of Neural Networks in RL

  • When the state space is big, we can use a function approximator to approximate the value function. This is called function approximation. This allows us to use Deep Learning to approximate the value function. This is called a neural network approximator. A game of Go has possible states. A moving helicopter has continuous state space hence we need to use a function approximater.

  • RL experience is not i.i.d. because the next state depends on the current state and the action taken.

  • The agent policy affects the data it receives. This is due to the active nature of RL.

  • SOTA keywords

  • Dueling Deep Q Networks (DDQN)

  • Double Deep Q Networks (DDQN)

  • Deep Q Networks (DQN)

  • Policy Gradient (PG)

  • Actor Critic (AC)

  • Advantage Actor Critic (A2C)

  • Asynchronous Advantage Actor Critic (A3C)

  • Proximal Policy Optimization (PPO)

  • Trust Region Policy Optimization (TRPO)

:::tip Atari Enduro requires a Dueling Deep Q Networks (DDQN) since the next state is undeterministic. :::

The traditional Q-learning model is weak since it requires access to the Q-table. Instead, we can use a Neural Network to approximate the Q-table. This is called Deep Q-Learning. The network takes in the state as input and outputs the Q-values for each action. The NN estimates the possible actions from the state provided.

  • History

A relatively newer algorithm is Asynchronous Advantage Actor Critic (A3C).

  • This algorithm is an extension of the Actor-Critic algorithm.
  • The Actor-Critic algorithm is an extension of the Policy Gradient algorithm.
  • The Policy Gradient algorithm is an extension of the REINFORCE algorithm.
  • The REINFORCE algorithm is an extension of the Monte Carlo algorithm.
  • The Monte Carlo algorithm is an extension of the Dynamic Programming algorithm.
  • The Dynamic Programming algorithm is an extension of the Bellman Equation.
  • According to the Bellman Equation, long-term reward in a given action is equal to the reward from the current action combined with the expected reward from the future actions taken at the following time.

The Bellman Equation is given by:

  • Applications

Intelligent Traffic Control Systems is an application of Reinforcement Learning. The goal is to maximize the throughput of the traffic system. SUMO is a popular simulator for ITCS.

Cognitive radio networks is another application of Reinforcement Learning. The goal is to maximize the throughput of the network while minimizing the interference. Its an application of multi-armed bandits.

transformers

  • Sequence modelling: given input and output sequences, find weights that can predict the output tokens from the input tokens.

where Y is the set of output tokens and X is the input tokens.

  • temporal relations

  • attention

  • Transformers were initially used for NLP with the goal of replacing RNNs in seq2seq models (e.g. machine translation). The input tokens and output tokens belonged to distinct vocabularies, and the models were trained to maximize the likelihood of the output sequence given the input sequence.

  • The main advantage of Transformers is that they are very good at capturing long-range dependencies. This is because the attention mechanism allows each output token to attend to all input tokens. In contrast, RNNs can only attend to the previous tokens in the sequence.

Types of transformers:

  • Encoder-Decoder Transformer This is the original transformer architecture.

  • Decoder-only Transformer Also called Autoregressive Transformer

  • Decision Transformer Here, the long-range dependencies are between the tokens in the past time steps and the current token. The tokens in the future time steps are not relevant for the prediction task.

For discrete action spaces, the output layer is a softmax layer with the error being the cross-entropy loss. For continuous action spaces, the output layer is a linear layer with the error being the mean squared error.

There is no positional encoding of sine and cosine functions in the input layer.

web

Frameworks: