[2024] [2023]
hey, I’m Ashok.
28.03.24
- overleaf docker container: github link.
- texstudio also works well,
sudo apt install texstudio
.
27.03.24
- trajectory stitching involves piecing together parts of the different trajectories.
- it helps offline rl match the performance of online rl.
- sub-optimal algorithms can be stitched to perform better than them.
26.03.24
- in latex,
\include{}
adds a new page, instead use\input{}
. - embedded firmware just means the arduino code.
24.03.24
- rpi pico supports micropython and its only 6$. ¯\_(ツ)_/¯
- its also dual core so it can multi-task.
simpla
package in SUMO does not work withlibsumo
.
23.03.24
STM32F103RB
has 128KB flash and 72MHz clock speed. It was about 14$.- micropython requires a minimum of 256KB flash.
- micro:bit v2 has 512KB flash, 128KB RAM, and 64MHz clock speed. It has nRF52 chip.
- micro:bit can be programmed using micropython.
22.03.24
- db9 is a serial port connector. db15 is a vga connector. T_T
21.03.24
- pdf on how to use rplidar on windows to scan the environment.
20.03.24
- nvim config is stored at
~/.config/nvim/init.vim
. - minimal vim/nvim config:
syntax on
set tabstop=4
set shiftwidth=4
set expandtab
set autoindent
set number
set ruler
- for error:
AttributeError: module 'tensorflow_probability' has no attribute 'substrates'
useimport tensorflow_probability.substrates.jax as tfp
. - Parse local TensorBoard data into pandas DataFrame
19.03.24
-
MAE loss is less sensitive to outliers.
-
MSE loss penalises large errors.
-
MAE is not differentiable whereas huber loss is better because its differentiable.
-
images:
- mae vs mse vs huber
- huber at different values of can become MSE or MAE.
-
-
in vim, switch between splits: Ctrl-W + [hjkl].
-
and reload the current file using
:e
. -
ai inference hardware is getting better. tenstorrent sells e150 for 75k inr (shipping included).
-
quantization reduces the size of the model and makes it less memory hungry.
18.03.24
- rpi pins max output is 3.3v.
- how to monitor the rpi temperature?
- is gpio cleanup necessary?
16.03.24
- gpio pin layout is actually this way:
-
5v to 3.3v converter: HW-122 (AMS1117-3.3).
-
the converter can be used for rpi to arduino serial communication.
15.03.24
- ring attention is useful for increasing the context size.
miniforge
works better on raspberry pi.- pinout.xyz for pin layout.
13.03.24
- UART is a serial communication protocol.
- Enabling serial on RPi 4:
sudo raspi-config
Interfacing Options
>Serial
>No
>Yes
- Reboot
- GPIO connections:
TX
of RPi toRX
of USB to TTLRX
of RPi toTX
of USB to TTLGND
of RPi toGND
of USB to TTL
minicom
can be used to access the serial console of RPi. (sudo apt install minicom
)minicom -b 115200 -o -D /dev/ttyUSB0
to start minicom with baud rate 115200 and device/dev/ttyUSB0
- disable hardware flow control in minicom using
Ctrl+A
>O
>Serial port setup
>F
>No
12.03.24
- the notes belong to different categories, can I use a LLM to classify them without any labels? Each bullet point is a note and the category is the label.
- the categories could be:
-
Embedded
-
ML
-
GPU/Infra
-
Programming
-
Latex
-
Unlabelled
-
11.03.24
- to reduce matplotlib xticks:
num_xticks = 5 # Number of x-ticks to show
step = len(time_steps) // num_xticks
plt.xticks(time_steps[::step], rotation=45, fontsize=15) # Set x-axis ticks to show only selected time steps
-
usb-c power delivery (pd) can deliver variable voltage and current using software negotiation.
-
power delivery trigger board can be used to negotiate power delivery and get a fixed voltage and current.
-
\usepackage{graphicx}
and\usepackage{subcaption}
for subfigures in latex.
10.03.24
- how to flash a blank
stm32f030f4p6
chip? - blinking led is the hello world of embedded systems
- today’s commit deletes the old format files.
nvidia-driver-350
is compatible withcuda-11.8
.nvidia-driver-250
is compatible withcuda-11.5
.- to switch display driver from
nvidia
tointel
, usenvidia-prime
:
sudo apt install nvidia-prime
sudo prime-select intel
- install cuda 11.8:
wget https://developer.download.nvidia.com/compute/cuda/repos/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run
and update path using:
$ export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
$ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64\
{LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
- when building cuda libraries using ninja if you get an error:
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
435 | function(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 | operator=(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’
then install gcc-10
and g++-10
:
sudo apt install gcc-10 g++-10
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 10
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-10 10
and update version:
Ubuntu 22.04.1 LTS
Cuda compilation tools, release 11.8, V11.8.89
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
g++ (Ubuntu 9.5.0-1ubuntu1~22.04) 11.3.0
-
bash_aliases
is a file to store aliases for bash commands such asexport PATH
andexport LD_LIBRARY_PATH
. -
To install pytorch with cuda support:
conda install pytorch=*=*cuda* cudatoolkit -c pytorch
09.03.24
- there’s no desktop ARM processors.
- a usb to ttl converter
pl2303hx
can be used to access the serial console of a raspberry pi. - ssh gives virtual console whereas serial console gives physical console.
- serial console doesn’t require wifi or hdmi.
arm
is alsorisc
.
08.03.24
- embedded languages: c, c++, rust
- rust can run bare metal on raspberry pi using
no_std
andno_main
crate-level attributes - bare metal can be used to run code without an operating system
07.03.24
- lora is duplex by default. It can send and receive at the same time.
- analog pins on arduino can be used as digital pins too.
- arduino D0 and D1 pins although set aside for TX and RX can also be used as digital pins.
05.03.24
- nvidia display driver is different from nvidia cuda driver.
- cuda version in
nvidia-smi
is not the installed version. nvcc --version
gives the installed cuda version.
04.03.24
- neo6m gps module connects to the satellite and gives the location in NMEA format.
- it has a cold start time of 27s and a hot start time of 1s. on my desk, it took 2-5 minutes to get a fix.
- once fixed, it saves it to the eeprom and can be retrieved on the next boot.
- the eepron battery is a coin cell.
03.03.24
einsum
is cool. It uses the Einstein summation convention to perform matrix operation.torch.einsum('ij,jk->ik', a, b)
is equivalent totorch.matmul(a, b)
- its drawbacks are that its not optimized on gpu (yet). Also doesn’t allow brackets in the expression.
>>> a = torch.rand(3, 5)
>>> a
tensor([[0.7912, 0.6213, 0.6479, 0.2060, 0.9857],
[0.9950, 0.7826, 0.6850, 0.6712, 0.0524],
[0.4367, 0.8872, 0.9622, 0.0159, 0.4960]])
>>> b = torch.rand(5, 3)
>>> b
tensor([[0.4560, 0.9680, 0.1179],
[0.9072, 0.8982, 0.2926],
[0.5526, 0.2779, 0.5810],
[0.4366, 0.8061, 0.0065],
[0.4744, 0.6915, 0.5326]])
>>> torch.einsum('ij,jk -> ik', a,b)
tensor([[1.8401, 2.3517, 1.1779],
[1.8601, 2.4338, 0.7766],
[1.7780, 1.8429, 1.1344]])
>>> torch.matmul(a, b)
tensor([[1.8401, 2.3517, 1.1779],
[1.8601, 2.4338, 0.7766],
[1.7780, 1.8429, 1.1344]])
stm32f030f4p6
as per the naming convention means:stm32
is the family of microcontrollersf
is the series = General purpose0
is the core count = ARM Cortex-M030
is the line numberf
is the pin count = 204
is the flash size = 16KBp
is the package type = TSSOP6
is the temperature range = -40 to 85 degree celsius
02.03.24
- The
stm32f030f4p6
chip is SMD and in TSSOP-20 footprint. - I also bought SMD to THT adapters which are called
breakout boards
and soldered the chip to it. - STM32 nucleo boards come with a built-in st-link programmer and debugger.
images:
stm32f030f4p6
soldered onto a breakout boardstm32f030f4p6
with rpi v4 for scale
01.03.24
- v100s has 5120 cuda cores and 640 tensor cores
- quadro rtx 5000 has 3072 cuda cores and 384 tensor cores
- tensor cores are more important for deep learning than cuda cores
- installing miniconda:
# install miniconda
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
source ~/.bashrc
- installing nvidia gpu drivers:
# install nvidia drivers
sudo apt update && sudo apt upgrade
sudo apt autoremove nvidia* --purge
ubuntu-drivers devices
sudo apt install nvidia-driver-525
sudo reboot
nvidia-smi
# install pytorch with cuda support
pip install torch torchvision torchaudio
- ICs come in different packages: DIP, SOP, QFP, TQFP
29.02.24
- softmax suffers from numerical instability due to floating point precision error
>>> import torch
>>> m = torch.nn.Softmax(dim=1)
>>> a = torch.tensor([[ 0.4981e3, 0.5018, -0.7310]])
>>> m(a)
tensor([[1., 0., 0.]])
- normalization is a way to solve numerical instability
>>> torch.nn.functional.normalize(a)
tensor([[ 1.0000, 0.0010, -0.0015]])
>>> m(torch.nn.functional.normalize(a))
tensor([[0.5762, 0.2122, 0.2117]])
28.02.24
- color sensors (TCS34725, TCS3200) can detect intensity of R,G,B individually
- because of open source, risc v is cheaper than arm and runs linux too
- microcontroller (arduino, stm32) vs single board computer (raspberry pi, beaglebone)
- models perform better when data is gaussian
27.02.24
warmup_step
hyperparameter lowers the learning rate for the first few steps and then increases it- transformer = encoder + decoder + attention
K
is the context window size in the attention mechanism which is the number of tokens that each token attends to.- attention in transformers has quadratic time complexity
- flash attention has linear time complexity
- An Attention Free Transformer also has linear time complexity
wandb
can be self-hosted too inside the docker container
26.02.24
- cpu architectures: x86, x86_64, arm, arm64, risc-v
- famous arm dev board: stm32
- risc-v is open source and is gaining popularity
- LuckFox Pico Plus RV1103 is a risc-v dev board with ethernet and can run linux
- softmax not summing to 1 T_T
- how to make LoRa full duplex?
25.02.24
- rl implementations: stable-baselines3
- cleanrl has single file implementations of rl algorithms
- tianshou is a pytorch based rl library
- Through Hole Technology (THT) vs Surface Mount Technology (SMT)
24.02.24
- Found this Machine Learning Theory Notes GDrive