[2024] [2023]

hey, I’m Ashok.

Email / CV / Google Scholar / GitHub / LinkedIn

28.03.24

  • overleaf docker container: github link.
  • texstudio also works well, sudo apt install texstudio.

27.03.24

  • trajectory stitching involves piecing together parts of the different trajectories.
  • it helps offline rl match the performance of online rl.
  • sub-optimal algorithms can be stitched to perform better than them.

26.03.24

  • in latex, \include{} adds a new page, instead use \input{}.
  • embedded firmware just means the arduino code.

24.03.24

  • rpi pico supports micropython and its only 6$. ¯\_(ツ)_/¯
  • its also dual core so it can multi-task.
  • simpla package in SUMO does not work with libsumo.

23.03.24

  • STM32F103RB has 128KB flash and 72MHz clock speed. It was about 14$.
  • micropython requires a minimum of 256KB flash.
  • micro:bit v2 has 512KB flash, 128KB RAM, and 64MHz clock speed. It has nRF52 chip.
  • micro:bit can be programmed using micropython.

22.03.24

  • db9 is a serial port connector. db15 is a vga connector. T_T

21.03.24

  • pdf on how to use rplidar on windows to scan the environment.

20.03.24

  • nvim config is stored at ~/.config/nvim/init.vim.
  • minimal vim/nvim config:
syntax on
set tabstop=4
set shiftwidth=4
set expandtab
set autoindent
set number
set ruler

19.03.24

  • MAE loss is less sensitive to outliers.

  • MSE loss penalises large errors.

  • MAE is not differentiable whereas huber loss is better because its differentiable.

  • images:

    1. mae vs mse vs huber
    2. huber at different values of can become MSE or MAE.
mae vs mse vs huber huber at different d
  • in vim, switch between splits: Ctrl-W + [hjkl].

  • and reload the current file using :e.

  • ai inference hardware is getting better. tenstorrent sells e150 for 75k inr (shipping included).

  • quantization reduces the size of the model and makes it less memory hungry.

18.03.24

  • rpi pins max output is 3.3v.
  • how to monitor the rpi temperature?
  • is gpio cleanup necessary?

16.03.24

  • gpio pin layout is actually this way:
rpi v4 gpio
  • 5v to 3.3v converter: HW-122 (AMS1117-3.3).

  • the converter can be used for rpi to arduino serial communication.

15.03.24

  • ring attention is useful for increasing the context size.
  • miniforge works better on raspberry pi.
  • pinout.xyz for pin layout.

13.03.24

  • UART is a serial communication protocol.
  • Enabling serial on RPi 4:
    • sudo raspi-config
    • Interfacing Options > Serial > No > Yes
    • Reboot
  • GPIO connections:
    • TX of RPi to RX of USB to TTL
    • RX of RPi to TX of USB to TTL
    • GND of RPi to GND of USB to TTL
  • minicom can be used to access the serial console of RPi. (sudo apt install minicom)
  • minicom -b 115200 -o -D /dev/ttyUSB0 to start minicom with baud rate 115200 and device /dev/ttyUSB0
  • disable hardware flow control in minicom using Ctrl+A > O > Serial port setup > F > No

12.03.24

  • the notes belong to different categories, can I use a LLM to classify them without any labels? Each bullet point is a note and the category is the label.
  • the categories could be:
    1. Embedded
    2. ML
    3. GPU/Infra
    4. Programming
    5. Latex
    6. Unlabelled

11.03.24

  • to reduce matplotlib xticks:
num_xticks = 5  # Number of x-ticks to show
step = len(time_steps) // num_xticks
plt.xticks(time_steps[::step], rotation=45, fontsize=15)  # Set x-axis ticks to show only selected time steps
  • usb-c power delivery (pd) can deliver variable voltage and current using software negotiation.

  • power delivery trigger board can be used to negotiate power delivery and get a fixed voltage and current.

  • \usepackage{graphicx} and \usepackage{subcaption} for subfigures in latex.

10.03.24

  • how to flash a blank stm32f030f4p6 chip?
  • blinking led is the hello world of embedded systems
  • today’s commit deletes the old format files.

  • nvidia-driver-350 is compatible with cuda-11.8.
  • nvidia-driver-250 is compatible with cuda-11.5.
  • to switch display driver from nvidia to intel, use nvidia-prime:
sudo apt install nvidia-prime
sudo prime-select intel
  • install cuda 11.8:
wget https://developer.download.nvidia.com/compute/cuda/repos/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run

and update path using:

$ export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}}
$ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64\
                         {LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
  • when building cuda libraries using ninja if you get an error:
/usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
  435 |         function(_Functor&& __f)
      |                                                                                                                                                 ^
/usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
  530 |         operator=(_Functor&& __f)
      |                                                                                                                                                  ^
/usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’

then install gcc-10 and g++-10:

sudo apt install gcc-10 g++-10
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-10 10
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-10 10

and update version:

Ubuntu 22.04.1 LTS
Cuda compilation tools, release 11.8, V11.8.89
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0
g++ (Ubuntu 9.5.0-1ubuntu1~22.04) 11.3.0
  • bash_aliases is a file to store aliases for bash commands such as export PATH and export LD_LIBRARY_PATH.

  • To install pytorch with cuda support:

conda install pytorch=*=*cuda* cudatoolkit -c pytorch

09.03.24

  • there’s no desktop ARM processors.
  • a usb to ttl converter pl2303hx can be used to access the serial console of a raspberry pi.
  • ssh gives virtual console whereas serial console gives physical console.
  • serial console doesn’t require wifi or hdmi.
  • arm is also risc.

08.03.24

  • embedded languages: c, c++, rust
  • rust can run bare metal on raspberry pi using no_std and no_main crate-level attributes
  • bare metal can be used to run code without an operating system

07.03.24

  • lora is duplex by default. It can send and receive at the same time.
  • analog pins on arduino can be used as digital pins too.
  • arduino D0 and D1 pins although set aside for TX and RX can also be used as digital pins.

05.03.24

  • nvidia display driver is different from nvidia cuda driver.
  • cuda version in nvidia-smi is not the installed version.
  • nvcc --version gives the installed cuda version.

04.03.24

  • neo6m gps module connects to the satellite and gives the location in NMEA format.
  • it has a cold start time of 27s and a hot start time of 1s. on my desk, it took 2-5 minutes to get a fix.
  • once fixed, it saves it to the eeprom and can be retrieved on the next boot.
  • the eepron battery is a coin cell.

03.03.24

  • einsum is cool. It uses the Einstein summation convention to perform matrix operation.
  • torch.einsum('ij,jk->ik', a, b) is equivalent to torch.matmul(a, b)
  • its drawbacks are that its not optimized on gpu (yet). Also doesn’t allow brackets in the expression.
>>> a = torch.rand(3, 5)
>>> a
tensor([[0.7912, 0.6213, 0.6479, 0.2060, 0.9857],
        [0.9950, 0.7826, 0.6850, 0.6712, 0.0524],
        [0.4367, 0.8872, 0.9622, 0.0159, 0.4960]])
>>> b = torch.rand(5, 3)
>>> b
tensor([[0.4560, 0.9680, 0.1179],
        [0.9072, 0.8982, 0.2926],
        [0.5526, 0.2779, 0.5810],
        [0.4366, 0.8061, 0.0065],
        [0.4744, 0.6915, 0.5326]])
>>> torch.einsum('ij,jk -> ik', a,b)
tensor([[1.8401, 2.3517, 1.1779],
        [1.8601, 2.4338, 0.7766],
        [1.7780, 1.8429, 1.1344]])
>>> torch.matmul(a, b)
tensor([[1.8401, 2.3517, 1.1779],
        [1.8601, 2.4338, 0.7766],
        [1.7780, 1.8429, 1.1344]])
  • stm32f030f4p6 as per the naming convention means:
    • stm32 is the family of microcontrollers
    • f is the series = General purpose
    • 0 is the core count = ARM Cortex-M0
    • 30 is the line number
    • f is the pin count = 20
    • 4 is the flash size = 16KB
    • p is the package type = TSSOP
    • 6 is the temperature range = -40 to 85 degree celsius

02.03.24

  • The stm32f030f4p6 chip is SMD and in TSSOP-20 footprint.
  • I also bought SMD to THT adapters which are called breakout boards and soldered the chip to it.
  • STM32 nucleo boards come with a built-in st-link programmer and debugger.

images:

  1. stm32f030f4p6 soldered onto a breakout board
  2. stm32f030f4p6 with rpi v4 for scale
stm32f030f4p6 breakout board stm32f030f4p6 with rpi v4

01.03.24

  • v100s has 5120 cuda cores and 640 tensor cores
  • quadro rtx 5000 has 3072 cuda cores and 384 tensor cores
  • tensor cores are more important for deep learning than cuda cores
  • installing miniconda:
# install miniconda
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
source ~/.bashrc
  • installing nvidia gpu drivers:
# install nvidia drivers
sudo apt update && sudo apt upgrade
sudo apt autoremove nvidia* --purge
ubuntu-drivers devices
sudo apt install nvidia-driver-525
sudo reboot
nvidia-smi
# install pytorch with cuda support
pip install torch torchvision torchaudio
  • ICs come in different packages: DIP, SOP, QFP, TQFP

29.02.24

  • softmax suffers from numerical instability due to floating point precision error
>>> import torch
>>> m = torch.nn.Softmax(dim=1)
>>> a = torch.tensor([[ 0.4981e3,  0.5018, -0.7310]])
>>> m(a)
tensor([[1., 0., 0.]])
  • normalization is a way to solve numerical instability
>>> torch.nn.functional.normalize(a)
tensor([[ 1.0000,  0.0010, -0.0015]])
>>> m(torch.nn.functional.normalize(a))
tensor([[0.5762, 0.2122, 0.2117]])

28.02.24

  • color sensors (TCS34725, TCS3200) can detect intensity of R,G,B individually
  • because of open source, risc v is cheaper than arm and runs linux too
  • microcontroller (arduino, stm32) vs single board computer (raspberry pi, beaglebone)
  • models perform better when data is gaussian

27.02.24

  • warmup_step hyperparameter lowers the learning rate for the first few steps and then increases it
  • transformer = encoder + decoder + attention
  • K is the context window size in the attention mechanism which is the number of tokens that each token attends to.
  • attention in transformers has quadratic time complexity
  • flash attention has linear time complexity
  • An Attention Free Transformer also has linear time complexity
  • wandb can be self-hosted too inside the docker container

26.02.24

  • cpu architectures: x86, x86_64, arm, arm64, risc-v
  • famous arm dev board: stm32
  • risc-v is open source and is gaining popularity
  • LuckFox Pico Plus RV1103 is a risc-v dev board with ethernet and can run linux
  • softmax not summing to 1 T_T
  • how to make LoRa full duplex?

25.02.24

  • rl implementations: stable-baselines3
  • cleanrl has single file implementations of rl algorithms
  • tianshou is a pytorch based rl library
  • Through Hole Technology (THT) vs Surface Mount Technology (SMT)

24.02.24