List: GPUs & IPUs | Curated by Antonio Esteves

Sep 22, 2024

51 stories

1 save

GPUs & IPUs
In
Intuitively and Exhaustively Explained
by
Daniel Warfield
GPU Accelerated Polars — Intuitively and Exhaustively ExplainedFast Dataframes for Big Problems
Sep 17, 2024
1
Sep 17, 2024
1
Ali Shafique
PyTorch training optimizations: 
5× throughput with GPU profiling and memory analysis.Training optimization techniques are critical in machine learning because they enhance efficiency, speed up convergence, ensure stability…
Apr 29, 2024
Apr 29, 2024
In
Intuitively and Exhaustively Explained
by
Daniel Warfield
CUDA for AI — Intuitively and Exhaustively ExplainedParallelized AI from scratch in CUDA
Jun 14, 2024
8
Jun 14, 2024
8
In
TDS Archive
by
Lucas de Lima Nogueira
Why Deep Learning Models Run Faster on GPUs: A Brief Introduction to CUDA ProgrammingFor those who want to understand what .to(“cuda”) does.
Apr 17, 2024
19
Apr 17, 2024
19
In
Towards AI
by
Florian June
Flash Attention: Underlying Principles ExplainedFlash Attention is an efficient and precise Transformer model acceleration technique, this article will explain its underlying principles.
Dec 17, 2023
1
Dec 17, 2023
1
In
AIGuys
by
Vishal Rajput
150x faster Pandas with NVIDIA’s RAPIDS cuDF150x jump in Pandas perfromance. Using NVIDIA’s RAPIDS cuDF to leverage GPU power.
Nov 10, 2023
2
Nov 10, 2023
2
In
TDS Archive
by
Andy Lo
Matrix Multiplication on GPUHow to achieve state-of-the-art matrix multiplication performance in CUDA.
Oct 9, 2023
Oct 9, 2023
In
TDS Archive
by
Antonis Makropoulos
How to Build a Multi-GPU System for Deep Learning in 2023This story provides a guide on how to build a multi-GPU system for deep learning and hopefully save you some research time and…
Sep 16, 2023
19
Sep 16, 2023
19
In
Better Programming
by
Ekin Karabulut
Simplifying GPU Management for Data Scientists With GenvA walkthrough about how to get the most out of your GPUs
May 11, 2023
May 11, 2023
abhishek kushwaha
Speeding Deep Learning inference by upto 20XIf your engineering team is not using Nvidia TRT for your deep learning model deployment then you should stop everything and read this…
Feb 19, 2023
Feb 19, 2023
In
RAPIDS AI
by
Paul Mahler
RAPIDS Release 23.02First Release of the Year
Mar 9, 2023
Mar 9, 2023
In
Geek Culture
by
Vikas Kumar Ojha
Training Larger Models Over Your Average GPU With Gradient Checkpointing in PyTorchMost of us have faced situations where our model is too big to train on our GPU. This blog explains how we can solve it through a example.
Jan 30, 2023
Jan 30, 2023
In
TDS Archive
by
Michele Zanotti
Dynamic MIG partitioning in KubernetesMaximize GPU utilization and reduce infrastructure costs.
Jan 26, 2023
2
Jan 26, 2023
2
In
TDS Archive
by
Michele Zanotti
How to Increase GPU Utilization in Kubernetes with NVIDIA MPSIntegrating NVIDIA Multi-Process Service (MPS) in Kubernetes to share GPUs among workloads for maximizing utilization and reducing…
Feb 2, 2023
2
Feb 2, 2023
2
In
Towards AI
by
Benjamin Marie
Run Very Large Language Models on Your ComputerWith PyTorch and Hugging Face’s device_map
Dec 22, 2022
2
Dec 22, 2022
2
In
RAPIDS AI
by
Nick Becker
100x Faster Machine Learning Model Ensembling with RAPIDS cuML and Scikit-Learn Meta-EstimatorsLearn how to use RAPIDS cuML with scikit-learn’s ensemble model APIs to achieve more than 100x faster boosting, bagging, and stacking.
Aug 18, 2020
3
Aug 18, 2020
3
In
RAPIDS AI
by
Nick Becker
RAPIDS 22.12 ReleaseMaking sure your holiday season is full of presents.
Dec 14, 2022
Dec 14, 2022
In
Geek Culture
by
Vikas Kumar Ojha
Achieve GPU Grade Performance on CPUs With SparseMLThis blog explains the model optimization for achieving GPU grade performance on multi core CPUs
Nov 11, 2022
Nov 11, 2022
 This story is no longer available
Jacob Lavoie
Why Spiking Neural Networks are the next leap in AIWhy spiking neural nets on neuromorphic hardware could beat GPU and TPU energy consumption to sustainability deploy artificial intelligence
Sep 14, 2022
4
Sep 14, 2022
4

GPUs & IPUs

GPU Accelerated Polars — Intuitively and Exhaustively Explained

Fast Dataframes for Big Problems

PyTorch training optimizations: 5× throughput with GPU profiling and memory analysis.

Training optimization techniques are critical in machine learning because they enhance efficiency, speed up convergence, ensure stability…

CUDA for AI — Intuitively and Exhaustively Explained

Parallelized AI from scratch in CUDA

Why Deep Learning Models Run Faster on GPUs: A Brief Introduction to CUDA Programming

For those who want to understand what .to(“cuda”) does.

Flash Attention: Underlying Principles Explained

Flash Attention is an efficient and precise Transformer model acceleration technique, this article will explain its underlying principles.

150x faster Pandas with NVIDIA’s RAPIDS cuDF

150x jump in Pandas perfromance. Using NVIDIA’s RAPIDS cuDF to leverage GPU power.

Matrix Multiplication on GPU

How to achieve state-of-the-art matrix multiplication performance in CUDA.

How to Build a Multi-GPU System for Deep Learning in 2023

This story provides a guide on how to build a multi-GPU system for deep learning and hopefully save you some research time and…

Simplifying GPU Management for Data Scientists With Genv

A walkthrough about how to get the most out of your GPUs

Speeding Deep Learning inference by upto 20X

If your engineering team is not using Nvidia TRT for your deep learning model deployment then you should stop everything and read this…

RAPIDS Release 23.02

First Release of the Year

Training Larger Models Over Your Average GPU With Gradient Checkpointing in PyTorch

Most of us have faced situations where our model is too big to train on our GPU. This blog explains how we can solve it through a example.

Dynamic MIG partitioning in Kubernetes

Maximize GPU utilization and reduce infrastructure costs.

How to Increase GPU Utilization in Kubernetes with NVIDIA MPS

Integrating NVIDIA Multi-Process Service (MPS) in Kubernetes to share GPUs among workloads for maximizing utilization and reducing…

Run Very Large Language Models on Your Computer

With PyTorch and Hugging Face’s device_map

100x Faster Machine Learning Model Ensembling with RAPIDS cuML and Scikit-Learn Meta-Estimators

Learn how to use RAPIDS cuML with scikit-learn’s ensemble model APIs to achieve more than 100x faster boosting, bagging, and stacking.

RAPIDS 22.12 Release

Making sure your holiday season is full of presents.

Achieve GPU Grade Performance on CPUs With SparseML

This blog explains the model optimization for achieving GPU grade performance on multi core CPUs

Why Spiking Neural Networks are the next leap in AI

Why spiking neural nets on neuromorphic hardware could beat GPU and TPU energy consumption to sustainability deploy artificial intelligence

Antonio Esteves