Implementing virtual list view with variable row heights

For an immediate mode UI framework

2025.10

I want to talk about how I implemented a virtual list view for shi•rei, the immediate mode GUI framework I'm working on.

A few weeks ago, this episode of the Wookash Podcast came out, where the subject was immediate mode UI.

Near the very end, there was a question about things that are challenging to implement in immediate mode, and one of the things that came up was a virtual list where the items do not all have the same height.

The following is a 1-minute clip of the conversation:

At the time, I was in the middle of tackling this exact problem for shirei so I left a comment on Twitter mentioning that, and sort of promising to write up about it if I manage to get it working.

I see around 2:54:00 you discuss what is essentially a virtual list where items inside are of variable height

Interestingly that is something I am trying to tackle right now.

If it works out it might be worth writing up about it

— Hasen Judi (@Hasen_Judi) October 13, 2025

I think I have it mostly figured out, so this article is the writeup I promised!

Demo

So first of all, let me show you a couple of demos:

(1) Loading and displaying a large text file (200MB):

(2) Loading a directory with hundreds of images and scrolling through them:

Background

shi•rei is an immediate mode style UI framework where you build the UI by laying out flex-box like containers (recursively) as rows or columns.

The way you build the UI is by creating a new container, setting properties on it, adding child containers to it, then closing it.

Container sizing goes through two passes:

Intrinsic size is computed recursively from bottom to top. When you close a container, we compute its intrinsic size right away by iterating on its children, collecting their sizes on both dimensions, adding padding and gaps.

It's a bit more complicated due to wrapping, but hopefully you get the idea.

Extrinsic sizing works the opposite way. It goes top down. We cannot apply extrinsic sizing to a container unless it has been applied to its parent container first.

This means, extrinsic sizing can only be applied after all the UI elements for this frame have been laid out.

Scrolling

It's possible for the content size to exceed the container size, specifically, when the container has a max size.

By default, content that overflows can still be rendered; it just won't affect the sizing process.

If the clip flag is set, the content will be clipped, and the only way to bring them to view is to "scroll" the container.

While the API is immediate mode, containers do have state that persists across frames if they have the same id.

Container ids are assigned automatically based on the containers position in the tree, but the caller has the option to set their own id.

One of the state variables that persists across frames is the scroll offset.

We also have a scrollbar widget that can reflect the scroll offset and control it. Note that the widget is implemented in "user space". Anyone can implement their own scrollbar. All the data and APIs needed to implement such a widget are available for anyone to use.

Here's a video showing two containers with text content being scrolled. The demo also shows a list view with dummy items getting scrolled, along with a scrollbar.

Just to make sure there's no confusion: the video above does not show a virtual list container. It's just the default mechanism where a container can have content larger than its size and it has a scroll offset.

We can create a normal container with many children, but given that most of the content will not be rendered, it can be very wasteful to lay them all out, and this can negatively impact UI responsiveness (depending on machine specs and list size).

That list with dummy items can make the UI quite sluggish if we put in hundreds of items.

Initial motivation

I don't know how to design system components "platonically". I need to have something specific that I'm struggling to implement, such that I would benefit from a certain kind of component existing, then I can go on to work on that component.

I also tend to take into account potentially similar problems, and try to design the component in a way where is it not overfitted to the one specific problem I'm currently doing. Instead, I try to make it a bit more general purpose.

My first use case was a directory tree listing for a small "disk usage" utility, meant to be a show case (example program) for the GUI framework.

We scan directories recursively to find out which directories are taking the most amount of disk space, and the UI shows all the directories, and lets you toggle expand/collapse them.

Very similar to what Ramon (raysan5) was describing in the clip I took of the UI podcast.

Here is a screenshot of what the UI currently looks like. It was much more plain when I started working on this problem, but I don't have a screenshot from that time, so this should do. I'm not trying to flaunt my design skills (I don't have any), I'm just trying to show what kind of view we are talking about here.

Immediately I faced the problem that the UI becomes janky / unresponsive when you have several hundred rows in a scrollable container!

And in this case, the solution was rather straight forward: a virtual list container where items have a fixed height, and you as a caller just supply a render function per item.

There's a direct mapping from scroll offset to position in the list. We know exactly how big the content needs to be, and we know how big to make the empty spacer before above the fold and below the fold. It's all so trivial.

As a reminder, what we need to compute in order to display a virtual list:

Large text view

Then I had the idea of reusing this container to display a large text file.

See, it was already a known limitation of the framework that the default text view would get slower and slower as the string size gets larger and larger, until eventually grinding to a halt when you try to display an 8MB text file (on my machine: MacbookAir with the Apple M2 chip).

I had to cripple the text view to truncate the text so it's size is no more than 16kb.

To give some background, text processing goes through three stages:

Shaping goes through a Go port of HarfBuzz. It is not super fast per se — but processing is only done once, and then cached, so it doesn't usually matter.

Part of shaping involves breaking the text into segments, so what we get as the output of the shaping process is a list of shaped segments.

Line wrapping works on these segments, grouping them into lines such that each line's cumulative width does not exceed the maximum width parameter. It's a pretty straightfoward for loop. I don't have exact measurement numbers, but I do remember running some measurements on the whole shaping process, and line breaking was only a tiny percentage of the time consumed.

And, one other thing: after segments are broken into lines, an important processing step is applied for bidirectional text layout support: contiguous segments of the direction opposite to the paragraph direction are reveresed.

This can only happen after line wrapping; thus are the bidi rules.

The "display" step is just arranging the wrapped lines into rows to be layed out inside a column container, and within each line container, we lay out the segments horizontally.

This has implications on performance when displaying a very large text file: we are building a huge tree sturcture every frame, but the vast majority of the data in it is not even shown on the screen; the only function it ends up serveing is help the scrollable container determine the size of its content, which will influence the scrollbar display.

First iteration

I thought text was a perfect example of a list where items all have the same height. At least, plain text should be.

You just need to process the string: pass it through the shaping engine, and apply line wrapping. After that, you know exactly how many items (lines) you have, and you can display each line as its own item in the list, and we already had text rendering, so incorporating it into the virtual list should be very easy!

Except, it turned out, the processing phase would take a very long time.

Once processing was done, the display and scrolling worked perfectly! It was very smooth and responsive.

How long was the process time? Well, when I tried it on a 10MB file, it took 10 seconds to apply shaping and line wrapping.

Rendering the solution unviable.

Large text file view .. but .. the initial processing time is very slow 😭 pic.twitter.com/RKyd6S76uV

— Hasen Judi (@Hasen_Judi) October 9, 2025

While the Go port of HarfBuzz is not the fastest thing in the world, it works fine for small amounts of text — good enough for what you could fit on a screen.

We could replace it with a better, faster library, such as kb_text_shape, (which is probably a good idea anyway), but it wouldn't really solve the core problem.

See, even if text processing was 10x faster, there would still be some file size for which it would take several seconds, and we'd still need to do something about that.

So the solution obviously has to be to not do this pre-processing. In other words, we should not be running the entire text through the shaping engine.

Does the user care about this?

One of the most profound ideas I got out of Casey Muratori's refterm lecture is to not spend CPU time doing work whose results the user does not care about.

Relevant clip:

Why are we running the shaping process on 10MB of text if the user is not going to see any of it?

Let's step back a bit and think about what the user actually wants from the scroll bar and a scrollable container.

It is this last point that would benefit from knowing all the heights of all the items ahead of time, but there's a crucial point we can't overlook:

For a large enough view, the scrollbar should be thought more as a rough visual indicator rather than some kind of pixel-accurate seeker.

When the user clicks half way throught the scrollbar, are they seeking to a specific item? No, they are asking to be shown, roughly speaking, what items exist half way through the list.

So now we are dealing with a virtual list where items have different heights and we do not know all of them ahead of time, nor can we!

Large text view - second iteration

If we are not going to run the shaping process on the entire text, then our next best bet is to break the text by '\n' so that we get a known number of lines, but whose heights we don't know ahead of time, because text can wrap, increasing the height of each line.

Breaking text into lines is still a kind of pre-processing, but it's surprisingly fast!

On my machine, it takes about 400ms to process 200MB of text this way, producing around 9.5 million lines of text. Not bad.

It's also worth mentioning that I already had another problem in mind that would benefit from this kind of virtual list, which is how to display a large list of pictures (hundreds or even thousands of them), and have the view load instantly, and respond to user input with no delay.

The challenge now is, how do we implement the virtual list with variable and unknown item heights, such that it can support both smooth scrolling and random-access scrolling?

Formulating the challenge

Given a list of items and a scroll offset, the decision that needs to be made every single frame in an immediate mode virtual list container is:

This gives you everything you need to start rendering the view; namely:

The first obvious idea is to use a "guess" for an average item height, which we can use to compute the total height of the content.

For now, it doesn't really matter where that value comes from. We'll just assume we have such a value.

To support random access scrolling, we can use the same logic we used for the virtual list where all items have the same height.

This logic unfortunately breaks down if the user is just smooth scrolling.

Consider a situation where the first element in the scrolling window is taller than the assumed average height. What happens when you scroll down?

The expected behavior is the elements move up exactly by the amount of pixels given in the user input. But if we apply the rough scroll position method, the "first item" in the visible window will suddenly "jump" to the next item before reaching the bottom of the current item.

On the other hand, if you are given the problem of smooth scrolling from the top of the list where the total height is known but individual item heights are not known, it should be quite simple to imeplement. You just keep track of the heights of the items you've seen, and you adjust accordingly as you move up or down.

Here we were scrolling relative to the first item, but the math can be generalized to any random anchor point.

Instead of counting from the first item, we count starting from the anchor. If the anchor item has a known offset anchor_y, then we can easily find the offset of some other item relative to anchor_y such that:

To support both smooth and random access scrolling, we need to know whether the scroll event was smooth or "jumpy". If it's jumpy, we use the random access scrolling logic, if it's smooth, we apply the smooth scrolling method.

Every time we apply scrolling, whether it's smooth or random access, we can just set the anchor to the first item in the scroll window.

Obtaining item heights

I skipped over the question of how to get item heights, and worked with the assumption that we can just "obtain" it by rendering the item.

In shi•rei, this is very much possible: as soon as we are done with rendering one item, we can obtain its intrinsic height. So in my first iteration, I tried to lean heavily on this ability.

If we could just obtain item heights automatically, it would make our API that much easier to use. You just pass the number of items and a function to render an item by index!

However, as I was going through the implementation, I kept hitting various walls and I ended up deciding to require the caller to supply another function that would return the height for items.

It does make the virtual list component a bit more cumbersome to use, but it makes it more robust and reliable.

What problems does this solve? Consider for instance what happens when the user tries to smoothly scroll up: we need to know the heights of several items that are above the current first item in the viewing window.

There's also the question of how to determine the "assumed" average item height. At first I tried just taking the average of the items we've seen, but then as we scroll down and see more items, this average could change, causing the scroll thumb to change randomly in size and position.

So I settled on taking the average height of the first and last N items in the view. This makes the value more meaningful, and produce a pixel-accurate total height value for lists with itemCount <= N*2.

This basically takes care of everything we've discussed so far!

Handling resize

One important aspect we haven't discussed is resizing. The content of the item depends on the width available. In the case of text, a narrower width results in more lines, thus taller height. In the case of images, it results in smaller images, thus smaller height.

So both the render function and the item height function need to take the view width as a parameter.

We also need to keep the scroll position visually stable: the first item should remain as the first item, and the position of the scroll thumb should also remain stable.

I did not realize at first that I would need to handle this, but I noticed something really strange happens when I resized the pictures virtual list: the scroll thumb offset would go all the way downwards, approaching the bottom! This is probably because I was updating the assumed average height but not the offset for the anchor. I didn't realize at first that this is what was happening, so I tried several complicated approaches, but what I settled on was to simply forcibly maintain ratio offset / content_height for the anchor, the first item, and the scroll offset.

Given offset0 and height0 as the values before the resize, and given height as the new height value, obtained after we recompute the heights of the first and last N items, and use them to derive an average value, and multiply it by the number of items, we need to set the offset such that:

offset / height = offset0 / height0

Which means:

offset = height * offset0 / height0

We can also update the anchor point to be the first visible item.

Code organization

The idea presented hear might seem neat, but I went through several iterations until I settled on this. During these iterations, I had many "failed" attempts, or rather, ideas that didn't work out so great. The process was quite messy, and sometimes I got confused as to what exactly was going on, as the code was getting quite messy, and I was confusing input variables, computed values, and retained state. It was really quite messy for me brain, with too many inline computations.

The implementation did not really solidify until I solved this problem of the "messiness".

We all know about Clean Code and the SOLID principles and Domain Driven Design, right? I had to bite my tongue as I implemented these amazing ideas to clean up the code!!11

Just kidding.

I just defined a few struct types (all inline, btw) and some functions that operate on them, more or less.

The total amount of code in the virtual list container ended up being about 250 lines, a significant portion of which are explanatory comments.

For reference, this is the public interface:

func VirtualListView(
    itemCount int,
    itemIdFn func(index int) any,
    itemHeightFn func(index int, width f32) f32,
    itemViewFn func(index int, width f32),
)

The first struct I define is (index, offset), which we use for computations related to finding an anchor point and finding the first item relative to the anchor point

type ItemOffset struct {
    Index  int
    Offset f32
}

Next, we have the virtual list's internal state. shi•rei lets you define container specific UI state using an API based on the same idea as React hooks.

type VirtualListState struct {
    Anchor ItemOffset

    // state used to handle width resizing
    TotalHeight f32

    // known view state; used to detect changes
    ScrollOffset f32
    Width        f32
}

Inside the virtual list component, we do not attempt to handle input events directly, instead we detect changes to the scroll offset and the width.

This is keeping in line with the general event handling strategy for shi•rei: instead of thinking about input as discrete events, it's just data that can change from frame to frame.

We define the following functions to operate on data. I'll leave the implementation as an exercise to the reader!

(Note that these are inner functions, so they get access to the parameters passed to VirtualListView)

computeAverageHeight := func(width f32) f32
itemOffsetFromAnchor := func(width f32, anchor ItemOffset, scrollOffset f32) ItemOffset
anchorFromOffset := func(width f32, avgHeight f32, scrollOffset f32) ItemOffset

We recompute the "average height" and "total height" every frame. We get the average value by averaging the heights of the top N items, and we multiply this average by the itemCount to get the total height.

This means we call itemHeightFn(..) on the top N items, so this function better be pretty fast! It doesn't need to be super fast though.

For text lines, knowing the height requries full shaping to be applied, and we have already established that shaping would be expensive to apply to the entire text ahead of time.

For image files, we can get the image size without reading the full file: just the header portion. Go's image package provides DecodeConfig exactly for this purpose, and it's pretty fast.

What's not mentioned here is that I have defined some "immediate mode" style functions to read files without managing their life cycles, so that I can say LoadImage(...) and/or LoadImageConfig(...) every frame without actually loading the file every single time. This is a separate topic that might be worth writing about.

Edge cases

The last point worth mentioning is what to do about edge cases. If you use the heuristic mentioned to handle random-access scrolling, and then you continue to smoothly scroll from there until you hit the top or bottom of the view, we might get incorrect values for 'space before' and 'space after' heights.

There could either be too much space outside the edge, or not enough space!

Having too much space would look bad, think about empty space before the first line (or picture), or empty space after the first line (or picture).

Having too little space would just break functionality! Imagine if we thought there's only 2 pixels of space before the third picture; then the user will not be able to scroll up!

To resolve having too much space, I simply force the space before to be 0 if the first item to render is found to be 0, and force the space after to be 0 if the last item rendered is the last item.

To force having too little space, I do different things on top and bottom.

If we have too little space on top, which I detect using the heuristic of the first item's offset is smaller than the average height even though it's not the first item in the list, then I re-calculate the offset by using the zero item as the anchor.

first := itemOffsetFromAnchor(width, state.Anchor, state.ScrollOffset)

// edge case 1 (top)
if first.Index == 0 {
    first.Offset = 0
}
if first.Offset < avgHeight && first.Index != 0 {
    first = itemOffsetFromAnchor(width, ItemOffset{}, state.ScrollOffset)
}

For the bottom, I use two values for space after: the value we get from

spaceAfter := max(0, state.TotalHeight-(spaceBefore+renderedHeight))

// edge case 2 (bottom)
if endIndex == itemCount {
    spaceAfter = 0
}
if endIndex != itemCount && spaceAfter < avgHeight {
    remainingCount := itemCount - endIndex
    spaceAfter = f32(remainingCount) * avgHeight
}

There might be some jumpiness but in practice it doesn't seem to feel too disruptive. We could probably do something better, but for now this does not feel that urgent to me.

Here's a demo with a somewhat medium size text file. You can see the thumb jumping artifact as we approach the bottom when we conttinuously smooth scroll from the middle to the bottom.

Implementation outline

If you've followed the logic so far, the code for the virtual list view should be fairly straight forward.

Instead of pasting the code as-is, with all of its framework specific quirks and details, I think a high level outline is more useful.

Here's what the virtual list view's rendering function does every frame:

When I say we "render the items" or "render the bix", what I actually mean is add a layout container, like this:

LayoutId(id, TW(FixSize(width, height)), func() {
    itemViewFn(idx, width)
})

which doesn't really render anything directly — it just prepares layout data! The actual rendering is handled in a different part of the system.

This might be another topic for another day.

For now, here's the code for the virtual list view as of the time of this article: shirei/widgets/scroll.go:173

❀ ✒ ❀