×
all 6 comments

[–]teerre 9 points10 points  (0 children)

Very cool blog! Great job explaining it. It was easy to follow

[–]ejrh 4 points5 points  (1 child)

I guess this is for user space programming, but I was always taught that nothing "big and/or complicated" should happen in an interrupt. Instead, you should do no more than set a flag and rely on the normal non-interrupt code to check it and call the appropriate big and complicated function.

The usual example was that anything requiringmalloc orfree was too big and complicated. Running an eBPF program certainly seems big and complicated enough. But I guess kernel programmers are made of sterner stuff and they just have to provide for this? I have a feeling that the eBPF hooks for performance events wouldn't be practical if they used the traditional approach.

[–]admalledd 1 point2 points  (0 children)

You are generally correct, the idea falls apart though that these were NMIs from performance sampling type tools, so they have to do some work in the interrupt. As the author stated, eBPF devs shouldn't need to care about this specific case since it's kind of the whole point of eBPF tracing existing. So the kernel devs have to be extra defensive, and seems a spot or two were missed, oops!

[–]joolzg67_b 2 points3 points  (0 children)

I worked on a port of nucleus RTOS late 80s, was asked to get it running ASAP, had it running in one day.

Got a call a few months later saying "random crashes" are happening.

Went in and found the interrupt now being used for the RTOS was a NMI interrupt, added a flag to check if interrupts were disabled and if so ignored the NMI.

Voila fixed.

[–]DowntownCap6204 2 points3 points  (0 children)

love when a bug goes from “profiler freezes the box” to a tiny eBPF repro and a 250ms spinlock timeout

[–]drcforbin 4 points5 points  (0 children)

TIL I love low level debugging porn. "Yay! Job’s done. Or is it?"