You know what would be really nice (but nobody is ever going to build)?
-
You know what would be really nice (but nobody is ever going to build)?
Oscilloscope that replaces the ungodly slow USB3/1000baseT PC interface port with NVLink.
Forget PCIe and Thunderbolt... 900 GB/s of bandwidth straight from the ADC to my GPU? Sign me up.
For comparison... my 16 GHz LeCroy oscilloscope puts out 40 Gsps * 4 channels * 8 bits of raw ADC samples, not counting the flatness corrections done in gateware/firmware.
That's 160 GB/s or 1.28 Tbps of raw samples.
That would even fit in NVLink 2.0 much less the current gen4/5 stuff.
Imagine four channels of 16 GHz bandwidth waveform data straight into a (very large) GPU nonstop... We'd have to do a hell of a lot of optimization to ngscopeclient to keep up and probably add multi-GPU support but it would be so much fun lol.
-
For comparison... my 16 GHz LeCroy oscilloscope puts out 40 Gsps * 4 channels * 8 bits of raw ADC samples, not counting the flatness corrections done in gateware/firmware.
That's 160 GB/s or 1.28 Tbps of raw samples.
That would even fit in NVLink 2.0 much less the current gen4/5 stuff.
Imagine four channels of 16 GHz bandwidth waveform data straight into a (very large) GPU nonstop... We'd have to do a hell of a lot of optimization to ngscopeclient to keep up and probably add multi-GPU support but it would be so much fun lol.
@azonenberg I'm always kind of weary of silicon manufacturer's proprietary high-speed buses, because they teeeend to be slightly use-case-specific and don't deal well with edge cases outside that. Anyways, when I saw NVLink my first reaction was "wait is this HyperTransport, but with expensive modern transceivers?"; it isn't, but my guess is that from a system's perspective, you'd be better off going for AMD's InfinityFabric, which seems to make stronger coherence statements (not sure). But, you
-
@azonenberg I'm always kind of weary of silicon manufacturer's proprietary high-speed buses, because they teeeend to be slightly use-case-specific and don't deal well with edge cases outside that. Anyways, when I saw NVLink my first reaction was "wait is this HyperTransport, but with expensive modern transceivers?"; it isn't, but my guess is that from a system's perspective, you'd be better off going for AMD's InfinityFabric, which seems to make stronger coherence statements (not sure). But, you
@azonenberg are mostly only optimizing for nvidia GPUs anyways, so that might be a moot point.
-
@azonenberg are mostly only optimizing for nvidia GPUs anyways, so that might be a moot point.
@funkylab Well I mean I would *like* a ludicrously high bandwidth portable interface, but the vendors aren't building it.
Realistically, I think the best you can do portably today is 100GbE with RoCE.
-
For comparison... my 16 GHz LeCroy oscilloscope puts out 40 Gsps * 4 channels * 8 bits of raw ADC samples, not counting the flatness corrections done in gateware/firmware.
That's 160 GB/s or 1.28 Tbps of raw samples.
That would even fit in NVLink 2.0 much less the current gen4/5 stuff.
Imagine four channels of 16 GHz bandwidth waveform data straight into a (very large) GPU nonstop... We'd have to do a hell of a lot of optimization to ngscopeclient to keep up and probably add multi-GPU support but it would be so much fun lol.
-
@funkylab Well I mean I would *like* a ludicrously high bandwidth portable interface, but the vendors aren't building it.
Realistically, I think the best you can do portably today is 100GbE with RoCE.
@azonenberg oh you *can* buy 800 Gb/s interfaces, don't know what their host sides look like, if any for non network-vendor stuff (this is mostly aggregated traffic equipment, i.e. linking racks or DC 1 to DC 2)
-
@azonenberg oh you *can* buy 800 Gb/s interfaces, don't know what their host sides look like, if any for non network-vendor stuff (this is mostly aggregated traffic equipment, i.e. linking racks or DC 1 to DC 2)
@funkylab yeah exactly. 100G with a normal pcie interface is available today, i have a 100G pipe to my desk and have saturated it with iperf in benchmarks.
And the nic has RoCE offload capabilities although I'm not using it yet
-
@funkylab yeah exactly. 100G with a normal pcie interface is available today, i have a 100G pipe to my desk and have saturated it with iperf in benchmarks.
And the nic has RoCE offload capabilities although I'm not using it yet
@funkylab You can go all the way up to 800G if you have a host system with PCIe gen6 and sufficiently deep pockets (I do not)
-
For comparison... my 16 GHz LeCroy oscilloscope puts out 40 Gsps * 4 channels * 8 bits of raw ADC samples, not counting the flatness corrections done in gateware/firmware.
That's 160 GB/s or 1.28 Tbps of raw samples.
That would even fit in NVLink 2.0 much less the current gen4/5 stuff.
Imagine four channels of 16 GHz bandwidth waveform data straight into a (very large) GPU nonstop... We'd have to do a hell of a lot of optimization to ngscopeclient to keep up and probably add multi-GPU support but it would be so much fun lol.
@azonenberg so my tek 11801C with it's ability to connect to an external sampling head array might present a problem? Oh no wait it's sampling to slow.
-
@azonenberg so my tek 11801C with it's ability to connect to an external sampling head array might present a problem? Oh no wait it's sampling to slow.
@scribblesonnapkins well equivalent time sampling is easy to handle with today's tech because the number of actual samples acquired per second is low.
equally, a scope that acquires high speed data and buffers it in memory before processing at a much slower rate is something we can handle today.
But the vision is to be able to do real time or at least lower-dead-time processing at much higher data rates. ThunderScope almost maxes out 10GbE, my vision is to able to keep up with 25/40/100G eventually
-
M meph@social.treehouse.systems shared this topic
-
@funkylab You can go all the way up to 800G if you have a host system with PCIe gen6 and sufficiently deep pockets (I do not)
@azonenberg I doubt your pockets will be deep enough for NVlink things involving anything but GPUs

-
@azonenberg I doubt your pockets will be deep enough for NVlink things involving anything but GPUs

@funkylab oh i know, nvlink doesnt even let you get the PHY chiplets (the protocol itself is undocumented) unless you have NDAs and a partnership with nvidia etc.
but I can dream...
-
@funkylab oh i know, nvlink doesnt even let you get the PHY chiplets (the protocol itself is undocumented) unless you have NDAs and a partnership with nvidia etc.
but I can dream...
@azonenberg I was assuming that you'd probably (assuming infinite money) could buy an Nvidia server platform that has network->VRAM piping (I assume this because I presume that's what nvidia bought mellanox for)
-
@azonenberg I was assuming that you'd probably (assuming infinite money) could buy an Nvidia server platform that has network->VRAM piping (I assume this because I presume that's what nvidia bought mellanox for)
@funkylab That's where RoCE comes in.
But Ethernet today tops out at 800 Gbps while the latest NVLink can do 14.4 Tbps
-
@funkylab That's where RoCE comes in.
But Ethernet today tops out at 800 Gbps while the latest NVLink can do 14.4 Tbps
@funkylab NVLink is the fantasy, the actually achievable real world implementation is to make the scope speak RoCE, put a mellanox NIC in the client, and RDMA the incoming Ethernet frames straight into VRAM.
But it still has to cross over PCIe and get bottlenecked on that bandwidth
-
@scribblesonnapkins well equivalent time sampling is easy to handle with today's tech because the number of actual samples acquired per second is low.
equally, a scope that acquires high speed data and buffers it in memory before processing at a much slower rate is something we can handle today.
But the vision is to be able to do real time or at least lower-dead-time processing at much higher data rates. ThunderScope almost maxes out 10GbE, my vision is to able to keep up with 25/40/100G eventually
@azonenberg I was trying to make a joke with the 1st part "Oh no wait it's sampling to slow."
But the second part about the thunderscope is cool.
-
You know what would be really nice (but nobody is ever going to build)?
Oscilloscope that replaces the ungodly slow USB3/1000baseT PC interface port with NVLink.
Forget PCIe and Thunderbolt... 900 GB/s of bandwidth straight from the ADC to my GPU? Sign me up.
@azonenberg How about CXL4? That claims 242GB/s and is at least designed for external connectivity.
-
@azonenberg How about CXL4? That claims 242GB/s and is at least designed for external connectivity.
@penguin42 If somebody makes a GPU with CXL I'll be all over it.
Until then I'm stuck with what I can get my hands on. Realistically, that's PCIe and RoCE
-
@funkylab NVLink is the fantasy, the actually achievable real world implementation is to make the scope speak RoCE, put a mellanox NIC in the client, and RDMA the incoming Ethernet frames straight into VRAM.
But it still has to cross over PCIe and get bottlenecked on that bandwidth
@azonenberg @funkylab I am curious what fields use Oscilloscopes at level you build and test for? I am guessing radio & perhaps medical? I've only ever used them for basic electronics back in the 90s so the performance of your stuff is just stunning.
-
@azonenberg @funkylab I am curious what fields use Oscilloscopes at level you build and test for? I am guessing radio & perhaps medical? I've only ever used them for basic electronics back in the 90s so the performance of your stuff is just stunning.
@CliffsEsport @funkylab My focus is mostly on the high speed digital side of things, so networking, high speed buses, etc. Modern digital interfaces are absurdly fast.
Even USB 3.0 was 5 Gbps per pair and that's pretty slow compared to modern stuff. PCIe gen6 runs at 64 Gbps.
DisplayPort goes up to 20 Gbps per lane now.
But understanding complex issues around these buses involves recording a lot of data, processing it fast, looking at packet captures and physical layer signal quality, etc. There's always room to crunch more data faster.