You know what would be really nice (but nobody is ever going to build)?
-
You know what would be really nice (but nobody is ever going to build)?
Oscilloscope that replaces the ungodly slow USB3/1000baseT PC interface port with NVLink.
Forget PCIe and Thunderbolt... 900 GB/s of bandwidth straight from the ADC to my GPU? Sign me up.
-
You know what would be really nice (but nobody is ever going to build)?
Oscilloscope that replaces the ungodly slow USB3/1000baseT PC interface port with NVLink.
Forget PCIe and Thunderbolt... 900 GB/s of bandwidth straight from the ADC to my GPU? Sign me up.
For comparison... my 16 GHz LeCroy oscilloscope puts out 40 Gsps * 4 channels * 8 bits of raw ADC samples, not counting the flatness corrections done in gateware/firmware.
That's 160 GB/s or 1.28 Tbps of raw samples.
That would even fit in NVLink 2.0 much less the current gen4/5 stuff.
Imagine four channels of 16 GHz bandwidth waveform data straight into a (very large) GPU nonstop... We'd have to do a hell of a lot of optimization to ngscopeclient to keep up and probably add multi-GPU support but it would be so much fun lol.
-
For comparison... my 16 GHz LeCroy oscilloscope puts out 40 Gsps * 4 channels * 8 bits of raw ADC samples, not counting the flatness corrections done in gateware/firmware.
That's 160 GB/s or 1.28 Tbps of raw samples.
That would even fit in NVLink 2.0 much less the current gen4/5 stuff.
Imagine four channels of 16 GHz bandwidth waveform data straight into a (very large) GPU nonstop... We'd have to do a hell of a lot of optimization to ngscopeclient to keep up and probably add multi-GPU support but it would be so much fun lol.
@azonenberg I'm always kind of weary of silicon manufacturer's proprietary high-speed buses, because they teeeend to be slightly use-case-specific and don't deal well with edge cases outside that. Anyways, when I saw NVLink my first reaction was "wait is this HyperTransport, but with expensive modern transceivers?"; it isn't, but my guess is that from a system's perspective, you'd be better off going for AMD's InfinityFabric, which seems to make stronger coherence statements (not sure). But, you
-
@azonenberg I'm always kind of weary of silicon manufacturer's proprietary high-speed buses, because they teeeend to be slightly use-case-specific and don't deal well with edge cases outside that. Anyways, when I saw NVLink my first reaction was "wait is this HyperTransport, but with expensive modern transceivers?"; it isn't, but my guess is that from a system's perspective, you'd be better off going for AMD's InfinityFabric, which seems to make stronger coherence statements (not sure). But, you
@azonenberg are mostly only optimizing for nvidia GPUs anyways, so that might be a moot point.
-
@azonenberg are mostly only optimizing for nvidia GPUs anyways, so that might be a moot point.
@funkylab Well I mean I would *like* a ludicrously high bandwidth portable interface, but the vendors aren't building it.
Realistically, I think the best you can do portably today is 100GbE with RoCE.
-
For comparison... my 16 GHz LeCroy oscilloscope puts out 40 Gsps * 4 channels * 8 bits of raw ADC samples, not counting the flatness corrections done in gateware/firmware.
That's 160 GB/s or 1.28 Tbps of raw samples.
That would even fit in NVLink 2.0 much less the current gen4/5 stuff.
Imagine four channels of 16 GHz bandwidth waveform data straight into a (very large) GPU nonstop... We'd have to do a hell of a lot of optimization to ngscopeclient to keep up and probably add multi-GPU support but it would be so much fun lol.
-
@funkylab Well I mean I would *like* a ludicrously high bandwidth portable interface, but the vendors aren't building it.
Realistically, I think the best you can do portably today is 100GbE with RoCE.
@azonenberg oh you *can* buy 800 Gb/s interfaces, don't know what their host sides look like, if any for non network-vendor stuff (this is mostly aggregated traffic equipment, i.e. linking racks or DC 1 to DC 2)
-
@azonenberg oh you *can* buy 800 Gb/s interfaces, don't know what their host sides look like, if any for non network-vendor stuff (this is mostly aggregated traffic equipment, i.e. linking racks or DC 1 to DC 2)
@funkylab yeah exactly. 100G with a normal pcie interface is available today, i have a 100G pipe to my desk and have saturated it with iperf in benchmarks.
And the nic has RoCE offload capabilities although I'm not using it yet
-
@funkylab yeah exactly. 100G with a normal pcie interface is available today, i have a 100G pipe to my desk and have saturated it with iperf in benchmarks.
And the nic has RoCE offload capabilities although I'm not using it yet
@funkylab You can go all the way up to 800G if you have a host system with PCIe gen6 and sufficiently deep pockets (I do not)
-
For comparison... my 16 GHz LeCroy oscilloscope puts out 40 Gsps * 4 channels * 8 bits of raw ADC samples, not counting the flatness corrections done in gateware/firmware.
That's 160 GB/s or 1.28 Tbps of raw samples.
That would even fit in NVLink 2.0 much less the current gen4/5 stuff.
Imagine four channels of 16 GHz bandwidth waveform data straight into a (very large) GPU nonstop... We'd have to do a hell of a lot of optimization to ngscopeclient to keep up and probably add multi-GPU support but it would be so much fun lol.
@azonenberg so my tek 11801C with it's ability to connect to an external sampling head array might present a problem? Oh no wait it's sampling to slow.
-
@azonenberg so my tek 11801C with it's ability to connect to an external sampling head array might present a problem? Oh no wait it's sampling to slow.
@scribblesonnapkins well equivalent time sampling is easy to handle with today's tech because the number of actual samples acquired per second is low.
equally, a scope that acquires high speed data and buffers it in memory before processing at a much slower rate is something we can handle today.
But the vision is to be able to do real time or at least lower-dead-time processing at much higher data rates. ThunderScope almost maxes out 10GbE, my vision is to able to keep up with 25/40/100G eventually
-
M meph@social.treehouse.systems shared this topic
-
@funkylab You can go all the way up to 800G if you have a host system with PCIe gen6 and sufficiently deep pockets (I do not)
@azonenberg I doubt your pockets will be deep enough for NVlink things involving anything but GPUs

-
@azonenberg I doubt your pockets will be deep enough for NVlink things involving anything but GPUs

@funkylab oh i know, nvlink doesnt even let you get the PHY chiplets (the protocol itself is undocumented) unless you have NDAs and a partnership with nvidia etc.
but I can dream...
-
@funkylab oh i know, nvlink doesnt even let you get the PHY chiplets (the protocol itself is undocumented) unless you have NDAs and a partnership with nvidia etc.
but I can dream...
@azonenberg I was assuming that you'd probably (assuming infinite money) could buy an Nvidia server platform that has network->VRAM piping (I assume this because I presume that's what nvidia bought mellanox for)
-
@azonenberg I was assuming that you'd probably (assuming infinite money) could buy an Nvidia server platform that has network->VRAM piping (I assume this because I presume that's what nvidia bought mellanox for)
@funkylab That's where RoCE comes in.
But Ethernet today tops out at 800 Gbps while the latest NVLink can do 14.4 Tbps
-
@funkylab That's where RoCE comes in.
But Ethernet today tops out at 800 Gbps while the latest NVLink can do 14.4 Tbps
@funkylab NVLink is the fantasy, the actually achievable real world implementation is to make the scope speak RoCE, put a mellanox NIC in the client, and RDMA the incoming Ethernet frames straight into VRAM.
But it still has to cross over PCIe and get bottlenecked on that bandwidth
-
@scribblesonnapkins well equivalent time sampling is easy to handle with today's tech because the number of actual samples acquired per second is low.
equally, a scope that acquires high speed data and buffers it in memory before processing at a much slower rate is something we can handle today.
But the vision is to be able to do real time or at least lower-dead-time processing at much higher data rates. ThunderScope almost maxes out 10GbE, my vision is to able to keep up with 25/40/100G eventually
@azonenberg I was trying to make a joke with the 1st part "Oh no wait it's sampling to slow."
But the second part about the thunderscope is cool.
-
You know what would be really nice (but nobody is ever going to build)?
Oscilloscope that replaces the ungodly slow USB3/1000baseT PC interface port with NVLink.
Forget PCIe and Thunderbolt... 900 GB/s of bandwidth straight from the ADC to my GPU? Sign me up.
@azonenberg How about CXL4? That claims 242GB/s and is at least designed for external connectivity.
-
@azonenberg How about CXL4? That claims 242GB/s and is at least designed for external connectivity.
@penguin42 If somebody makes a GPU with CXL I'll be all over it.
Until then I'm stuck with what I can get my hands on. Realistically, that's PCIe and RoCE
-
@funkylab NVLink is the fantasy, the actually achievable real world implementation is to make the scope speak RoCE, put a mellanox NIC in the client, and RDMA the incoming Ethernet frames straight into VRAM.
But it still has to cross over PCIe and get bottlenecked on that bandwidth
@azonenberg @funkylab I am curious what fields use Oscilloscopes at level you build and test for? I am guessing radio & perhaps medical? I've only ever used them for basic electronics back in the 90s so the performance of your stuff is just stunning.