'Consider a system with no DRAM' replaced by a 'recycling fiber loop': John Carmack envisages bold future to avoid AI-driven RAM crisis

tal@lemmy.today · 5 days ago

I bet that if someone went to The Internet Archive, they could pay them to get timestamped snapshots of professionally-spidered stuff at zero load to the websites. I’m sure that it’d cost something for all the hard drives and probably something for labor, but so does spidering the whole Internet yourself. The people running the bots clearly have the funds available to run them at massive scale.

tal@lemmy.today · 6 days ago

looks at slides

I see where the anime catgirl logo that Anubis uses came from.

tal@lemmy.today · edit-2 6 days ago

What makes this worse is that git servers are the most pathologically vulnerable to the onslaught of doom from modern internet scrapers because remember, they click on every link on every page.

The especially disappointing thing is that, for the specific case that Xe was running into, a better-written scraper could just recognize that this is a public git repository and just git clone the thing and get all the useful code without the overhead. Like, it’s not even “this scraper is scraping data that I don’t want it to have”, but “this scraper is too dumb to just scrape the thing efficiently and is blowing both the scraper’s resources and the server’s resources downloading innumerable redundant copies of the data”.

It’s probably just as well, since the protection is relevant for other websites, and he probably wouldn’t have done it if he hadn’t been getting his git repo hammered, but…

EDIT: Plus, I bet that the scraper was requesting a ton of files at once from the server, since he said that it was unusable. Like, you have a zillion servers to parallelize requests over. You could write a scraper that requested one file at once per server, which is common courtesy, and you’re still going to be bandwidth constrained if you’re schlorping up the whole Internet. Xe probably wouldn’t have even noticed.

tal@lemmy.today · 6 days ago

https://en.wikipedia.org/wiki/National_Helium_Reserve

The National Helium Reserve, also known as the Federal Helium Reserve, was a strategic reserve of the United States, which once held over 1 billion cubic meters (about 170,000,000 kg)[a] of helium gas.

The Bureau of Land Management (BLM) transferred the reserve to the General Services Administration (GSA) as surplus property, but a 2022 auction[10] failed to finalize a sale.[11] On June 22, 2023, the GSA announced a new auction of the facilities and remaining helium.[12] The auction of the last helium assets was due to take place in November, 2023.[13] Though the last of the Cliffside reserve was to be sold by November 2023, more natural gas was discovered at the site than was previously known, and the Bureau of Land Management extended the auction to January 25, 2024 to allow for increased bids.[14] In 2024 the remaining reserve was sold to the highest bidder, Messer Group.[15]

Arguably not the best timing on that.

tal@lemmy.today · 6 days ago

Sure. What that guy is using is actually not the most-interesting diagram style, IMHO, for automatic layout of network maps, if you want large-scale stuff, which is where the automatic layout gets more interesting. I have some scripts floating around somewhere that will generate very large network maps — run a bunch of traceroutes, geolocate IPs, dump the results into an sqlite database, and then generate an automatically laid-out Internet network map. I don’t want to go to the trouble of anonymizing the addresses and locations right now, but if you have a graphviz graph and want to try playing with it, I used:

goes looking

Ugh, it’s Python 2, a decade-and-a-half old, and never got ported to Python 3. Lemme gin up an example for the non-hierarchical graphviz stuff:

graph.dot:

graph foo {
    a--b
    a--d
    b--c
    d--e
    c--e
    e--f
    b--d
}

Processed with:

$ sfdp -Goverlap=prism -Gsep=+5 -Gesep=+4 -Gremincross -Gpack -Gsplines=true -Tpdf -o graph.pdf graph.dot

Generates something like this:

That’ll take a ton of graphviz edges and nicely lay them out while trying to avoid crossing edges and stuff, in a non-hierarchical map. Get more complicated maps that it can’t use direct lines on, it’ll use splines to curve lines around nodes. You can create massive network maps like this. Note that I was last looking at graphviz’s automated layout stuff about 15 years ago, so it’s possible that they have better layout algorithms now, but this can deal with enormous numbers of nodes and will do reasonable things with them.

I just grabbed his example because it was the first graphviz network map example that came up on a Web search.

tal@lemmy.today · 6 days ago

We faced an unprecedented bot problem

When the Digg beta launched, we immediately noticed posts from SEO spammers noting that Digg still carried meaningful Google link authority. Within hours, we got a taste of what we’d only heard rumors about. The internet is now populated, in meaningful part, by sophisticated AI agents and automated accounts. We knew bots were part of the landscape, but we didn’t appreciate the scale, sophistication, or speed at which they’d find us. We banned tens of thousands of accounts. We deployed internal tooling and industry-standard external vendors. None of it was enough. When you can’t trust that the votes, the comments, and the engagement you’re seeing are real, you’ve lost the foundation a community platform is built on.

This isn’t just a Digg problem. It’s an internet problem. But it hit us harder because trust is the product.

It’s a social media problem. It’s going to be hard to provide pseudonymity, low-cost accounts relatively freely, and counter bots spamming the system to manipulate it. The model worked well in an era before there were very human-like bots that were easy to produce.

It might be possible to build webs of trust with pseudonyms. You can make a new pseudonym, but the influence and visibility gets tied to, for example, what users or curators that you trust trust, so the pseudonym has less weight until it acquires reputation. I do not think that a single global trust “score” will work, because you can always have bot webs of trust.

Unfortunately, the tools to unmask pseudonyms are also getting better, and throwing away pseudonyms occasionally or using more of them is one of the reasonable counters to unmasking, and that doesn’t play well with relying more on reputation.

tal@lemmy.today · 6 days ago

So that when setting up a new system, you can migrate all your user configuration easily, while also version-controlling it.

tal@lemmy.today · 6 days ago

The old lighting wasn’t that great anyway. If I were to just put lighting on a DMX512-controlled network, then all of it could be synchronized to whole-house audio…

tal@lemmy.today · edit-2 6 days ago

It seems like a good time to learn graphviz’s dot format for the network layout diagrams, with automated layout.

https://mamchenkov.net/wordpress/2015/08/20/graphviz-dot-erds-network-diagrams/

tal@lemmy.today · 6 days ago

Probably a good idea to switch over to WPA-Enterprise using Authentik’s RADIUS server support and let all of the users of your wireless access point log in with their own network credentials, while you’re at it.

tal@lemmy.today · 7 days ago

You have all your devices attached to a console server with a serial port console set up on the serial port, and if they support accessing the BIOS via a serial console, that enabled so that you can access that remotely, right? Either a dedicated hardware console server, or some server on your network with a multiport serial card or a USB to multiport serial adapter or something like that, right? So that if networking fails on one of those other devices, you can fire up minicom or similar on the serial console server and get into the device and fix whatever’s broken?

Oh, you don’t. Well, that’s probably okay. I mean, you probably won’t lose networking on those devices.

tal@lemmy.today · 7 days ago

You have remote power management set up for the systems in your homelab, right? A server set up that you can reach to power-cycle other servers, so that if they wedge in some unusable state and you can’t be physically there, you can still reboot them? A managed/smart PDU or something like that? Something like one of these guys?

Oh. You don’t. Well, that’s probably okay. I mean, nothing will probably go wrong and render a device in need of being forcibly rebooted when you’re physically away from home.

tal@lemmy.today · 7 days ago

You have squid or some other forward http proxy set up to share a cache among all the devices on your network set up to access the Web, to minimize duplicate traffic?

And you have a shared caching DNS server set up locally, something like BIND?

Oh. You don’t. Well, that’s probably okay. I mean, it probably doesn’t matter that your devices are pulling duplicate copies of data down. Not everyone can have a network that minimizes latency and avoids inefficiency across devices.

tal@lemmy.today · 7 days ago

All of those systems in your homelab…they aren’t all pulling down their updates multiple times over your network link, right? You’re making use of a network-wide cache? For Debian-family systems, something like Apt-Cacher NG?

Oh. You’re not. Well, that’s probably okay. I mean, not everyone can have their environment optimized to minimize network traffic.

tal@lemmy.today · 7 days ago

You have an intrusion detection system set up, right? A server watching your network’s traffic, looking for signs that systems on your network have been compromised, and to warn you? Snort or something like that?

Oh. You don’t. Well, that’s probably okay. I mean, probably nothing on your network has been compromised. And probably nothing in the future will be.

tal@lemmy.today · 7 days ago

All of your systems are set up, but are they capable of being redeployed using a configuration management software package? Ansible or something like that?

Oh. They’re not. Well, that’s probably okay. I mean, you could probably go manually reproduce configurations, more or less.

tal@lemmy.today · 7 days ago

logging is probably down

You do, of course have a dedicated rsyslogd server? An isolated system to which logs are sent, so that if someone compromises another one of your systems, they can’t wipe traces of that compromise from those systems?

Oh. You don’t. Well, that’s okay. Not every lab can be complete. That Raspberry Pi over there in the corner isn’t actually doing anything, but it’s probably happy where it is. You know, being off, not doing anything.

tal@lemmy.today · 8 days ago

OpenAI exec highlights the rising importance of AI compute in tech job compensation.

In other news, Roblox executive thinks that having companies pay employees partly in Robux would be a great idea.

tal@lemmy.today · edit-2 12 days ago

Neural net computation has predictable access patterns, so instead of using the thing as a random access memory with latency incurred by waiting for the bit you want to get around to you, I expect that you can load the memory appropriately such that you always have the appropriate bit showing up at the time you need it. I’d guess that it probably needs something like the ability to buffer a small amount of data to get and keep multiple fiber coils in synch due to thermal expansion.

The Hacker’s Jargon File has an anecdote about doing something akin to that with drum memory, “The Story of Mel”.

http://www.catb.org/~esr/jargon/html/story-of-mel.html

tal@lemmy.today · 12 days ago

I’m assuming that the point is the bandwidth.

goes looking for HBM bandwidth

https://en.wikipedia.org/wiki/High_Bandwidth_Memory

It says that HBM 4, which came out one year ago, can do 2 TiB/s.

tal@lemmy.today · 12 days ago

'Consider a system with no DRAM' replaced by a 'recycling fiber loop': John Carmack envisages bold future to avoid AI-driven RAM crisis

tal@lemmy.today · 14 days ago

It might be a good thing for the Internet to get intrinsic resistance to DDoS attacks