Recent Events for foo.be MainPageDiary (Blog)

FeedCollection

hack.lu 2007

http://www.hack.lu/news.rdf returned no data, or LWP::UserAgent is not available.

adulau SVN

http://a.6f2.net/svnweb/index.cgi/adulau/rss/ returned no data, or LWP::UserAgent is not available.

Michael G. Noll

http://www.michael-noll.com/feed/ returned no data, or LWP::UserAgent is not available.

Justin Mason

2025-03-11

  • 13:00 UTC Arguments about AI summarisationArguments about AI summarisation This is from an W3C discussion thread, where AI summarisation and minuting of meetings was proposed, and it lays out some interesting issues with LLM summarisation: Sure I’m excited about new tech as the next person, but I want to express my concerns (sorry to point out some elephants in the room): Ethics – major large language models rely on stolen training data, and they use low wage workers to ‘train’ at the expense of the well being of those workers. Environment – Apart from raw material usage that comes with increase in processing power, LLMs uses a lot more energy and water than human scribes and summarisers do (both during training and at point of use). Magnitudes more, not negligible, such that major tech cos are building/buying nuclear power plants and areas near data centres suffer from water shortages and price hikes. Can we improve disability rights while disregarding environmental effects? Quality – we’ve got a lot of experts in our group: who are sometimes wrong, sure, but it seems like a disservice to their input, knowledge and expertise to pipe their speech through LLMs. From the couple of groups I’ve been in that used AI summaries, I’ve seen them: a. miss the point a lot of the time; it looks reasonable but doesn’t match up with what people said/meant; b. ‘normalise’ what was said to what most people would say, so it biases towards what’s more common in training data, rather than towards the smart things individuals in this group often bring up. Normalising seems orthogonal to innovation? c. create summaries that are either very long and wooly, with many unnecessary words, or short but incorrect. If we’re considering if it’s technically possible, I’d urge us to consider the problems with these systems too, including in ethics, environmental impact and quality. The “normalising” risk is one that hadn’t occurred to me, but it makes perfect sense given how LLMs operate. Tags: llms ai summarisation w3c discussion meetings automation transcription

2025-03-07

  • 17:00 UTC Fighting the AI scraperbot scourgeFighting the AI scraperbot scourge How LWN are dealing with the flood of AI scrapers using botnets Tags: botnets scrapers ai llms web lwn ops
  • 10:00 UTC AWS WAF adds JA4 fingerprintingAWS WAF adds JA4 fingerprinting TIL: A JA4 TLS client fingerprint contains a 36-character long fingerprint of the TLS Client Hello which is used to initiate a secure connection from clients. The fingerprint can be used to build a database of known good and bad actors to apply when inspecting HTTP[S] requests. These new features enhance your ability to identify and mitigate sophisticated attacks by creating more precise rules based on client behavior patterns. By leveraging both JA4 and JA3 fingerprinting capabilities, you can implement robust protection against automated threats while maintaining legitimate traffic flow to your applications. Tags: fingerprinting http https tls ja3 ja4 inspection networking firewalls waf web

2025-03-06

  • 15:50 UTC WireMockWireMock I could have done with knowing about this before implementing mock APNs, Huawei, Microsoft and FCM push APIs over the last few years! An open-source tool for API mock testing, with over 5 million downloads per month. It can help you to create stable test and development environments, isolate yourself from flakey 3rd parties and simulate APIs that don’t exist yet. Nice features include running in-process in a JVM, standalone, or in a Docker container; GraphQL and gRPC support; and fault and latency injection. https://library.wiremock.org/ is a library of pre-built API mocks other people have previously made. Tags: mocking testing mocks integration-testing wiremock tools coding apis
  • 11:30 UTC KIP-932: Queues for KafkaKIP-932: Queues for Kafka KIP-932 adds a long awaited capability to the Apache Kafka project: queue-like semantics, including the ability to acknowledge messages on a one-by-one basis. This positions Kafka for use cases such as job queuing, for which it hasn’t been a good fit historically. As multiple members of a share group can process the messages from a single topic partition, the partition count does not limit the degree of consumer parallelism any longer. The number of consumers in a group can quickly be increased and decreased as needed, without requiring to repartition the topic. [….] Available as an early access feature as of the [unreleased] Kafka 4.0 release, Kafka queues are not recommended for production usage yet, and there are several limitations worth calling out: most importantly, the lack of DLQ support. More control over retry timing would be desirable, too. As such, I don’t think Kafka queues in their current form will make users of established queue solutions such as Artemis or RabbitMQ migrate to Kafka. It is a very useful addition to the Kafka feature set nevertheless, coming in handy for instance for teams already running Kafka and who look for a solution for simple queuing use cases, avoiding to stand up and operate a separate solution just for these. This story will become even more compelling if the feature gets built out and improved in future Kafka releases. Tags: kafka queueing queues architecture
  • 10:40 UTC HAJPAQUEHAJPAQUE “Hardware Acceleration for JSON Parsing, Querying and Schema Validation” — State-of-the-art analytics pipelines can now process data at a rate that exceeds 50 Gbps owing to recent advances in RDMA, NVM, and network technology (notably Infiniband). The peak throughput of the best-performing software solutions for parsing, querying, and validating JSON data is 20 Gbps, which is far lower than the current requirement. We propose a novel [hardware-]based accelerator that ingests 16-bytes of JSON data at a time and processes all the 16 bytes in parallel as opposed to competing approaches that process such data byte by byte. Our novel solution comprises lookup tables, parallel sliding windows, and recursive computation. Together, they ensure that our online pipeline does not encounter any stalls while performing all the operations on JSON data. We ran experiments on several widely used JSON benchmarks/datasets and demonstrated that we can parse and query JSON data at 106 Gbps (@28 nm). (Via Rob) Tags: accelerators papers asics json parsing throughput performance via:rsynnott

2025-03-05

  • 11:10 UTC The history behind “Assassin’s Creed: Valhalla”The history behind “Assassin’s Creed: Valhalla” History Hit, the UK historical podcast company, are recording a podcast where they dig into the extensive historical background used in the various “Assassin’s Creed” videogames. This episode digs into the history which animates “Assassin’s Creed: Valhalla”, set in Britain and Ireland around 800-900CE during the time of the Great Heathen Army’s invasion, and it’s fascinating stuff. Tags: history podcasts ireland britain vikings assassins-creed videogames games

2025-03-04

  • 11:50 UTC kafkacat and visidatakafkacat and visidata Two excellent tools in one blog post. Visidata “is a commandline tool to work with data in all sorts of formats, including from stdin”; in this example it’s taking lines of JSONL and producing an instant histogram of values from the stream: Once visidata is open, use the arrow keys to move to the column on which you want to build a histogram and press Shift-F. Since it works with pipes if you leave the -e off the kafkacat argument you get a live stream of messages from the Kafka topic and the visidata will continue to update as messages arrive (although I think you need to replot the histogram if you want it to refresh). On top of that, there’s kcat, “netcat for Kafka”, “a swiss-army knife of tools for inspecting and creating data in Kafka”, even supporting on-the-fly decode of Avro messages. https://github.com/edenhill/kcat Tags: kcat kafka streams visidata tools cli avro debugging

2025-03-03

  • 18:00 UTC Answers for AWS Survey for 2025Answers for AWS Survey for 2025 The most-used AWS services; mainly SNS, SQS, and everyone hates Jenkins Tags: aws sqs sns architecture cloud-computing surveys
  • 18:00 UTC RuffRuff An extremely fast Python linter and code formatter, written in Rust. Ruff aims to be orders of magnitude faster than alternative tools while integrating more functionality behind a single, common interface. Ruff can be used to replace Flake8 (plus dozens of plugins), Black, isort, pydocstyle, pyupgrade, autoflake, and more, all while executing tens or hundreds of times faster than any individual tool. Tags: formatting coding python tools lint code

Paul Graham