February 19, 2018
February 17, 2018
February 15, 2018
Be ready to love : #Devuan ASCII 2.0.0-beta is out!
So what's new in Devuan 2.0 ASCII Beta?
- #OpenRC is installable using the expert install path (thanks Maemo Leste!)
- eudev has replaced systemd-udev (thanks Gentoo!)
- elogind has been added as an alternative to consolekit (thanks Gentoo!)
- Desktop users can choose among fully functional XFCE (Default), KDE, Cinnamon, LXQT, MATE, and LXDE desktops
- CLI-oriented users can select the "Console productivity" task that installs a fully-featured set of console-based utils and tools.
- A .vdi disk image is now provided for use with VirtualBox.
- ARM board kernels have been updated to 4.14 and 4.15 for most boards.
An annotated digest of the top "Hacker" "News" posts for the second week of February, 2018.
February 08, 2018 (comments)
Google is excited about the progress they've made in their unending quest to ensure that trustworthiness is a quality that can only be ascribed by Google. Hackernews is disturbed at the idea that someone might be running arbitrary code in a web browser without HTTPS, which is the only possible way to determine if code should be run. One Hackernews figures out that browser vendors have effectively become the gatekeepers of all internet access, but everyone decides that there's nobody else qualified to do the job.
February 09, 2018 (comments)
A ragtag band of internet drifters releases some kind of weird offline YouTube clone that doesn't even have a messaging app built in. Hackernews celebrates the duration of the project development, but is miffed that it doesn't seem to work with Apple TV.
February 10, 2018 (comments)
An Internet posts an oral history of one of Google's legendary project management successes. Starting from open protocols and widely-understood software, Google slowly and steadily replaced every aspect of the program with unusable garbage. Hackernews spends a while disagreeing about which of Google's eighteen indistinguishable chat programs is best, then proceeds to list every single communications-related piece of software ever deployed. Everyone agrees that anything is better than email, which is why nobody on Hackernews has ever had an email account.
February 11, 2018 (comments)
A webshit lectures us about problems with keeping track of user accounts, almost all of which were directly caused by the garbage software the webshit has chosen to embrace. Border skirmishes break out between the Schlaueste Demokratische Republik and Союз Стандартизованных Социалистических Республик. When arguing about e-mail address parsing gets tiresome, Hackernews switches to optimizing InnoDB layout for looking up user IDs. One Hackernews writes a doctoral thesis about the injustice of unique usernames, and then has to defend it against a panel of Hackernews who are incapable of imagining any other approach.
February 12, 2018 (comments)
Facebook, a webshit founded on the idea that everyone on Earth should upload all of their personal information to Facebook, begins development on a special edition of its software for the German market. Hackernews is fully aroused, since the only thing more alluring than bikeshedding a legal issue is bikeshedding a foreign legal issue. Even more delicious: bikeshedding hypothetical legal issues that might arise after the instantiation of an upcoming new law! When the orgy dies down, Hackernews lights a Marlboro and engages in pillow talk about how foolish the German government must be to clash with an organization with as many lawyers as Facebook has.
February 13, 2018 (comments)
It is a period of civil war. Rebel Hackernews, striking from a hidden browser, have experienced their first disagreement with the evil Google Empire. During the comment thread, rebel Hackernews managed to object to secret plans for Google's ultimate weapon, AMP, a webshit protocol with enough lock-in to destroy an entire internet. Pursued by the Empire's sinister agents, rebel Hackernews race online aboard their Macbooks, custodians of the stubborn objections that can save their protocols and restore freedom to the internet.....
February 14, 2018 (comments)
A webshit posts a summary of that one week in high school your physics teacher talked about audio. Hackernews is so mesmerized by the moving colors and pretty sounds that they can barely incorrect each other about digital signal processing, so they spend some time trading webshit school lessons instead.
February 14, 2018
February 13, 2018
Deux jours inoubliables avec les étudiants de la Faculté de Génie
Industriel, de l’Université de Douala au Cameroun.
February 12, 2018
February 11, 2018
This is an experiment we made some months ago.
It’s about a damaged EPROM. It was inserted backwise into his socket, so it was burned. We saw it and we had an idea…
The EPROM was crack-opened by the owner. A microscope image show us that connections in +Vcc and Ground were interrupted. So EPROM wasn’t recognized by our instruments.
Our idea was to try reconnecting the pin to the pad into the silicon circuit, applying two tiny wires.
Result wasn’t good on this EPROM: we can read it but the content is blanked out, maybe for the current spike as a result of the wrong insertion. We think is completely damaged.
So we tried this test on another purposefully damaged EPROM… and we had a success: we did the pin connection, EPROM was succesfully red and we also were able to reprogram it
So we think this kind of data recovery from a burned EPROM is sometime profitable.
February 08, 2018
An annotated digest of the top "Hacker" "News" posts for the first week of February, 2018.
February 01, 2018 (comments)
Mozilla half-asses another fundamental privacy feature. No explanation is given as to why users would want this protection only in specific circumstances, instead of, for instance, at all times everywhere. Hackernews initially wants this feature enabled full-time, but slowly realizes that this would deprive all their employers of delicious customer data. After some debate, they decide that advertising agencies are insurmountable opponents and only the government can save them.
February 02, 2018 (comments)
An Internet posts a deep dive into a supremely creepy hobby. Hackernews has a nice chuckle at the people who think creepy hobbies are anything but the natural progression of human society toward the ultimate utopia where nobody can trust any of their senses.
February 03, 2018 (comments)
A webshit guesses about how a web browser works and complains that the ad agency which makes the browser isn't helpful enough about blocking ads. Hackernews writes, and then bikesheds, science fiction about possible malfeasance on the part of the ad agency. Another Hackernews figures out that the ad agency is only enabling the blocker on sites that don't comply with the "recommendations" of a cartel operated by the ad agency.
February 04, 2018 (comments)
A webshit posts a brief history of the crayons that webshits use to scribble on your browser. Hackernews, unsatisified with the pedantry of the original article, argues about the etymology of various keywords. The rest of the comments are various Hackernews expressing relief that CSS has evolved past all of this box-model garbage to the platonic ideal of just using
tables grids for everything.
February 05, 2018 (comments)
The United States government continues the war against its own users. Hackernews is utterly outraged at the idea that some corporation somewhere can track and monitor their activity without express consent and then aggregate that data and then market predictive services to third parties based on that data even if that corporation is not based in Silicon Valley.
February 06, 2018 (comments)
Tesla finally launches a product as advertised. Hackernews, based on Youtube videos, reverse-engineers a spacecraft down to the metallurgical level and then sagely debates the maintenance characteristics of an aircraft that exited service before any of them learned to read and none of them have seen in operation, mostly derived from blog posts by people they're pretty sure knew a guy.
February 07, 2018 (comments)
February 07, 2018
Ahoy, programming-language tinkerfolk! Today's rambling missive chews the gnarly bones of "inline caches", in general but also with particular respect to the Guile implementation of Scheme. First, a little intro.
Inline caches are a language implementation technique used to accelerate polymorphic dispatch. Let's dive in to that.
By implementation technique, I mean that the technique applies to the language compiler and runtime, rather than to the semantics of the language itself. The effects on the language do exist though in an indirect way, in the sense that inline caches can make some operations faster and therefore more common. Eventually inline caches can affect what users expect out of a language and what kinds of programs they write.
characterscodepoints by index, then you want an array. But if the codepoints are all below 256, maybe you should represent them as bytes to save space, whereas maybe instead as 4-byte codepoints otherwise? Or maybe even UTF-8 with a codepoint index side table.
The right representation (form) of a string depends on the myriad ways that the string might be used. The string-append operation is polymorphic, in the sense that the precise code for the operator depends on the representation of the operands -- despite the fact that the meaning of string-append is monomorphic!
Anyway, that's the problem. Before inline caches came along, there were two solutions: callouts and open-coding. Both were bad in similar ways. A callout is where the compiler generates a call to a generic runtime routine. The runtime routine will be able to handle all the myriad forms and combination of forms of the operands. This works fine but can be a bit slow, as all callouts for a given operator (e.g. string-append) dispatch to a single routine for the whole program, so they don't get to optimize for any particular call site.
One tempting thing for compiler writers to do is to effectively inline the string-append operation into each of its call sites. This is "open-coding" (in the terminology of the early Lisp implementations like MACLISP). The advantage here is that maybe the compiler knows something about one or more of the operands, so it can eliminate some cases, effectively performing some compile-time specialization. But this is a limited technique; one could argue that the whole point of polymorphism is to allow for generic operations on generic data, so you rarely have compile-time invariants that can allow you to specialize. Open-coding of polymorphic operations instead leads to code bloat, as the string-append operation is just so many copies of the same thing.
However for any given reference y.z in the source code, there is a finite set of concrete representations of y that will actually flow to that call site at run-time. Inline caches allow the language implementation to specialize the y.z access for its particular call site. For example, at some point in the evaluation of a program, y may be seen to have representation R1 or R2. For R1, the z property may be stored at offset 3 within the object's storage, and for R2 it might be at offset 4. The inline cache is a bit of specialized code that compares the type of the object being accessed against R1 , in that case returning the value at offset 3, otherwise R2 and offset r4, and otherwise falling back to a generic routine. If this isn't clear to you, Vyacheslav Egorov write a fine article describing and implementing the object representation optimizations enabled by inline caches.
Inline caches also serve as input data to later stages of an adaptive compiler, allowing the compiler to selectively inline (open-code) only those cases that are appropriate to values actually seen at any given call site.
The classic formulation of inline caches from Self and early V8 actually patched the code being executed. An inline cache might be allocated at address 0xcabba9e5 and the code emitted for its call-site would be jmp 0xcabba9e5. If the inline cache ended up bottoming out to the generic routine, a new inline cache would be generated that added an implementation appropriate to the newly seen "form" of the operands and the call-site. Let's say that new IC (inline cache) would have the address 0x900db334. Early versions of V8 would actually patch the machine code at the call-site to be jmp 0x900db334 instead of jmp 0xcabba6e5.
Patching machine code has a number of disadvantages, though. It inherently target-specific: you will need different strategies to patch x86-64 and armv7 machine code. It's also expensive: you have to flush the instruction cache after the patch, which slows you down. That is, of course, if you are allowed to patch executable code; on many systems that's impossible. Writable machine code is a potential vulnerability if the system may be vulnerable to remote code execution.
For all of these reasons, the modern take on inline caches is to implement them as a memory location that can be atomically modified. The call site is just jmp *loc, as if it were a virtual method call. Modern CPUs have "branch target buffers" that predict the target of these indirect branches with very high accuracy so that the indirect jump does not become a pipeline stall. (What does this mean in the face of the Spectre v2 vulnerabilities? Sadly, God only knows at this point. Saddest panda.)
cry, the beloved country
I am interested in ICs in the context of the Guile implementation of Scheme, but first I will make a digression. Scheme is a very monomorphic language. Yet, this monomorphism is entirely cultural. It is in no way essential. Lack of ICs in implementations has actually fed back and encouraged this monomorphism.
Let us take as an example the case of property access. If you have a pair in Scheme and you want its first field, you do (car x). But if you have a vector, you do (vector-ref x 0).
What's the reason for this nonuniformity? You could have a generic ref procedure, which when invoked as (ref x 0) would return the field in x associated with 0. Or (ref x 'foo) to return the foo property of x. It would be more orthogonal in some ways, and it's completely valid Scheme.
We don't write Scheme programs this way, though. From what I can tell, it's for two reasons: one good, and one bad.
The good reason is that saying vector-ref means more to the reader. You know more about the complexity of the operation and what side effects it might have. When you call ref, who knows? Using concrete primitives allows for better program analysis and understanding.
it gets worse
On the most basic level, Scheme is the call-by-value lambda calculus. It's well-studied, well-understood, and eminently flexible. However the way that the syntax maps to the semantics hides a constrictive monomorphism: that the "callee" of a call refer to a lambda expression.
Concretely, in an expression like (a b), in which a is not a macro, a must evaluate to the result of a lambda expression. Perhaps by reference (e.g. (define a (lambda (x) x))), perhaps directly; but a lambda nonetheless. But what if a is actually a vector? At that point the Scheme language standard would declare that to be an error.
The semantics of Clojure, though, would allow for ((vector 'a 'b 'c) 1) to evaluate to b. Why not in Scheme? There are the same good and bad reasons as with ref. Usually, the concerns of the language implementation dominate, regardless of those of the users who generally want to write terse code. Of course in some cases the implementation concerns should dominate, but not always. Here, Scheme could be more flexible if it wanted to.
what have you done for me lately
Although inline caches are not a miracle cure for performance overheads of polymorphic dispatch, they are a tool in the box. But what, precisely, can they do, both in general and for Scheme?
To my mind, they have five uses. If you can think of more, please let me know in the comments.
Next, there are the arithmetic operators: addition, multiplication, and so on. Scheme's arithmetic is indeed polymorphic; the addition operator + can add any number of complex numbers, with a distinction between exact and inexact values. On a representation level, Guile has fixnums (small exact integers, no heap allocation), bignums (arbitrary-precision heap-allocated exact integers), fractions (exact ratios between integers), flonums (heap-allocated double-precision floating point numbers), and compnums (inexact complex numbers, internally a pair of doubles). Also in Guile, arithmetic operators are a "primitive generics", meaning that they can be extended to operate on new types at runtime via GOOPS.
The usual situation though is that any particular instance of an addition operator only sees fixnums. In that case, it makes sense to only emit code for fixnums, instead of the product of all possible numeric representations. This is a clear application where inline caches can be interesting to Guile.
Third, there is a very specific case related to dynamic linking. Did you know that most programs compiled for GNU/Linux and related systems have inline caches in them? It's a bit weird but the "Procedure Linkage Table" (PLT) segment in ELF binaries on Linux systems is set up in a way that when e.g. libfoo.so is loaded, the dynamic linker usually doesn't eagerly resolve all of the external routines that libfoo.so uses. The first time that libfoo.so calls frobulate, it ends up calling a procedure that looks up the location of the frobulate procedure, then patches the binary code in the PLT so that the next time frobulate is called, it dispatches directly. To dynamic language people it's the weirdest thing in the world that the C/C++/everything-static universe has at its cold, cold heart a hash table and a dynamic dispatch system that it doesn't expose to any kind of user for instrumenting or introspection -- any user that's not a malware author, of course.
But I digress! Guile can use ICs to lazily resolve runtime routines used by compiled Scheme code. But perhaps this isn't optimal, as the set of primitive runtime calls that Guile will embed in its output is finite, and so resolving these routines eagerly would probably be sufficient. Guile could use ICs for inter-module references as well, and these should indeed be resolved lazily; but I don't know, perhaps the current strategy of using a call-site cache for inter-module references is sufficient.
Fourthly (are you counting?), there is a general case of the former: when you see a call (a b) and you don't know what a is. If you put an inline cache in the call, instead of having to emit checks that a is a heap object and a procedure and then emit an indirect call to the procedure's code, you might be able to emit simply a check that a is the same as x, the only callee you ever saw at that site, and in that case you can emit a direct branch to the function's code instead of an indirect branch.
Here I think the argument is less strong. Modern CPUs are already very good at indirect jumps and well-predicted branches. The value of a devirtualization pass in compilers is that it makes the side effects of a virtual method call concrete, allowing for more optimizations; avoiding indirect branches is good but not necessary. On the other hand, Guile does have polymorphic callees (generic functions), and call ICs could help there. Ideally though we would need to extend the language to allow generic functions to feed back to their inline cache handlers.
Finally, ICs could allow for cheap tracepoints and breakpoints. If at every breakable location you included a jmp *loc, and the initial value of *loc was the next instruction, then you could patch individual locations with code to run there. The patched code would be responsible for saving and restoring machine state around the instrumentation.
Honestly I struggle a lot with the idea of debugging native code. GDB does the least-overhead, most-generic thing, which is patching code directly; but it runs from a separate process, and in Guile we need in-process portable debugging. The debugging use case is a clear area where you want adaptive optimization, so that you can emit debugging ceremony from the hottest code, knowing that you can fall back on some earlier tier. Perhaps Guile should bite the bullet and go this way too.
In Guile, monomorphic as it is in most things, probably only arithmetic is worth the trouble of inline caches, at least in the short term.
Another question is how much to specialize the inline caches to their call site. On the extreme side, each call site could have a custom calling convention: if the first operand is in register A and the second is in register B and they are expected to be fixnums, and the result goes in register C, and the continuation is the code at L, well then you generate an inline cache that specializes to all of that. No need to shuffle operands or results, no need to save the continuation (return location) on the stack.
The opposite would be to call ICs as if their were normal procedures: shuffle arguments into fixed operand registers, push a stack frame, and when the IC returns, shuffle the result into place.
Honestly I am looking mostly to the simple solution. I am concerned about code and heap bloat if I specify to every last detail of a call site. Also maximum speed comes with an adaptive optimizer, and in that case simple lower tiers are best.
To compare these impressions, I took a look at V8's current source code to see where they use ICs in practice. When I worked on V8, the compiler was entirely different -- there were two tiers, and both of them generated native code. Inline caches were everywhere, and they were gnarly; every architecture had its own implementation. Now in V8 there are two tiers, not the same as the old ones, and the lowest one is a bytecode interpreter.
As an adaptive optimizer, V8 doesn't need breakpoint ICs. It can always deoptimize back to the interpreter. In actual practice, to debug at a source location, V8 will patch the bytecode to insert a "DebugBreak" instruction, which has its own support in the interpreter. V8 also supports optimized compilation of this operation. So, no ICs needed here.
V8 does use inline caches for property access (loads and stores). Besides that there is an inline cache used in calls which is just used to record callee counts, and not used for direct call optimization.
The dynamic linking and relocation points don't apply to V8 either, as it doesn't receive binary code from the internet; it always starts from source.
twilight of the inline cache
There was a time when inline caches were recommended to solve all your VM problems, but it would seem now that their heyday is past.
ICs are still a win if you have named property access on objects whose shape you don't know at compile-time. But improvements in CPU branch target buffers mean that it's no longer imperative to use ICs to avoid indirect branches (modulo Spectre v2), and creating direct branches via code-patching has gotten more expensive and tricky on today's targets with concurrency and deep cache hierarchies.
Besides that, the type feedback component of inline caches seems to be taken over by explicit data-driven call-site caches, rather than executable inline caches, and the highest-throughput tiers of an adaptive optimizer burn away inline caches anyway. The pressure on an inline cache infrastructure now is towards simplicity and ease of type and call-count profiling, leaving the speed component to those higher tiers.
In Guile the bounded polymorphism on arithmetic combined with the need for ahead-of-time compilation means that ICs are probably a code size and execution time win, but it will take some engineering to prevent the calling convention overhead from dominating cost.
Time to experiment, then -- I'll let y'all know how it goes. Thoughts and feedback welcome from the compilerati. Until then, happy hacking :)
February 05, 2018
I am on my way back from FOSDEM and thought I would share with yall some impressions from talks in the Networking devroom. I didn't get to go to all that many talks -- FOSDEM's hallway track is the hottest of them all -- but I did hit a select few. Thanks to Dave Neary at Red Hat for organizing the room.
Ray Kinsella -- Intel -- The path to data-plane micro-services
The day started with a drum-beating talk that was very light on technical information.
Essentially Ray was arguing for an evolution of network function virtualization -- that instead of running VNFs on bare metal as was done in the days of yore, that people started to run them in virtual machines, and now they run them in containers -- what's next? Ray is saying that "cloud-native VNFs" are the next step.
Cloud-native VNFs to move from "greedy" VNFs that take charge of the cores that are available to them, to some kind of resource sharing. "Maybe users value flexibility over performance", says Ray. It's the Care Bears approach to networking: (resource) sharing is caring.
In practice he proposed two ways that VNFs can map to cores and cards.
One was in-process sharing, which if I understood him properly was actually as nodes running within a VPP process. Basically in this case VPP or DPDK is the scheduler and multiplexes two or more network functions in one process.
The other was letting Linux schedule separate processes. In networking, we don't usually do it this way: we run network functions on dedicated cores on which nothing else runs. Ray was suggesting that perhaps network functions could be more like "normal" Linux services. Ray doesn't know if Linux scheduling will work in practice. Also it might mean allowing DPDK to work with 4K pages instead of the 2M hugepages it currently requires. This obviously has the potential for more latency hazards and would need some tighter engineering, and ultimately would have fewer guarantees than the "greedy" approach.
Interesting side things I noticed:
All the diagrams show Kubernetes managing CPU node allocation and interface assignment. I guess in marketing diagrams, Kubernetes has completely replaced OpenStack.
One slide showed guest VNFs differentiated between "virtual network functions" and "socket-based applications", the latter ones being the legacy services that use kernel APIs. It's a useful terminology difference.
The talk identifies user-space networking with DPDK (only!).
Finally, I note that Conway's law is obviously reflected in the performance overheads: because there are organizational isolations between dev teams, vendors, and users, there are big technical barriers between them too. The least-overhead forms of resource sharing are also those with the highest technical consistency and integration (nodes in a single VPP instance).
Magnus Karlsson -- Intel -- AF_XDP
This was a talk about getting good throughput from the NIC to userspace, but by using some kernel facilities. The idea is to get the kernel to set up the NIC and virtualize the transmit and receive ring buffers, but to let the NIC's DMA'd packets go directly to userspace.
The performance goal is 40Gbps for thousand-byte packets, or 25 Gbps for traffic with only the smallest packets (64 bytes). The fast path does "zero copy" on the packets if the hardware has the capability to steer the subset of traffic associated with the AF_XDP socket to that particular process.
The AF_XDP project builds on XDP, a newish thing where a little kind of bytecode can run on the kernel or possibly on the NIC. One of the bytecode commands (REDIRECT) causes packets to be forwarded to user-space instead of handled by the kernel's otherwise heavyweight networking stack. AF_XDP is the bridge between XDP on the kernel side and an interface to user-space using sockets (as opposed to e.g. AF_INET). The performance goal was to be within 10% or so of DPDK's raw user-space-only performance.
The benefits of AF_XDP over the current situation would be that you have just one device driver, in the kernel, rather than having to have one driver in the kernel (which you have to have anyway) and one in user-space (for speed). Also, with the kernel involved, there is a possibility for better isolation between different processes or containers, when compared with raw PCI access from user-space..
AF_XDP is what was previously known as AF_PACKET v4, and its numbers are looking somewhat OK. Though it's not upstream yet, it might be interesting to get a Snabb driver here.
I would note that kernel-userspace cooperation is a bit of a theme these days. There are other points of potential cooperation or common domain sharing, storage being an obvious one. However I heard more than once this weekend the kind of "I don't know, that area of the kernel has a different culture" sort of concern as that highlighted by Daniel Vetter in his recent LCA talk.
François-Frédéric Ozog -- Linaro -- Userland Network I/O
This talk is hard to summarize. Like the previous one, it's again about getting packets to userspace with some support from the kernel, but the speaker went really deep and I'm not quite sure what in the talk is new and what is known.
François-Frédéric is working on a new set of abstractions for relating the kernel and user-space. He works on OpenDataPlane (ODP), which is kinda like DPDK in some ways. ARM seems to be a big target for his work; that x86-64 is also a target goes without saying.
His problem statement was, how should we enable fast userland network I/O, without duplicating drivers?
François-Frédéric was a bit negative on AF_XDP because (he says) it is so focused on packets that it neglects other kinds of devices with similar needs, such as crypto accelerators. Apparently the challenge here is accelerating a single large IPsec tunnel -- because the cryptographic operations are serialized, you need good single-core performance, and making use of hardware accelerators seems necessary right now for even a single 10Gbps stream. (If you had many tunnels, you could parallelize, but that's not the case here.)
He was also a bit skeptical about standardizing on the "packet array I/O model" which AF_XDP and most NICS use. What he means here is that most current NICs move packets to and from main memory with the help of a "descriptor array" ring buffer that holds pointers to packets. A transmit array stores packets ready to transmit; a receive array stores maximum-sized packet buffers ready to be filled by the NIC. The packet data itself is somewhere else in memory; the descriptor only points to it. When a new packet is received, the NIC fills the corresponding packet buffer and then updates the "descriptor array" to point to the newly available packet. This requires at least two memory writes from the NIC to memory: at least one to write the packet data (one per 64 bytes of packet data), and one to update the DMA descriptor with the packet length and possible other metadata.
Although these writes go directly to cache, there's a limit to the number of DMA operations that can happen per second, and with 100Gbps cards, we can't afford to make one such transaction per packet.
François-Frédéric promoted an alternative I/O model for high-throughput use cases: the "tape I/O model", where packets are just written back-to-back in a uniform array of memory. Every so often a block of memory containing some number of packets is made available to user-space. This has the advantage of packing in more packets per memory block, as there's no wasted space between packets. This increases cache density and decreases DMA transaction count for transferring packet data, as we can use each 64-byte DMA write to its fullest. Additionally there's no side table of descriptors to update, saving a DMA write there.
Apparently the only cards currently capable of 100 Gbps traffic, the Chelsio and Netcope cards, use the "tape I/O model".
Incidentally, the DMA transfer limit isn't the only constraint. Something I hadn't fully appreciated before was memory write bandwidth. Before, I had thought that because the NIC would transfer in packet data directly to cache, that this wouldn't necessarily cause any write traffic to RAM. Apparently that's not the case. Later over drinks (thanks to Red Hat's networking group for organizing), François-Frédéric asserted that the DMA transfers would eventually use up DDR4 bandwidth as well.
A NIC-to-RAM DMA transaction will write one cache line (usually 64 bytes) to the socket's last-level cache. This write will evict whatever was there before. As far as I can tell, there are three cases of interest here. The best case is where the evicted cache line is from a previous DMA transfer to the same address. In that case it's modified in the cache and not yet flushed to main memory, and we can just update the cache instead of flushing to RAM. (Do I misunderstand the way caches work here? Do let me know.)
However if the evicted cache line is from some other address, we might have to flush to RAM if the cache line is dirty. That causes a memory write traffic. But if the cache line is clean, that means it was probably loaded as part of a memory read operation, and then that means we're evicting part of the network function's working set, which will later cause memory read traffic as the data gets loaded in again, and write traffic to flush out the DMA'd packet data cache line.
François-Frédéric simplified the whole thing to equate packet bandwidth with memory write bandwidth, that yes, the packet goes directly to cache but it is also written to RAM. I can't convince myself that that's the case for all packets, but I need to look more into this.
Of course the cache pressure and the memory traffic is worse if the packet data is less compact in memory; and worse still if there is any need to copy data. Ultimately, processing small packets at 100Gbps is still a huge challenge for user-space networking, and it's no wonder that there are only a couple devices on the market that can do it reliably, not that I've seen either of them operate first-hand :)
Talking with Snabb's Luke Gorrie later on, he thought that it could be that we can still stretch the packet array I/O model for a while, given that PCIe gen4 is coming soon, which will increase the DMA transaction rate. So that's a possibility to keep in mind.
At the same time, apparently there are some "coherent interconnects" coming too which will allow the NIC's memory to be mapped into the "normal" address space available to the CPU. In this model, instead of having the NIC transfer packets to the CPU, the NIC's memory will be directly addressable from the CPU, as if it were part of RAM. The latency to pull data in from the NIC to cache is expected to be slightly longer than a RAM access; for comparison, RAM access takes about 70 nanoseconds.
For a user-space networking workload, coherent interconnects don't change much. You still need to get the packet data into cache. True, you do avoid the writeback to main memory, as the packet is already in addressable memory before it's in cache. But, if it's possible to keep the packet on the NIC -- like maybe you are able to add some kind of inline classifier on the NIC that could directly shunt a packet towards an on-board IPSec accelerator -- in that case you could avoid a lot of memory transfer. That appears to be the driving factor for coherent interconnects.
At some point in François-Frédéric's talk, my brain just died. I didn't quite understand all the complexities that he was taking into account. Later, after he kindly took the time to dispell some more of my ignorance, I understand more of it, though not yet all :) The concrete "deliverable" of the talk was a model for kernel modules and user-space drivers that uses the paradigms he was promoting. It's a work in progress from Linaro's networking group, with some support from NIC vendors and CPU manufacturers.
Luke Gorrie and Asumu Takikawa -- SnabbCo and Igalia -- How to write your own NIC driver, and why
This talk had the most magnificent beginning: a sort of "repent now ye sinners" sermon from Luke Gorrie, a seasoned veteran of software networking. Luke started by describing the path of righteousness leading to "driver heaven", a world in which all vendors have publically accessible datasheets which parsimoniously describe what you need to get packets flowing. In this blessed land it's easy to write drivers, and for that reason there are many of them. Developers choose a driver based on their needs, or they write one themselves if their needs are quite specific.
But there is another path, says Luke, that of "driver hell": a world of wickedness and proprietary datasheets, where even when you buy the hardware, you can't program it unless you're buying a hundred thousand units, and even then you are smitten with the cursed non-disclosure agreements. In this inferno, only a vendor is practically empowered to write drivers, but their poor driver developers are only incentivized to get the driver out the door deployed on all nine architectural circles of driver hell. So they include some kind of circle-of-hell abstraction layer, resulting in a hundred thousand lines of code like a tangled frozen beard. We all saw the abyss and repented.
Luke described the process that led to Mellanox releasing the specification for its ConnectX line of cards, something that was warmly appreciated by the entire audience, users and driver developers included. Wonderful stuff.
My Igalia colleague Asumu Takikawa took the last half of the presentation, showing some code for the driver for the Intel i210, i350, and 82599 cards. For more on that, I recommend his recent blog post on user-space driver development. It was truly a ray of sunshine in dark, dark Brussels.
Ole Trøan -- Cisco -- Fast dataplanes with VPP
This talk was a delightful introduction to VPP, but without all of the marketing; the sort of talk that makes FOSDEM worthwhile. Usually at more commercial, vendory events, you can't really get close to the technical people unless you have a vendor relationship: they are surrounded by a phalanx of salesfolk. But in FOSDEM it is clear that we are all comrades out on the open source networking front.
The speaker expressed great personal pleasure on having being able to work on open source software; his relief was palpable. A nice moment.
He also had some kind words about Snabb, too, saying at one point that "of course you can do it on snabb as well -- Snabb and VPP are quite similar in their approach to life". He trolled the horrible complexity diagrams of many "NFV" stacks whose components reflect the org charts that produce them more than the needs of the network functions in question (service chaining anyone?).
He did get to drop some numbers as well, which I found interesting. One is that recently they have been working on carrier-grade NAT, aiming for 6 terabits per second. Those are pretty big boxes and I hope they are getting paid appropriately for that :) For context he said that for a 4-unit server, these days you can build one that does a little less than a terabit per second. I assume that's with ten dual-port 40Gbps cards, and I would guess to power that you'd need around 40 cores or so, split between two sockets.
Finally, he finished with a long example on lightweight 4-over-6. Incidentally this is the same network function my group at Igalia has been building in Snabb over the last couple years, so it was interesting to see the comparison. I enjoyed his commentary that although all of these technologies (carrier-grade NAT, MAP, lightweight 4-over-6) have the ostensible goal of keeping IPv4 running, in reality "we're day by day making IPv4 work worse", mainly by breaking the assumption that just because you get traffic from port P on IP M, doesn't mean you can send traffic to M from another port or another protocol and have it reach the target.
All of these technologies also have problems with IPv4 fragmentation. Getting it right is possible but expensive. Instead, Ole mentions that he and a cross-vendor cabal of dataplane people have a "dark RFC" in the works to deprecate IPv4 fragmentation entirely :)
OK that's it. If I get around to writing up the couple of interesting Java talks I went to (I know right?) I'll let yall know. Happy hacking!
Limpia los volcanes, llueve las flores / Water the flowers - clean the volcanoes
via UbuWeb -> https://twitter.com/ubuweb/status/958762627177635840
February 02, 2018
February 01, 2018
An annotated digest of the top "Hacker" "News" posts for the last week of January, 2018.
January 22, 2018 (comments)
An unpopulated state has strong opinions about packet routing. Hackernews gets into an argument about whether artisanal locally-grown internet is worth the extra money, but the argument is just an excuse to getting back to what Hackernews does best: aggressively misunderstanding the law. The rest of the comments comprise a debate on whether a hypothetical Montana population would develop some sort of rudimentary cultural identity, just like the population of Iowa didn't.
January 23, 2018 (comments)
With a clamor of posts that set the swallows soaring, the remembrance of an author comes to Hackernews in the city Mountain View, bright-towered by the bay.
January 24, 2018 (comments)
Some academics feed pills to a negligible number of old people. Hackernews erupts into trench warfare between those who learned about science from Wikipedia and those who learned about science from working in the web advertising industry. Every soldier in the battlefield dutifully loads and fires reference materials into the enemy, none of which are relevant to the article at hand. Inevitably, the fracas dissolves into extremely gullible Hackernews peddling nutritional supplements and superstitions about health care.
January 25, 2018 (comments)
An Internet, looking for a lost satellite, finds a different lost satellite. NASA explains that the satellite in question experienced a malfunction whose only resolution was to restart the whole apparatus, which turns out to be how Hackernews deals with all their problems too.
January 26, 2018 (comments)
An Internet posts a short story about running a mail server based on four software packages, three of which are extraneous. Hackernews, as with all articles about doing anything for yourself, cries like tiny babies about how hard it is to send and receive email. The few Hackernews who express puzzlement at the wave of whining are derided by Hackernews who haven't used anything but GMail since 2006. A few spammers ask real human beings for tips on evading spam filters.
January 27, 2018 (comments)
A surveillance concierge service produces mortar-targeting maps for free. Hackernews is flummoxed that soldiers are people -- and not just that, but people who apparently buy and use things just like other people! Consensus coalesces: none of this is the surveillance company's fault, and the government should just turn off the internet. Hackernews then spends several days LARPing as military information security agents.
January 28, 2018 (comments)
Intel continues to be a pack of incompetent morons. Hackernews veers off onto a tangent about how ALL updates for EVERYTHING are fucked, except when they're not, or it's the user's fault, or you should have picked a different product, but not that one either. At some point there is a hundred-comment-long thread about Ubuntu, which causes a spate of posts attempting to invent UNIX from first principles. The rest of the comments are Hackernews whining that nothing happens when they click "check for updates."
January 29, 2018 (comments)
An Internet finds a new hole to shove Linux into. Hackernews agrees that Linux is a better choice than whatever Intel came up with, but gets distracted by an argument about whether it's better to replace a simple, well-understood tool with a slow, insecure, undocumented tool because someone said it was newer.
January 30, 2018 (comments)
Some rich people decide to get slightly richer in the near future. Hackernews invents a society in which indoor plumbing can be introduced to societies outside of the Bay Area, but realizes the plan is doomed to failure and settles in to argue about economics instead.
January 31, 2018 (comments)
AMD accidentally made money last year, bringing its debt load all the way down to $1.3 billion. Every single Hackernews who has purchased an AMD product posts about purchasing an AMD product, and the rest argue about video games.
January 31, 2018
The Adance class of Linux Friends
have five students on scholarship, the goal is to make this class stable and to enable those who have graduated from the beginners class to be very very skillful. This students have a natural curiosity to learn new things, the students do most of the talking and work in the class. A variety of learning models are used in the advance class like direct instruction and peers to peers learning. Progressively we will move to project base learning and school to school learning as soon as the crisis will come to an end.
January 30, 2018
January 29, 2018
Let's take a look at my annotated copy of the FOSDEM 2018 main talk schedule, shall we?
Twenty-five minutes you'll never get back.
Some bureaucrats arrive to explain to everyone how important bureaucracy is. One of them takes credit for introducing Java and XML to IBM, as though that is a praiseworthy achievement instead of grounds for a war crimes tribunal. The talk description focuses heavily on namedropping corporations known for ramming code into production whether there is consensus or not.
Some academics would like anyone at all to listen to them. Their thesis is that the European Union is the right organization to reinvent the internet, presumably based on the wild success experienced during the reinvention of Europe. Nobody involved appears to have actually done anything, either in the past or the present.
A Red Hat employee who is good at ARM processors would like to lecture everyone about something. We don't get to know what, because the talk description is blank, but we can guess: the speaker's pet technologies are the solution to all our problems, assuming we can just ignore the crippling problems of those pet technologies. This is merely speculation, but it doesn't really matter which PR fluff Red Hat chooses to excrete; they paid a lot to be a "cornerstone sponsor" and by God they're going to get a keynote slot.
Ten MORE minutes you'll never get back.
Bosch, an automotive manufacturer, sends a drone to ask that programmers line up behind Bosch. Bosch has decided to base their automated driving software on Eclipse, which allows us to infer some important facts: This software will never finish loading, much less begin to work. Your car will have the ask.com toolbar installed on every update. Very soon, nobody will ever hear of OpenADx ever again, but not before it makes the news for causing massive freeway-blocking traffic fatalities because someone's car crossed a time zone.
A programmer tells the harrowing tale of doing things other than programming, fucking them all up, and trying again. The talk description does not indicate that the speaker will present a solution to one of the oldest open problems in the software engineering discipline: "why anyone should trust software written by PHP programmers."
A bureaucratic parasite tries to convince the world that the United Nations should be taken seriously as a software project management organization. The talk describes the parasite's domain as "a multilateral participatory program" and "in collaboration with partners around the world," which is ancient United Nations code for "we are going to demand resources that we can use to demand further resources."
Another bureaucrat arrives to teach us how to like our jobs, providing those jobs are "cloning Borland Delphi and giving it to the Apache Software Foundation" or "cloning Microsoft Office and giving it to the Apache Software Foundation." No advice is offered for "living with what you've done, everyday."
An academic has spent some number of years shoving unix into a github repository, and now would like to read it aloud. Presumably some Greek student is furious that they had to pay money to attend this lecture last year.
A person nobody cares about will read the names of some software nobody cares about.
A self-described "industry thought leader" invites you to "experience his musings." In fact it's another academic attempting to convince us of the importance of his job.
Some students recreate an ancient computer without any of the things that made it interesting.
An Internet invites you to simplify your program's configuration file by importing a 130,000-line software grenade, over which you have no control. Wasteful and complicated constructs like "text files" can finally be discarded in favor of a globally-addressable nested key-value datastore which requires new software to be written for every type of configuration data.
Some academics arrive to pitch their latest invention: interprocess communications in the form of a packet-switched network. All you need to get started is a packet-switched network, on top of which they may pile untold complexity. Enthusiasts of doctoral thesis-defense trial runs will be sure to love this presentation.
A person who maintains software targeting hardware that does not in fact exist shows up to talk about the burgeoning field of hardware fanfiction. The speaker has successfully tricked several GNU projects into supporting this nonexistent architecture, which was a natural fit for their nonexistent Hurd operating system. This is the first time on record that a complete compute stack, from absent silicon to absent operating system to absent users, has ever been announced to be released Real Soon Now.
Obviously the only place that bugs cannot survive is within software that is not ever run, so the presenter would like to discuss the many different approaches to ensuring that unused software is subjected to a mind-numbing array of bureaucratic oversight, outdated standards documents, and half-assed formal verification procedures.
A webshit would like to brag about handing control of a software project over to a monte-carlo approximation of a project manager. The software project in question is webshit designed to expose root control of your computer to a web browser. This talk is the first multi-scale integrated model of terrible decision making and questionable practice.
Someone has shoved enough bullshit into the linux kernel, the Mesa graphical library, and the Android user space that they finally work together, if you break a lot of stuff and add a lot of otherwise-unneccesary software. The speaker is here to gloat about being involved with cramming so much garbage into so many disparate projects.
Glossing over the reasons they skipped "making LibreOffice work well anywhere", a company devoted to taking credit for other people's work is here to take credit for shoving a Microsoft Office clone into a web browser.
A comedy routine in which a professional database janitor pretends to honestly believe that MySQL is capable of doing anything quickly or scaling in a manner other than "run sixty of these and have fifty-nine of them lie to clients." After the talk, several spontaneous "how to migrate to PostgreSQL" talks will break out in the hallway outside, closely attended by sweating project managers who were not previously aware of what a trash pile MySQL and its advocates are.
Elasticsearch is a distributed customer-data exposure tool with a ransomware-friendly webshit interface. The company who charges money to clean up the mess has sent one of its less useful drones to drum up audience interest in the implementation details that make their product respond reliably and quickly to random internet assailants scanning AWS for data left unprotected by morons.
The Gluster team at Red Hat, bereft of customers, whiles away the hours by pretending it takes any work at all to reap performance benefits from faster hardware. The thesis seems to be that shitty software was acceptable when the hardware was shitty, but since storage platforms have improved, the bad programming and awful architecture of their project has become more obvious. Evidently it takes three people to apologize.
A Python arrives to tell us that Python 2 is irretrievably fucked and everyone should switch to Python 3, just like they've been telling us for the past decade. The primary products of the speaker's own employer rely entirely on Python 2, just like they have for the past decade. The talk will include plans for replacing Python 3 with Python 4, by the team who fucked up the previous transition so badly that Python 2 has been "deprecated" longer than the release interval between 2.0 and 3.0.
A monster uses a Python subset as a C++ templating language. The monster will be on hand to explain how to secure funding for similar monstrosities.
The speaker presents a horrible chimera of a programming language, wherein the drawbacks and limitations of Python are augmented by the drawbacks and limitations of C. The result is a language that introduces header files to Python and requires breathtaking amounts of boilerplate. The primary goal of Cython appears to be transforming the programming experience from "implementing a solution to a given problem" to "trying to guess when to turn off exception handling so that your code runs marginally faster."
Since nobody uses the eighteen million web services that Mozilla starts, ignores, deprecates, and discontinues each month, Mozilla has devoted actual resources to creating software devoted to making themselves feel like they have users. The speaker is willing to educate the audience on how to replace market penetration with a few hundred lines of code. The talk is in the Python track because of an implementation detail and because there is no "software nobody wants" track.
Security and Encryption
A buzzword enthusiast will talk about some software nobody will use, designed to run on hardware nobody wants. Another refugee from the "software nobody wants" track FOSDEM has once again failed to implement.
An IBM arrives to teach us how to use hardware he doesn't use. Lots of words will be expounded about integrating all kinds of software into this hardware chip, but a cursory glance at the speaker's own website reveals all this crap is such a pain in the ass that he just uses ssh-agent anyway.
Red Hat explains how they're fixing all their previous fuckups with linux disk encryption.
An LDAP programmer arrives to explain why we should all give a shit about the Bitcoin knockoff whose primary use is chewing through your processor whenever you watch videos on Youtube.
Someone has noticed that most of the problems with computers are caused by people.
The speaker seems to be confused regarding employment; he either works for Greenpeace or Mozilla, but since the software on which this talk focuses appears to function as intended, we can assume he does not work for Mozilla.
This talk focuses on the only hardware project at FOSDEM that actually exists in the physical world. This speaker does work for Mozilla, but his title is "Community Architect." Apparently Mozilla has automated their user-ignoring toolkit sufficiently that the people in charge of it have time to reach orbit, where they can pretend people haven't been doing this since the Kennedy administration.
January 28, 2018
Much of the current hysteria about the technology industry is due to its highly ambiguous relationship with its users. Driven by the logics of both compassion and indifference, this relationship has always been erratic yet functional. These two clashing rationales, for example, allowed technology companies, frequently painted as Dr Evil, to claim the mantle of Mother Theresa. However, as the unresolved contradictions of these logics pile up, we can’t fail to notice the incoherence of the industry’s overall social vision.
The compassion story has some truth to it. Tech giants have pegged their business models on our ability to consume. Thus, their interests are somewhat aligned with ours: we need a paycheque to buy what’s being advertised. A charitable comparison might be to Henry Ford paying his workers enough to buy his cars; a less charitable might be to slave owners keeping slaves fed not to lose them to exhaustion. However, unlike Ford or slave owners, our tech moguls want someone else to fund their preferred solutions (eg the universal basic income).Continue reading...
January 26, 2018
Scaletta Zanclea: Siddharte organizza un workshop gratuito
E' in preparazione l'edizione 2018 del “Trasformatorio – Quarto Laboratorio Internazionale di Arti Sperimentali, Performative e Site Specific”, organizzato dalla fondazione olandese Dyne.org in collaborazione con l'Associazione Culturale Siddharte e il ...
January 22, 2018
An annotated digest of the top "Hacker" "News" posts for the third week of January, 2018.
January 15, 2018 (comments)
An Internet is lost in the SERP e-market and can no longer search happily. Hackernews briefly explodes with a litany of complaints about Google's failure to meet their expectations in basically every market Google has entered. Fortunately, they all come to their senses and chant the standard praises unto their Lords, lest the cloud is rent asunder by the wrath of the Googly appendage.
January 16, 2018 (comments)
Mozilla devotes yet more resources to things that are merely tangentially related to the only Mozilla product anyone has ever cared about. They expect their problem report to remain in NEW state in the Federal bug tracker indefinitely. Hackernews gets to armchair lawyering just in case the legal opinions of cloistered programmer drones ever become relevant. The consensus is that all of these politicians just need to study the OSI model and all these problems will go away.
January 17, 2018 (comments)
Mozilla posts more condescending cartoons to illustrate their new breakthrough: they can now compile code nobody writes much faster than they could before. Hackernews is staggered by this tremendous technical achievement -- to the degree that valid technical objections are derided as off-topic and heaped with scorn.
January 18, 2018 (comments)
A software project posts a release announcement explaining that their arbitrary schedule caused them to drop several features on the floor. Hackernews is tremendously excited that they can now poorly run Photoshop on computers that could have run Photoshop perfectly, before Hackernews got their hands on them. Dozens of pages of technical details are posted to enable others to run expensive software in the least convenient possible manner. Some Hackernews express intent to purchase hundreds of dollars of new hardware to give this a try.
January 19, 2018 (comments)
An Internet writes security software. To fit with Apple's overarching development recommendations, the software is trivially bypassed and jam-packed with XML. Hackernews complains that none of this class of software make it easy enough to hand control of your network over to strangers. When this gets boring they switch over to arguing about licenses for another hundred pages.
January 20, 2018 (comments)
Some journalists write alternate-history fiction about a dimension where Intel could possibly ever be held responsible for anything. Hackernews either doesn't realize it is fiction or seamlessly changes into a fanfiction forum, enumerating all of the individual market segments where Intel could (but will not) lose any significant amount of business. The rest of the comments are barely comprehensible lectures incorrecting each other about how processors work.
January 21, 2018 (comments)
A webshit is mad at Sarajevo's city council. Hackernews likes the pretty pictures. Other Hackernews get into the spirit of the webshit's armchair lawyering. Neither the original article or the resulting comment threads are worth even loading in a web browser, much less reading. No technology is discussed.
January 17, 2018
Greetings, fellow Schemers and compiler nerds: I bring fresh nargery!
A couple years ago I made a list of compiler tasks for Guile. Most of these are still open, but I've been chipping away at the one labeled "instruction explosion":
Now we get more to the compiler side of things. Currently in Guile's VM there are instructions like vector-ref. This is a little silly: there are also instructions to branch on the type of an object (br-if-tc7 in this case), to get the vector's length, and to do a branching integer comparison. Really we should replace vector-ref with a combination of these test-and-branches, with real control flow in the function, and then the actual ref should use some more primitive unchecked memory reference instruction. Optimization could end up hoisting everything but the primitive unchecked memory reference, while preserving safety, which would be a win. But probably in most cases optimization wouldn't manage to do this, which would be a lose overall because you have more instruction dispatch.
Well, this transformation is something we need for native compilation anyway. I would accept a patch to do this kind of transformation on the master branch, after version 2.2.0 has forked. In theory this would remove most all high level instructions from the VM, making the bytecode closer to a virtual CPU, and likewise making it easier for the compiler to emit native code as it's working at a lower level.
Now that I'm getting close to finished I wanted to share some thoughts. Previous progress reports on the mailing list.
a simple loop
As an example, consider this loop that sums the 32-bit floats in a bytevector. I've annotated the code with lines and columns so that you can correspond different pieces to the assembly.
0 8 12 19 +-v-------v---v------v- | 1| (use-modules (rnrs bytevectors)) 2| (define (f32v-sum bv) 3| (let lp ((n 0) (sum 0.0)) 4| (if (< n (bytevector-length bv)) 5| (lp (+ n 4) 6| (+ sum (bytevector-ieee-single-native-ref bv n))) 7| sum)))
The assembly for the loop before instruction explosion went like this:
L1: 17 (handle-interrupts) at (unknown file):5:12 18 (uadd/immediate 0 1 4) 19 (bv-f32-ref 1 3 1) at (unknown file):6:19 20 (fadd 2 2 1) at (unknown file):6:12 21 (s64<? 0 4) at (unknown file):4:8 22 (jnl 8) ;; -> L4 23 (mov 1 0) at (unknown file):5:8 24 (j -7) ;; -> L1
So, already Guile's compiler has hoisted the (bytevector-length bv) and unboxed the loop index n and accumulator sum. This work aims to simplify further by exploding bv-f32-ref.
exploding the loop
In practice, instruction explosion happens in CPS conversion, as we are converting the Scheme-like Tree-IL language down to the CPS soup language. When we see a Tree-Il primcall (a call to a known primitive), instead of lowering it to a corresponding CPS primcall, we inline a whole blob of code.
In the concrete case of bv-f32-ref, we'd inline it with something like the following:
(unless (and (heap-object? bv) (eq? (heap-type-tag bv) %bytevector-tag)) (error "not a bytevector" bv)) (define len (word-ref bv 1)) (define ptr (word-ref bv 2)) (unless (and (<= 4 len) (<= idx (- len 4))) (error "out of range" idx)) (f32-ref ptr len)
As you can see, there are four branches hidden in the bv-f32-ref: two to check that the object is a bytevector, and two to check that the index is within range. In this explanation we assume that the offset idx is already unboxed, but actually unboxing the index ends up being part of this work as well.
One of the goals of instruction explosion was that by breaking the operation into a number of smaller, more orthogonal parts, native code generation would be easier, because the compiler would only have to know about those small bits. However without an optimizing compiler, it would be better to reify a call out to a specialized bv-f32-ref runtime routine instead of inlining all of this code -- probably whatever language you write your runtime routine in (C, rust, whatever) will do a better job optimizing than your compiler will.
But with an optimizing compiler, there is the possibility of removing possibly everything but the f32-ref. Guile doesn't quite get there, but almost; here's the post-explosion optimized assembly of the inner loop of f32v-sum:
L1: 27 (handle-interrupts) 28 (tag-fixnum 1 2) 29 (s64<? 2 4) at (unknown file):4:8 30 (jnl 15) ;; -> L5 31 (uadd/immediate 0 2 4) at (unknown file):5:12 32 (u64<? 2 7) at (unknown file):6:19 33 (jnl 5) ;; -> L2 34 (f32-ref 2 5 2) 35 (fadd 3 3 2) at (unknown file):6:12 36 (mov 2 0) at (unknown file):5:8 37 (j -10) ;; -> L1
The first thing to note is that unlike the "before" code, there's no instruction in this loop that can throw an exception. Neat.
Next, note that there's no type check on the bytevector; the peeled iteration preceding the loop already proved that the bytevector is a bytevector.
And indeed there's no reference to the bytevector at all in the loop! The value being dereferenced in (f32-ref 2 5 2) is a raw pointer. (Read this instruction as, "sp = *(float*)((byte*)sp + (uptrdiff_t)sp)".) The compiler does something interesting; the f32-ref CPS primcall actually takes three arguments: the garbage-collected object protecting the pointer, the pointer itself, and the offset. The object itself doesn't appear in the residual code, but including it in the f32-ref primcall's inputs keeps it alive as long as the f32-ref itself is alive.
Then there are the limitations. Firstly, instruction 28 tags the u64 loop index as a fixnum, but never uses the result. Why is this here? Sadly it's because the value is used in the bailout at L2. Recall this pseudocode:
(unless (and (<= 4 len) (<= idx (- len 4))) (error "out of range" idx))
Here the error ends up lowering to a throw CPS term that the compiler recognizes as a bailout and renders out-of-line; cool. But it uses idx as an argument, as a tagged SCM value. The compiler untags the loop index, but has to keep a tagged version around for the error cases.
The right fix is probably some kind of allocation sinking pass that sinks the tag-fixnum to the bailouts. Oh well.
Additionally, there are two tests in the loop. Are both necessary? Turns out, yes :( Imagine you have a bytevector of length 1025. The loop continues until the last ref at offset 1024, which is within bounds of the bytevector but there's one one byte available at that point, so we need to throw an exception at this point. The compiler did as good a job as we could expect it to do.
is is worth it? where to now?
On the one hand, instruction explosion is a step sideways. The code is more optimal, but it's more instructions. Because Guile currently has a bytecode VM, that means more total interpreter overhead. Testing on a 40-megabyte bytevector of 32-bit floats, the exploded f32v-sum completes in 115 milliseconds compared to around 97 for the earlier version.
On the other hand, it is very easy to imagine how to compile these instructions to native code, either ahead-of-time or via a simple template JIT. You practically just have to look up the instructions in the corresponding ISA reference, is all. The result should perform quite well.
I will probably take a whack at a simple template JIT first that does no register allocation, then ahead-of-time compilation with register allocation. Getting the AOT-compiled artifacts to dynamically link with runtime routines is a sufficient pain in my mind that I will put it off a bit until later. I also need to figure out a good strategy for truly polymorphic operations like general integer addition; probably involving inline caches.
So that's where we're at :) Thanks for reading, and happy hacking in Guile in 2018!
January 16, 2018
January 15, 2018
An annotated digest of the top "Hacker" "News" posts for the second week of January, 2018.
January 08, 2018 (comments)
A journalist has run out of things to talk about, and so resorts to whining about smartphones. Hackernews gets engrossed in a discussion of the finer points of introducing their children to Apple products without letting them learn how to fend for themselves. Other Hackernews are angry that this newspaper article is not a peer-reviewed academic paper with a citation list. Most of the rest of the comments are from people who regard any public expression of worry as a direct act of psychological warfare on the reader.
January 09, 2018 (comments)
Some Internets are angry that Google runs the web. The few Hackernews with the temerity to disagree with Google are buried in an avalanche of straw men stuffed with red herrings. Several dozen Hackernews try to imagine what search engine optimization should look like, without ever stopping to question why "search engine optimization" is a thing that the web would ever need in the first place. The consensus seems to be that AMP is necessary because without it people won't do what Google wants, which is clearly an unsustainable position for humanity to take.
January 10, 2018 (comments)
The police are not required to enforce webshit terms of service. Hackernews is glad to have the cops off their tails, but can't stop for a sigh of relief: they're wasting all their breath bickering about commonly-used legal terms in an attempt to justify software piracy.
January 11, 2018 (comments)
Microsoft pays some nerds. Hackernews, unwilling to settle for reinventing chat programs from first principles, reinvents the chat program industry from first principles. Some Hackernews wrestle with whether Signal is the encrypted comms tool that will save the world, or whether it's just another NSA front. A sidebar is held to bitch about Skype's user interface.
January 12, 2018 (comments)
A graduate student in the Rust Evangelism Strike Force is given enough rope. Hackernews spends some time shopping for light-blinking accoutrements and then questions whether it's even possible to write software that is not just a reimplementation of existing software. A subgroup of these decide the fundamental purpose of an operating system is to render webshit. A roll call is held for every operating system Hackernews has ever heard of.
January 13, 2018 (comments)
An Internet writes about coping with the death of a family member, concluding with some sage advice: "You should follow me on Twitter". Hackernews discards this instruction and instead pastes quotes about the deceased from every other Twitter account they can find. The rest of the comments are people speaking well of the dead and other people arguing about whose fault the death was.
January 14, 2018 (comments)
A webshit is mad because a spam company isn't following instructions. Hackernews is familiar with the spam company because they have all accidentally given it money. As usual with "I'm too stupid to direct the flow of my own capital" discussions on Hackernews, the thread devolves into competitive bank-shilling. The rest of the comment threads are the old Hackernews standards: "this website is dying because I don't like it," "this is a user interface problem and not a fundamental design flaw," "I am smarter than everyone I know," and the inevitable "I am better than you because I don't use this service."
January 13, 2018
El Jarabe - Son de Madera
soy el aguacero entre la neblina, claro jardinero de la flor divina, agua del venero que nunca termina
permítame usted una interrupción a su aburrimiento musical : este es uno de los más chingones sones :) Lo compuso Ramón Gutierrez quien lo interpreta y canta junto con Jose Tereso Vega Herandez, jarana tercera, y Natalia Arroyo Rodríguez al violín. La grabación corresponde a un programa del Canal 11 de la televisión pública mexicana
January 12, 2018
January 11, 2018
I remember in 2008 seeing Gerald Sussman, creator of the Scheme language, resignedly describing a sea change in the MIT computer science curriculum. In response to a question from the audience, he said:
The work of engineers used to be about taking small parts that they understood entirely and using simple techniques to compose them into larger things that do what they want.
But programming now isn't so much like that. Nowadays you muck around with incomprehensible or nonexistent man pages for software you don't know who wrote. You have to do basic science on your libraries to see how they work, trying out different inputs and seeing how the code reacts. This is a fundamentally different job.
Like many I was profoundly saddened by this analysis. I want to believe in constructive correctness, in math and in proofs. And so with the rise of functional programming, I thought that this historical slide from reason towards observation was just that, historical, and that the "safe" languages had a compelling value that would be evident eventually: that "another world is possible".
In particular I found solace in "langsec", an approach to assessing and ensuring system security in terms of constructively correct programs. One obvious application is parsing of untrusted input, and indeed the langsec.org website appears to emphasize this domain as one in which a programming languages approach can be fruitful. It is, after all, a truth universally acknowledged, that a program with good use of data types, will be free from many common bugs. So far so good, and so far so successful.
The basis of language security is starting from a programming language with a well-defined, easy-to-understand semantics. From there you can prove (formally or informally) interesting security properties about particular programs. For example, if a program has a secret k, but some untrusted subcomponent C of it should not have access to k, one can prove if k can or cannot leak to C. This approach is taken, for example, by Google's Caja compiler to isolate components from each other, even when they run in the context of the same web page.
What's worse, we need to do basic science to come up with adequate mitigations to the Spectre vulnerabilities (side-channel exfiltration of results of speculative execution). Retpolines, poisons and masks, et cetera: none of these are proven to work. They are simply observed to be effective on current hardware. Indeed mitigations are anathema to the correctness-by-construction: if you can prove that a problem doesn't exist, what is there to mitigate?
Spectre is not the first crack in the edifice of practical program correctness. In particular, timing side channels are rarely captured in language semantics. But I think it's fair to say that Spectre is the most devastating vulnerability in the langsec approach to security that has ever been uncovered.
Where do we go from here? I see but two options. One is to attempt to make the behavior of the machines targetted by secure language implementations behave rigorously as architecturally specified, and in no other way. This is the approach taken by all of the deployed mitigations (retpolines, poisoned pointers, masked accesses): modify the compiler and runtime to prevent the CPU from speculating through vulnerable indirect branches (prevent speculative execution), or from using fetched values in further speculative fetches (prevent this particular side channel). I think we are missing a model and a proof that these mitigations restore target architectural semantics, though.
However if we did have a model of what a CPU does, we have another opportunity, which is to incorporate that model in a semantics of the target language of a compiler (e.g. micro-x86 versus x86). It could be that this model produces a co-evolution of the target architectures as well, whereby Intel decides to disclose and expose more of its microarchitecture to user code. Cacheing and other microarchitectural side-effects would then become explicit rather than transparent.
Rich Hickey has this thing where he talks about "simple versus easy". Both of them sound good but for him, only "simple" is good whereas "easy" is bad. It's the sort of subjective distinction that can lead to an endless string of Worse Is Better Is Worse Bourbaki papers, according to the perspective of the author. Anyway transparent caching in the CPU has been marvelously easy for most application developers and fantastically beneficial from a performance perspective. People needing constant-time operations have complained, of course, but that kind of person always complains. Could it be, though, that actually there is some other, better-is-better kind of simplicity that should replace the all-pervasive, now-treacherous transparent cacheing?
I don't know. All I will say is that an ad-hoc approach to determining which branches and loads are safe and which are not is not a plan that inspires confidence. Godspeed to the langsec faithful in these dark times.